Data Reference & Preprocessing
Dataset: data/cumulative_2025.09.20_12.15.37.csv
Columns used
- koi_period (days)
- koi_prad (Earth radii)
- koi_teq (K)
- koi_srad (Solar radii)
- koi_slogg (log10 cm/s^2)
- koi_steff (K)
- koi_impact
- koi_duration (hours)
- koi_depth (ppm)
Label mapping
- CONFIRMED → 1
- CANDIDATE → 1
- FALSE POSITIVE → 0
Preprocessing
- SimpleImputer(mean) →
models/imputer.pkl
- StandardScaler →
models/scaler.pkl
- Train/test split: test_size=0.2, random_state=42
Train with a new CSV
from exobengal.exobengal import DetectExoplanet
m = DetectExoplanet()
m.train_random_forest(data_path="data/your_new_file.csv")