Docs/Data Reference

Data Reference & Preprocessing

Dataset: data/cumulative_2025.09.20_12.15.37.csv

Columns used

  • koi_period (days)
  • koi_prad (Earth radii)
  • koi_teq (K)
  • koi_srad (Solar radii)
  • koi_slogg (log10 cm/s^2)
  • koi_steff (K)
  • koi_impact
  • koi_duration (hours)
  • koi_depth (ppm)

Label mapping

  • CONFIRMED → 1
  • CANDIDATE → 1
  • FALSE POSITIVE → 0

Preprocessing

  • SimpleImputer(mean) → models/imputer.pkl
  • StandardScaler → models/scaler.pkl
  • Train/test split: test_size=0.2, random_state=42

Train with a new CSV

from exobengal.exobengal import DetectExoplanet

m = DetectExoplanet()
m.train_random_forest(data_path="data/your_new_file.csv")
Interstellar
Background Music
30%