Documentation

Docs/Models

Models & Artifacts

The models/ directory contains pre-trained machine learning models and preprocessing artifacts for exoplanet detection. All models are trained on NASA Exoplanet Archive Kepler mission data.

Model Artifacts

1. Random Forest Classifier

  • File: random_forest_classifier.pkl
  • Type: Scikit-learn RandomForestClassifier
  • Size: ~5-10 MB
  • Description: Ensemble classifier combining multiple decision trees for robust predictions
  • Performance: ~90% accuracy, AUC-ROC 0.94-0.96

2. Decision Tree Classifier

  • File: decision_tree_classifier.pkl
  • Type: Scikit-learn DecisionTreeClassifier
  • Size: ~1-3 MB
  • Description: Single decision tree for interpretable rule-based classification
  • Performance: ~85% accuracy, AUC-ROC 0.88-0.91

3. CNN Model

  • File: cnn_model.h5
  • Type: TensorFlow/Keras Sequential neural network
  • Size: ~500 KB - 2 MB
  • Description: Deep learning model with dense layers and dropout regularization
  • Architecture: Input(9) → Dense(64) → Dropout(0.3) → Dense(32) → Dropout(0.3) → Dense(16) → Dropout(0.3) → Output(1)
  • Performance: ~89% accuracy, AUC-ROC 0.92-0.95

4. k-Nearest Neighbors

  • File: knn_model.pkl
  • Type: Scikit-learn KNeighborsClassifier
  • Size: ~10-20 MB (stores training data)
  • Description: Instance-based classifier using k=5 nearest neighbors
  • Performance: ~87% accuracy, AUC-ROC 0.90-0.93

5. Standard Scaler

  • File: scaler.pkl
  • Type: Scikit-learn StandardScaler
  • Size: ~1-5 KB
  • Description: Feature normalization to zero mean and unit variance
  • Critical: Must use the same scaler for training and inference

6. Simple Imputer

  • File: imputer.pkl
  • Type: Scikit-learn SimpleImputer
  • Size: ~1-5 KB
  • Description: Fill missing values using mean strategy
  • Critical: Must use the same imputer for training and inference

Model Architectures & Hyperparameters

Random Forest

  • n_estimators: 100
  • max_depth: None
  • threshold: 0.5

CNN

  • hidden_layers: [64,32,16]
  • dropout: 0.3
  • epochs: 50
  • threshold: 0.6

kNN

  • n_neighbors: 5
  • metric: euclidean
  • threshold: 0.6

Decision Tree

  • max_depth: 10
  • criterion: gini
  • threshold: 0.6

Performance Comparison

ModelAccuracyAUC-ROCTraining TimeInference SpeedBest For
Random Forest~90%0.94-0.96Medium (1-2 min)FastProduction, general use
CNN~89%0.92-0.95Slow (5-15 min CPU)FastComplex patterns, research
kNN~87%0.90-0.93Fast (instant)SlowQuick prototyping
Decision Tree~85%0.88-0.91Fast (10-20 sec)FastEducation, interpretability

Prediction (auto-load)

Models are loaded automatically on first prediction. The detector checks if the model is loaded and calls the corresponding load method if needed.

from exobengal import DetectExoplanet

detector = DetectExoplanet()
sample = [365.0, 1.0, 288.0, 1.0, 4.44, 5778, 0.1, 5.0, 100.0]
print(detector.random_forest(sample))

Explicit Pre-loading

from exobengal import DetectExoplanet

detector = DetectExoplanet()
detector.load_rf_model()  # Pre-load for faster predictions

# Now predictions are faster
result = detector.random_forest([365.0, 1.0, 288.0, 1.0, 4.44, 5778, 0.1, 5.0, 100.0])
print(result)

Retraining

Each train_* method overwrites its model file and updates scaler.pkl and imputer.pkl. Always backup existing models before retraining.

Important Notes

  • Each train_* overwrites its model and writes scaler.pkl and imputer.pkl
  • Always backup existing models before retraining: cp -r models/ models_backup/
  • Training automatically handles data loading, preprocessing, splitting (80/20), and evaluation
  • Outputs include classification report, confusion matrix, and AUC-ROC score

Retraining Example

from exobengal import DetectExoplanet

detector = DetectExoplanet()

# Retrain Random Forest with custom hyperparameters
detector.train_random_forest(
    data_path="data/cumulative.csv",
    n_estimators=200,
    max_depth=20
)

Loading Custom Models

You can specify custom model paths when initializing the detector:

from exobengal import DetectExoplanet

detector = DetectExoplanet(
    rf_model_path="/path/to/my_rf_model.pkl",
    scaler_path="/path/to/my_scaler.pkl",
    imputer_path="/path/to/my_imputer.pkl"
)

Troubleshooting

Common Issues

  • Model file not found: Train models first or download from repository
  • TensorFlow warnings: Safe to ignore or install tensorflow-cpu
  • Inconsistent predictions: Ensure same scaler/imputer used for training and inference
Interstellar
Background Music
30%