Documentation
Tutorials & Learning
Comprehensive learning path with Jupyter notebooks covering the complete pipeline from installation through training to inference. Follow our step-by-step guides to master exoplanet detection with machine learning.
Learning Path
Our tutorials are organized into progressive phases, each building on the previous one. Start with Phase 1 if you're new to ExoBengal, or jump to advanced topics if you're already familiar with the basics.
- Phase 1: Getting Started (15-30 min) - Basic setup and first prediction
- Phase 2: Understanding the Data (30-45 min) - Exoplanet parameters and their meanings
- Phase 3: Training Your First Model (45-60 min) - Train Random Forest classifier
- Phase 4: Exploring Different Algorithms (60-90 min) - Compare all four models
- Phase 5: Hyperparameter Tuning (90-120 min) - Optimize model performance
- Phase 6: Advanced Topics (120+ min) - Batch predictions, ensemble methods
Available Notebooks
test.ipynb for local development and pip_test.ipynb for Google Colab with Drive integration
Training Models
Train Random Forest, CNN, k-Nearest Neighbors, and Decision Tree models with custom hyperparameters on NASA data
Making Predictions
Load pre-trained models and classify exoplanet candidates with probability scores and ESI calculation
Running Locally
Set up Jupyter Lab/Notebook on your machine and run tutorials with local Python environment
Running on Colab
Use Google Colab's free GPU for faster training with Drive mounting and pip installation
Notebook Walkthrough
Cell-by-cell explanation of test.ipynb with expected outputs and interpretation
Prerequisites
- Python 3.8+ required
- Dependencies: numpy, pandas, matplotlib, seaborn, scikit-learn, joblib, tensorflow
- Data: NASA Exoplanet Archive cumulative table (cumulative.csv)
- Hardware: Minimum 4GB RAM, recommended 8GB RAM
- For Colab: Google account for Drive access
Quick Start
Local Setup
cd ExoBengal/tutorial
jupyter lab
# Open test.ipynbGoogle Colab
Click the "Open in Colab" badge in pip_test.ipynb, mount Google Drive when prompted, and run the pip install cell.
What You'll Learn
- How to set up ExoParams with exoplanet parameters
- Training workflows for all four ML models
- Making predictions with trained models
- Comparing model performance and outputs
- Understanding evaluation metrics (classification reports, confusion matrices, AUC-ROC)
- Calculating and interpreting Earth Similarity Index (ESI)
- Hyperparameter tuning for better accuracy
- Batch predictions and ensemble methods
Common Issues
- Data file not found: Ensure cumulative.csv is in data/ directory
- TensorFlow installation issues: Use tensorflow-cpu for CPU-only
- Memory errors: Reduce CNN batch size or use subset of data
- Slow training: Enable GPU in Colab or reduce epochs
- Import errors: Reinstall exobengal package
Next Steps
After completing tutorials:
- Explore the API Reference for detailed class documentation
- Read Model Artifacts documentation for architecture details
- Try the live Cerebrium API for production deployments
- Contribute to the project on GitHub