Challenge
A simple 'Yes/No' isn't enough for medical AI. I wanted to provide 'Why' for every prediction.
Solution
Integrated SHAP explainability directly into the API response to highlight top physiological risk factors.
File Explorer
src
train.py
data
diabetes.csv
train.py
1def objective(trial):
2 # Bayesian Optimization with Optuna
3 params = {
4 'n_estimators': trial.suggest_int('n_estimators', 100, 500),
5 'max_depth': trial.suggest_int('max_depth', 3, 7),
6 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2, log=True),
7 }
8
9 pipeline = ImbPipeline([
10 ('imputer', KNNImputer(n_neighbors=5)),
11 ('scaler', StandardScaler()),
12 ('smote', SMOTE(random_state=42)),
13 ('model', XGBClassifier(**params))
14 ])
15
16 cv = StratifiedKFold(n_splits=5, shuffle=True)
17 score = cross_val_score(pipeline, X_train, y_train, cv=cv, scoring='roc_auc').mean()
18 return score
19
20study = optuna.create_study(direction='maximize')
21study.optimize(objective, n_trials=100)
Output
Initializing Medical ML Environment...
Loading PIMA Indians Dataset...
Ready.
Learning Optimization
My exploration into using Optuna to find the best hyperparameters for medical diagnosis.
Medical Insight
I learned that automated tuning is much more reliable than manual guessing for complex models.
Pipeline Specs
Dataset
PIMA Indians
Stack
XGBoost + Optuna
Result
83.4% ROC-AUC
Focus
Search Space
Key Technologies
- Cross-Validation
- SMOTE Imbalance Correction
- KNN Imputation