Challenge

A simple 'Yes/No' isn't enough for medical AI. I wanted to provide 'Why' for every prediction.

Solution

Integrated SHAP explainability directly into the API response to highlight top physiological risk factors.

File Explorer
src
train.py
data
diabetes.csv
train.py
1def objective(trial):
2 # Bayesian Optimization with Optuna
3 params = {
4 'n_estimators': trial.suggest_int('n_estimators', 100, 500),
5 'max_depth': trial.suggest_int('max_depth', 3, 7),
6 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2, log=True),
7 }
8
9 pipeline = ImbPipeline([
10 ('imputer', KNNImputer(n_neighbors=5)),
11 ('scaler', StandardScaler()),
12 ('smote', SMOTE(random_state=42)),
13 ('model', XGBClassifier(**params))
14 ])
15
16 cv = StratifiedKFold(n_splits=5, shuffle=True)
17 score = cross_val_score(pipeline, X_train, y_train, cv=cv, scoring='roc_auc').mean()
18 return score
19
20study = optuna.create_study(direction='maximize')
21study.optimize(objective, n_trials=100)
Output
Initializing Medical ML Environment...
Loading PIMA Indians Dataset...
Ready.

Learning Optimization

My exploration into using Optuna to find the best hyperparameters for medical diagnosis.

Medical Insight

I learned that automated tuning is much more reliable than manual guessing for complex models.

Pipeline Specs

Dataset

PIMA Indians

Stack

XGBoost + Optuna

Result

83.4% ROC-AUC

Focus

Search Space

Key Technologies

  • Cross-Validation
  • SMOTE Imbalance Correction
  • KNN Imputation