BATCH_01 // #05

Diabetes XAI API

Learning Explainable Medical AI

I built this to learn how to make AI predictions transparent, which is critical in healthcare. I explored SHAP (Explainable AI) to show exactly why a model predicts a certain risk level. This project taught me about class imbalance handling with SMOTE and how to use Optuna for Bayesian hyperparameter optimization to reach high recall.

Project Highlights

[+]Learned to use SHAP values to provide human-readable reasoning for ML predictions.
[+]Mastered hyperparameter tuning using Bayesian optimization with Optuna.
[+]Handled extreme class imbalance using SMOTE and KNNImputation techniques.
[+]Built a production-ready API with FastAPI and containerized it with Docker.

Tech Stack

#Python#FastAPI#XGBoost#Optuna#SHAP#Docker

Open Source Repository

Challenge

A simple 'Yes/No' isn't enough for medical AI. I wanted to provide 'Why' for every prediction.

Solution

Integrated SHAP explainability directly into the API response to highlight top physiological risk factors.

File Explorer

src

train.py

data

diabetes.csv

train.py

1def objective(trial):

2 # Bayesian Optimization with Optuna

3 params = {

4 'n_estimators': trial.suggest_int('n_estimators', 100, 500),

5 'max_depth': trial.suggest_int('max_depth', 3, 7),

6 'learning_rate': trial.suggest_float('learning_rate', 0.01, 0.2, log=True),

7 }

9 pipeline = ImbPipeline([

10 ('imputer', KNNImputer(n_neighbors=5)),

11 ('scaler', StandardScaler()),

12 ('smote', SMOTE(random_state=42)),

13 ('model', XGBClassifier(**params))

14 ])

16 cv = StratifiedKFold(n_splits=5, shuffle=True)

17 score = cross_val_score(pipeline, X_train, y_train, cv=cv, scoring='roc_auc').mean()

18 return score

20study = optuna.create_study(direction='maximize')

21study.optimize(objective, n_trials=100)

Output

Initializing Medical ML Environment...

Loading PIMA Indians Dataset...

Ready.

Learning Optimization

My exploration into using Optuna to find the best hyperparameters for medical diagnosis.

Medical Insight

I learned that automated tuning is much more reliable than manual guessing for complex models.

Pipeline Specs

Dataset

PIMA Indians

Stack

XGBoost + Optuna

Result

83.4% ROC-AUC

Focus

Search Space

Key Technologies

Cross-Validation
SMOTE Imbalance Correction
KNN Imputation