🏠 Course Home πŸ“š Module 6 Class Material πŸ”¬ Lab Manual

πŸ”¬ Module 6 Lab: Credit Risk Assessment with Boosting

Build a production-grade credit scoring model using XGBoost Β· 5 Interactive Tasks

πŸ“‹ Lab Progress

0 / 5 tasks completed

Lab Overview

Learning Objectives

By completing this lab, you will develop practical skills in implementing boosting algorithms for a real-world financial application. You will learn how to handle imbalanced datasets, optimize hyperparameters using cross-validation, interpret feature importance for regulatory compliance, and compare multiple gradient boosting frameworks. This lab simulates the work you would perform as a data scientist at a financial institution.

Business Context

You are a data scientist at FinanceFirst Bank, a regional bank processing 50,000 loan applications annually. The current credit scoring system approves 65% of applications with a 12% default rate, resulting in $24 million in annual losses. Your task is to build an XGBoost model that improves default prediction while maintaining approval rates above 60% to meet growth targets. The model must also provide feature importance scores to satisfy regulatory requirements for explainability in lending decisions.

Task 1: Data Preparation and Exploration

✏️ Your Task

Load the credit risk dataset, perform exploratory data analysis to understand feature distributions and default rates, identify missing values and outliers, and prepare the data for modeling by handling categorical variables and splitting into training and test sets.

Hint

Your exploration should reveal that younger applicants (18–30) have higher default rates (~18%), while applicants over 50 show rates below 8%. Income levels show a strong inverse relationship with default rates. Credit scores provide clear separation β€” most defaults occur below 620. Use pd.cut() for binning continuous features and groupby().mean() to compute group-level default rates.

# Import required libraries import pandas as pd import numpy as np import matplotlib.pyplot as plt import seaborn as sns from sklearn.model_selection import train_test_split from sklearn.preprocessing import LabelEncoder import warnings warnings.filterwarnings('ignore') # Load the dataset df = pd.read_csv('credit_risk_data.csv') # Initial exploration print("Dataset shape:", df.shape) print("\nFirst few rows:") print(df.head()) print("\nFeature types:") print(df.dtypes) print("\nMissing values:") print(df.isnull().sum()) print("\nTarget variable distribution:") print(df['default'].value_counts()) print(f"Default rate: {df['default'].mean():.2%}") # Visualize key relationships fig, axes = plt.subplots(2, 2, figsize=(15, 12)) df['age_group'] = pd.cut(df['age'], bins=[18,30,40,50,60,100], labels=['18-30','30-40','40-50','50-60','60+']) age_default = df.groupby('age_group')['default'].mean() axes[0,0].bar(age_default.index, age_default.values) axes[0,0].set_title('Default Rate by Age Group') df['income_group'] = pd.cut(df['annual_income'], bins=[0,30000,60000,100000,500000], labels=['<30K','30-60K','60-100K','>100K']) income_default = df.groupby('income_group')['default'].mean() axes[0,1].bar(income_default.index, income_default.values) axes[0,1].set_title('Default Rate by Income Level') axes[1,0].hist([df[df['default']==0]['loan_amount'], df[df['default']==1]['loan_amount']], label=['Non-default','Default'], bins=30) axes[1,0].set_title('Loan Amount Distribution') axes[1,0].legend() axes[1,1].hist([df[df['default']==0]['credit_score'], df[df['default']==1]['credit_score']], label=['Non-default','Default'], bins=30) axes[1,1].set_title('Credit Score Distribution') axes[1,1].legend() plt.tight_layout() plt.show()
Python 3 Β· pandas Β· sklearn
Dataset shape: (50000, 23) First few rows: loan_id age annual_income loan_amount credit_score employment_status ... default 0 10001 34 58200.0 18500 692 Employed ... 0 1 10002 27 31400.0 12000 588 Self-Employed ... 1 2 10003 45 87300.0 25000 741 Employed ... 0 3 10004 22 24100.0 8000 541 Unemployed ... 1 4 10005 51 112500.0 32000 795 Employed ... 0 Feature types: loan_id int64 age int64 annual_income float64 loan_amount int64 credit_score int64 employment_status object home_ownership object loan_purpose object ... Missing values: annual_income 127 employment_status 43 credit_score 0 ... dtype: int64 Target variable distribution: 0 44022 1 5978 Name: default, dtype: int64 Default rate: 11.96% βœ“ Visualizations generated: 4 plots (age groups, income levels, loan amount distribution, credit score distribution) βœ“ Class imbalance confirmed: ~12% default rate βœ“ Credit score is strongest predictor (mean 740 non-default vs 571 default) βœ“ Income level inversely correlated with default rate

Checkpoint 1

Before proceeding, verify that you have identified the class imbalance (~12% default rate), confirmed that credit score and income are strong predictors, noted missing values requiring imputation, and created visualizations showing feature–default relationships. These foundational insights will inform your modeling decisions in the next tasks.

Task 2: Build and Train the XGBoost Model

✏️ Your Task

Implement an XGBoost classifier with appropriate parameters for the imbalanced credit risk dataset. Configure the model to handle class imbalance, set regularization parameters to prevent overfitting, and use early stopping to find the optimal number of trees. Train the model and evaluate its performance using metrics appropriate for imbalanced classification.

Hint

Use scale_pos_weight = count(negative) / count(positive) to account for class imbalance β€” this tells XGBoost to penalize false negatives more heavily. Set early_stopping_rounds=20 to prevent overfitting; optimal stopping typically occurs between 80–150 rounds. Evaluate with AUC rather than accuracy because accuracy is misleading on imbalanced data (a model predicting all non-defaults achieves 88% accuracy but zero recall on defaults).

# Prepare data for modeling categorical_cols = ['employment_status', 'home_ownership', 'loan_purpose'] for col in categorical_cols: le = LabelEncoder() df[col] = le.fit_transform(df[col]) X = df.drop(['default', 'age_group', 'income_group'], axis=1) y = df['default'] X_train, X_test, y_train, y_test = train_test_split( X, y, test_size=0.2, random_state=42, stratify=y ) print(f"Training set size: {len(X_train)}") print(f"Test set size: {len(X_test)}") print(f"Training default rate: {y_train.mean():.2%}") # Build XGBoost model import xgboost as xgb from sklearn.metrics import classification_report, roc_auc_score, confusion_matrix scale_pos_weight = len(y_train[y_train==0]) / len(y_train[y_train==1]) params = { 'objective': 'binary:logistic', 'max_depth': 5, 'learning_rate': 0.05, 'n_estimators': 500, 'subsample': 0.8, 'colsample_bytree': 0.8, 'min_child_weight': 3, 'gamma': 0.1, 'reg_alpha': 0.1, 'reg_lambda': 1.0, 'scale_pos_weight': scale_pos_weight, 'eval_metric': ['auc', 'logloss'], 'random_state': 42 } dtrain = xgb.DMatrix(X_train, label=y_train) dtest = xgb.DMatrix(X_test, label=y_test) evals = [(dtrain, 'train'), (dtest, 'eval')] model = xgb.train( params, dtrain, num_boost_round=500, evals=evals, early_stopping_rounds=20, verbose_eval=50 ) y_pred_proba = model.predict(dtest) y_pred = (y_pred_proba >= 0.5).astype(int) print("\nModel Performance:") print(classification_report(y_test, y_pred)) print(f"\nAUC Score: {roc_auc_score(y_test, y_pred_proba):.4f}") cm = confusion_matrix(y_test, y_pred) tn, fp, fn, tp = cm.ravel() approval_rate = (tp + fp) / len(y_test) default_rate_approved = fn / (tp + fp) if (tp + fp) > 0 else 0 print(f"\nBusiness Metrics:") print(f"Approval Rate: {approval_rate:.2%}") print(f"Default Rate Among Approved: {default_rate_approved:.2%}")
Python 3 Β· xgboost Β· sklearn
Training set size: 40000 Test set size: 10000 Training default rate: 11.97% [0] train-auc:0.88412 eval-auc:0.86931 [50] train-auc:0.93856 eval-auc:0.91204 [100] train-auc:0.95712 eval-auc:0.92387 [112] train-auc:0.96014 eval-auc:0.92441 ← best Stopping. Best iteration: [112] Model Performance: precision recall f1-score support 0 0.95 0.92 0.93 8805 1 0.65 0.76 0.70 1195 accuracy 0.90 10000 macro avg 0.80 0.84 0.82 10000 AUC Score: 0.9244 Business Metrics: Approval Rate: 63.2% Default Rate Among Approved: 6.8% βœ“ AUC 0.9244 exceeds target of 0.85 βœ“ Approval rate 63.2% exceeds business floor of 60% βœ“ Default rate among approved: 6.8% (down from 12.0% β†’ -43% reduction) βœ“ Early stopping halted at round 112 (optimal complexity)

Checkpoint 2

Your model should achieve an AUC above 0.85. The approval rate should remain above 60%, while the default rate among approved loans should drop below 8% (from the current 12%). Early stopping should halt between 80–150 rounds. If metrics fall short, revisit your feature engineering or adjust scale_pos_weight to better balance precision and recall.

Task 3: Optimize and Deploy the Final Model

✏️ Your Task

Use cross-validation to find optimal hyperparameters, analyze feature importance to identify the top predictors of credit risk, calculate the expected financial impact of deploying your model, and prepare a model summary accessible to non-technical stakeholders.

Hint

In GridSearchCV use scoring='roc_auc' and cv=5 for stable estimates on imbalanced data. For the financial impact calculation, start with average loan = $100K and LGD = 20%. The formula is: savings = annual_volume Γ— (old_default_rate βˆ’ new_default_rate) Γ— avg_loan Γ— LGD. For regulators, frame feature importance in plain language β€” "credit score" and "debt-to-income ratio" are understandable; "feature_7" is not.

# Hyperparameter tuning with cross-validation from sklearn.model_selection import GridSearchCV param_grid = { 'max_depth': [3, 5, 7], 'learning_rate': [0.01, 0.05, 0.1], 'min_child_weight': [1, 3, 5], 'subsample': [0.8, 1.0] } xgb_clf = xgb.XGBClassifier( objective='binary:logistic', scale_pos_weight=scale_pos_weight, random_state=42 ) grid_search = GridSearchCV( xgb_clf, param_grid, cv=5, scoring='roc_auc', n_jobs=-1, verbose=1 ) grid_search.fit(X_train, y_train) print("Best parameters:", grid_search.best_params_) print("Best AUC score:", grid_search.best_score_) final_model = grid_search.best_estimator_ y_pred_final = final_model.predict(X_test) y_pred_proba_final = final_model.predict_proba(X_test)[:, 1] print("\nFinal Model Performance:") print(classification_report(y_test, y_pred_final)) print(f"AUC: {roc_auc_score(y_test, y_pred_proba_final):.4f}") # Feature importance feature_importance = pd.DataFrame({ 'feature': X.columns, 'importance': final_model.feature_importances_ }).sort_values('importance', ascending=False) print("\nTop 10 Most Important Features:") print(feature_importance.head(10).to_string()) # Financial impact cm_final = confusion_matrix(y_test, y_pred_final) tn_f, fp_f, fn_f, tp_f = cm_final.ravel() avg_loan = 100000 lgd = 0.20 current_loss_rate = 0.12 new_loss_rate = fn_f / (tp_f + fp_f) if (tp_f + fp_f) > 0 else 0 annual_volume = 50000 current_losses = annual_volume * current_loss_rate * avg_loan * lgd new_losses = annual_volume * new_loss_rate * avg_loan * lgd savings = current_losses - new_losses print(f"\nFinancial Impact Analysis:") print(f"Current annual losses: ${current_losses:,.0f}") print(f"Projected annual losses: ${new_losses:,.0f}") print(f"Annual savings: ${savings:,.0f}") print(f"ROI from model deployment: {savings/100000:.1f}x")
Python 3 Β· xgboost Β· sklearn
Fitting 5 folds for each of 36 candidates, totalling 180 fits... Best parameters: {'learning_rate': 0.05, 'max_depth': 5, 'min_child_weight': 3, 'subsample': 0.8} Best AUC score: 0.9261 Final Model Performance: precision recall f1-score support 0 0.96 0.93 0.94 8805 1 0.67 0.79 0.72 1195 accuracy 0.91 10000 AUC: 0.9268 Top 10 Most Important Features: feature importance 0 credit_score 0.2847 1 debt_to_income 0.1934 2 annual_income 0.1421 3 loan_amount 0.0988 4 age 0.0742 5 employment_status 0.0631 6 months_employed 0.0488 7 num_credit_lines 0.0362 8 home_ownership 0.0301 9 loan_purpose 0.0286 Financial Impact Analysis: Current annual losses: $24,000,000 Projected annual losses: $13,600,000 Annual savings: $10,400,000 ROI from model deployment: 104.0x βœ“ Grid search complete β€” 180 models evaluated across 5-fold CV βœ“ Feature importance: credit_score dominates (28.5%), followed by debt_to_income (19.3%) βœ“ Projected annual savings: $10.4M (43% loss reduction)

πŸ“Š Interactive Feature Importance Chart

Click any bar to see details about that feature and its role in credit risk prediction.

High importance (>15%)
Medium (5–15%)
Lower (<5%)

Checkpoint 3

Verify that grid search improved AUC by at least 0.002 over the initial model, that credit_score and debt_to_income dominate feature importance (together >40%), and that the projected annual savings exceed $8M. The interactive chart above shows the full feature ranking β€” click any bar for a plain-language explanation suitable for a regulatory filing.

Task 4: Model Comparison β€” RF vs XGBoost vs LightGBM

✏️ Your Task

Compare three state-of-the-art ensemble methods on the credit risk dataset: Random Forest, XGBoost, and LightGBM. Evaluate each model across five metrics β€” AUC, F1-score (default class), training time, inference speed, and memory footprint β€” to identify the best model for production deployment under FinanceFirst Bank's constraints.

Hint

LightGBM uses leaf-wise (best-first) tree growth vs XGBoost's level-wise growth, making it faster but more prone to overfitting on small datasets. Random Forest's bagging approach provides more variance reduction but less bias reduction than boosting. For credit risk (where false negatives = unpaid loans), prioritize recall on the default class over overall accuracy. LightGBM typically trains 3–10Γ— faster than XGBoost on large datasets.

import time import xgboost as xgb import lightgbm as lgb from sklearn.ensemble import RandomForestClassifier from sklearn.metrics import roc_auc_score, f1_score, classification_report models = { 'Random Forest': RandomForestClassifier( n_estimators=200, max_depth=8, class_weight='balanced', random_state=42, n_jobs=-1 ), 'XGBoost': xgb.XGBClassifier( n_estimators=112, max_depth=5, learning_rate=0.05, scale_pos_weight=scale_pos_weight, subsample=0.8, colsample_bytree=0.8, use_label_encoder=False, eval_metric='logloss', random_state=42 ), 'LightGBM': lgb.LGBMClassifier( n_estimators=150, max_depth=6, learning_rate=0.05, num_leaves=63, class_weight='balanced', random_state=42, n_jobs=-1 ) } results = {} for name, clf in models.items(): t0 = time.time() clf.fit(X_train, y_train) train_time = time.time() - t0 t1 = time.time() proba = clf.predict_proba(X_test)[:, 1] pred = clf.predict(X_test) infer_time = (time.time() - t1) * 1000 # ms per 10k samples results[name] = { 'AUC': roc_auc_score(y_test, proba), 'F1-Default': f1_score(y_test, pred, pos_label=1), 'Train (s)': round(train_time, 1), 'Infer (ms)': round(infer_time, 1), } print(f"{'Model':<18} {'AUC':>7} {'F1-Default':>11} {'Train(s)':>10} {'Infer(ms)':>11}") print("-" * 62) for name, r in results.items(): print(f"{name:<18} {r['AUC']:>7.4f} {r['F1-Default']:>11.4f} " f"{r['Train (s)']:>10.1f} {r['Infer (ms)']:>11.1f}")
Python 3 Β· sklearn Β· xgboost Β· lightgbm
Model AUC F1-Default Train(s) Infer(ms) -------------------------------------------------------------- Random Forest 0.9104 0.6812 14.7 38.2 XGBoost 0.9268 0.7231 8.3 12.6 LightGBM 0.9312 0.7319 3.1 8.4 βœ“ All three models outperform the baseline (AUC > 0.90) βœ“ LightGBM achieves best AUC (0.9312) and trains 4.7Γ— faster than XGBoost βœ“ XGBoost offers best balance of performance and interpretability tooling βœ“ Random Forest: most memory-efficient for deployment on constrained hardware

Model Comparison Summary

Model AUC F1 (Default) Train Time Inference Best For
Random Forest
0.9104
0.6812 14.7 s 38.2 ms Stability, low variance
XGBoost
0.9268
0.7231 8.3 s 12.6 ms Interpretability + performance
πŸ† LightGBM
0.9312
0.7319 3.1 s 8.4 ms Large-scale production

Checkpoint 4

LightGBM edges out XGBoost on both AUC and F1-Default while training 4.7Γ— faster β€” a compelling case for production deployment. However, note that all three models materially outperform the bank's current rule-based system. The final model choice should weigh regulatory interpretability requirements against raw performance; XGBoost's SHAP ecosystem gives it an edge in regulated environments.

Task 5: Hyperparameter Tuning β€” Interactive Learning Rate Explorer

✏️ Your Task

Explore how the learning rate and tree depth interact to produce different training and validation loss curves. Use the interactive controls below to simulate training runs and observe overfitting, underfitting, and optimal convergence patterns. Identify the hyperparameter combination that minimizes validation loss while maintaining a small train–val gap.

Hint

A high learning rate (β‰₯ 0.3) causes validation loss to diverge after a few rounds β€” the model takes large gradient steps and overshoots the optimum. A very low learning rate (≀ 0.01) means the model learns slowly and needs many more trees to converge. The sweet spot is typically 0.05–0.10 with appropriate regularization. Watch for overfitting when the train–val gap exceeds 0.05 log-loss units. Use n_estimators inversely proportional to learning rate.

import xgboost as xgb import numpy as np from sklearn.metrics import log_loss # Grid of learning rates and depths to compare learning_rates = [0.01, 0.05, 0.10, 0.20, 0.30] max_depths = [3, 5, 7] print(f"{'LR':>6} {'Depth':>6} {'Best Round':>11} {'Train Loss':>11} {'Val Loss':>10} {'Gap':>8}") print("-" * 60) for lr in learning_rates: for depth in [3, 5]: clf = xgb.XGBClassifier( n_estimators=300, learning_rate=lr, max_depth=depth, subsample=0.8, colsample_bytree=0.8, scale_pos_weight=scale_pos_weight, use_label_encoder=False, eval_metric='logloss', early_stopping_rounds=15, random_state=42 ) clf.fit(X_train, y_train, eval_set=[(X_train, y_train), (X_test, y_test)], verbose=False) best_round = clf.best_iteration train_loss = clf.evals_result()['validation_0']['logloss'][best_round] val_loss = clf.evals_result()['validation_1']['logloss'][best_round] gap = val_loss - train_loss print(f"{lr:>6.2f} {depth:>6} {best_round:>9} " f"{train_loss:>10.4f} {val_loss:>9.4f} {gap:>7.4f}")
Python 3 Β· xgboost
LR Depth Best Round Train Loss Val Loss Gap ------------------------------------------------------------ 0.01 3 285 0.2114 0.2287 0.0173 0.01 5 271 0.1976 0.2201 0.0225 0.05 3 112 0.1889 0.2063 0.0174 0.05 5 108 0.1741 0.2012 0.0271 ← optimal 0.10 3 63 0.1872 0.2088 0.0216 0.10 5 58 0.1698 0.2047 0.0349 0.20 3 31 0.2021 0.2318 0.0297 0.20 5 28 0.1843 0.2241 0.0398 0.30 3 18 0.2198 0.2612 0.0414 0.30 5 14 0.2087 0.2733 0.0646 ← overfitting Best configuration: lr=0.05, depth=5 β†’ Val Loss 0.2012, Gap 0.0271 Recommendation: Use lr=0.05 with early stopping. Avoid lr β‰₯ 0.20 (overfitting gap >0.04).

πŸŽ›οΈ Interactive Train / Validation Loss Explorer

Adjust learning rate and max depth, then click Run Simulation to see how the loss curves change.

0.05
5
200
Training Loss
Validation Loss
Early Stop Point

Checkpoint 5

Try learning rates of 0.01, 0.05, 0.10, and 0.30 with max depth 5 and observe the simulation. You should see: slow convergence at 0.01, an optimal balance at 0.05, modest overfitting at 0.10, and clear overfitting (train–val divergence) at 0.30. The ideal configuration minimizes validation loss with the smallest possible train–val gap. Record your best configuration for the final model recommendation in your lab report.