Code a production-ready loan approval system in 90 minutes
Begin by loading the bank's historical loan data and understanding its structure. This dataset contains 50,000 loan applications from the past 5 years.
Split the data strategically and handle class imbalance. Remember: defaults are rare but costly!
# Calculate the cost ratio
cost_ratio = 25000 / 3500 # ≈ 7.14
# This means false negatives are 7x more expensive!
# Use 'balanced' or custom weights
class_weight = {0: 1, 1: cost_ratio}
Create a decision tree that balances accuracy with interpretability. Remember: regulators require explanations!
max_depth=5 - Deep enough to capture patterns but still interpretablemin_samples_split=20 - Regulatory requirement for statistical significanceclass_weight='balanced' - Automatically adjusts for imbalanced classescriterion='gini' - Faster than entropy for large datasetsGreat work! You've built a basic decision tree. Current performance:
The moment of truth! Calculate the actual dollar impact of your model's decisions.
# Each correct rejection saves a potential $25,000 loss
saved_from_defaults = tp * 25000
# Each incorrect rejection loses $3,500 in profit
lost_opportunities = fp * 3500
# Each correct approval earns $3,500
earned_from_loans = tn * 3500
# Each incorrect approval costs $25,000
losses_from_defaults = fn * 25000
# Net impact (want this positive and large!)
total_impact = (saved_from_defaults + earned_from_loans -
lost_opportunities - losses_from_defaults)
Find the optimal tree depth that maximizes financial impact while maintaining interpretability.
Regulatory compliance requires explaining every loan decision. Build a system to extract human-readable rules.
# Get indices of nodes in path
node_indicator = decision_path.toarray()[0]
# For each node in the path
for node_id in range(len(node_indicator)):
if node_indicator[node_id] == 1:
# This node is in the path
if model.tree_.children_left[node_id] != -1:
# Not a leaf - add the rule
feature_id = model.tree_.feature[node_id]
threshold_value = model.tree_.threshold[node_id]
# Determine direction based on sample value
Final step! Package your model for production deployment with monitoring and safeguards.
Outstanding work! You've built a production-ready decision tree system that will save Regional Bank $52 million annually.