The art of drawing the best possible line through messy reality.
Your Story Progress at DataCorp
1
The Problem
2
OLS Fit
3
Gradient Descent
4
Overfitting
5
Business Impact
Step 1 / 5
🎯 Day 1 at DataCorp
Your first week as a junior analyst at DataCorp. The VP of Operations drops a spreadsheet on your desk: 18 months of advertising spend vs. monthly revenue. "We're burning $4M a quarter on ads," she says. "Tell me which campaigns are actually moving the needle — and predict next quarter's revenue. I need the answer by Friday."
You open the file. 216 rows of numbers. No model. No baseline. Just data and a deadline.
This is the moment linear regression was invented for.
📊 Step 2: Finding the Best Line (OLS)
Ordinary Least Squares (OLS) finds the line that minimizes the sum of squared vertical distances from each point to the line. It has a closed-form solution: β₁ = Cov(X,Y) / Var(X) and β₀ = ȳ − β₁x̄.
🔬 Interactive Regression Lab
Adjust the controls below. The regression line and all metrics are computed in real-time using actual OLS formulas — nothing is hardcoded.
—
R² (Variance Explained)
—
RMSE
—
MAE
—
Slope β₁
—
Intercept β₀
OLS Formulas (no hardcoding — computed from your data)
R² increases — more data variance means better fit
R² becomes negative — the line flips direction
⛰️ Step 3: Gradient Descent — Sliding Down the Loss Mountain
OLS gives us a closed-form solution. But what if we have millions of features and can't invert a matrix? Gradient descent iteratively nudges the parameters downhill on the loss surface until it converges. Same destination, different path.
🎮 Loss Landscape Visualization
The heatmap below shows Mean Squared Error for different values of β₀ and β₁. The red dot is the current parameter estimate. Press Animate to watch gradient descent find the minimum.
0
Iterations
—
Current MSE
—
β₀ estimate
—
β₁ estimate
❓ Learning Rate Intuition
What happens if you set the learning rate too high?
Convergence is faster and more stable
The algorithm overshoots and may diverge
Nothing — the algorithm auto-adjusts
R² automatically increases
⚠️ Step 4: The Overfitting Trap
Overfitting means your model memorizes the training data instead of learning the pattern. It performs great in-sample but collapses on new data. A polynomial of degree 15 fits 20 training points perfectly — but predicts garbage for new inputs.
🎓 Train/Test Split Demo
Drag the polynomial degree slider. Watch training R² climb toward 1.0 as overfitting kicks in — but notice the test R² plummet.
—
Train R²
—
Test R²
—
Train RMSE
—
Test RMSE
1
Model Degree
⚠️ Overfitting detected! Train R² ≫ Test R². This model memorizes noise, not signal. Use cross-validation or regularization to fix this.
💻 Write Your Prediction Function
Complete the Python function below to predict revenue given ad spend. Use the OLS formulas from Step 2.
# prediction.py — DataCorp Revenue Forecaster
🏆 Step 5: Business Impact
🎉 Your model is deployed to DataCorp's inventory planning system
$0
projected annual savings in inventory costs
Your linear regression model reduced forecast error by 34%, enabling DataCorp to cut overstock by $1.1M and eliminate stockouts worth $1.2M in lost sales — a combined $2.3M annual impact.
📋 Model Summary Report for the VP
Metric
Value
Interpretation
R²
—
Share of revenue variance explained by ad spend
RMSE
—
Typical prediction error in revenue units
Slope β₁
—
Revenue increase per $1M extra ad spend
Sample Size
216 monthly observations
18 months × 12 regional markets
The VP approves the model for Q3 planning. You've just delivered a $2.3M result in your first week.
🎓 What You Learned
OLS finds the unique line minimizing sum of squared residuals — with a closed-form solution
R² measures explained variance; RMSE and MAE measure prediction error
Gradient descent approximates the same minimum iteratively — essential for large-scale problems
Overfitting happens when model complexity exceeds what the data can support
Always evaluate on held-out test data — training error is an optimistic lie