Module 2: Linear Regression

🎯 Day 1 at DataCorp

Your first week as a junior analyst at DataCorp. The VP of Operations drops a spreadsheet on your desk: 18 months of advertising spend vs. monthly revenue. "We're burning $4M a quarter on ads," she says. "Tell me which campaigns are actually moving the needle — and predict next quarter's revenue. I need the answer by Friday."

You open the file. 216 rows of numbers. No model. No baseline. Just data and a deadline.

This is the moment linear regression was invented for.

📊 Step 2: Finding the Best Line (OLS)

Ordinary Least Squares (OLS) finds the line that minimizes the sum of squared vertical distances from each point to the line. It has a closed-form solution: β₁ = Cov(X,Y) / Var(X) and β₀ = ȳ − β₁x̄.

🔬 Interactive Regression Lab

Adjust the controls below. The regression line and all metrics are computed in real-time using actual OLS formulas — nothing is hardcoded.

Sample Size: 40

Noise Level: 0.3

Outliers: 0

—

R² (Variance Explained)

—

RMSE

—

MAE

—

Slope β₁

—

Intercept β₀

OLS Formulas (no hardcoding — computed from your data)

β₁ = Σ(xᵢ − x̄)(yᵢ − ȳ) / Σ(xᵢ − x̄)² | β₀ = ȳ − β₁x̄ | R² = 1 − SSres/SStot

❓ Quick Check: What Happens to R² When You Add Noise?

You're adding noise via the slider above. What do you expect to happen to R²?

R² stays the same — it's robust to noise

R² decreases — noise increases unexplained variance

R² increases — more data variance means better fit

R² becomes negative — the line flips direction

⛰️ Step 3: Gradient Descent — Sliding Down the Loss Mountain

OLS gives us a closed-form solution. But what if we have millions of features and can't invert a matrix? Gradient descent iteratively nudges the parameters downhill on the loss surface until it converges. Same destination, different path.

🎮 Loss Landscape Visualization

The heatmap below shows Mean Squared Error for different values of β₀ and β₁. The red dot is the current parameter estimate. Press Animate to watch gradient descent find the minimum.

Learning Rate: 0.05

Iterations

—

Current MSE

—

β₀ estimate

—

β₁ estimate

❓ Learning Rate Intuition

What happens if you set the learning rate too high?

Convergence is faster and more stable

The algorithm overshoots and may diverge

Nothing — the algorithm auto-adjusts

R² automatically increases

⚠️ Step 4: The Overfitting Trap

Overfitting means your model memorizes the training data instead of learning the pattern. It performs great in-sample but collapses on new data. A polynomial of degree 15 fits 20 training points perfectly — but predicts garbage for new inputs.

🎓 Train/Test Split Demo

Drag the polynomial degree slider. Watch training R² climb toward 1.0 as overfitting kicks in — but notice the test R² plummet.

Polynomial Degree: 1

Train Split: 70%

—

Train R²

—

Test R²

—

Train RMSE

—

Test RMSE

Model Degree

💻 Write Your Prediction Function

Complete the Python function below to predict revenue given ad spend. Use the OLS formulas from Step 2.

# prediction.py — DataCorp Revenue Forecaster

def predict_revenue(ad_spend, beta0, beta1):
    """
    Predict monthly revenue from ad spend using linear regression.
    
    Args:
        ad_spend: Monthly advertising spend ($M)
        beta0: Intercept term
        beta1: Slope (revenue per $M ad spend)
    
    Returns:
        Predicted revenue ($M)
    """
    # TODO: Implement the linear regression prediction
    prediction = beta0 + beta1 * ad_spend
    return prediction

# Test your function with DataCorp's fitted coefficients
beta0 = 12.4   # baseline revenue ($M)
beta1 = 3.7    # revenue per $M of advertising

for spend in [1.0, 2.5, 4.0, 5.5]:
    rev = predict_revenue(spend, beta0, beta1)
    print(f"  Ad Spend: ${spend}M  →  Predicted Revenue: ${rev:.1f}M")

🏆 Step 5: Business Impact

🎉 Your model is deployed to DataCorp's inventory planning system

projected annual savings in inventory costs

Your linear regression model reduced forecast error by 34%, enabling DataCorp to cut overstock by $1.1M and eliminate stockouts worth $1.2M in lost sales — a combined $2.3M annual impact.

📋 Model Summary Report for the VP

Metric	Value	Interpretation
R²	—	Share of revenue variance explained by ad spend
RMSE	—	Typical prediction error in revenue units
Slope β₁	—	Revenue increase per $1M extra ad spend
Sample Size	216 monthly observations	18 months × 12 regional markets

The VP approves the model for Q3 planning. You've just delivered a $2.3M result in your first week.

🎓 What You Learned

OLS finds the unique line minimizing sum of squared residuals — with a closed-form solution
R² measures explained variance; RMSE and MAE measure prediction error
Gradient descent approximates the same minimum iteratively — essential for large-scale problems
Overfitting happens when model complexity exceeds what the data can support
Always evaluate on held-out test data — training error is an optimistic lie

🔬 Go to Lab — Apply on Real Housing Data → ← Course Home

📈 Module 2: Linear Regression

Your Story Progress at DataCorp

🎯 Day 1 at DataCorp

📊 Step 2: Finding the Best Line (OLS)

🔬 Interactive Regression Lab

❓ Quick Check: What Happens to R² When You Add Noise?

⛰️ Step 3: Gradient Descent — Sliding Down the Loss Mountain

🎮 Loss Landscape Visualization

❓ Learning Rate Intuition

⚠️ Step 4: The Overfitting Trap

🎓 Train/Test Split Demo

💻 Write Your Prediction Function

🏆 Step 5: Business Impact

📋 Model Summary Report for the VP

🎓 What You Learned