โ† Pipeline Engineering Course Home Next: Enterprise Integration โ†’
ML Frameworks & Applied Analytics ยท Framework 3 of 5

๐Ÿš€ Model Deployment & MLOps

From Jupyter notebook to production system โ€” keeping your model alive, monitored, and profitable

โฑ 90 min ๐ŸŽ“ Business Students ๐Ÿ›  5 Hands-on Labs ๐Ÿ’ฐ $50K/hr Stakes

๐Ÿšจ 3 AM. Your Phone Rings.

"It's 3 AM. Your phone rings. The recommendation model is returning the same product for every user. Revenue is dropping $50,000 per hour. What do you do?"

-$0
Revenue lost since incident began
Incident started 0 minutes ago

This isn't a hypothetical. It happened to a major e-commerce company in 2022. The root cause? A Docker container that worked perfectly on every engineer's laptop โ€” but silently failed when the base image was updated in production.

Today you'll learn to prevent incidents, detect them faster, and recover in minutes โ€” not hours.

๐Ÿ—บ Your Learning Journey โ€” 5 MLOps Pillars

1
2
3
4
5
Containerization Model Serving A/B Testing Monitoring Rollback
5%
Quiz score: 0 / 5

๐Ÿ“ฆ Pillar 1: Containerization for ML (Docker)

A container packages your model, code, and exact environment into one portable unit. If it runs in your container, it runs in production. That's the promise of Docker โ€” and the cure for "works on my machine."

Without Docker ๐Ÿ˜ฑ

  • Python 3.8 locally โ†’ Python 3.11 in prod
  • scikit-learn 1.0 โ†’ 1.3 in prod
  • No GPU driver on prod server
  • Missing system library libgomp
  • Result: 3 AM phone call

With Docker โœ…

  • Exact Python version locked
  • Exact library versions pinned
  • All system dependencies included
  • Runs identically everywhere
  • Result: peaceful sleep

๐Ÿ”จ Lab 1: Build Your Dockerfile Layer by Layer

Add each layer by clicking the buttons below. Watch the Dockerfile assemble โ€” and see how image size grows with each addition.

Dockerfile Image size: 0 MB
Image Size0 MB / 800 MB budget

๐Ÿงฉ Quiz 1: The Production Gotcha

Your container works perfectly locally but crashes immediately in production. The error says: libgomp.so.1: cannot open shared object file. What's most likely the cause?

A) Your Python code has a bug that only appears at scale
B) The production server has too little RAM
C) Your Dockerfile uses a different base image than production (missing system library)
D) You forgot to push the latest model file

โšก Pillar 2: Model Serving Patterns

Not all inference is equal. Choosing the wrong serving pattern can mean paying 10ร— more for the same results โ€” or missing your latency SLA entirely.

๐Ÿ“ฆ Batch
โšก Real-time
๐ŸŒŠ Streaming

Batch Inference

Process large volumes of data at scheduled intervals (hourly, nightly). Results are pre-computed and stored.

  • Latency: Minutes to hours (acceptable)
  • Throughput: Very high (millions of rows)
  • Cost: Low (run only when needed)
  • Use when: Results don't need to be instant
๐ŸŽฌ Netflix
Batch Serving
Recommendations computed nightly for all 260M subscribers. By morning, your homepage is ready instantly. Realtime computation would cost 100ร— more.

Real-time Inference

Model responds to each request within milliseconds. Results are computed on demand.

  • Latency: < 100ms (tight SLA)
  • Throughput: Medium (requests/sec)
  • Cost: High (always-on infrastructure)
  • Use when: User is waiting for the answer
๐Ÿ” Google Search
Real-time Serving
Every search query triggers 200+ ML models in <200ms โ€” spam detection, query understanding, result ranking. You can't pre-compute "what will people search?"

Streaming Inference

Continuously process data streams as events arrive in near-real-time (seconds, not hours).

  • Latency: 1โ€“10 seconds
  • Throughput: Very high (event streams)
  • Cost: Medium (always-on + scale)
  • Use when: Data arrives continuously, decisions needed fast
๐Ÿš— Uber
Streaming Serving
Surge pricing model ingests real-time GPS data from millions of drivers and riders. Prices update every few seconds. Too slow โ†’ wrong price. Too fast โ†’ unstable UI.

๐Ÿ“Š Lab 2: Latency vs Throughput Simulator

Adjust batch size and see how latency and throughput change. Find the sweet spot for your use case.

1 (real-time)128512 (batch)
8 ms
P99 Latency
๐ŸŸข Excellent
125 req/s
Throughput
๐Ÿ“ˆ Good
$0.08
Cost per 1K req
๐Ÿ’ฐ High
โœ…
SLA Met (<200ms)
Pattern: Real-time

๐ŸŽฏ Which Pattern Fits Your Use Case?

Use CaseRecommended PatternKey Reason
Credit card fraud detectionReal-timeTransaction must be approved/denied in milliseconds
Weekly sales forecastBatchResults consumed next morning; high volume
Ride-share surge pricingStreamingDriver/rider GPS updates every few seconds
Email spam filterReal-timeUser expects immediate delivery or block
Product recommendations (homepage)BatchPre-compute for all users nightly
Social media content moderationStreamingPosts arrive continuously; hours is too slow

๐Ÿงฉ Quiz 2: Serving Pattern Choice

A hospital wants to flag patients at high risk of sepsis. The model analyzes vital signs (updated every 5 minutes). Which serving pattern is most appropriate?

A) Batch โ€” run the model nightly for all patients
B) Streaming โ€” process vital sign updates as they arrive in near-real-time
C) Real-time API โ€” wait for a doctor to request a prediction
D) No ML needed โ€” use rule-based thresholds only

๐ŸŽฒ Pillar 3: A/B Testing & Canary Deployments

Never deploy a new model to 100% of users at once. Split the risk. Measure the impact. Let data โ€” not opinions โ€” decide if the new model is better.

๐Ÿงช Lab 3: Run Your Own A/B Test

Simulate an A/B test comparing your current model (control) against a new model (treatment). Adjust parameters and watch statistical significance emerge.

1005,00010,000
-3%0%+10%
โ€”
Control CVR
Model A (current)
โ€”
Treatment CVR
Model B (new)
โ€”
p-value
Run test first
โ€”
Statistical Power
Run test first

๐Ÿค Lab 3b: Canary Deployment โ€” Gradual Traffic Shift

A canary deployment gradually shifts traffic to the new model. If metrics degrade at any stage, you roll back โ€” only a fraction of users are affected.

Traffic Split: 0% โ†’ New Model  |  100% โ†’ Current Model
v1 (current)
0.1%
Error Rate
๐ŸŸข Normal
82ms
P99 Latency
๐ŸŸข Normal
3.2%
Conversion Rate
๐ŸŸข Baseline
0
Affected Users
Risk: None
Set canary percentage to begin gradual rollout.

๐Ÿงฉ Quiz 3: The Multiple Testing Trap

You ran an A/B test. Your new recommendation model shows a 2% conversion lift with p = 0.04. Your team is excited. Do you ship it?

A) Yes โ€” p < 0.05, it's statistically significant. Ship immediately.
B) Yes โ€” 2% lift is huge business value regardless of p-value.
C) Not yet โ€” ask: Was this the only metric tested? Did we peek early? Multiple testing inflates false positives.
D) No โ€” p < 0.05 is not significant enough for production deployment.

๐Ÿ“ก Pillar 4: Model Monitoring & Drift Detection

A model that was 92% accurate at launch might be 71% accurate 6 months later โ€” without a single line of code changing. Why? The world changed. Your model didn't.

Data Drift

Input feature distribution changes. Customers who used to be 25โ€“34 years old are now predominantly 45โ€“54.

PSIKS TestJensen-Shannon

Concept Drift

The relationship between features and outcome changes. A "good credit" score meant something different before vs. after a recession.

Performance metricsLabel shift

๐Ÿ“Š Lab 4: Live Monitoring Dashboard

This simulates a real production monitoring dashboard. Click "Start Monitoring" and watch for anomalies โ€” then diagnose and respond.

Simulated time: T+0h
92.1%
Model Accuracy
๐ŸŸข Healthy
0.04
PSI Score
๐ŸŸข No drift
0.06
KS Statistic
๐ŸŸข Stable
87ms
Avg Latency
๐ŸŸข Normal

๐Ÿ”” Alert Thresholds

%
ms
[System] Alert monitor initialized. Thresholds loaded.

๐Ÿš‘ Incident Response: Step Through the Process

Click each step to advance through a real incident response. This is the process your on-call engineer follows at 3 AM.

๐Ÿ”

1. Detect

PagerDuty alert fires. Accuracy dropped from 92% โ†’ 61%. Drift PSI = 0.31.

โ†“
๐Ÿฉบ

2. Diagnose

Is it data drift, concept drift, or infrastructure failure? Check feature distributions vs. training data.

โ†“
๐Ÿ› 

3. Mitigate

Option A: Rollback to v1 (immediate). Option B: Hotfix input preprocessing. Option C: Emergency retrain.

โ†“
โœ…

4. Verify

Monitor for 30 min post-fix. Confirm accuracy recovered. Check no new alerts.

โ†“
๐Ÿ“

5. Postmortem

Write incident report: root cause, timeline, what broke, what was missing in monitoring, prevention plan.

๐Ÿงฉ Quiz 4: Drift Detection

Your fraud detection model's accuracy is still 91% (same as launch), but fraud losses have increased 40% over 6 months. What type of drift is most likely occurring?

A) Data drift โ€” input features have shifted distribution
B) Concept drift โ€” fraudsters adapted to your model; accuracy metric is misleading because fraud rate itself changed
C) Infrastructure drift โ€” latency increased, causing more fraud
D) No drift โ€” 91% accuracy proves the model is working fine

โช Pillar 5: Rollback Strategies

Every deployment needs an escape hatch. The fastest fix is almost always rolling back to the last known-good model version โ€” not debugging at 3 AM.

8 min
Average time to rollback with a good MLOps pipeline. vs. 4+ hours of debugging without one.

๐Ÿ”„ Lab 5: Practice a Model Rollback

Walk through a simulated rollback. Your model is failing โ€” you need to detect the issue, switch versions, and verify recovery.

[MLOps Dashboard] System nominal. Model v2.3 serving 100% traffic.
92.1%
Accuracy
๐ŸŸข Normal
0.1%
Error Rate
๐ŸŸข Normal
v2.3
Active Version
Current deployment
$0
Revenue Loss
Saved vs. no-action

๐ŸŒณ Decision Tree: Rollback vs Hotfix vs Retrain?

Use this framework when an incident occurs. Click to explore the decision path.

โ“ Model performance is degrading in production. What now?
โ†“

๐Ÿ“‹ Rollback vs Hotfix vs Retrain โ€” Quick Reference

OptionWhen to UseTime to ExecuteRisk
RollbackNew model broke something; previous version was goodMinutesLow
HotfixInfrastructure/data pipeline issue, not model logic30 min โ€“ 2 hrsMedium
RetrainData drift, world has changed, no good previous versionHours to daysHigh

๐Ÿงฉ Quiz 5: The Right Response

At 2 PM on Black Friday, your pricing model starts returning $0.00 for all products. Error rate spikes to 95%. Your last working deployment was 3 hours ago. What do you do FIRST?

A) Start investigating the root cause โ€” understand the bug before acting
B) Retrain the model with today's data to fix the issue
C) Immediately rollback to the version from 3 hours ago, then investigate
D) Disable the ML model and use manual pricing rules

๐ŸŽฏ MLOps Mastery โ€” Your Incident Survival Kit

๐Ÿ“ฆ
Containerize
Lock your environment. Docker prevents "works on my machine."
โšก
Right Serving
Match batch/real-time/streaming to your latency requirements.
๐ŸŽฒ
Test & Canary
Never deploy to 100%. Measure first. Let data decide.
๐Ÿ“ก
Monitor Drift
Your model decays. Watch PSI, KS, accuracy continuously.
โช
Rollback Fast
Every minute costs $800+. Rollback first, debug second.
โ€”
Your final quiz score ยท

Coming up next: Framework 4 โ€” Enterprise ML Integration: connecting models to business systems, governance, and organizational change management.

Next: Enterprise Integration โ†’ โ† Review Pipeline Engineering
ML Frameworks & Applied Analytics ยท Framework 3 of 5 ยท Chenhao Zhou, Rutgers Business School