From Jupyter notebook to production system โ keeping your model alive, monitored, and profitable
"It's 3 AM. Your phone rings. The recommendation model is returning the same product for every user. Revenue is dropping $50,000 per hour. What do you do?"
This isn't a hypothetical. It happened to a major e-commerce company in 2022. The root cause? A Docker container that worked perfectly on every engineer's laptop โ but silently failed when the base image was updated in production.
Today you'll learn to prevent incidents, detect them faster, and recover in minutes โ not hours.
A container packages your model, code, and exact environment into one portable unit. If it runs in your container, it runs in production. That's the promise of Docker โ and the cure for "works on my machine."
libgompAdd each layer by clicking the buttons below. Watch the Dockerfile assemble โ and see how image size grows with each addition.
Your container works perfectly locally but crashes immediately in production. The error says: libgomp.so.1: cannot open shared object file. What's most likely the cause?
Not all inference is equal. Choosing the wrong serving pattern can mean paying 10ร more for the same results โ or missing your latency SLA entirely.
Process large volumes of data at scheduled intervals (hourly, nightly). Results are pre-computed and stored.
Model responds to each request within milliseconds. Results are computed on demand.
Continuously process data streams as events arrive in near-real-time (seconds, not hours).
Adjust batch size and see how latency and throughput change. Find the sweet spot for your use case.
| Use Case | Recommended Pattern | Key Reason |
|---|---|---|
| Credit card fraud detection | Real-time | Transaction must be approved/denied in milliseconds |
| Weekly sales forecast | Batch | Results consumed next morning; high volume |
| Ride-share surge pricing | Streaming | Driver/rider GPS updates every few seconds |
| Email spam filter | Real-time | User expects immediate delivery or block |
| Product recommendations (homepage) | Batch | Pre-compute for all users nightly |
| Social media content moderation | Streaming | Posts arrive continuously; hours is too slow |
A hospital wants to flag patients at high risk of sepsis. The model analyzes vital signs (updated every 5 minutes). Which serving pattern is most appropriate?
Never deploy a new model to 100% of users at once. Split the risk. Measure the impact. Let data โ not opinions โ decide if the new model is better.
Simulate an A/B test comparing your current model (control) against a new model (treatment). Adjust parameters and watch statistical significance emerge.
A canary deployment gradually shifts traffic to the new model. If metrics degrade at any stage, you roll back โ only a fraction of users are affected.
You ran an A/B test. Your new recommendation model shows a 2% conversion lift with p = 0.04. Your team is excited. Do you ship it?
A model that was 92% accurate at launch might be 71% accurate 6 months later โ without a single line of code changing. Why? The world changed. Your model didn't.
Input feature distribution changes. Customers who used to be 25โ34 years old are now predominantly 45โ54.
PSIKS TestJensen-ShannonThe relationship between features and outcome changes. A "good credit" score meant something different before vs. after a recession.
Performance metricsLabel shiftThis simulates a real production monitoring dashboard. Click "Start Monitoring" and watch for anomalies โ then diagnose and respond.
Click each step to advance through a real incident response. This is the process your on-call engineer follows at 3 AM.
PagerDuty alert fires. Accuracy dropped from 92% โ 61%. Drift PSI = 0.31.
Is it data drift, concept drift, or infrastructure failure? Check feature distributions vs. training data.
Option A: Rollback to v1 (immediate). Option B: Hotfix input preprocessing. Option C: Emergency retrain.
Monitor for 30 min post-fix. Confirm accuracy recovered. Check no new alerts.
Write incident report: root cause, timeline, what broke, what was missing in monitoring, prevention plan.
Your fraud detection model's accuracy is still 91% (same as launch), but fraud losses have increased 40% over 6 months. What type of drift is most likely occurring?
Every deployment needs an escape hatch. The fastest fix is almost always rolling back to the last known-good model version โ not debugging at 3 AM.
Walk through a simulated rollback. Your model is failing โ you need to detect the issue, switch versions, and verify recovery.
Use this framework when an incident occurs. Click to explore the decision path.
| Option | When to Use | Time to Execute | Risk |
|---|---|---|---|
| Rollback | New model broke something; previous version was good | Minutes | Low |
| Hotfix | Infrastructure/data pipeline issue, not model logic | 30 min โ 2 hrs | Medium |
| Retrain | Data drift, world has changed, no good previous version | Hours to days | High |
At 2 PM on Black Friday, your pricing model starts returning $0.00 for all products. Error rate spikes to 95%. Your last working deployment was 3 hours ago. What do you do FIRST?
Coming up next: Framework 4 โ Enterprise ML Integration: connecting models to business systems, governance, and organizational change management.