From Jupyter notebooks to production-grade ML systems that actually run.
"A data scientist at your company built a model in Jupyter that achieves 95% accuracy. 'Ship it!' says the VP of Product. You know this won't end well. Here's why โ and how to fix it."
Jupyter notebooks are fantastic for exploration. They're terrible for production. Here's an actual notebook from a real project โ can you spot what's wrong?
This notebook achieved 95% accuracy. Click any cell to inspect it. Find the 5 production problems hidden inside.
๐ก Click on highlighted cells (marked PROBLEM) to identify the issues.
Select all 5 production problems you found. Check each one that applies:
/Users/john/Desktop/...)
A pipeline is a directed acyclic graph (DAG) of tasks with defined dependencies. Tools like Apache Airflow Prefect Dagster manage these workflows at scale.
Click each pipeline stage to learn what it does. The arrows show data flow and dependencies.
Notice: Validate and Transform run in parallel after Extract. This is a key optimization โ independent tasks run simultaneously, cutting pipeline time.
with DAG('ml_pipeline') as dag:
extract = PythonOperator(...)
validate = PythonOperator(...)
train = PythonOperator(...)
extract >> [validate, transform]
[validate, transform] >> load
load >> [train, evaluate]
In the DAG above, the Deploy node should only run when:
Raw data is rarely model-ready. Feature engineering transforms it into representations that capture the signal your model needs. The key insight: feature engineering should live inside a reproducible pipeline, not scattered across notebook cells.
Watch how raw customer data is transformed step by step. Click each stage to apply the transformation.
Drag the slider to see how feature engineering investment affects accuracy. Based on real ML project data.
Complete the pipeline below. Replace # YOUR CODE HERE with a StandardScaler() step, then click Run.
What is the most important production benefit of wrapping preprocessing in a Pipeline object?
Models decay. The world changes โ customer behavior shifts, economic conditions evolve, new product lines launch. A model trained on last year's data becomes less accurate over time. This is called concept drift.
This chart shows model accuracy over 12 months. Use the controls to simulate a retraining event and explore different drift patterns.
Compare the three main retraining strategies. Click each to see pros and cons.
Retrain every N days, regardless of performance
Retrain when accuracy drops below threshold
Monitor input data distribution; retrain when it shifts
Your churn model's accuracy dropped from 92% to 87% this month. What should you check first?
Would you deploy software without testing it? ML pipelines need tests too โ but they're different from regular software tests. You test data, transformations, model behavior, and end-to-end system behavior.
Click each layer to see what to test at that level.
This is a Great Expectations-style data validation suite. Edit and run to see validation results.
Your data validation suite detects that 3% of incoming records have age = -1 (clearly invalid). Your pipeline is set to fail on any validation error. What should you do?
What you learned:
Key tools to know: