Building production-grade ML platforms β from monoliths to microservices, feature stores to model registries.
Your company just raised $50M Series B. The CEO walks into your office and asks:
"How do we build an ML platform that scales? We're competing with companies that have 10Γ our team. What's our move?"
You have 6 months and a team of 5. Every architectural decision you make in the next 30 minutes will determine whether you ship on time β or burn through the Series B with nothing to show.
In this module, you'll learn the 4 architectural pillars every production ML system needs. We'll make the same decisions real companies made β and see which ones paid off.
Day 1 decision: Your team needs to ship a fraud detection model in 8 weeks. Do you build everything in one system, or split it into independent services? This choice will shape your codebase for years.
Click on any component to learn what it does and where it belongs. Use the tabs to compare the two architectures.
β Fast to build. One repo, one deploy. Great for small teams & early stage.
βοΈ Each service independently deployable. Scale what needs scaling.
| Deploy time | β‘ 15 min |
| Team size fit | 1β8 engineers |
| Scaling | Scale everything |
| Fault isolation | None |
| Initial cost | $$ Low |
| Tech debt risk | High at scale |
Best for: MVPs, small teams, homogeneous workloads
| Deploy time | π 2β4 hours setup |
| Team size fit | 5β500+ engineers |
| Scaling | Per-service |
| Fault isolation | Excellent |
| Initial cost | $$$$ High infra |
| Tech debt risk | Low long-term |
Best for: Scale-ups, diverse models, high availability
Scenario: You're a Series A startup with 3 ML engineers. You need to ship a recommendation engine in 6 weeks for your flagship product. What do you build?
Netflix is the canonical example of a monolith-to-microservices migration done right. Here's how they did it:
Lesson: They started monolithic deliberately β not by accident. The monolith let them build product intuition before investing in infrastructure.
Week 3: Your fraud detection model is in production. Now the recommendation team wants to use some of the same features β user transaction history, session behavior. Without a feature store, they'll spend 3 months rebuilding pipelines you already built. There's a better way.
How much engineering time do you save as feature reuse increases? Drag the slider to see.
Click "Run Step" to walk through how raw data becomes model-ready features in a production feature store.
Click "Run Next Step" to start the pipeline simulation.
Your recommendation model needs features that must be computed in real-time (e.g., "items the user clicked in the last 5 minutes") AND historical features computed offline (e.g., "user's 30-day purchase history"). What architecture supports both?
Uber's Michelangelo is one of the most influential ML platforms ever built. Here's the problem it solved:
Business impact: Uber's ETA model accuracy improvement (enabled by richer features) is estimated to have reduced driver idle time by 20%, generating ~$300M/year in efficiency gains.
Month 2: Your fraud model is live. It's working great. Then an engineer updates a preprocessing step, retrains the model, and pushes it directly to production. Performance tanks β fraud escapes undetected. You don't know which version is running or how to roll back. Sound familiar?
You deployed a model without versioning. The model starts performing poorly. Which nightmare scenario are you in?
Simulate promoting a model through the development lifecycle. Click on each stage to move the model forward.
Month 4: Your model is solid. Now engineering wants to integrate it into the mobile app. You need to expose it as an API. How you design this API determines latency, reliability, and how easily other teams can use your model.
Different serving patterns have very different latency profiles. Use the controls to simulate request types and see the impact.
π‘ For 1 request: Real-time is best. Use synchronous REST API with <50ms SLA.
Your fraud detection model must return a decision within 200ms while the user is completing a checkout. The model uses 50 features. What serving pattern do you use?
Write a FastAPI prediction endpoint. Your task: implement the /predict endpoint that (1) validates input, (2) loads features, (3) runs inference, and (4) returns a structured response. Click "Run" to test it.
model_version in every response. When predictions change, clients know why.
Complete the quizzes above to see your score!
Based on everything you've learned, here's the playbook for your company's ML platform:
| Month | Decision | Why | Cost |
|---|---|---|---|
| 1β2 | Monolithic MVP | Ship fast, learn what matters | $2K/mo infra |
| 3β4 | Feature Store (Feast) | Second model proves reuse value | 1 eng-month setup |
| 4β5 | Model Registry (MLflow) | Multiple models β governance needed | Open source, free |
| 5β6 | REST APIs (FastAPI) | Product integrations go live | Minimal overhead |
| 6+ | Begin microservices migration | Now you know what to split | 3β6 months effort |
ML Frameworks & Applied Analytics Β· Chenhao Zhou Β· Rutgers Business School
Framework 1 of 6 Β· Teaching Portfolio