RetailMax has 2 million customers but treats them all the same. Every customer gets the same emails, the same promotions, the same recommendations. Result: 12% open rate on emails, $3.2M wasted on irrelevant promotions per year, and customers churning because they feel misunderstood.
Your job: find the hidden groups. Within those 2 million customers are segments with distinct behaviors, values, and needs. Discover them, and you can personalize at scale.
Complete sections to track your progress.
Unsupervised learning โ finding structure when you have no labels
Everything we've done so far was supervised: we had labeled examples (spam/not-spam, price, churn/no-churn) and learned to predict them.
Clustering is different. You have data with no labels. You're exploring: are there natural groups? How many? What defines each group?
Assign each point to the nearest centroid. Iterate until stable. Fast, scalable.
Best for: known # clusters, spherical shapesMerge closest points iteratively. Produces a tree (dendrogram).
Best for: unknown # clusters, nested groupsFind dense regions, label sparse points as noise.
Best for: irregular shapes, outlier detectionThe workhorse of customer segmentation
WCSS = ฮฃ ฮฃ ||xแตข - ฮผโ||ยฒClick on the canvas to place data points. Then click "Run K-Means" to watch the algorithm cluster them in real time.
| Aspect | Detail | Implication |
|---|---|---|
| โ Speed | O(nยทkยทiterations) | Scales to millions of customers |
| โ Simplicity | Easy to understand & explain | Stakeholders can grasp the segments |
| โ ๏ธ Needs k upfront | Must specify number of clusters | Use elbow method or business knowledge |
| โ ๏ธ Spherical clusters | Assumes similar-sized round clusters | Fails on elongated or irregular shapes |
| โ ๏ธ Sensitive to init | Random start โ different results | Use k-means++ or multiple restarts |
| โ No noise handling | Every point must join a cluster | Outliers distort centroids |
How many clusters is the right number?
The hardest part of K-Means is choosing k. Too few clusters โ segments are too broad. Too many โ segments are tiny and unmeaningful.
The elbow method plots WCSS (inertia) against k. As k increases, WCSS always decreases. But there's a point of diminishing returns โ the "elbow" โ where adding more clusters stops being worth it.
Adjust the slider to see how WCSS and silhouette score change with different values of k on the RetailMax dataset.
The silhouette score measures how similar each point is to its own cluster vs. other clusters:
s(i) = (b(i) - a(i)) / max(a(i), b(i))RetailMax runs K-Means with k=2 (WCSS=1200) and k=5 (WCSS=380). Should they always choose k=5?
Building a tree of relationships
Instead of picking k upfront, hierarchical clustering builds a complete tree of merges. You can cut the tree at any height to get any number of clusters.
Click on the dendrogram to cut at different heights and see how many clusters result. The colored bands show different cluster groupings.
A dendrogram shows a very long vertical line before the final merge at the top. What does this suggest?
Find clusters of any shape, ignore noise
DBSCAN (Density-Based Spatial Clustering of Applications with Noise) doesn't look for compact spheres. Instead, it finds dense regions and labels sparse points as noise.
Has โฅ minPts neighbors within ฮต. Center of a dense region.
Fewer than minPts neighbors, but reachable from a core point.
Not core, not reachable from any core. Outlier.
Adjust epsilon to see how the density threshold changes cluster formation. Watch how noise points (red โ) appear when epsilon is too small.
Which algorithm is BEST for finding fraudulent transactions (rare, unusual patterns) in a customer dataset?
From algorithm to actionable strategy
RetailMax has 3 key features per customer: Purchase Frequency (orders/year), Average Order Value ($), and Recency (days since last purchase).
Click "Segment Customers" to run K-Means on the RetailMax data and see the resulting business segments.
| Segment | Profile | Size | Strategy | Expected Lift |
|---|---|---|---|---|
| ๐ Champions | High frequency, high value, recent | 8% | VIP program, early access, referral rewards | +25% AOV |
| ๐ฑ Potential Loyalists | Mid frequency, growing value | 22% | Loyalty tier, personalized recommendations | +40% retention |
| ๐ค At Risk | Was frequent, now dormant | 31% | Win-back campaign, "We miss you" discounts | Recover 15% |
| ๐ New Customers | Low frequency, recent | 39% | Onboarding series, first-purchase discounts | +60% 2nd purchase |
After running K-Means, a cluster has 80% of all customers in a single group and 2 tiny groups. What went wrong?
The right tool for the right job
| Criterion | K-Means | Hierarchical | DBSCAN |
|---|---|---|---|
| Need to specify k? | โ Yes | โ No (cut later) | โ No (automatic) |
| Scalability | โ Excellent (millions) | โ ๏ธ O(nยฒ) โ medium datasets | โ Good |
| Cluster shapes | Spherical only | Any (via dendrogram) | Any shape |
| Handles noise? | โ No | โ No | โ Yes (explicitly) |
| Interpretability | High (centroids) | High (dendrogram) | Medium |
| Best use case | Customer segmentation, profiling | Taxonomy, hierarchy understanding | Anomaly detection, geo clusters |
RetailMax wants to identify customer segments to target with email campaigns. The marketing team insists on exactly 4 segments (one per team member). Which algorithm is MOST appropriate?