๐Ÿ  Course Home ๐ŸŽฏ K-Means Interactive ๐ŸŒณ Hierarchical ๐Ÿ”ฌ Lab

๐Ÿ“Š Module 9: Clustering โ€” Find the Hidden Groups

๐ŸŽฏ The Challenge

RetailMax has 2 million customers but treats them all the same. Every customer gets the same emails, the same promotions, the same recommendations. Result: 12% open rate on emails, $3.2M wasted on irrelevant promotions per year, and customers churning because they feel misunderstood.

Your job: find the hidden groups. Within those 2 million customers are segments with distinct behaviors, values, and needs. Discover them, and you can personalize at scale.

2M
Customers to segment
$3.2M
Annual waste from generic targeting
340%
ROI from personalized campaigns

๐Ÿ“ˆ Your Progress

0%
๐Ÿ”ต K-Means๐Ÿ“ Elbow Method๐ŸŒณ Hierarchical๐ŸŒŠ DBSCAN๐Ÿ’ผ Business Caseโœ… Quiz

Complete sections to track your progress.

1. What is Clustering?

Unsupervised learning โ€” finding structure when you have no labels

Supervised vs. Unsupervised Learning

Everything we've done so far was supervised: we had labeled examples (spam/not-spam, price, churn/no-churn) and learned to predict them.

Clustering is different. You have data with no labels. You're exploring: are there natural groups? How many? What defines each group?

Key insight: Clustering is about discovering structure, not predicting outcomes. Two customers who buy premium brands + live in urban areas + are 25โ€“35 years old are "similar" even if we never labeled them as such.

Three Main Approaches

๐ŸŽฏ K-Means

Assign each point to the nearest centroid. Iterate until stable. Fast, scalable.

Best for: known # clusters, spherical shapes

๐ŸŒณ Hierarchical

Merge closest points iteratively. Produces a tree (dendrogram).

Best for: unknown # clusters, nested groups

๐ŸŒŠ DBSCAN

Find dense regions, label sparse points as noise.

Best for: irregular shapes, outlier detection

2. K-Means Clustering

The workhorse of customer segmentation

How K-Means Works

  1. Choose k: Decide how many clusters you want
  2. Initialize centroids: Place k random points
  3. Assign points: Each point joins its nearest centroid
  4. Update centroids: Move each centroid to the mean of its cluster
  5. Repeat: Until centroids stop moving (convergence)
The math: K-Means minimizes within-cluster sum of squares (WCSS):
WCSS = ฮฃ ฮฃ ||xแตข - ฮผโ‚–||ยฒ
where ฮผโ‚– is the centroid of cluster k.

๐ŸŽฎ Interactive K-Means Visualization

Click on the canvas to place data points. Then click "Run K-Means" to watch the algorithm cluster them in real time.

Click canvas to add points, then Run K-Means.

K-Means Strengths & Limitations

AspectDetailImplication
โœ… SpeedO(nยทkยทiterations)Scales to millions of customers
โœ… SimplicityEasy to understand & explainStakeholders can grasp the segments
โš ๏ธ Needs k upfrontMust specify number of clustersUse elbow method or business knowledge
โš ๏ธ Spherical clustersAssumes similar-sized round clustersFails on elongated or irregular shapes
โš ๏ธ Sensitive to initRandom start โ†’ different resultsUse k-means++ or multiple restarts
โŒ No noise handlingEvery point must join a clusterOutliers distort centroids

3. Choosing k: The Elbow Method

How many clusters is the right number?

The hardest part of K-Means is choosing k. Too few clusters โ†’ segments are too broad. Too many โ†’ segments are tiny and unmeaningful.

The elbow method plots WCSS (inertia) against k. As k increases, WCSS always decreases. But there's a point of diminishing returns โ€” the "elbow" โ€” where adding more clusters stops being worth it.

๐Ÿ“ Interactive Elbow Method

Adjust the slider to see how WCSS and silhouette score change with different values of k on the RetailMax dataset.

Current k = 3: Select a k value to see interpretation.

Silhouette Score: A Better Metric

The silhouette score measures how similar each point is to its own cluster vs. other clusters:

s(i) = (b(i) - a(i)) / max(a(i), b(i))

โ“ Quick Check 1

RetailMax runs K-Means with k=2 (WCSS=1200) and k=5 (WCSS=380). Should they always choose k=5?

Yes โ€” lower WCSS always means better clustering
No โ€” more clusters doesn't mean more useful segments
Yes โ€” k=5 captures more customer nuance
No โ€” WCSS can't be compared across different k values

4. Hierarchical Clustering

Building a tree of relationships

Instead of picking k upfront, hierarchical clustering builds a complete tree of merges. You can cut the tree at any height to get any number of clusters.

Agglomerative (Bottom-Up) Algorithm

  1. Start: each point is its own cluster (n clusters)
  2. Find the two closest clusters
  3. Merge them into one
  4. Repeat until one cluster remains
  5. Cut the dendrogram at the desired height
Linkage methods define "closest":
โ€ข Single: minimum distance between any two points
โ€ข Complete: maximum distance between any two points
โ€ข Average: average of all pairwise distances
โ€ข Ward: minimizes increase in total WCSS (best for most use cases)

๐ŸŒณ Interactive Dendrogram

Click on the dendrogram to cut at different heights and see how many clusters result. The colored bands show different cluster groupings.

Click on the dendrogram to cut at a height and see cluster count.

โ“ Quick Check 2

A dendrogram shows a very long vertical line before the final merge at the top. What does this suggest?

The data has many outliers
There are likely 2 natural clusters โ€” the two groups are very different
K-Means would be a better algorithm here
The linkage method is wrong

5. DBSCAN: Density-Based Clustering

Find clusters of any shape, ignore noise

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) doesn't look for compact spheres. Instead, it finds dense regions and labels sparse points as noise.

Two Parameters

Three Types of Points

๐Ÿ”ต Core Point

Has โ‰ฅ minPts neighbors within ฮต. Center of a dense region.

๐ŸŸก Border Point

Fewer than minPts neighbors, but reachable from a core point.

๐Ÿ”ด Noise Point

Not core, not reachable from any core. Outlier.

๐ŸŒŠ Interactive DBSCAN

Adjust epsilon to see how the density threshold changes cluster formation. Watch how noise points (red โœ•) appear when epsilon is too small.

Clusters found: โ€”
Noise points: โ€”

When to Use DBSCAN

DBSCAN shines when:
DBSCAN struggles when:

โ“ Quick Check 3

Which algorithm is BEST for finding fraudulent transactions (rare, unusual patterns) in a customer dataset?

K-Means with k=10
Hierarchical with Ward linkage
DBSCAN โ€” fraud cases will be labeled as noise/outliers
Any algorithm works equally well here

6. Business Case: RetailMax Customer Segmentation

From algorithm to actionable strategy

The Data

RetailMax has 3 key features per customer: Purchase Frequency (orders/year), Average Order Value ($), and Recency (days since last purchase).

๐ŸŽฏ RetailMax Segmentation Demo

Click "Segment Customers" to run K-Means on the RetailMax data and see the resulting business segments.

From Segments to Strategy

SegmentProfileSizeStrategyExpected Lift
๐Ÿ’Ž ChampionsHigh frequency, high value, recent8%VIP program, early access, referral rewards+25% AOV
๐ŸŒฑ Potential LoyalistsMid frequency, growing value22%Loyalty tier, personalized recommendations+40% retention
๐Ÿ’ค At RiskWas frequent, now dormant31%Win-back campaign, "We miss you" discountsRecover 15%
๐Ÿ‘‹ New CustomersLow frequency, recent39%Onboarding series, first-purchase discounts+60% 2nd purchase

โ“ Quick Check 4

After running K-Means, a cluster has 80% of all customers in a single group and 2 tiny groups. What went wrong?

The data wasn't normalized โ€” features with large ranges dominate
k is too small โ€” you need more clusters
Bad centroid initialization
All of the above could cause this โ€” always normalize data, validate k, and use k-means++

7. Choosing the Right Algorithm

The right tool for the right job

CriterionK-MeansHierarchicalDBSCAN
Need to specify k?โœ… YesโŒ No (cut later)โŒ No (automatic)
Scalabilityโœ… Excellent (millions)โš ๏ธ O(nยฒ) โ€” medium datasetsโœ… Good
Cluster shapesSpherical onlyAny (via dendrogram)Any shape
Handles noise?โŒ NoโŒ Noโœ… Yes (explicitly)
InterpretabilityHigh (centroids)High (dendrogram)Medium
Best use caseCustomer segmentation, profilingTaxonomy, hierarchy understandingAnomaly detection, geo clusters

โ“ Quick Check 5 (Final)

RetailMax wants to identify customer segments to target with email campaigns. The marketing team insists on exactly 4 segments (one per team member). Which algorithm is MOST appropriate?

K-Means with k=4 โ€” fast, interpretable, meets the constraint
DBSCAN โ€” it finds the natural number of clusters
Hierarchical with single linkage
No algorithm โ€” you need labels for email marketing

๐Ÿ“ Module 9 Summary

Next: K-Means Deep Dive โ†’ Hierarchical Deep Dive โ†’