πŸ“š Class Material πŸ”΅ K-Means 🏠 Course Home

🌳 Module 9B: Hierarchical Clustering β€” Structure, Not Just Groups

πŸ“ Your Journey Through Hierarchical Clustering

0%
1Concept
2Build
3Dendrogram
4Linkage
5Apply

🎯 The Business Problem K-Means Can't Solve

"The marketing team wants to know not just WHAT groups exist, but HOW they relate to each other. Are 'budget shoppers' a completely separate segment, or are they a sub-group of 'value seekers'? Which premium customers are closest to standard-tier? We need a hierarchy, not just a partition."

β€” CMO of a $2B retail company during a strategy meeting

K-Means gave us k clusters. Great. But it can't tell us:

Hierarchical clustering builds a tree of relationships β€” a dendrogram β€” that answers all of these at once. You can cut it at any level to get 2, 3, 5, or 10 clusters without rerunning anything.

πŸ—οΈ Step 1: Agglomerative Clustering β€” Building From the Bottom Up

Agglomerative (bottom-up) clustering starts with every point as its own cluster, then repeatedly merges the two closest clusters until only one remains. Watch it happen:

🎬 Live Merge Animation

Click "New Points" to generate data, then "Next Merge" to step through agglomerative clustering.
Current clusters
Merged (last step)
Step: 0
Algorithm in Plain English:
1. Start: N points β†’ N singleton clusters
2. Find the two closest clusters (by chosen linkage method)
3. Merge them into one cluster
4. Repeat from step 2 until 1 cluster remains
5. Record every merge β€” this becomes your dendrogram!

🌲 Step 2: Reading the Dendrogram

The dendrogram records every merge: which clusters merged and at what distance. The height of each horizontal bar = the distance between clusters when they merged. Cut the tree at any height to get your clusters.

βœ‚οΈ Interactive Dendrogram β€” Drag to Cut

50 Clusters: 2

⬆ Drag the slider or click on the dendrogram to set the cut height. Each color = one cluster.

Adjust the cut height slider to see how many clusters you get.

πŸ”‘ Key Insight: Long Branches = Natural Clusters

πŸ“
Height = Distance

Taller bars = clusters that were far apart when merged. Look for large gaps.

βœ‚οΈ
Cut = k Clusters

Cut at height h β†’ count the vertical lines crossing the cut line = number of clusters.

🌿
Hierarchy Preserved

Clusters at k=4 are sub-divisions of clusters at k=2. No re-running needed.

πŸ”— Step 3: Linkage Methods β€” How Do We Measure Cluster Distance?

Once we have clusters (not just points), how do we measure distance between clusters? Different answers give dramatically different trees.

βš–οΈ Compare Linkage Methods Side-by-Side

Point clusters

Resulting dendrogram

Select a linkage method above to see how it shapes the tree.
LinkageDistance = ?ProsConsBest For
Single Minimum distance between any two points across clusters Handles non-spherical shapes Chaining effect β€” long, stringy clusters Detecting elongated shapes
Complete Maximum distance between any two points Compact, tight clusters Sensitive to outliers Equal-size, compact clusters
Average Average of all pairwise distances Balanced, robust Less intuitive General-purpose use
Ward Increase in total within-cluster variance Minimizes variance β€” similar to K-Means criterion Tends toward equal-size clusters Most business applications βœ…

πŸ“ Distance Metrics: What Does "Close" Mean?

Before we can merge clusters, we need to define distance between points. The choice can change your results dramatically β€” especially with high-dimensional or text data.

πŸ“Š Visual Comparison of Distance Metrics

πŸ“
Euclidean

Straight-line distance. √(Σ(aᡒ-bᡒ)²)
Use for: spatial, continuous numeric data

πŸ™οΈ
Manhattan

Grid-path distance. Ξ£|aα΅’-bα΅’|
Use for: robust to outliers, city-block movement

πŸ“
Cosine

Angle between vectors. 1 - (aΒ·b)/(|a||b|)
Use for: text, high-dimensional data, NLP

🎯 Step 4: Choosing the Right Number of Clusters

With hierarchical clustering, you can choose k after building the tree. Two methods help you pick the optimal cut:

πŸ“ Dendrogram Gap Method + Silhouette Score

Dendrogram β€” Find the Largest Gap

The red arrow points to the largest gap β€” that's where you should cut!

Silhouette Score by k

Higher silhouette = better-separated clusters. Pick the k with the highest score.

Two Rules of Thumb:
🌲 Dendrogram Gap: Find the longest vertical line with no horizontal crossing β€” cut in the middle of that gap.
πŸ“Š Silhouette Score: Ranges from -1 to 1. Score near 1 = tight, well-separated clusters. Pick k that maximizes it.

πŸ’Ό Business Case: Customer Hierarchy at LuxRetail

Scenario: LuxRetail has 50,000 customers with purchase history, visit frequency, and average spend. The CMO wants a customer hierarchy β€” not just 3 segments, but a full tree showing how segments relate, so marketing can target at multiple levels of granularity.

πŸͺ Customer Clustering Hierarchy β€” Interactive Explore

πŸ“‹ Discovered Customer Hierarchy (3-level)

All Customers
β”œβ”€β”€ πŸ’Ž Premium Tier (top 20%, avg spend $850/visit)
β”‚   β”œβ”€β”€ Luxury Loyalists (frequent, high spend, brand-conscious)
β”‚   β””── Occasion Splurgers (infrequent but very high single-visit spend)
β”œβ”€β”€ πŸ“¦ Standard Tier (middle 55%, avg spend $210/visit)
β”‚   β”œβ”€β”€ Regular Shoppers (steady frequency, moderate spend)
β”‚   β””── Deal Hunters (medium frequency, discount-driven)
└── 🏷️ Budget Tier (bottom 25%, avg spend $45/visit)
    β”œβ”€β”€ Occasional Browsers (low frequency, low spend)
    β””── Churn Risk (declining engagement)

🎯 Marketing Actions at Each Level

LevelSegmentStrategyExpected ROI
Broad (2 clusters)Premium vs. RestVIP program vs. mass campaign15% lift
Medium (3 clusters)Premium / Standard / BudgetTiered loyalty rewards23% lift
Granular (5+ clusters)All sub-segmentsPersonalized 1:1 messaging31% lift

Key advantage: You ran hierarchical clustering once. Marketing can cut the dendrogram at whatever level makes sense for this month's campaign β€” no rerunning, no choosing k upfront.

βš”οΈ K-Means vs. Hierarchical: When to Use Which?

Dimension πŸ”΅ K-Means 🌳 Hierarchical
Input required Must specify k upfront No k needed β€” choose after
Result Flat partition (k clusters) Full tree (all k at once)
Scalability βœ… Scales to millions of points ⚠️ Slow on large data (O(nΒ²) or O(nΒ³))
Interpretability Cluster centroids easy to explain Dendrogram shows relationships
Reproducibility Random init β†’ different results Deterministic (same tree every run)
Cluster shape Assumes spherical/convex Flexible (depends on linkage)
Outlier handling Sensitive to outliers Can isolate outliers as singletons
Data size 10K–10M+ rows Up to ~5K–10K rows practical
Best use case Large-scale segmentation with known k Exploratory analysis, unknown k, need hierarchy
πŸ”΅ Use K-Means when...

β€’ Dataset is large (>10K rows)
β€’ You know how many clusters you want
β€’ Speed matters
β€’ Clusters are roughly spherical

🌳 Use Hierarchical when...

β€’ You want to explore k without rerunning
β€’ You need cluster relationships (hierarchy)
β€’ Dataset is moderate size (<10K rows)
β€’ You want a reproducible, deterministic result

🧠 Knowledge Check β€” 5 Questions

Test your understanding of hierarchical clustering. Answer all 5 to complete the module.

Score: 0/5

Question 1 of 5

In agglomerative clustering, what happens at the very first step?

All points are merged into one cluster immediately
Each point starts as its own singleton cluster
You must specify k clusters to start with
The algorithm randomly assigns points to clusters

Question 2 of 5

What does the height of a bar in a dendrogram represent?

The number of points in the merged cluster
The order in which the merge happened
The distance between the two clusters when they merged
The silhouette score at that merge step

Question 3 of 5

Which linkage method is most prone to the "chaining effect" (long, stringy clusters)?

Single linkage
Complete linkage
Ward's method
Average linkage

Question 4 of 5

You have 500,000 customer records and need to segment them quickly into 5 groups. Which method is better and why?

Hierarchical β€” it gives a better dendrogram for large datasets
K-Means β€” it scales much better (O(nk)) vs hierarchical O(nΒ²)
Hierarchical β€” because you don't need to specify k
Both are equally good at this scale

Question 5 of 5

When using the dendrogram gap method to choose k, what should you look for?

The point where the tree has exactly k leaves
The merge step with the smallest distance
The longest vertical line with no horizontal merges crossing it
The merge step where exactly half the points are combined

πŸ—ΊοΈ Module 9 Navigation

πŸ“š Class Material πŸ”΅ K-Means Interactive 🏠 Course Home