Hierarchical Clustering is an unsupervised learning technique used to group similar data points into clusters based on a hierarchy. Unlike K-Means, you don't need to predefine the number of clusters. Instead, the algorithm builds a tree-like structure (dendrogram) representing nested groupings of data.
There are two main types of hierarchical clustering:
Agglomerative (bottom-up): Start with individual data points, merge clusters iteratively.
Divisive (top-down): Start with all data in one cluster and split recursively.
Key Concepts
Dendrogram
A dendrogram is a tree-like diagram that records the sequences of merges or splits. Cutting the dendrogram at a chosen level yields the desired number of clusters.
Distance Metrics
Hierarchical clustering relies on a distance metric:
Euclidean (default)
Manhattan
Cosine, etc.
Linkage Criteria
Defines how the distance between clusters is calculated:
Single Linkage – Minimum distance between points of the clusters
Complete Linkage – Maximum distance between points
Average Linkage – Average distance between points
Ward’s Method – Minimizes variance within clusters
Python Example: Agglomerative Clustering
from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch
# 1. Generate synthetic data
X, _ = make_blobs(n_samples=150, centers=3, cluster_std=0.60, random_state=42)
# 2. Plot dendrogram to visualize hierarchy
plt.figure(figsize=(8, 5))
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
plt.title("Dendrogram")
plt.xlabel("Data Points")
plt.ylabel("Euclidean Distance")
plt.show()
# 3. Fit Agglomerative Clustering with 3 clusters
model = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
labels = model.fit_predict(X)
# 4. Visualize Clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title("Agglomerative Clustering (3 Clusters)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()
Output-
Explanation
Step 1: Create data with known centers.
Step 2: Plot a dendrogram to observe the natural cluster hierarchy.
Step 3: Use AgglomerativeClustering with ward linkage.
Step 4: Visualize the clustering results.
When to Use Hierarchical Clustering
When the number of clusters is unknown.
When you want to understand the data structure or nested groupings.
For smaller datasets (does not scale well for very large data).
Advantages
No need to predefine number of clusters.
Produces a dendrogram for visual analysis.
Can capture nested data groupings.
Limitations
Computationally expensive (O(n²) time and space).
Sensitive to noise and outliers.
Final clusters are not always optimal due to early merge decisions (no reassignments).
Tips for Beginners
Use dendrograms to visually decide on the number of clusters.
Stick with ward linkage and euclidean distance for standard tasks.
Normalize data if features have different scales.
Tips for Professionals
For large datasets, use scikit-learn’s connectivity constraints to speed up clustering.
Combine PCA with hierarchical clustering for better performance and visualization.
Use hierarchical clustering as a preprocessing step to inform other models.
Summary
Hierarchical clustering builds a hierarchy of clusters without needing to specify K.
Dendrograms help visualize and select cluster groups.
Best suited for small to medium datasets with clear hierarchical structures.