Hierarchical Clustering

  Add to Bookmark

Hierarchical Clustering is an unsupervised learning technique used to group similar data points into clusters based on a hierarchy. Unlike K-Means, you don't need to predefine the number of clusters. Instead, the algorithm builds a tree-like structure (dendrogram) representing nested groupings of data.

There are two main types of hierarchical clustering:

  • Agglomerative (bottom-up): Start with individual data points, merge clusters iteratively.
  • Divisive (top-down): Start with all data in one cluster and split recursively.

Key Concepts

Dendrogram

A dendrogram is a tree-like diagram that records the sequences of merges or splits. Cutting the dendrogram at a chosen level yields the desired number of clusters.

Distance Metrics

Hierarchical clustering relies on a distance metric:

  • Euclidean (default)
  • Manhattan
  • Cosine, etc.

Linkage Criteria

Defines how the distance between clusters is calculated:

  • Single Linkage – Minimum distance between points of the clusters
  • Complete Linkage – Maximum distance between points
  • Average Linkage – Average distance between points
  • Ward’s Method – Minimizes variance within clusters

Python Example: Agglomerative Clustering

from sklearn.datasets import make_blobs
from sklearn.cluster import AgglomerativeClustering
import matplotlib.pyplot as plt
import scipy.cluster.hierarchy as sch

# 1. Generate synthetic data
X, _ = make_blobs(n_samples=150, centers=3, cluster_std=0.60, random_state=42)

# 2. Plot dendrogram to visualize hierarchy
plt.figure(figsize=(8, 5))
dendrogram = sch.dendrogram(sch.linkage(X, method='ward'))
plt.title("Dendrogram")
plt.xlabel("Data Points")
plt.ylabel("Euclidean Distance")
plt.show()

# 3. Fit Agglomerative Clustering with 3 clusters
model = AgglomerativeClustering(n_clusters=3, affinity='euclidean', linkage='ward')
labels = model.fit_predict(X)

# 4. Visualize Clusters
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis')
plt.title("Agglomerative Clustering (3 Clusters)")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.show()

Output- 

 

Explanation

  • Step 1: Create data with known centers.
  • Step 2: Plot a dendrogram to observe the natural cluster hierarchy.
  • Step 3: Use AgglomerativeClustering with ward linkage.
  • Step 4: Visualize the clustering results.

When to Use Hierarchical Clustering

  • When the number of clusters is unknown.
  • When you want to understand the data structure or nested groupings.
  • For smaller datasets (does not scale well for very large data).

Advantages

  • No need to predefine number of clusters.
  • Produces a dendrogram for visual analysis.
  • Can capture nested data groupings.

Limitations

  • Computationally expensive (O(n²) time and space).
  • Sensitive to noise and outliers.
  • Final clusters are not always optimal due to early merge decisions (no reassignments).

Tips for Beginners

  • Use dendrograms to visually decide on the number of clusters.
  • Stick with ward linkage and euclidean distance for standard tasks.
  • Normalize data if features have different scales.

Tips for Professionals

  • For large datasets, use scikit-learn’s connectivity constraints to speed up clustering.
  • Combine PCA with hierarchical clustering for better performance and visualization.
  • Use hierarchical clustering as a preprocessing step to inform other models.

Summary

  • Hierarchical clustering builds a hierarchy of clusters without needing to specify K.
  • Dendrograms help visualize and select cluster groups.
  • Best suited for small to medium datasets with clear hierarchical structures.