Self-Organizing Maps (SOM)

  Add to Bookmark

Self-Organizing Maps (SOM) are a type of artificial neural network introduced by Teuvo Kohonen. SOMs are mainly used for dimensionality reduction and data visualization, especially for high-dimensional data.

Unlike supervised learning models, SOMs learn patterns without labels, organizing input data into a meaningful 2D map where similar inputs are grouped together. They’re ideal for clustering, pattern recognition, and visualization.


How SOM Works

Architecture

  • Input layer: accepts n-dimensional feature vectors
  • Map layer: typically a 2D grid of neurons (nodes), each with an associated weight vector of the same dimension as the input

Training Process

  1. Initialize weight vectors randomly
  2. For each input vector:
    • Find the Best Matching Unit (BMU) — the neuron whose weights are closest to the input
    • Update the BMU and its neighbors to make them more like the input
  3. Over time, the map self-organizes to reflect the data distribution

Key Features of SOM

FeatureDescription
Topology PreservationSimilar data points map to nearby neurons
Dimensionality ReductionHigh-dimensional data projected onto a 2D space
Unsupervised LearningNo labeled data required
Clustering & VisualizationHelps identify natural clusters and structure in data

Python Example Using MiniSom

We’ll use the MiniSom package for a basic SOM implementation.

Install Library

pip install minisom

Sample Code

from minisom import MiniSom
from sklearn.datasets import load_iris
from sklearn.preprocessing import MinMaxScaler
import matplotlib.pyplot as plt
import numpy as np

# Load and normalize data
iris = load_iris()
X = iris.data
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Initialize SOM: 7x7 grid, input_len = 4 (features)
som = MiniSom(x=7, y=7, input_len=4, sigma=1.0, learning_rate=0.5)
som.random_weights_init(X_scaled)
som.train_random(X_scaled, 100)

# Visualize SOM distance map (U-Matrix)
plt.figure(figsize=(7, 7))
plt.pcolor(som.distance_map().T, cmap='coolwarm')  # distance map as heatmap
plt.colorbar()
plt.title("SOM - Distance Map (U-Matrix)")
plt.show()

Output-

Output Explanation

  • Brighter cells indicate greater distance (edges between clusters)
  • Darker cells represent more similarity (cluster centers)

Applications of SOM

  • Customer segmentation
  • Anomaly detection
  • Document or text clustering
  • Gene expression pattern analysis
  • Image compression

Advantages

  • Intuitive data visualization
  • Preserves data topology
  • Works well with unlabeled, high-dimensional data
  • Identifies patterns/clusters without supervision

Limitations

  • SOMs require careful tuning of map size and parameters
  • Training is relatively slow for large datasets
  • Interpretation can be less precise than traditional clustering models

Tips for Beginners

  • Always normalize data before training a SOM
  • Use the U-Matrix to visually interpret cluster boundaries
  • Start with small grid sizes and scale as needed

Tips for Professionals

  • Use SOM as a preprocessing step for classification or anomaly detection
  • Combine with other clustering techniques like DBSCAN for hybrid modeling
  • Customize color-mapped heatmaps for deeper insight
  • Evaluate map quality using quantization and topographic errors

Summary

Self-Organizing Maps (SOM) are a powerful tool for unsupervised learning and visualization of complex data. They help you discover hidden structures and relationships, especially in high-dimensional datasets, making them a valuable tool for exploratory data analysis and clustering tasks.