DBSCAN (Density-Based Spatial Clustering of Applications with Noise) is a powerful clustering algorithm that groups data based on density rather than distance like K-Means. It is particularly useful for discovering clusters of arbitrary shapes and identifying outliers (noise).
Unlike K-Means, DBSCAN does not require the number of clusters to be specified in advance and handles non-spherical clusters and noise naturally.
DBSCAN uses two key parameters:
eps (epsilon): Maximum distance between two points to be considered neighborsmin_samples: Minimum number of points required to form a dense region (cluster)min_samples within eps radiuseps of a core point but has fewer than min_samples neighborseps and min_samples is trickysklearnpip install scikit-learn matplotlibfrom sklearn.datasets import make_moons
from sklearn.cluster import DBSCAN
import matplotlib.pyplot as plt
import numpy as np
# Generate synthetic data
X, y = make_moons(n_samples=300, noise=0.05, random_state=42)
# Fit DBSCAN
dbscan = DBSCAN(eps=0.2, min_samples=5)
labels = dbscan.fit_predict(X)
# Visualize clusters
plt.figure(figsize=(8, 5))
plt.scatter(X[:, 0], X[:, 1], c=labels, cmap='viridis', s=50)
plt.title("DBSCAN Clustering Result")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.grid(True)
plt.show()Output-
make_moons creates non-linearly separable data-1eps value-1 as noise or outlierDBSCAN is a robust density-based clustering algorithm that identifies clusters of arbitrary shapes and handles outliers effectively. It’s ideal for real-world applications like geolocation data, anomaly detection, and pattern recognition in noisy environments.
Sign in to join the discussion and post comments.
Sign in