K-Nearest Neighbors (KNN) Algorithm

Add to Bookmark

K-Nearest Neighbors (KNN) is one of the simplest and most intuitive supervised machine learning algorithms used for both classification and regression tasks. It makes predictions based on the labels of the nearest data points in the training set.

What You'll Learn

What is KNN and how it works
Use cases and advantages
KNN for classification with Python code
How to choose the value of K
Limitations of KNN

What is KNN?

The K-Nearest Neighbors algorithm classifies a data point based on how its neighbors are classified. It doesn’t involve any training phase (lazy learning), and it simply stores the training data. When a new input is given, it calculates the distance to all points in the training set and returns the majority label of the K closest ones.

How KNN Works

Choose the number of neighbors K.
Measure the distance (typically Euclidean) between the new point and every other point in the training data.
Select the K nearest neighbors.
Assign the label based on majority vote (for classification) or average (for regression).

Example: KNN for Classification (Iris Dataset)

from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score

# Load dataset
iris = load_iris()
X = iris.data
y = iris.target

# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Create KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)

# Predict
y_pred = knn.predict(X_test)

# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))

Output-

Accuracy: 1.0

How to Choose the Best K

Small K (e.g., 1 or 3) → Sensitive to noise
Large K (e.g., 15 or 20) → More stable but may miss local patterns
Use cross-validation to find the optimal K

# Trying different values of K
for k in range(1, 11):
    model = KNeighborsClassifier(n_neighbors=k)
    model.fit(X_train, y_train)
    pred = model.predict(X_test)
    print(f"K={k}, Accuracy={accuracy_score(y_test, pred)}")

Output-

K=1, Accuracy=1.0
K=2, Accuracy=1.0
K=3, Accuracy=1.0
K=4, Accuracy=1.0
K=5, Accuracy=1.0
K=6, Accuracy=1.0
K=7, Accuracy=1.0
K=8, Accuracy=1.0
K=9, Accuracy=1.0

Distance Metrics

Euclidean Distance (default in most libraries)
Manhattan Distance
Minkowski Distance
You can choose using the metric parameter in KNeighborsClassifier.

When to Use KNN

When data is not too large (KNN is computationally expensive)
When you need an interpretable algorithm
For problems like classification of text, images, or recommendation systems

Advantages of KNN

Simple and easy to understand
No training phase (fast setup)
Works well with multi-class problems

Limitations

Slow prediction on large datasets
Performance depends heavily on the choice of distance metric and value of K
Sensitive to irrelevant features and feature scaling

Summary

K-Nearest Neighbors is a practical, non-parametric algorithm that's ideal for beginners. While it's not suited for very large datasets or high-dimensional data, it's often surprisingly effective for simple problems.

Overview

K-Nearest Neighbors (KNN) Algorithm

What You'll Learn

What is KNN?

How KNN Works

Example: KNN for Classification (Iris Dataset)

How to Choose the Best K

Distance Metrics

When to Use KNN

Advantages of KNN

Limitations

Summary

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin

Overview

K-Nearest Neighbors (KNN) Algorithm

What You'll Learn

What is KNN?

How KNN Works

Example: KNN for Classification (Iris Dataset)

How to Choose the Best K

Distance Metrics

When to Use KNN

Advantages of KNN

Limitations

Summary

Related Tutorials

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin