- Supervised Learning
-
Overview
- Introduction to Supervised Learning
- Linear Regression and Its Applications
- Logistic Regression for Classification
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN) Algorithm
- Naïve Bayes Classifier
- Gradient Boosting (XGBoost, LightGBM)
- Overfitting and Underfitting in Models
- Bias-Variance Tradeoff
K-Nearest Neighbors (KNN) Algorithm
Add to BookmarkK-Nearest Neighbors (KNN) is one of the simplest and most intuitive supervised machine learning algorithms used for both classification and regression tasks. It makes predictions based on the labels of the nearest data points in the training set.
What You'll Learn
- What is KNN and how it works
- Use cases and advantages
- KNN for classification with Python code
- How to choose the value of
K
- Limitations of KNN
What is KNN?
The K-Nearest Neighbors algorithm classifies a data point based on how its neighbors are classified. It doesn’t involve any training phase (lazy learning), and it simply stores the training data. When a new input is given, it calculates the distance to all points in the training set and returns the majority label of the K closest ones.
How KNN Works
- Choose the number of neighbors
K
. - Measure the distance (typically Euclidean) between the new point and every other point in the training data.
- Select the
K
nearest neighbors. - Assign the label based on majority vote (for classification) or average (for regression).
Example: KNN for Classification (Iris Dataset)
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score
# Load dataset
iris = load_iris()
X = iris.data
y = iris.target
# Split into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)
# Create KNN model
knn = KNeighborsClassifier(n_neighbors=3)
knn.fit(X_train, y_train)
# Predict
y_pred = knn.predict(X_test)
# Evaluate
print("Accuracy:", accuracy_score(y_test, y_pred))
Output-
Accuracy: 1.0
How to Choose the Best K
- Small
K
(e.g., 1 or 3) → Sensitive to noise - Large
K
(e.g., 15 or 20) → More stable but may miss local patterns - Use cross-validation to find the optimal
K
# Trying different values of K
for k in range(1, 11):
model = KNeighborsClassifier(n_neighbors=k)
model.fit(X_train, y_train)
pred = model.predict(X_test)
print(f"K={k}, Accuracy={accuracy_score(y_test, pred)}")
Output-
K=1, Accuracy=1.0
K=2, Accuracy=1.0
K=3, Accuracy=1.0
K=4, Accuracy=1.0
K=5, Accuracy=1.0
K=6, Accuracy=1.0
K=7, Accuracy=1.0
K=8, Accuracy=1.0
K=9, Accuracy=1.0
Distance Metrics
- Euclidean Distance (default in most libraries)
- Manhattan Distance
- Minkowski Distance
You can choose using themetric
parameter inKNeighborsClassifier
.
When to Use KNN
- When data is not too large (KNN is computationally expensive)
- When you need an interpretable algorithm
- For problems like classification of text, images, or recommendation systems
Advantages of KNN
- Simple and easy to understand
- No training phase (fast setup)
- Works well with multi-class problems
Limitations
- Slow prediction on large datasets
- Performance depends heavily on the choice of distance metric and value of
K
- Sensitive to irrelevant features and feature scaling
Summary
K-Nearest Neighbors is a practical, non-parametric algorithm that's ideal for beginners. While it's not suited for very large datasets or high-dimensional data, it's often surprisingly effective for simple problems.
Prepare for Interview
- JavaScript Interview Questions for 5+ Years Experience
- JavaScript Interview Questions for 2–5 Years Experience
- JavaScript Interview Questions for 1–2 Years Experience
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
Random Blogs
- Avoiding the Beginner’s Trap: Key Python Fundamentals You Shouldn't Skip
- How Multimodal Generative AI Will Change Content Creation Forever
- Variable Assignment in Python
- Google’s Core Update in May 2020: What You Need to Know
- The Ultimate Guide to Artificial Intelligence (AI) for Beginners
- Understanding LLMs (Large Language Models): The Ultimate Guide for 2025
- Understanding AI, ML, Data Science, and More: A Beginner's Guide to Choosing Your Career Path
- Ideas for Content of Every niche on Reader’s Demand during COVID-19
- Government Datasets from 50 Countries for Machine Learning Training
- AI & Space Exploration – AI’s Role in Deep Space Missions and Planetary Research
- Big Data: The Future of Data-Driven Decision Making
- The Ultimate Guide to Machine Learning (ML) for Beginners
- Time Series Analysis on Air Passenger Data
- Downlaod Youtube Video in Any Format Using Python Pytube Library
- Understanding Data Lake, Data Warehouse, Data Mart, and Data Lakehouse – And Why We Need Them
Datasets for Machine Learning
- Awesome-ChatGPT-Prompts
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset