Feature scaling is one of the most essential preprocessing steps in machine learning. Algorithms that rely on distances or gradients (like KNN, SVM, or Gradient Descent-based models) can perform poorly if features are on different scales.
In this tutorial, we’ll explore two core techniques: Normalization and Standardization, understand when to use each, and implement them with Python.
Imagine you have two features:
Many ML algorithms (like K-Means, Logistic Regression, SVM, and KNN) treat larger scale features as more important unless scaled properly.
Without scaling:
Rescales the feature to a fixed range, usually [0, 1].
Formula:
from sklearn.preprocessing import MinMaxScaler
import pandas as pd
# Sample data
df = pd.DataFrame({
'Age': [20, 25, 30, 35, 40],
'Income': [20000, 40000, 60000, 80000, 100000]
})
scaler = MinMaxScaler()
normalized = scaler.fit_transform(df)
normalized_df = pd.DataFrame(normalized, columns=df.columns)
print(normalized_df)Output-
Age Income
0 0.00 0.00
1 0.25 0.25
2 0.50 0.50
3 0.75 0.75
4 1.00 1.00Transforms data to have zero mean and unit variance.
Formula:
Where:
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
standardized = scaler.fit_transform(df)
standardized_df = pd.DataFrame(standardized, columns=df.columns)
print(standardized_df)Output-
Age Income
0 -1.414214 -1.414214
1 -0.707107 -0.707107
2 0.000000 0.000000
3 0.707107 0.707107
4 1.414214 1.414214| Feature | Normalization | Standardization |
|---|---|---|
| Range | [0, 1] | Mean = 0, Std = 1 |
| Sensitive to outliers | Yes | Less |
| When to use | KNN, NN, distance-based models | Linear models, PCA, Gaussian assumptions |
| Requires normal distribution | No | Preferably yes |
Sign in to join the discussion and post comments.
Sign inUnsupervised Learning
Explore Unsupervised Learning techniques to uncover patterns, structures, and relationships in unlabeled data.
Supervised Learning
Discover what Supervised Learning is, how it works, and what you'll learn in this hands-on tutorial series covering top ML algorithms like Linear Regression, Decision Trees, SVM, and more.