Dimensionality Reduction Techniques

Add to Bookmark

Dimensionality reduction is the process of reducing the number of input variables or features in a dataset. It is a crucial preprocessing step in machine learning, especially when dealing with high-dimensional data. Reducing the number of features can help improve model performance, reduce computational cost, and combat the curse of dimensionality.

Why Dimensionality Reduction Is Important

Improves Model Performance: Redundant or irrelevant features can degrade model performance. Removing them helps the model focus on meaningful patterns.
Reduces Overfitting: With fewer features, the model is less likely to learn noise from the training data.
Increases Training Efficiency: Fewer features mean faster training and less memory usage.
Enables Visualization: Reducing features to two or three dimensions enables data visualization and insight.

Types of Dimensionality Reduction Techniques

Dimensionality reduction techniques fall into two major categories:

Feature Selection: Selecting a subset of the original features without transforming them.
Feature Extraction: Transforming data from a high-dimensional space to a lower-dimensional space.

In this tutorial, we focus on Feature Extraction techniques.

1. Principal Component Analysis (PCA)

Description:

Principal Component Analysis is a linear technique that projects data onto a lower-dimensional subspace such that the variance of the projected data is maximized. It finds the axes (principal components) along which the data varies the most.

Key Concepts:

PCA transforms the original features into new uncorrelated features (principal components).
The first component captures the most variance, the second captures the second-most, and so on.
PCA is unsupervised (does not use target labels).

Use Case:

Best used when you want to reduce dimensionality for continuous variables and when you suspect multicollinearity among features.

Code Example:


from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt
import pandas as pd

# Load the dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names
target_names = data.target_names

# Step 1: Standardize the data
# PCA is affected by the scale of features, so standardization is important
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 2: Apply PCA to reduce dimensions to 2 components
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X_scaled)

# Step 3: Explained variance ratio
print("Explained Variance Ratio for each component:")
print(pca.explained_variance_ratio_)

# Step 4: Visualize the reduced feature space
pca_df = pd.DataFrame(data=X_pca, columns=['PC1', 'PC2'])
pca_df['Target'] = y

plt.figure(figsize=(8, 6))
for label, target_name in enumerate(target_names):
    plt.scatter(
        pca_df[pca_df['Target'] == label]['PC1'],
        pca_df[pca_df['Target'] == label]['PC2'],
        label=target_name,
        alpha=0.7
    )

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.title('PCA - Breast Cancer Dataset')
plt.legend()
plt.grid(True)
plt.tight_layout()
plt.savefig("pca_plot.png")  # Save the figure

Output -

Explained Variance Ratio for each component:
[0.44272026 0.18971182]

Code Explanation:

Dataset: The Breast Cancer dataset is loaded for binary classification.
Standardization: PCA is a variance-based technique and is sensitive to feature scales. Hence, standardization using StandardScaler is applied.
PCA Transformation: The data is projected to 2 principal components, which explain most of the variance in the dataset.
Variance Ratio: explained_variance_ratio_ shows how much variance is retained by each principal component.
Visualization: A 2D scatter plot shows how the two classes (malignant and benign) are separated in the new reduced feature space.

2. Linear Discriminant Analysis (LDA)

Description:

LDA is a supervised dimensionality reduction technique that seeks to find a feature space that maximizes class separability. It projects data in a way that maximizes the distance between classes and minimizes the variation within each class.

Key Concepts:

Unlike PCA, LDA uses class labels.
It is suitable for classification tasks.

Use Case:

Best used when the goal is to improve classification performance by reducing dimensions while maintaining class separation.

Code Example:


from sklearn.datasets import load_breast_cancer
from sklearn.discriminant_analysis import LinearDiscriminantAnalysis
from sklearn.preprocessing import StandardScaler
import pandas as pd
import matplotlib.pyplot as plt

# Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
feature_names = data.feature_names
target_names = data.target_names

# Step 1: Standardize the data
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 2: Apply LDA to reduce to 1 component (since it’s a binary classification)
lda = LinearDiscriminantAnalysis(n_components=1)
X_lda = lda.fit_transform(X_scaled, y)

# Step 3: Add target labels for visualization
lda_df = pd.DataFrame(X_lda, columns=['LDA1'])
lda_df['Target'] = y

# Step 4: Visualize the distribution along the LDA component
plt.figure(figsize=(8, 4))
for label, name in enumerate(target_names):
    plt.hist(
        lda_df[lda_df['Target'] == label]['LDA1'],
        bins=30,
        alpha=0.6,
        label=name
    )

plt.title('LDA Projection - Breast Cancer Dataset')
plt.xlabel('Linear Discriminant 1')
plt.ylabel('Frequency')
plt.legend()
plt.grid(True)
plt.tight_layout()

# Save the plot instead of showing if in headless environment
plt.savefig("lda_projection.png")
# plt.show()  # Uncomment this line if running in an interactive environment

Output-

Code Explanation:

LDA is supervised, meaning it uses the class labels (y) to find a projection that maximizes separation between classes and minimizes intra-class variance.
The dataset has 30 features, which are projected into 1 component (LDA1) because there are 2 classes (you can extract at most n_classes - 1 components).
The final plot shows how the classes are separated along the LDA axis.

3. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Description:

t-SNE is a non-linear technique particularly well-suited for visualizing high-dimensional data in 2D or 3D space. It preserves local structure by modeling similar data points close together in the lower-dimensional space.

Key Concepts:

Not suitable for predictive modeling.
Primarily used for visualization and understanding clusters in data.

Limitations:

Computationally expensive.
Not deterministic unless the random state is fixed.

Code Example:


from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import StandardScaler
from sklearn.manifold import TSNE
import matplotlib.pyplot as plt
import pandas as pd

# Step 1: Load dataset
data = load_breast_cancer()
X, y = data.data, data.target
target_names = data.target_names

# Step 2: Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Apply t-SNE to reduce dimensions to 2
tsne = TSNE(n_components=2, perplexity=30, random_state=0)
X_tsne = tsne.fit_transform(X_scaled)

# Step 4: Convert to DataFrame for visualization
tsne_df = pd.DataFrame(X_tsne, columns=['TSNE1', 'TSNE2'])
tsne_df['Target'] = y

# Step 5: Plot the 2D t-SNE output
plt.figure(figsize=(8, 6))
for label, name in enumerate(target_names):
    subset = tsne_df[tsne_df['Target'] == label]
    plt.scatter(subset['TSNE1'], subset['TSNE2'], label=name, alpha=0.6)

plt.title("t-SNE Projection of Breast Cancer Dataset")
plt.xlabel("t-SNE Component 1")
plt.ylabel("t-SNE Component 2")
plt.legend()
plt.grid(True)
plt.tight_layout()

# Save plot for non-interactive environments
plt.savefig("tsne_projection.png")
# plt.show()  # Uncomment if running interactively

Output -

Code Explanation:

t-SNE is a non-linear, unsupervised dimensionality reduction technique best suited for visualizing high-dimensional data in 2 or 3 dimensions.
It preserves local structure — nearby points in high-dimensional space stay close in the lower-dimensional projection.
It's particularly useful when you want to visualize how well your data clusters without needing labels (though we use them just for plotting here).

Important Notes:

Perplexity is a key parameter. Values between 5–50 usually work well depending on the dataset size.
t-SNE is computationally intensive, and results may vary with each run unless random_state is set.

4. Autoencoders (Neural Network-based)

Description:

Autoencoders are a type of neural network trained to compress input data into a lower-dimensional representation and then reconstruct it back to the original data. The compressed representation (bottleneck layer) is the reduced feature space.

Key Concepts:

Non-linear dimensionality reduction.
Can learn complex feature representations.
Requires more data and computation compared to PCA/LDA.

Architecture:

Input layer

Encoding layers (compress data)
Bottleneck layer (lowest dimension)
Decoding layers (reconstruct data)

Use Case:

Useful for reducing dimensions in complex, non-linear datasets like images or sensor data.

Code Example (using Keras):


from sklearn.datasets import load_breast_cancer
from sklearn.preprocessing import MinMaxScaler
from keras.models import Model
from keras.layers import Input, Dense
import matplotlib.pyplot as plt

# Step 1: Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Step 2: Normalize input data (autoencoders work best with scaled data)
scaler = MinMaxScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Define Autoencoder architecture
input_layer = Input(shape=(X_scaled.shape[1],))
# Encoder
encoded = Dense(64, activation='relu')(input_layer)
encoded = Dense(32, activation='relu')(encoded)
bottleneck = Dense(2, activation='linear')(encoded)
# Decoder
decoded = Dense(32, activation='relu')(bottleneck)
decoded = Dense(64, activation='relu')(decoded)
output_layer = Dense(X_scaled.shape[1], activation='sigmoid')(decoded)

# Step 4: Build and compile autoencoder
autoencoder = Model(inputs=input_layer, outputs=output_layer)
autoencoder.compile(optimizer='adam', loss='mse')

# Step 5: Train the autoencoder
autoencoder.fit(X_scaled, X_scaled, epochs=50, batch_size=32, shuffle=True, verbose=0)

# Step 6: Extract encoder model to get reduced features
encoder = Model(inputs=input_layer, outputs=bottleneck)
X_reduced = encoder.predict(X_scaled)

# Step 7: Visualize the 2D reduced representation
plt.figure(figsize=(8, 6))
plt.scatter(X_reduced[:, 0], X_reduced[:, 1], c=y, cmap='viridis', alpha=0.7)
plt.title("2D Representation from Autoencoder Bottleneck")
plt.xlabel("Encoded Feature 1")
plt.ylabel("Encoded Feature 2")
plt.colorbar(label='Target')
plt.grid(True)
plt.tight_layout()
plt.savefig("autoencoder_projection.png")  # Use plt.show() if running interactively

Output -

Code Explanation:

An Autoencoder is a type of neural network that learns to compress (encode) the data into a lower dimension and then reconstruct it (decode) back to original.
The bottleneck layer is the reduced-dimensional representation. Here, we set it to 2 dimensions.
Unlike PCA or t-SNE, Autoencoders are non-linear, learned, and can scale well for large and complex data.

5. Independent Component Analysis (ICA)

Description:

ICA separates a multivariate signal into additive, independent non-Gaussian components. It is often used in signal processing and is useful when the source signals are statistically independent.

Use Case:

Commonly used in audio signal separation, image processing, or other cases where independent source signals are mixed.

Code Example:


from sklearn.datasets import load_breast_cancer
from sklearn.decomposition import FastICA
from sklearn.preprocessing import StandardScaler
import matplotlib.pyplot as plt

# Step 1: Load dataset
data = load_breast_cancer()
X, y = data.data, data.target

# Step 2: Standardize the features
scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 3: Apply FastICA
ica = FastICA(n_components=2, random_state=0)
X_ica = ica.fit_transform(X_scaled)

# Step 4: Visualize the reduced data
plt.figure(figsize=(8, 6))
plt.scatter(X_ica[:, 0], X_ica[:, 1], c=y, cmap='plasma', alpha=0.7)
plt.title("2D Representation using FastICA")
plt.xlabel("Independent Component 1")
plt.ylabel("Independent Component 2")
plt.grid(True)
plt.colorbar(label='Target')
plt.tight_layout()
plt.savefig("fastica_projection.png")  # Use plt.show() if running interactively

Output -

Code Explanation:

Independent Component Analysis (ICA) is a statistical technique that separates a multivariate signal into additive independent components.
Unlike PCA (which looks for uncorrelated axes), ICA looks for statistically independent components — useful in signal processing and data unmixing.
Here, we reduce to 2 components and visualize the result.

Choosing the Right Dimensionality Reduction Technique

Technique	Supervised	Linear/Non-Linear	Use Case
PCA	No	Linear	General dimensionality reduction
LDA	Yes	Linear	Classification tasks
t-SNE	No	Non-Linear	Data visualization
Autoencoders	No	Non-Linear	Complex data, unsupervised
ICA	No	Linear	Signal separation

Best Practices

Standardize your data before applying PCA or LDA to ensure that all features contribute equally.
Use PCA for data exploration and noise reduction.
Use t-SNE for visualizing clusters or embedding.
Evaluate reconstruction error when using autoencoders.
Avoid using dimensionality reduction blindly in production without evaluating the impact on model performance.

Overview

Dimensionality Reduction Techniques

Why Dimensionality Reduction Is Important

Types of Dimensionality Reduction Techniques

1. Principal Component Analysis (PCA)

Description:

Key Concepts:

Use Case:

Code Example:

Code Explanation:

2. Linear Discriminant Analysis (LDA)

Description:

Key Concepts:

Use Case:

Code Example:

Code Explanation:

3. t-Distributed Stochastic Neighbor Embedding (t-SNE)

Description:

Key Concepts:

Limitations:

Code Example:

Code Explanation:

Important Notes:

4. Autoencoders (Neural Network-based)

Description:

Key Concepts:

Architecture:

Use Case:

Code Example (using Keras):

Code Explanation:

5. Independent Component Analysis (ICA)

Description:

Use Case:

Code Example:

Code Explanation:

Choosing the Right Dimensionality Reduction Technique

Best Practices

Feature Selection Techniques

Feature Extraction from Text and Images

Related Tutorials

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin