Principal Component Analysis (PCA) is a dimensionality reduction technique used in unsupervised learning. It helps simplify datasets by transforming features into a smaller set of uncorrelated variables called principal components, which capture the maximum variance in the data.
PCA is widely used for:
Visualization of high-dimensional data
Speeding up training by reducing input size
Eliminating multicollinearity
Noise reduction
Why Use PCA?
Real-world datasets can have dozens or hundreds of features.
Not all features contribute equally to the model.
PCA projects data to a lower-dimensional space while preserving the most critical information (variance).
How PCA Works
Standardize the data PCA is sensitive to scale, so data must be normalized.
Compute the covariance matrix Captures the relationship between features.
Compute eigenvectors and eigenvalues Eigenvectors (principal components) define new axes, and eigenvalues represent variance captured.
Select top k components Based on the cumulative explained variance.
Project original data Transform the data onto the new k-dimensional subspace.