Bias-Variance Tradeoff

  Add to Bookmark

The bias-variance tradeoff is a fundamental concept in machine learning that helps explain the sources of error in model predictions. Understanding this tradeoff allows you to build models that generalize well, avoiding both underfitting and overfitting.


What You'll Learn

  • What is bias and variance
  • How they impact model performance
  • The tradeoff between bias and variance
  • Visual and code-based explanations
  • How to manage this tradeoff in real-world ML tasks

What is Bias?

Bias is the error introduced by simplifying a complex problem. A model with high bias pays little attention to the training data and oversimplifies the model, which can lead to underfitting.

Example: Predicting house prices using just the average price regardless of features like size or location.


What is Variance?

Variance is the error introduced by the model’s sensitivity to small fluctuations in the training data. A high-variance model pays too much attention to training data and may not perform well on unseen data, causing overfitting.

Example: A decision tree that grows very deep, perfectly fitting training data but failing on test data.


The Bias-Variance Tradeoff

  • High bias, low variance → Underfitting
  • Low bias, high variance → Overfitting
  • Optimal balance → Good generalization

Error Decomposition:

Visual Example Using Python

import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate dataset
np.random.seed(42)
X = np.sort(np.random.rand(100, 1) * 2 - 1, axis=0)
y = X**3 + np.random.normal(0, 0.1, size=(100, 1))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Function to fit and plot models with different complexities
def plot_models(degree):
    poly = PolynomialFeatures(degree)
    X_poly = poly.fit_transform(X_train)
    model = LinearRegression().fit(X_poly, y_train)
    
    X_test_poly = poly.transform(X_test)
    y_pred = model.predict(X_test_poly)
    
    plt.scatter(X_test, y_test, color='black', label='Test Data')
    plt.plot(np.sort(X_test[:, 0]), y_pred[np.argsort(X_test[:, 0])], label=f'Degree {degree}')
    plt.legend()
    plt.title(f'Degree {degree} → MSE: {mean_squared_error(y_test, y_pred):.3f}')
    plt.show()
plot_models(1)  # High bias
plot_models(15) # High variance
plot_models(3)  # Balanced

Output-


How to Manage the Tradeoff

  • Use cross-validation to assess model performance
  • Try regularization (Lasso, Ridge) to reduce variance
  • Simplify or increase model complexity depending on the bias or variance issue
  • Add more data to reduce variance
  • Perform feature selection or extraction

Summary

Model BehaviorBiasVarianceProblem
UnderfittingHighLowToo simple
OverfittingLowHighToo complex
Good GeneralizationBalancedBalancedIdeal scenario

Mastering the bias-variance tradeoff is essential for designing effective machine learning models. It helps you understand the cause of model errors and guides your model selection, complexity, and tuning strategies.