The bias-variance tradeoff is a fundamental concept in machine learning that helps explain the sources of error in model predictions. Understanding this tradeoff allows you to build models that generalize well, avoiding both underfitting and overfitting.
Bias is the error introduced by simplifying a complex problem. A model with high bias pays little attention to the training data and oversimplifies the model, which can lead to underfitting.
Example: Predicting house prices using just the average price regardless of features like size or location.
Variance is the error introduced by the model’s sensitivity to small fluctuations in the training data. A high-variance model pays too much attention to training data and may not perform well on unseen data, causing overfitting.
Example: A decision tree that grows very deep, perfectly fitting training data but failing on test data.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import PolynomialFeatures
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error
# Generate dataset
np.random.seed(42)
X = np.sort(np.random.rand(100, 1) * 2 - 1, axis=0)
y = X**3 + np.random.normal(0, 0.1, size=(100, 1))
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
# Function to fit and plot models with different complexities
def plot_models(degree):
poly = PolynomialFeatures(degree)
X_poly = poly.fit_transform(X_train)
model = LinearRegression().fit(X_poly, y_train)
X_test_poly = poly.transform(X_test)
y_pred = model.predict(X_test_poly)
plt.scatter(X_test, y_test, color='black', label='Test Data')
plt.plot(np.sort(X_test[:, 0]), y_pred[np.argsort(X_test[:, 0])], label=f'Degree {degree}')
plt.legend()
plt.title(f'Degree {degree} → MSE: {mean_squared_error(y_test, y_pred):.3f}')
plt.show()
plot_models(1) # High bias
plot_models(15) # High variance
plot_models(3) # BalancedOutput-
| Model Behavior | Bias | Variance | Problem |
|---|---|---|---|
| Underfitting | High | Low | Too simple |
| Overfitting | Low | High | Too complex |
| Good Generalization | Balanced | Balanced | Ideal scenario |
Mastering the bias-variance tradeoff is essential for designing effective machine learning models. It helps you understand the cause of model errors and guides your model selection, complexity, and tuning strategies.
Sign in to join the discussion and post comments.
Sign in