Overfitting and Underfitting in Models

Add to Bookmark

When training machine learning models, achieving a balance between accuracy and generalization is key. Two common issues that arise during this process are overfitting and underfitting. Understanding these concepts helps build models that perform well on both training and unseen data.

What You'll Learn

What overfitting and underfitting mean
How to detect them
Examples using Python
Strategies to prevent and correct these problems

What is Overfitting?

Overfitting happens when a model learns the training data too well, including noise and minor fluctuations. It performs well on training data but poorly on new, unseen data.

Symptoms:

High accuracy on training data
Low accuracy on test/validation data

Visual Example:

A highly complex curve that passes through every training point but fails to predict test data properly.

What is Underfitting?

Underfitting occurs when a model is too simple to capture the underlying pattern of the data. It performs poorly on both training and test data.

Symptoms:

Low training accuracy
Low test accuracy

Visual Example:

A straight line trying to fit a non-linear relationship, missing the trend entirely.

Example in Python

from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
# Generate dataset
X, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Underfitting model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
y_pred_linear = linear_model.predict(X_test)
# Overfitting model
tree_model = DecisionTreeRegressor(max_depth=20)
tree_model.fit(X_train, y_train)
y_pred_tree = tree_model.predict(X_test)
# Evaluation
print("Linear Regression MSE (Underfitting):", mean_squared_error(y_test, y_pred_linear))
print("Decision Tree MSE (Overfitting):", mean_squared_error(y_test, y_pred_tree))

Output-

Linear Regression MSE (Underfitting): 234.45500969670806
Decision Tree MSE (Overfitting): 500.20712573958053

How to Detect Overfitting and Underfitting

Observation	Training Error	Test Error
High	High	Underfitting
Low	High	Overfitting
Low	Low	Good Fit

How to Prevent Overfitting

Use simpler models (reduce depth, number of features)
Apply regularization (L1, L2 penalties)
Use cross-validation
Prune decision trees
Increase training data
Apply early stopping in iterative models

How to Prevent Underfitting

Use more complex models
Increase model capacity (more layers, features)
Reduce regularization
Improve feature engineering

Conclusion

Overfitting and underfitting are crucial problems in model development. Striking the right balance helps build models that generalize well to unseen data. Monitoring training and validation metrics throughout the training process is a good way to keep these issues in check.

Overview

Overfitting and Underfitting in Models

What You'll Learn

What is Overfitting?

Symptoms:

Visual Example:

What is Underfitting?

Symptoms:

Visual Example:

Example in Python

How to Detect Overfitting and Underfitting

How to Prevent Overfitting

How to Prevent Underfitting

Conclusion

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin

Overview

Overfitting and Underfitting in Models

What You'll Learn

What is Overfitting?

Symptoms:

Visual Example:

What is Underfitting?

Symptoms:

Visual Example:

Example in Python

How to Detect Overfitting and Underfitting

How to Prevent Overfitting

How to Prevent Underfitting

Conclusion

Related Tutorials

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin