- Supervised Learning
-
Overview
- Introduction to Supervised Learning
- Linear Regression and Its Applications
- Logistic Regression for Classification
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN) Algorithm
- Naïve Bayes Classifier
- Gradient Boosting (XGBoost, LightGBM)
- Overfitting and Underfitting in Models
- Bias-Variance Tradeoff
Overfitting and Underfitting in Models
Add to BookmarkWhen training machine learning models, achieving a balance between accuracy and generalization is key. Two common issues that arise during this process are overfitting and underfitting. Understanding these concepts helps build models that perform well on both training and unseen data.
What You'll Learn
- What overfitting and underfitting mean
- How to detect them
- Examples using Python
- Strategies to prevent and correct these problems
What is Overfitting?
Overfitting happens when a model learns the training data too well, including noise and minor fluctuations. It performs well on training data but poorly on new, unseen data.
Symptoms:
- High accuracy on training data
- Low accuracy on test/validation data
Visual Example:
A highly complex curve that passes through every training point but fails to predict test data properly.
What is Underfitting?
Underfitting occurs when a model is too simple to capture the underlying pattern of the data. It performs poorly on both training and test data.
Symptoms:
- Low training accuracy
- Low test accuracy
Visual Example:
A straight line trying to fit a non-linear relationship, missing the trend entirely.
Example in Python
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
# Generate dataset
X, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Underfitting model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
y_pred_linear = linear_model.predict(X_test)
# Overfitting model
tree_model = DecisionTreeRegressor(max_depth=20)
tree_model.fit(X_train, y_train)
y_pred_tree = tree_model.predict(X_test)
# Evaluation
print("Linear Regression MSE (Underfitting):", mean_squared_error(y_test, y_pred_linear))
print("Decision Tree MSE (Overfitting):", mean_squared_error(y_test, y_pred_tree))Output-
Linear Regression MSE (Underfitting): 234.45500969670806
Decision Tree MSE (Overfitting): 500.20712573958053How to Detect Overfitting and Underfitting
| Observation | Training Error | Test Error | Diagnosis |
|---|---|---|---|
| High | High | Underfitting | |
| Low | High | Overfitting | |
| Low | Low | Good Fit |
How to Prevent Overfitting
- Use simpler models (reduce depth, number of features)
- Apply regularization (L1, L2 penalties)
- Use cross-validation
- Prune decision trees
- Increase training data
- Apply early stopping in iterative models
How to Prevent Underfitting
- Use more complex models
- Increase model capacity (more layers, features)
- Reduce regularization
- Improve feature engineering
Conclusion
Overfitting and underfitting are crucial problems in model development. Striking the right balance helps build models that generalize well to unseen data. Monitoring training and validation metrics throughout the training process is a good way to keep these issues in check.
Prepare for Interview
- JavaScript Interview Questions for 5+ Years Experience
- JavaScript Interview Questions for 2–5 Years Experience
- JavaScript Interview Questions for 1–2 Years Experience
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
Random Blogs
- The Beginner’s Guide to Normalization and Denormalization in Databases
- Government Datasets from 50 Countries for Machine Learning Training
- OLTP vs. OLAP Databases: Advanced Insights and Query Optimization Techniques
- The Ultimate Guide to Machine Learning (ML) for Beginners
- Understanding AI, ML, Data Science, and More: A Beginner's Guide to Choosing Your Career Path
- Transforming Logistics: The Power of AI in Supply Chain Management
- Time Series Analysis on Air Passenger Data
- What to Do When Your MySQL Table Grows Too Wide
- Where to Find Free Datasets for Your Next Machine Learning & Data Science Project
- Extract RGB Color From a Image Using CV2
- Datasets for Natural Language Processing
- Mastering SQL in 2025: A Complete Roadmap for Beginners
- Downlaod Youtube Video in Any Format Using Python Pytube Library
- Mastering Python in 2025: A Complete Roadmap for Beginners
- Understanding LLMs (Large Language Models): The Ultimate Guide for 2025
Datasets for Machine Learning
- Awesome-ChatGPT-Prompts
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset


