- Supervised Learning
-
Overview
- Introduction to Supervised Learning
- Linear Regression and Its Applications
- Logistic Regression for Classification
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN) Algorithm
- Naïve Bayes Classifier
- Gradient Boosting (XGBoost, LightGBM)
- Overfitting and Underfitting in Models
- Bias-Variance Tradeoff
Overfitting and Underfitting in Models
Add to BookmarkWhen training machine learning models, achieving a balance between accuracy and generalization is key. Two common issues that arise during this process are overfitting and underfitting. Understanding these concepts helps build models that perform well on both training and unseen data.
What You'll Learn
- What overfitting and underfitting mean
- How to detect them
- Examples using Python
- Strategies to prevent and correct these problems
What is Overfitting?
Overfitting happens when a model learns the training data too well, including noise and minor fluctuations. It performs well on training data but poorly on new, unseen data.
Symptoms:
- High accuracy on training data
- Low accuracy on test/validation data
Visual Example:
A highly complex curve that passes through every training point but fails to predict test data properly.
What is Underfitting?
Underfitting occurs when a model is too simple to capture the underlying pattern of the data. It performs poorly on both training and test data.
Symptoms:
- Low training accuracy
- Low test accuracy
Visual Example:
A straight line trying to fit a non-linear relationship, missing the trend entirely.
Example in Python
from sklearn.linear_model import LinearRegression
from sklearn.tree import DecisionTreeRegressor
from sklearn.model_selection import train_test_split
from sklearn.datasets import make_regression
from sklearn.metrics import mean_squared_error
# Generate dataset
X, y = make_regression(n_samples=100, n_features=1, noise=15, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Underfitting model
linear_model = LinearRegression()
linear_model.fit(X_train, y_train)
y_pred_linear = linear_model.predict(X_test)
# Overfitting model
tree_model = DecisionTreeRegressor(max_depth=20)
tree_model.fit(X_train, y_train)
y_pred_tree = tree_model.predict(X_test)
# Evaluation
print("Linear Regression MSE (Underfitting):", mean_squared_error(y_test, y_pred_linear))
print("Decision Tree MSE (Overfitting):", mean_squared_error(y_test, y_pred_tree))
Output-
Linear Regression MSE (Underfitting): 234.45500969670806
Decision Tree MSE (Overfitting): 500.20712573958053
How to Detect Overfitting and Underfitting
Observation | Training Error | Test Error | Diagnosis |
---|---|---|---|
High | High | Underfitting | |
Low | High | Overfitting | |
Low | Low | Good Fit |
How to Prevent Overfitting
- Use simpler models (reduce depth, number of features)
- Apply regularization (L1, L2 penalties)
- Use cross-validation
- Prune decision trees
- Increase training data
- Apply early stopping in iterative models
How to Prevent Underfitting
- Use more complex models
- Increase model capacity (more layers, features)
- Reduce regularization
- Improve feature engineering
Conclusion
Overfitting and underfitting are crucial problems in model development. Striking the right balance helps build models that generalize well to unseen data. Monitoring training and validation metrics throughout the training process is a good way to keep these issues in check.
Prepare for Interview
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
- Debugging in Python
- Unit Testing in Python
- Asynchronous Programming in PYthon
Random Blogs
- How to Start Your Career as a DevOps Engineer
- Top 10 Knowledge for Machine Learning & Data Science Students
- Ideas for Content of Every niche on Reader’s Demand during COVID-19
- What Is SEO and Why Is It Important?
- The Ultimate Guide to Data Science: Everything You Need to Know
- How AI is Making Humans Weaker – The Hidden Impact of Artificial Intelligence
- AI Agents & Autonomous Systems – The Future of Self-Driven Intelligence
- AI in Marketing & Advertising: The Future of AI-Driven Strategies
- Mastering SQL in 2025: A Complete Roadmap for Beginners
- Create Virtual Host for Nginx on Ubuntu (For Yii2 Basic & Advanced Templates)
- Where to Find Free Datasets for Your Next Machine Learning & Data Science Project
- 5 Ways Use Jupyter Notebook Online Free of Cost
- How to Become a Good Data Scientist ?
- Quantum AI – The Future of AI Powered by Quantum Computing
- Why to learn Digital Marketing?
Datasets for Machine Learning
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
- Artificial Characters Dataset