- Supervised Learning
-
Overview
- Introduction to Supervised Learning
- Linear Regression and Its Applications
- Logistic Regression for Classification
- Decision Trees and Random Forests
- Support Vector Machines (SVM)
- K-Nearest Neighbors (KNN) Algorithm
- Naïve Bayes Classifier
- Gradient Boosting (XGBoost, LightGBM)
- Overfitting and Underfitting in Models
- Bias-Variance Tradeoff
Decision Trees and Random Forests
Add to BookmarkDecision Trees and Random Forests are powerful supervised learning algorithms used for both classification and regression tasks. They are easy to understand, interpret, and visualize, making them popular choices for real-world problems.
What is a Decision Tree?
A Decision Tree is a flowchart-like tree structure where:
- Each internal node represents a feature (attribute).
- Each branch represents a decision rule.
- Each leaf node represents an output label (class or value).
The tree splits the data recursively based on the feature that provides the best separation using metrics like Gini Index or Information Gain (based on entropy).
Example Use Cases
- Credit scoring
- Medical diagnosis
- Customer segmentation
- Fraud detection
Building a Simple Decision Tree in Python
We’ll classify whether a person will buy a product based on age and income.
from sklearn.tree import DecisionTreeClassifier
import pandas as pd
# Sample dataset
data = {
'Age': [25, 30, 45, 35, 22],
'Income': [50000, 60000, 80000, 120000, 30000],
'Buy': ['No', 'No', 'Yes', 'Yes', 'No']
}
df = pd.DataFrame(data)
# Feature and target
X = df[['Age', 'Income']]
y = df['Buy']
# Create and train model
model = DecisionTreeClassifier()
model.fit(X, y)
# Predict with column names to match training data
new_data = pd.DataFrame([[28, 55000]], columns=['Age', 'Income'])
print(model.predict(new_data))
Output-
['No']
Visualizing the Tree
from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plot_tree(model, feature_names=['Age', 'Income'], class_names=['No', 'Yes'], filled=True)
plt.show()
What is a Random Forest?
A Random Forest is an ensemble of decision trees. It builds multiple decision trees using random subsets of the data and features, then averages their predictions (regression) or uses majority voting (classification).
This reduces overfitting and improves accuracy and generalization.
Python Example: Random Forest Classifier
from sklearn.ensemble import RandomForestClassifier
import pandas as pd
# Create and train Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X, y)
# Prediction with column names to avoid warning
new_data = pd.DataFrame([[28, 55000]], columns=['Age', 'Income'])
print(rf_model.predict(new_data))
Output-
['No']
Advantages and Disadvantages
Decision Trees | Random Forests |
---|---|
Easy to interpret and visualize | More accurate than single trees |
Can overfit easily | Resistant to overfitting |
Fast training on small data | Slower and more resource-intensive |
Evaluation Metrics
Use the same metrics as other classifiers:
- Accuracy
- Precision, Recall, F1 Score
- Confusion Matrix
from sklearn.metrics import accuracy_score, classification_report
y_pred = model.predict(X)
print("Accuracy:", accuracy_score(y, y_pred))
print(classification_report(y, y_pred))
Output -
Accuracy: 1.0
precision recall f1-score support
No 1.00 1.00 1.00 3
Yes 1.00 1.00 1.00 2
accuracy 1.00 5
macro avg 1.00 1.00 1.00 5
weighted avg 1.00 1.00 1.00 5
Tips for Beginners
- Decision Trees are great for understanding basic concepts of supervised learning.
- Use them for initial modeling and understanding feature importance.
Tips for Professionals
- Tune hyperparameters like
max_depth
,min_samples_split
, andn_estimators
to improve performance. - Use
RandomForestClassifier
with feature importance (model.feature_importances_
) to rank inputs. - Combine with grid search or cross-validation for better models.
Summary
- Decision Trees are intuitive and easy to use for both regression and classification.
- Random Forests enhance the performance by building multiple trees and averaging predictions.
- Together, they form a robust part of any machine learning toolkit.
Prepare for Interview
- JavaScript Interview Questions for 5+ Years Experience
- JavaScript Interview Questions for 2–5 Years Experience
- JavaScript Interview Questions for 1–2 Years Experience
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
Random Blogs
- Top 15 Recommended SEO Tools
- Understanding SQL vs MySQL vs PostgreSQL vs MS SQL vs Oracle and Other Popular Databases
- Quantum AI – The Future of AI Powered by Quantum Computing
- Datasets for Natural Language Processing
- Python Challenging Programming Exercises Part 1
- How Multimodal Generative AI Will Change Content Creation Forever
- Google’s Core Update in May 2020: What You Need to Know
- Window Functions in SQL – The Ultimate Guide
- Convert RBG Image to Gray Scale Image Using CV2
- Create Virtual Host for Nginx on Ubuntu (For Yii2 Basic & Advanced Templates)
- Avoiding the Beginner’s Trap: Key Python Fundamentals You Shouldn't Skip
- Grow your business with Facebook Marketing
- Mastering Python in 2025: A Complete Roadmap for Beginners
- 10 Awesome Data Science Blogs To Check Out
- AI in Marketing & Advertising: The Future of AI-Driven Strategies
Datasets for Machine Learning
- Awesome-ChatGPT-Prompts
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset