Decision Trees and Random Forests

Add to Bookmark

Decision Trees and Random Forests are powerful supervised learning algorithms used for both classification and regression tasks. They are easy to understand, interpret, and visualize, making them popular choices for real-world problems.

What is a Decision Tree?

A Decision Tree is a flowchart-like tree structure where:

Each internal node represents a feature (attribute).
Each branch represents a decision rule.
Each leaf node represents an output label (class or value).

The tree splits the data recursively based on the feature that provides the best separation using metrics like Gini Index or Information Gain (based on entropy).

Example Use Cases

Credit scoring
Medical diagnosis
Customer segmentation
Fraud detection

Building a Simple Decision Tree in Python

We’ll classify whether a person will buy a product based on age and income.

from sklearn.tree import DecisionTreeClassifier
import pandas as pd

# Sample dataset
data = {
    'Age': [25, 30, 45, 35, 22],
    'Income': [50000, 60000, 80000, 120000, 30000],
    'Buy': ['No', 'No', 'Yes', 'Yes', 'No']
}

df = pd.DataFrame(data)

# Feature and target
X = df[['Age', 'Income']]
y = df['Buy']

# Create and train model
model = DecisionTreeClassifier()
model.fit(X, y)

# Predict with column names to match training data
new_data = pd.DataFrame([[28, 55000]], columns=['Age', 'Income'])
print(model.predict(new_data))

Output-

['No']

Visualizing the Tree

from sklearn.tree import plot_tree
import matplotlib.pyplot as plt
plt.figure(figsize=(10, 6))
plot_tree(model, feature_names=['Age', 'Income'], class_names=['No', 'Yes'], filled=True)
plt.show()

What is a Random Forest?

A Random Forest is an ensemble of decision trees. It builds multiple decision trees using random subsets of the data and features, then averages their predictions (regression) or uses majority voting (classification).

This reduces overfitting and improves accuracy and generalization.

Python Example: Random Forest Classifier

from sklearn.ensemble import RandomForestClassifier
import pandas as pd

# Create and train Random Forest model
rf_model = RandomForestClassifier(n_estimators=100, random_state=42)
rf_model.fit(X, y)

# Prediction with column names to avoid warning
new_data = pd.DataFrame([[28, 55000]], columns=['Age', 'Income'])
print(rf_model.predict(new_data))

Output-

['No']

Advantages and Disadvantages

Decision Trees	Random Forests
Easy to interpret and visualize	More accurate than single trees
Can overfit easily	Resistant to overfitting
Fast training on small data	Slower and more resource-intensive

Evaluation Metrics

Use the same metrics as other classifiers:

Accuracy
Precision, Recall, F1 Score
Confusion Matrix

from sklearn.metrics import accuracy_score, classification_report

y_pred = model.predict(X)
print("Accuracy:", accuracy_score(y, y_pred))
print(classification_report(y, y_pred))

Output -

Accuracy: 1.0
              precision    recall  f1-score   support

          No       1.00      1.00      1.00         3
         Yes       1.00      1.00      1.00         2

    accuracy                           1.00         5
   macro avg       1.00      1.00      1.00         5
weighted avg       1.00      1.00      1.00         5

Tips for Beginners

Decision Trees are great for understanding basic concepts of supervised learning.
Use them for initial modeling and understanding feature importance.

Tips for Professionals

Tune hyperparameters like max_depth, min_samples_split, and n_estimators to improve performance.
Use RandomForestClassifier with feature importance (model.feature_importances_) to rank inputs.
Combine with grid search or cross-validation for better models.

Summary

Decision Trees are intuitive and easy to use for both regression and classification.
Random Forests enhance the performance by building multiple trees and averaging predictions.
Together, they form a robust part of any machine learning toolkit.

Overview

Decision Trees and Random Forests

What is a Decision Tree?

Example Use Cases

Building a Simple Decision Tree in Python

Visualizing the Tree

What is a Random Forest?

Python Example: Random Forest Classifier

Advantages and Disadvantages

Evaluation Metrics

Tips for Beginners

Tips for Professionals

Summary

Logistic Regression for Classification

Support Vector Machines (SVM)

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin

Overview

Decision Trees and Random Forests

What is a Decision Tree?

Example Use Cases

Building a Simple Decision Tree in Python

Visualizing the Tree

What is a Random Forest?

Python Example: Random Forest Classifier

Advantages and Disadvantages

Evaluation Metrics

Tips for Beginners

Tips for Professionals

Summary

Logistic Regression for Classification

Support Vector Machines (SVM)

Related Tutorials

Prepare for Interview

Tutorials

Random Blogs

Datasets for Machine Learning

Categories

Follow us on Linkedin