- Unsupervised Learning
-
Overview
- Introduction to Unsupervised Learning
- K-Means Clustering Algorithm
- Hierarchical Clustering
- Principal Component Analysis (PCA)
- Autoencoders for Dimensionality Reduction
- Gaussian Mixture Models (GMM)
- Association Rule Learning (Apriori, FP-Growth)
- DBSCAN Clustering Algorithm
- Self-Organizing Maps (SOM)
- Applications of Unsupervised Learning
Association Rule Learning (Apriori, FP-Growth)
Add to BookmarkAssociation Rule Learning is an unsupervised learning method used to discover interesting relationships, patterns, or correlations among a set of items in large datasets. It is commonly used in market basket analysis to identify products frequently bought together.
Two popular algorithms for association rule mining are:
- Apriori
- FP-Growth (Frequent Pattern Growth)
These algorithms help generate frequent itemsets and association rules from transaction data.
Key Concepts
Term | Description |
---|---|
Itemset | A group of one or more items |
Support | How frequently the itemset appears in the dataset |
Confidence | Probability that item B is purchased when item A is purchased |
Lift | Measures how much more likely B is purchased with A, compared to by chance |
Apriori Algorithm
Apriori uses a bottom-up approach by generating candidate itemsets and pruning them based on minimum support. It is iterative and simple but can be slow on large datasets.
How Apriori Works:
- Identify all frequent itemsets (items appearing above a support threshold)
- Generate association rules from these itemsets
- Prune rules based on confidence and lift thresholds
FP-Growth Algorithm
FP-Growth avoids candidate generation and instead uses a compact FP-tree structure to find frequent itemsets. It is faster and more efficient than Apriori, especially for large datasets.
How FP-Growth Works:
- Construct an FP-Tree by scanning data once
- Mine frequent patterns from the tree recursively
Python Example: Market Basket Analysis
We'll use the mlxtend
library for Apriori and FP-Growth.
Install Dependencies
pip install mlxtend
Sample Code
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
# Sample transaction data
dataset = [
['milk', 'bread', 'butter'],
['bread', 'butter'],
['milk', 'bread'],
['milk', 'bread', 'butter', 'jam'],
['bread', 'jam']
]
# Convert to one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit_transform(dataset)
df = pd.DataFrame(te_data, columns=te.columns_)
# Apply Apriori
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
# Generate rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])
Output-
Frequent Itemsets:
support itemsets
0 1.0 (bread)
1 0.6 (butter)
2 0.6 (milk)
3 0.6 (butter, bread)
4 0.6 (milk, bread)
Association Rules:
antecedents consequents support confidence lift
0 (butter) (bread) 0.6 1.0 1.0
1 (milk) (bread) 0.6 1.0 1.0
Output Explanation
- Frequent Itemsets: Shows item combinations that meet the minimum support.
- Association Rules: Displays rules like
{bread} → {butter}
with metrics such as confidence and lift.
Applications of Association Rule Learning
- Market Basket Analysis
- Recommendation Systems
- Website Navigation Patterns
- Fraud Detection
- Medical Diagnosis (symptom-disease patterns)
Advantages
- Easy to implement and understand
- Generates interpretable rules
- Works well for retail, e-commerce, and log data
Limitations
- Apriori is computationally expensive for large datasets
- Generates too many rules without pruning
- Doesn’t consider temporal or causal relationships
Tips for Beginners
- Use
min_support
andmin_confidence
thresholds to filter meaningful rules - Visualize itemset frequencies using bar charts or heatmaps
- Start with small datasets to understand algorithm behavior
Tips for Professionals
- Use FP-Growth over Apriori for scalability
- Apply rule pruning techniques to reduce noisy or redundant rules
- Combine with user segmentation for personalized recommendations
- Use Lift to identify truly interesting rules beyond random co-occurrence
Summary
Association Rule Learning is a powerful method for discovering item relationships in transaction data. Apriori is simple and suitable for smaller datasets, while FP-Growth is more efficient for larger volumes. Both help uncover valuable insights in retail, finance, and web usage patterns.
Prepare for Interview
- JavaScript Interview Questions for 0–1 Year Experience
- JavaScript Interview Questions For Fresher
- SQL Interview Questions for 5+ Years Experience
- SQL Interview Questions for 2–5 Years Experience
- SQL Interview Questions for 1–2 Years Experience
- SQL Interview Questions for 0–1 Year Experience
- SQL Interview Questions for Freshers
- Design Patterns in Python
- Dynamic Programming and Recursion in Python
- Trees and Graphs in Python
- Linked Lists, Stacks, and Queues in Python
- Sorting and Searching in Python
- Debugging in Python
- Unit Testing in Python
- Asynchronous Programming in PYthon
Random Blogs
- Understanding SQL vs MySQL vs PostgreSQL vs MS SQL vs Oracle and Other Popular Databases
- Understanding OLTP vs OLAP Databases: How SQL Handles Query Optimization
- Create Virtual Host for Nginx on Ubuntu (For Yii2 Basic & Advanced Templates)
- Avoiding the Beginner’s Trap: Key Python Fundamentals You Shouldn't Skip
- Important Mistakes to Avoid While Advertising on Facebook
- Datasets for Speech Recognition Analysis
- What Is SEO and Why Is It Important?
- Types of Numbers in Python
- Data Analytics: The Power of Data-Driven Decision Making
- Datasets for Natural Language Processing
- 5 Ways Use Jupyter Notebook Online Free of Cost
- The Beginner’s Guide to Normalization and Denormalization in Databases
- Deep Learning (DL): The Core of Modern AI
- The Ultimate Guide to Starting a Career in Computer Vision
- Exploratory Data Analysis On Iris Dataset
Datasets for Machine Learning
- Amazon Product Reviews Dataset
- Ozone Level Detection Dataset
- Bank Transaction Fraud Detection
- YouTube Trending Video Dataset (updated daily)
- Covid-19 Case Surveillance Public Use Dataset
- US Election 2020
- Forest Fires Dataset
- Mobile Robots Dataset
- Safety Helmet Detection
- All Space Missions from 1957
- OSIC Pulmonary Fibrosis Progression Dataset
- Wine Quality Dataset
- Google Audio Dataset
- Iris flower dataset
- Artificial Characters Dataset