Association Rule Learning (Apriori, FP-Growth)

  Add to Bookmark

Association Rule Learning is an unsupervised learning method used to discover interesting relationships, patterns, or correlations among a set of items in large datasets. It is commonly used in market basket analysis to identify products frequently bought together.

Two popular algorithms for association rule mining are:

  • Apriori
  • FP-Growth (Frequent Pattern Growth)

These algorithms help generate frequent itemsets and association rules from transaction data.


Key Concepts

TermDescription
ItemsetA group of one or more items
SupportHow frequently the itemset appears in the dataset
ConfidenceProbability that item B is purchased when item A is purchased
LiftMeasures how much more likely B is purchased with A, compared to by chance

Apriori Algorithm

Apriori uses a bottom-up approach by generating candidate itemsets and pruning them based on minimum support. It is iterative and simple but can be slow on large datasets.

How Apriori Works:

  1. Identify all frequent itemsets (items appearing above a support threshold)
  2. Generate association rules from these itemsets
  3. Prune rules based on confidence and lift thresholds

FP-Growth Algorithm

FP-Growth avoids candidate generation and instead uses a compact FP-tree structure to find frequent itemsets. It is faster and more efficient than Apriori, especially for large datasets.

How FP-Growth Works:

  1. Construct an FP-Tree by scanning data once
  2. Mine frequent patterns from the tree recursively

Python Example: Market Basket Analysis

We'll use the mlxtend library for Apriori and FP-Growth.

Install Dependencies

pip install mlxtend

Sample Code

import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules

# Sample transaction data
dataset = [
    ['milk', 'bread', 'butter'],
    ['bread', 'butter'],
    ['milk', 'bread'],
    ['milk', 'bread', 'butter', 'jam'],
    ['bread', 'jam']
]

# Convert to one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder

te = TransactionEncoder()
te_data = te.fit_transform(dataset)
df = pd.DataFrame(te_data, columns=te.columns_)

# Apply Apriori
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)

# Generate rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)

print("Frequent Itemsets:")
print(frequent_itemsets)

print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])

Output-

Frequent Itemsets:
   support         itemsets
0      1.0          (bread)
1      0.6         (butter)
2      0.6           (milk)
3      0.6  (butter, bread)
4      0.6    (milk, bread)

Association Rules:
  antecedents consequents  support  confidence  lift
0    (butter)     (bread)      0.6         1.0   1.0
1      (milk)     (bread)      0.6         1.0   1.0

Output Explanation

  • Frequent Itemsets: Shows item combinations that meet the minimum support.
  • Association Rules: Displays rules like {bread} → {butter} with metrics such as confidence and lift.

Applications of Association Rule Learning

  • Market Basket Analysis
  • Recommendation Systems
  • Website Navigation Patterns
  • Fraud Detection
  • Medical Diagnosis (symptom-disease patterns)

Advantages

  • Easy to implement and understand
  • Generates interpretable rules
  • Works well for retail, e-commerce, and log data

Limitations

  • Apriori is computationally expensive for large datasets
  • Generates too many rules without pruning
  • Doesn’t consider temporal or causal relationships

Tips for Beginners

  • Use min_support and min_confidence thresholds to filter meaningful rules
  • Visualize itemset frequencies using bar charts or heatmaps
  • Start with small datasets to understand algorithm behavior

Tips for Professionals

  • Use FP-Growth over Apriori for scalability
  • Apply rule pruning techniques to reduce noisy or redundant rules
  • Combine with user segmentation for personalized recommendations
  • Use Lift to identify truly interesting rules beyond random co-occurrence

Summary

Association Rule Learning is a powerful method for discovering item relationships in transaction data. Apriori is simple and suitable for smaller datasets, while FP-Growth is more efficient for larger volumes. Both help uncover valuable insights in retail, finance, and web usage patterns.