Association Rule Learning is an unsupervised learning method used to discover interesting relationships, patterns, or correlations among a set of items in large datasets. It is commonly used in market basket analysis to identify products frequently bought together.
Two popular algorithms for association rule mining are:
These algorithms help generate frequent itemsets and association rules from transaction data.
| Term | Description |
|---|---|
| Itemset | A group of one or more items |
| Support | How frequently the itemset appears in the dataset |
| Confidence | Probability that item B is purchased when item A is purchased |
| Lift | Measures how much more likely B is purchased with A, compared to by chance |
Apriori uses a bottom-up approach by generating candidate itemsets and pruning them based on minimum support. It is iterative and simple but can be slow on large datasets.
FP-Growth avoids candidate generation and instead uses a compact FP-tree structure to find frequent itemsets. It is faster and more efficient than Apriori, especially for large datasets.
We'll use the mlxtend library for Apriori and FP-Growth.
pip install mlxtendimport pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
# Sample transaction data
dataset = [
['milk', 'bread', 'butter'],
['bread', 'butter'],
['milk', 'bread'],
['milk', 'bread', 'butter', 'jam'],
['bread', 'jam']
]
# Convert to one-hot encoded DataFrame
from mlxtend.preprocessing import TransactionEncoder
te = TransactionEncoder()
te_data = te.fit_transform(dataset)
df = pd.DataFrame(te_data, columns=te.columns_)
# Apply Apriori
frequent_itemsets = apriori(df, min_support=0.6, use_colnames=True)
# Generate rules
rules = association_rules(frequent_itemsets, metric="confidence", min_threshold=0.7)
print("Frequent Itemsets:")
print(frequent_itemsets)
print("\nAssociation Rules:")
print(rules[['antecedents', 'consequents', 'support', 'confidence', 'lift']])Output-
Frequent Itemsets:
support itemsets
0 1.0 (bread)
1 0.6 (butter)
2 0.6 (milk)
3 0.6 (butter, bread)
4 0.6 (milk, bread)
Association Rules:
antecedents consequents support confidence lift
0 (butter) (bread) 0.6 1.0 1.0
1 (milk) (bread) 0.6 1.0 1.0{bread} → {butter} with metrics such as confidence and lift.min_support and min_confidence thresholds to filter meaningful rulesAssociation Rule Learning is a powerful method for discovering item relationships in transaction data. Apriori is simple and suitable for smaller datasets, while FP-Growth is more efficient for larger volumes. Both help uncover valuable insights in retail, finance, and web usage patterns.
Sign in to join the discussion and post comments.
Sign in