lecture 4: association market basket analysis analysis of customer behavior and service modeling
TRANSCRIPT
Lecture 4: Association
Market Basket Analysis
Analysis of Customer Behavior and Service Modeling
What Is Association Mining?
Association rule mining:– Finding frequent patterns, associations, correlations,
or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.
Applications:– Market basket analysis, cross-marketing, catalog
design, loss-leader analysis, clustering, classification, etc.
Examples:– Rule form: “Body Head [support, confidence]”
• buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%]• major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%,
75%]
Support and Confidence
Support – Percent of samples contain both A and B– support(A B) = P(A ∩ B)
Confidence– Percent of A samples also containing B – confidence(A B) = P(B|A)
Example– computer financial_management_software
[support = 2%, confidence = 60%]
Association Rules: Basic Concepts
Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)
Find: all rules that correlate the presence of one set of items with that of another set of items– e.g., 98% of people who purchase tires and auto accessories
also get automotive services done Applications
– Home Electronics - What other products should the store stocks up?
– Retailing – Shelf design, promotion structuring, direct marketing
Find all the rules A C with minimum confidence and support– Support (s) probability that a
transaction contains {A & C}– Confidence (c) conditional
probability that a transaction having {A} also contains {C}
Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F
Let minimum support 50%, and minimum confidence 50%, we have
A C (50%, 66.6%)C A (50%, 100%)
Customerbuys diaper
Customerbuys both
Customerbuys beer
Rule Measures: Support and Confidence
For rule A C:support = support({A, C}) = 50%confidence = support({A, C})/support({A}) =
66.6%
Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F
Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%
Target:Min. support 50%Min. confidence 50%
Mining Association Rules: An Example
An Example of Market Basket(1)
There are 8 transactions on three items on A (Apple), B (Banana) , C (Carrot).
Check associations for below two cases.
(1) A B (2) (A, B) C
# Basket
1 A
2 B
3 C
4 A, B
5 A, C
6 B, C
7 A, B, C
8 A, B, C
An Example of Market Basket(1(2)
Basic probabilities are below:
(1) AB (2) (A, B) C
LHS P(A) = 5/8 = 0.625 P(A,B) = 3/8 = 0.375
RHS P(B) = 5/8 = 0.625 P(C) = 5/8 = 0.625
Coverage
LHS = 0.625 LHS = 0.375
Support P(A∩B) = 3/8 = 0.375 P((A,B)∩C)) = 2/8 =0.25
Confidence
P(B|A)=0.375/0.625=0.6
P(C|(A,B))=0.25/0.375=0.7
Lift0.375/(0.625*0.625)=0.96
0.25/(0.375*0.625)=1.07
Leverage 0.375 - 0.390 = -0.015 0.25 - 0.234 = 0.016
What are good association rules? (How to interpret them?)
– If lift is close to 1, it means there is no association between two items (sets).
– If lift is greater than 1, it means there is a positive association between two items (sets).
– If lift is less than 1, it means there is a negative association between two items (sets).
Lift
Leverage
– Leverage = P(A∩B) - P(A)*P(B) , it has three types① Leverage > 0② Leverage = 0 ③ Leverage < 0
– ① Two items (sets) are positively associated– ② Two items (sets) are independent– ③Two items (sets) are negatively associated
Lab on Association Rules(1)
SPSS Clementine, SAS Enterprise Miner have association rules softwares.
This exercise uses Magnum Opus. Go to http://www.rulequest.com and download
Magnum Opus evaluation version ( click)
After you install the problem, you can see below initial screen. From menu, choose File – Import Data (Ctrl – O).
Demo Data sets are already there. Magnum Opus has two types of data sets available: (transaction data: *.idi, *.itl) and (attribute-value data: *.data, *.nam)
Data format has below two types:(*.idi, *.itl).
idi(identifier-item file)
itl(item list file)
001, apples 001, oranges 001, bananas 002, apples 002, carrots 002, lettuce 002, tomatoes
apples, oranges, bananas apples, carrots, lettuce, tomatoes
If you open tutorial.idi using note pad, you can see the file inside as left.
The example left has 5 transactions (baskets)
File – Import Data, or click . click Tutorial.idi
Check Identifier – item file and click Next >.
Click Yes and click Next > …
click Next > …
Click Next > …
What percentage of whole file you want to use? Type 50% and click Next > …
click Import Data 를 클릭
Then, you can see a screen like below left.
Set things as they are.– Search by:
LIFT– Minimum
lift: 1– Maximum
no. of rules: 10
Click GO
Results are saved in tutorial.out file. Below are rules derived:
lettuce & carrotsare associated with tomatoeswith strength = 0.857coverage = 0.042: 21 cases satisfy the LHSsupport = 0.036: 18 cases satisfy both the LHS and the RHSlift 3.51: the strength is 3.51 times greater than the strength if there were no associationleverage = 0.0258: the support is 0.0258 (12.9 cases) greater than if there were no association
lettuce & carrots tomatoes– When Lettuce and carrots are purchase then they buy
tomatoes– coverage = 0.042: 21 cases satisfy the LHS– LHS(lettuce & carrots) = 21/500 = 0.042
support = 0.036: 18 cases satisfy both the LHS and the RHS– P((lettuce & carrots) ∩ tomatoes)) = 18/500 = 0.036
strength(confidence) = 0.857– P(support|LHS)= 18/21 = 0.036/0.042 = 0.857
lift 3.51: the strength is 3.51 times greater than the strength if there were no association– 즉 , (18/21)/(122/500) = 3.51
leverage = 0.0258: the support is 0.0258 (12.9 cases) greater than if there were no association– P(LHS ∩ RHS) – P(A)*P(B) = 0.036 –
0.042*0.244 = 0.0258