lecture 4: association market basket analysis analysis of customer behavior and service modeling

22
Lecture 4: Association Market Basket Analys is Analysis of Customer Behavior and Service Modeling

Upload: chrystal-watkins

Post on 14-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Lecture 4: Association

Market Basket Analysis

Analysis of Customer Behavior and Service Modeling

Page 2: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

What Is Association Mining?

Association rule mining:– Finding frequent patterns, associations, correlations,

or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories.

Applications:– Market basket analysis, cross-marketing, catalog

design, loss-leader analysis, clustering, classification, etc.

Examples:– Rule form: “Body Head [support, confidence]”

• buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%]• major(x, “CS”) ^ takes(x, “DB”) grade(x, “A”) [1%,

75%]

Page 3: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Support and Confidence

Support – Percent of samples contain both A and B– support(A B) = P(A ∩ B)

Confidence– Percent of A samples also containing B – confidence(A B) = P(B|A)

Example– computer financial_management_software

[support = 2%, confidence = 60%]

Page 4: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Association Rules: Basic Concepts

Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit)

Find: all rules that correlate the presence of one set of items with that of another set of items– e.g., 98% of people who purchase tires and auto accessories

also get automotive services done Applications

– Home Electronics - What other products should the store stocks up?

– Retailing – Shelf design, promotion structuring, direct marketing

Page 5: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Find all the rules A C with minimum confidence and support– Support (s) probability that a

transaction contains {A & C}– Confidence (c) conditional

probability that a transaction having {A} also contains {C}

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Let minimum support 50%, and minimum confidence 50%, we have

A C (50%, 66.6%)C A (50%, 100%)

Customerbuys diaper

Customerbuys both

Customerbuys beer

Rule Measures: Support and Confidence

Page 6: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

For rule A C:support = support({A, C}) = 50%confidence = support({A, C})/support({A}) =

66.6%

Transaction ID Items Bought2000 A,B,C1000 A,C4000 A,D5000 B,E,F

Frequent Itemset Support{A} 75%{B} 50%{C} 50%{A,C} 50%

Target:Min. support 50%Min. confidence 50%

Mining Association Rules: An Example

Page 7: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

An Example of Market Basket(1)

There are 8 transactions on three items on A (Apple), B (Banana) , C (Carrot).

Check associations for below two cases.

(1) A B (2) (A, B) C

# Basket

1 A

2 B

3 C

4 A, B

5 A, C

6 B, C

7 A, B, C

8 A, B, C

Page 8: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

An Example of Market Basket(1(2)

Basic probabilities are below:

(1) AB (2) (A, B) C

LHS P(A) = 5/8 = 0.625 P(A,B) = 3/8 = 0.375

RHS P(B) = 5/8 = 0.625 P(C) = 5/8 = 0.625

Coverage

LHS = 0.625 LHS = 0.375

Support P(A∩B) = 3/8 = 0.375 P((A,B)∩C)) = 2/8 =0.25

Confidence

P(B|A)=0.375/0.625=0.6

P(C|(A,B))=0.25/0.375=0.7

Lift0.375/(0.625*0.625)=0.96

0.25/(0.375*0.625)=1.07

Leverage 0.375 - 0.390 = -0.015 0.25 - 0.234 = 0.016

Page 9: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

What are good association rules? (How to interpret them?)

– If lift is close to 1, it means there is no association between two items (sets).

– If lift is greater than 1, it means there is a positive association between two items (sets).

– If lift is less than 1, it means there is a negative association between two items (sets).

Lift

Page 10: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Leverage

– Leverage = P(A∩B) - P(A)*P(B) , it has three types① Leverage > 0② Leverage = 0 ③ Leverage < 0

– ① Two items (sets) are positively associated– ② Two items (sets) are independent– ③Two items (sets) are negatively associated

Page 11: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Lab on Association Rules(1)

SPSS Clementine, SAS Enterprise Miner have association rules softwares.

This exercise uses Magnum Opus. Go to http://www.rulequest.com and download

Magnum Opus evaluation version ( click)

Page 12: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

After you install the problem, you can see below initial screen. From menu, choose File – Import Data (Ctrl – O).

Page 13: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Demo Data sets are already there. Magnum Opus has two types of data sets available: (transaction data: *.idi, *.itl) and (attribute-value data: *.data, *.nam)

Data format has below two types:(*.idi, *.itl).

idi(identifier-item file)

itl(item list file)

001, apples 001, oranges 001, bananas 002, apples 002, carrots 002, lettuce 002, tomatoes

apples, oranges, bananas apples, carrots, lettuce, tomatoes

Page 14: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

If you open tutorial.idi using note pad, you can see the file inside as left.

The example left has 5 transactions (baskets)

Page 15: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

File – Import Data, or click . click Tutorial.idi

Check Identifier – item file and click Next >.

Page 16: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Click Yes and click Next > …

click Next > …

Page 17: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Click Next > …

What percentage of whole file you want to use? Type 50% and click Next > …

Page 18: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

click Import Data 를 클릭

Then, you can see a screen like below left.

Page 19: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Set things as they are.– Search by:

LIFT– Minimum

lift: 1– Maximum

no. of rules: 10

Click GO

Page 20: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

Results are saved in tutorial.out file. Below are rules derived:

lettuce & carrotsare associated with tomatoeswith strength = 0.857coverage = 0.042: 21 cases satisfy the LHSsupport = 0.036: 18 cases satisfy both the LHS and the RHSlift 3.51: the strength is 3.51 times greater than the strength if there were no associationleverage = 0.0258: the support is 0.0258 (12.9 cases) greater than if there were no association

Page 21: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

lettuce & carrots tomatoes– When Lettuce and carrots are purchase then they buy

tomatoes– coverage = 0.042: 21 cases satisfy the LHS– LHS(lettuce & carrots) = 21/500 = 0.042

support = 0.036: 18 cases satisfy both the LHS and the RHS– P((lettuce & carrots) ∩ tomatoes)) = 18/500 = 0.036

strength(confidence) = 0.857– P(support|LHS)= 18/21 = 0.036/0.042 = 0.857

Page 22: Lecture 4: Association Market Basket Analysis Analysis of Customer Behavior and Service Modeling

lift 3.51: the strength is 3.51 times greater than the strength if there were no association– 즉 , (18/21)/(122/500) = 3.51

leverage = 0.0258: the support is 0.0258 (12.9 cases) greater than if there were no association– P(LHS ∩ RHS) – P(A)*P(B) = 0.036 –

0.042*0.244 = 0.0258