association rules

24
Association Rules Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li

Upload: brandon-baird

Post on 31-Dec-2015

32 views

Category:

Documents


1 download

DESCRIPTION

Association Rules. Hawaii International Conference on System Sciences (HICSS-40) January 2007 David L. Olson Yanhong Li. Fuzzy Association Rules. Association rules mining provides information to assess significant correlations in large databases IF X THEN Y Initial data mining analysis - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Association Rules

Association Rules

Hawaii International Conference on System Sciences (HICSS-40)

January 2007David L. Olson

Yanhong Li

Page 2: Association Rules

Fuzzy Association Rules

• Association rules mining provides information to assess significant correlations in large databases

• IF X THEN Y– Initial data mining analysis– Not predictive

• SUPPORT: degree to which relationship appears in data

• CONFIDENCE: probability that if X, then Y

Page 3: Association Rules

Association Rule Algorithms

• APriori• Agrawal et al., 1993; Agrawal & Srikant, 1994

– Find correlations among transactions, binary values

• Weighted association rules• Cai et al., 1998; Lu et al. 2001

• Cardinal data• Srikant & Agrawal, 1996

– Partitions attribute domain, combines adjacent partitions until binary

Page 4: Association Rules

Fuzzy Analysis

Deal with vagueness & uncertainty• Fuzzy Set Theory

– Zadeh [1965]

• Probability Theory– Pearl [1988]

• Rough Set Theory– Pawlak [1982]

• Set Pair Theory– Zhao [2000]

Page 5: Association Rules

Fuzzy Association Rules

• Most based on APriori algorithm

• Treat all attributes as uniform

• Can increase number of rules by decreasing minimum support, decreasing minimum confidence– Generates many uninteresting rules– Software takes a lot longer

Page 6: Association Rules

Gyenesei (2000)

• Studied weighted quantitative association rules in fuzzy domain– With & without normalization– NONNORMALIZED

• Used product operator to define combined weight and fuzzy value

• If weight small, support level small, tends to have data overflow

– NORMALIZED• Used geometric mean of item weights as combined weight• Support then very small

Page 7: Association Rules

Algorithm

• Get membership functions, minimum support, minimum confidence

• Assign weight to each fuzzy membership for each attribute (categorical)

• Calculate support for each fuzzy region

• If support > minimum, OK

• If confidence > minimum, OK

• If both OK, generate rules

Page 8: Association Rules

Demo Model: Loan AppCase Age Income Risk Credit Result

1 20 52623 -38954 Red 0

2 26 23047 -23636 Green 1

3 46 56810 45669 Green 1

4 31 38388 -7968 Amber 1

5 28 80019 -35125 Green 1

6 21 74561 -47592 Green 1

7 46 65341 58119 Green 1

8 25 46504 -30022 Green 1

9 38 65735 30571 Green 1

10 27 26047 -6 Red 1

Page 9: Association Rules

Fuzzified Age

Figure 2: The membership functions of attibute Age

0

0.2

0.4

0.6

0.8

1

1.2

0 25 35 40 50 100

Age

Mem

bersh

ip

value

Young Middle Old

Page 10: Association Rules

Fuzzify AgeCase Age Young Middle Old

1 20 1.000 0 0

2 26 0.9 0.1 0

3 46 0 0.4 0.6

4 31 0.4 0.6 0

5 28 0.7 0.3 0

6 21 1 0 0

7 46 0 0.4 0.6

8 25 1 0 0

9 38 0 1 0

10 27 0.8 0.2 0

Page 11: Association Rules

Calculate Support for Each Pair of Fuzzy Categories

• Membership value– Identify weights for each attribute– Identify highest fuzzy membership category

for each case• Membership value = minimum weight associated

with highest fuzzy membership category

• Support– Average membership value for all cases

Page 12: Association Rules

Support by Single Item

Category Weight Sup(Rjk)

Age Young R11 0.45 0.261

Age Middle R12 0.45 0.135

Age Old R13 0.45 0.059

Income High R21 0.55 0.000

Income Middle R22 0.55 0.490

Income Low R23 0.55 0.060

Risk High R31 0.70 0.320

Risk Middle R32 0.70 0.146

Risk Low R33 0.70 0.233

Credit Good R41 0.80 0.576

Credit Bad R42 0.80 0.244

Page 13: Association Rules

Support

• If support for pair of categories is above minimum support, retain

• Identifies all pairs of fuzzy categories with sufficiently strong relationship

• For outcomes, R51 (On Time) strong,

R52 (Default) not

Page 14: Association Rules

Support by Pair: minsup 0.25

R11R22 0.235 R22R41 0.419

R11R31 0.207 R22R51 0.449

R11R41 0.212 R31R41 0.266

R11R51 0.230 R31R51 0.264

R22R31 0.237 R41R51 0.560

Page 15: Association Rules

Support by Triplet: minsup 0.25

R22R41R51 0.417

R22R31R41 0.198

R22R31R51 0.196

R31R41R51 0.264

Page 16: Association Rules

Quartets

• None qualify, so algorithm stops

Page 17: Association Rules

Confidence

• Identify direction

• For those training set cases involving the pair of attributes, what proportion came out as predicted?

Page 18: Association Rules

Confidence Values: PairsMinimum confidence 0.9

R22→R41 0.855 R41R22→R51 0.995

R41→R22 0.727 R41R51→R22 0.744

R22→R51 0.916 R22R51→R41 0.928

R51→R22 0.697 R31R41→R51 0.993

R31→R41 0.831 R31R51→R41 1.000

R41→R31 0.462 R51R41→R31 0.472

R31→R51 0.825

R51→R31 0.410

R41→R51 0.972

R51→R41 0.870

Page 19: Association Rules

4 Rules

• IF Income is Middle THEN Outcome is On-Time– R22→R51 support 0.490 confidence 0.916

• IF Credit is Good THEN Outcome is On-Time– R41→R51 support 0.576 confidence 0.972

• IF Income is Middle AND Credit is Good THEN Outcome is On-Time– R22R41→R51 support 0.419 confidence 0.995

• IF Risk is High AND Credit is Good THEN Outcome is On-Time– R31R41→R51 support 0.266 confidence 0.993

Page 20: Association Rules

Rules vs. Support

Figure 7: The relationship between number of association rules and minsup using the proposed method

0

5

10

15

20

0.2 0.25 0.3 0.35 0.4 0.55minsup

minconf=0.55

minconf=0.65

minconf=0.75

minconf=0.85

minconf=0.95

minconf=1

the number of association rules

Page 21: Association Rules

Rules vs. Confidence

0

5

10

15

20

0.55 0.65 0.75 0.85 0.95 1

minconf

minsup=0.2

minsup=0.25

minsup=0.3

minsup=0.35

minsup=0.4

minsup=0.55

Figure 8: The relationship betw een number of association rules and minconf using the proposed method

the number of association rules

Page 22: Association Rules

Higher order combinations

• Try triplets– If ambitious, sets of 4, and beyond

• Here, none

• Problems:– Computational complexity explodes– Doesn’t guarantee total coverage

• That also would explode complexity• Can control by lowering minsup, minconf

Page 23: Association Rules

Simulation Testing

• Selected 550 cases– Held out 100

• Randomly assigned weights to each fuzzy region of each attribute– minsup {0.35, 0.45, 0.55, 0.65} – minconf {0.7, 0.8, 0.9}

Page 24: Association Rules

Simulation Results

Accuracy vs. minsup & minconf

0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.35 0.45 0.55 0.65

minsup

Ac

cu

rac

y

weighted minconf=0.7

weighted minconf=0.8

weighted minconf=0.9