Download - Association Rules
Association Rules
Hawaii International Conference on System Sciences (HICSS-40)
January 2007David L. Olson
Yanhong Li
Fuzzy Association Rules
• Association rules mining provides information to assess significant correlations in large databases
• IF X THEN Y– Initial data mining analysis– Not predictive
• SUPPORT: degree to which relationship appears in data
• CONFIDENCE: probability that if X, then Y
Association Rule Algorithms
• APriori• Agrawal et al., 1993; Agrawal & Srikant, 1994
– Find correlations among transactions, binary values
• Weighted association rules• Cai et al., 1998; Lu et al. 2001
• Cardinal data• Srikant & Agrawal, 1996
– Partitions attribute domain, combines adjacent partitions until binary
Fuzzy Analysis
Deal with vagueness & uncertainty• Fuzzy Set Theory
– Zadeh [1965]
• Probability Theory– Pearl [1988]
• Rough Set Theory– Pawlak [1982]
• Set Pair Theory– Zhao [2000]
Fuzzy Association Rules
• Most based on APriori algorithm
• Treat all attributes as uniform
• Can increase number of rules by decreasing minimum support, decreasing minimum confidence– Generates many uninteresting rules– Software takes a lot longer
Gyenesei (2000)
• Studied weighted quantitative association rules in fuzzy domain– With & without normalization– NONNORMALIZED
• Used product operator to define combined weight and fuzzy value
• If weight small, support level small, tends to have data overflow
– NORMALIZED• Used geometric mean of item weights as combined weight• Support then very small
Algorithm
• Get membership functions, minimum support, minimum confidence
• Assign weight to each fuzzy membership for each attribute (categorical)
• Calculate support for each fuzzy region
• If support > minimum, OK
• If confidence > minimum, OK
• If both OK, generate rules
Demo Model: Loan AppCase Age Income Risk Credit Result
1 20 52623 -38954 Red 0
2 26 23047 -23636 Green 1
3 46 56810 45669 Green 1
4 31 38388 -7968 Amber 1
5 28 80019 -35125 Green 1
6 21 74561 -47592 Green 1
7 46 65341 58119 Green 1
8 25 46504 -30022 Green 1
9 38 65735 30571 Green 1
10 27 26047 -6 Red 1
Fuzzified Age
Figure 2: The membership functions of attibute Age
0
0.2
0.4
0.6
0.8
1
1.2
0 25 35 40 50 100
Age
Mem
bersh
ip
value
Young Middle Old
Fuzzify AgeCase Age Young Middle Old
1 20 1.000 0 0
2 26 0.9 0.1 0
3 46 0 0.4 0.6
4 31 0.4 0.6 0
5 28 0.7 0.3 0
6 21 1 0 0
7 46 0 0.4 0.6
8 25 1 0 0
9 38 0 1 0
10 27 0.8 0.2 0
Calculate Support for Each Pair of Fuzzy Categories
• Membership value– Identify weights for each attribute– Identify highest fuzzy membership category
for each case• Membership value = minimum weight associated
with highest fuzzy membership category
• Support– Average membership value for all cases
Support by Single Item
Category Weight Sup(Rjk)
Age Young R11 0.45 0.261
Age Middle R12 0.45 0.135
Age Old R13 0.45 0.059
Income High R21 0.55 0.000
Income Middle R22 0.55 0.490
Income Low R23 0.55 0.060
Risk High R31 0.70 0.320
Risk Middle R32 0.70 0.146
Risk Low R33 0.70 0.233
Credit Good R41 0.80 0.576
Credit Bad R42 0.80 0.244
Support
• If support for pair of categories is above minimum support, retain
• Identifies all pairs of fuzzy categories with sufficiently strong relationship
• For outcomes, R51 (On Time) strong,
R52 (Default) not
Support by Pair: minsup 0.25
R11R22 0.235 R22R41 0.419
R11R31 0.207 R22R51 0.449
R11R41 0.212 R31R41 0.266
R11R51 0.230 R31R51 0.264
R22R31 0.237 R41R51 0.560
Support by Triplet: minsup 0.25
R22R41R51 0.417
R22R31R41 0.198
R22R31R51 0.196
R31R41R51 0.264
Quartets
• None qualify, so algorithm stops
Confidence
• Identify direction
• For those training set cases involving the pair of attributes, what proportion came out as predicted?
Confidence Values: PairsMinimum confidence 0.9
R22→R41 0.855 R41R22→R51 0.995
R41→R22 0.727 R41R51→R22 0.744
R22→R51 0.916 R22R51→R41 0.928
R51→R22 0.697 R31R41→R51 0.993
R31→R41 0.831 R31R51→R41 1.000
R41→R31 0.462 R51R41→R31 0.472
R31→R51 0.825
R51→R31 0.410
R41→R51 0.972
R51→R41 0.870
4 Rules
• IF Income is Middle THEN Outcome is On-Time– R22→R51 support 0.490 confidence 0.916
• IF Credit is Good THEN Outcome is On-Time– R41→R51 support 0.576 confidence 0.972
• IF Income is Middle AND Credit is Good THEN Outcome is On-Time– R22R41→R51 support 0.419 confidence 0.995
• IF Risk is High AND Credit is Good THEN Outcome is On-Time– R31R41→R51 support 0.266 confidence 0.993
Rules vs. Support
Figure 7: The relationship between number of association rules and minsup using the proposed method
0
5
10
15
20
0.2 0.25 0.3 0.35 0.4 0.55minsup
minconf=0.55
minconf=0.65
minconf=0.75
minconf=0.85
minconf=0.95
minconf=1
the number of association rules
Rules vs. Confidence
0
5
10
15
20
0.55 0.65 0.75 0.85 0.95 1
minconf
minsup=0.2
minsup=0.25
minsup=0.3
minsup=0.35
minsup=0.4
minsup=0.55
Figure 8: The relationship betw een number of association rules and minconf using the proposed method
the number of association rules
Higher order combinations
• Try triplets– If ambitious, sets of 4, and beyond
• Here, none
• Problems:– Computational complexity explodes– Doesn’t guarantee total coverage
• That also would explode complexity• Can control by lowering minsup, minconf
Simulation Testing
• Selected 550 cases– Held out 100
• Randomly assigned weights to each fuzzy region of each attribute– minsup {0.35, 0.45, 0.55, 0.65} – minconf {0.7, 0.8, 0.9}
Simulation Results
Accuracy vs. minsup & minconf
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.35 0.45 0.55 0.65
minsup
Ac
cu
rac
y
weighted minconf=0.7
weighted minconf=0.8
weighted minconf=0.9