1 associative classification of imbalanced datasets sanjay chawla school of it university of sydney

33
1 Associative Classification of Imbalanced Datasets Sanjay Chawla School of IT University of Sydney

Upload: marlene-paul

Post on 28-Dec-2015

216 views

Category:

Documents


0 download

TRANSCRIPT

1

Associative Classification of Imbalanced Datasets

Sanjay ChawlaSchool of IT

University of Sydney

2

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

3

Data Mining

• Data Mining research has settled into an equilibrium involving four tasks

Pattern Mining(Association Rules)

Classification

Clustering Anomaly or OutlierDetection

Associative Classifier

DB

ML

5

Association Rules (Agrawal, Imielinksi and Swami, 93 SIGMOD)

Example:Beer}Diaper,Milk{

4.052

|T|)BeerDiaper,,Milk(

s

67.032

)Diaper,Milk()BeerDiaper,Milk,(

c

– An implication expression of the form X Y, where X and Y are itemsets

– Example: {Milk, Diaper} {Beer}

• Rule Evaluation Metrics– Support (s)

• Fraction of transactions that contain both X and Y

– Confidence (c)• Measures how often items in Y

appear in transactions thatcontain X

TID Items

1 Bread, Milk

2 Bread, Diaper, Beer, Eggs

3 Milk, Diaper, Beer, Coke

4 Bread, Milk, Diaper, Beer

5 Bread, Milk, Diaper, Coke

From “Introduction to Data Mining”, Tan,Steinbach and Kumar

6

Mining Association Rules

• Two-step approach:

1. Frequent Itemset Generation– Generate all itemsets whose support minsup

2. Rule Generation– Generate high confidence rules from each

frequent itemset, where each rule is a binary partitioning of a frequent itemset

• Frequent itemset generation is computationally expensive

7

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

8

Associative Classifiers

• Most of the Associative Classifiers are based on rules discovered using the support-confidence criterion.

• The classifier itself is a collection of rules ranked using their support or confidence.

9

Associative Classifiers (2)TID Items Gender

1 Bread, Milk F

2 Bread, Diaper, Beer, Eggs M

3 Milk Diaper, Beer, Coke M

4 Bread, Milk, Diaper, Beer M

5 Bread, Milk, Diaper, Coke F

In a Classification task we want to predict the class label (Gender) using the attributes

A good (albeit stereotypical) rule is {Beer,Diaper} Male whose support is 60% and confidence is 100%

10

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

11

Imbalanced Data Set

• In some application domains, Data Sets are Imbalanced :– The proportion of samples from one class is

much smaller than the other class/classes.– And the smaller class is the class of interest.

• Support and confidence are biased toward the majority class, and do not perform well in such cases.

12

Downsides of Support

• Support is biased towards the majority class– Eg: classes = {yes, no}, sup({yes})=90%– minSup > 10% wipes out any rule predicting

“no”– Suppose X no has confidence 1 and

support 3%. Rule discarded if minSup > 3% even though it perfectly predicts 30% of the instances in the minority class!

13

Downside of Confidence(1)

A20 5 25

70 5 75

90 10 100

AC CConf(A C) = 20/25 = 0.8

Support(AC) = 20/100 = 0.2

Correlation between A and C:

189.090.025.0

20

)()(

),(

CA

CA

Thus, when the data set is imbalanced a high support and high confidence rule may not necessarily imply that the antecedent and the consequent are positively correlated.

14

Downside of Confidence (2)

• Reasonable to expect that for “good rules” the antecedent and consequent are not independent!

• Suppose – P(Class=Yes) = 0.9– P(Class=Yes|X) = 0.9

15

Downsides of Confidence (3)

Another useful observation

• Higher confidence (support) for a rule in the minority class implies higher correlation, and lower correlation in the minority class implies lower confidence, but neither of these apply for the majority class.

• Confidence (support) tends to bias the majority class.

16

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

17

Contingency Table

dcbandbcacols

dcdcy

babay

rowsXX

• A 2 * 2 Contingency Table for X → y.

• We will use the notation [a, b; c, d] to represent this table.

18

Fisher Exact Test

• Given a table, [a, b; c, d], Fisher Exact Test will find the probability (p-value) of obtaining the given table under the hypothesis that {X, ¬X} and {y, ¬y} are independent.

• The margin sums (∑rows, ∑cols) are fixed.

19

Fisher Exact Test (2)

• The p-value is given by:

),min(

0 )!()!()!()!(!

)!()!()!()!(]),;,([

cb

i idicibian

dbcadcbadcbap

• We will only use rules whose p-values are below the level of significant desired (e.g. 0.01).

• Rules that pass this test are statistically significant in the positively associated direction (e.g. X → y).

20

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

21

Class Correlation Ratio

• In Class Correlation, we are interested in rules X → y where X is more positively correlated with y than it is with ¬y.

• The correlation is defined by:

))(()sup()sup(

||)sup()(

baca

na

yX

TyXyXcorr

where |T| is the number of transactions n.

22

Class Correlation Ratio (2)

• We then use corr() to measure how correlated X is with y compared to ¬y.

• X and y are positively correlated if corr(X→y)>1, and negatively correlated if corr(X→y)<1.

23

Class Correlation Ratio (3)

• Based on correlation corr(), we define the Class Correlation Ratio (CCR):

)(

)(

)(

)()(

bac

dca

yXcorr

yXcorryXCCR

• The CCR measures how much more positively the antecedent is correlated with the class it predicts (e.g. y), relative to the alternative class (e.g. ¬y).

24

Class Correlation Ratio (4)

)(

)()(

yXcorr

yXcorryXCCR

• We only use rules with CCR higher than a desired threshold, so that no rules are used that are more positively associated with the classes they do not predict.

25

The two measurements

• We perform the following tests to determine whether a potentially interesting rule is indeed interesting:– Check the significant of a rule X → y by

performing the Fisher’s Exact Test.– Check whether CCR(X→y) > 1.

• Those rules that pass the above two tests are candidates for the classification task.

26

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

27

Search and Pruning Strategies

• To avoid examining the whole set of possible rules, we use search strategies that ensure the concept of being potential interesting is anti-monotonic:

X→y might be considered as potential interesting if and only if all {X’→y|X’ in X} have been found to be potentially interesting.

28

Search and Pruning Strategies (2)

• The contingency table [a, b; c, d] used to test for the significance of the rule X → y in comparison to one of its generalizations X-{z} → y for the Aggressive search strategy.

}){sup()sup(}){sup()sup(

)}{sup()sup()}{sup()sup(:

}}{sup()sup()}{sup()sup(:

}{:}{::

zXdcbaXzXdbXca

yzXdcyXyzXdyXctyt

yzXbayXyzXbyXatyt

tzXttztzXttXt

29

Example

• Suppose we have already determined that the rules (A = a1) 1 and (A = a2) 1 are significant.

• Now we want to test if X=(A =a1) ^ (A=a2) 1 is significant

• Then we carry out a FET and calculate the CCR on X and X –{A=a1} (i.e. z = {a2})and X and X-{A=a2} (i.e. z = {a1}).

• If the minimum of their p-value is less than the significance level, and their CCR is greater than 1, we keep the X 1 rule, otherwise we discard it.

30

Ranking Rules

• Strength Score (SS): – In order to determine how interesting a rule is,

we need a ranking (ordering) of the rules, and the ordering is defined by the Strength Score.

31

Overview

• Data Mining Tasks

• Associative Classifiers

• Downside of Support and Confidence

• Mining Rules from Imbalanced Data Sets– Fisher’s Exact Test– Class Correlation Ratio (CCR)– Searching and Pruning Strategies– Experiments

32

Experiments (Balanced Data)• The preceding approach is represented by

“SPARCCC”.

• The experiments on Balanced Data Sets show that the average accuracy of SPARCCC compares favourably to CBA and C4.5.– The table below is the prediction accuracy on

balanced data sets.

33

Experiments (Imbalanced Data)

• True Positive Rate (Recall/Sensitivity) is a better performance measure for imbalanced data sets.

• SPARCCC overcomes other rule based techs such as CBA and CCCS.– The table below is True Positive Rate of the Minority

Class on Imbalanced version of the Datasets.

34

References

• Florian Verhein, Sanjay Chawla.Using Significant, Positively Associated and Relatively Class Correlated Rules For Associative Classification of Imbalanced Datasets.The 2007 IEEE International Conference on Data Mining . Omaha NE, USA. October 28-31, 2007.