cost-sensitive classifier evaluation robert holte computing science dept. university of alberta...

47
Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council, Ottawa

Post on 19-Dec-2015

221 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost-Sensitive Classifier Evaluation

Robert Holte

Computing Science Dept.

University of Alberta

Co-author

Chris Drummond

IIT, National Research Council, Ottawa

Page 2: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Classifiers

• A classifier assigns an object to one of a predefined set of categories or classes.

• Examples:– A metal detector either sounds an alarm or

stays quiet when someone walks through.– A credit card application is either approved or

denied.– A medical test’s outcome is either positive or

negative.

• This talk: only two classes, “positive” and “negative”.

Page 3: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Two Types of Error

False negative (“miss”), FNalarm doesn’t sound but person is carrying metal

False positive (“false alarm”), FPalarm sounds but person is not carrying metal

Page 4: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

2-class Confusion Matrix

• Reduce the 4 numbers to two ratestrue positive rate = TP = (#TP)/(#P)

false positive rate = FP = (#FP)/(#N)

• Rates are independent of class ratio*

True class

Predicted class

positive negative

positive (#P) #TP #P - #TP

negative (#N) #FP #N - #FP

* subject to certain conditions

Page 5: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Example: 3 classifiers

True

Predicted

pos neg

pos 60 40

neg 20 80

True

Predicted

pos neg

pos 70 30

neg 50 50

True

Predicted

pos neg

pos 40 60

neg 30 70

Classifier 1TP = 0.4FP = 0.3

Classifier 2TP = 0.7FP = 0.5

Classifier 3TP = 0.6FP = 0.2

Page 6: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Assumptions

• Standard Cost Model– correct classification costs 0– cost of misclassification depends only on the class, not

on the individual example– over a set of examples costs are additive

• Costs or Class Distributions:– are not known precisely at evaluation time– may vary with time– may depend on where the classifier is deployed

• True FP and TP do not vary with time or location, and are accurately estimated.

Page 7: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

How to Evaluate Performance ?

• Scalar Measures– Accuracy– Expected cost– Area under the ROC curve

• Visualization Techniques– ROC curves– Cost Curves

Page 8: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

What’s Wrong with Scalars ?

• A scalar does not tell the whole story.– There are fundamentally two numbers of interest (FP and

TP), a single number invariably loses some information.– How are errors distributed across the classes ?– How will each classifier perform in different testing

conditions (costs or class ratios other than those measured in the experiment) ?

• A scalar imposes a linear ordering on classifiers.– what we want is to identify the conditions under which

each is better.

Page 9: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

What’s Wrong with Scalars ?

• A table of scalars is just a mass of numbers.– No immediate impact– Poor way to present results in a paper– Equally poor way for an experimenter to analyze results

• Some scalars (accuracy, expected cost) require precise knowledge of costs and class distributions.– Often these are not known precisely and might vary

with time or location of deployment.

Page 10: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Why visualize performance ?

• Shape of curves more informative than a single number

• Curve informs about– all possible misclassification costs*– all possible class ratios*– under what conditions C1 outperforms C2

• Immediate impact (if done well)

* subject to certain conditions

Page 11: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Example: 3 classifiers

True

Predicted

pos neg

pos 60 40

neg 20 80

True

Predicted

pos neg

pos 70 30

neg 50 50

True

Predicted

pos neg

pos 40 60

neg 30 70

Classifier 1TP = 0.4FP = 0.3

Classifier 2TP = 0.7FP = 0.5

Classifier 3TP = 0.6FP = 0.2

Page 12: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Ideal classifier

chance

always negative

always positive

ROC plot for the 3 Classifiers

Page 13: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Dominance

Page 14: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Operating Range

Slope indicates the class distributions and misclassification costs for which the

classifier is better than always-negative

ditto for always-positive

Page 15: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Convex Hull

Slope indicates the class distributions and misclassification costs for which the red

classifier is the same as the blue one.

Page 16: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Creating an ROC Curve

• A classifier produces a single ROC point.

• If the classifier has a “sensitivity” parameter, varying it produces a series of ROC points (confusion matrices).

• Alternatively, if the classifier is produced by a learning algorithm, a series of ROC points can be generated by varying the class ratio in the training set.

Page 17: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

ROC Curve

Page 18: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

What’s Wrongwith ROC Curves ?

Page 19: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

When to switch from C4.5 to IB1 ?

What is the performance difference ?

When to use the default classifiers ?

ROC curves for two classifiers.

How to tell if two ROC curves’ difference is statistically significant ?

Page 20: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

How to average them?

How to compute a confidence interval for the average ROC curve ?

ROC curves from twocross-validation runs.

Page 21: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

And we would like be able to answer all these questions by

visual inspection …

Page 22: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curves

Page 23: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curves (1)

Err

or R

ate

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0 Classifier 1TP = 0.4FP = 0.3

Classifier 2TP = 0.7FP = 0.5

Classifier 3TP = 0.6FP = 0.2

FP FN = 1-TP

Page 24: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curves (2)

Err

or R

ate

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0

“always negative”“always positive”

Operating Range

Page 25: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Lower Envelope

Err

or R

ate

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0

Page 26: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curves

Err

or R

ate

Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6

0.0

0.2

0.4

0.6

0.8

1.0

“always negative”“always positive”

Page 27: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Taking Costs Into Account

Y = FN•X + FP •(1-X)

So far, X = p(+), making Y = error rate

Y = expected cost normalized to [0,1]

X = p(+) • C(-|+)

p(+) • C(-|+) + (1-p(+)) • C(+|-)

Page 28: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Comparing Cost Curves

Page 29: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Averaging ROC Curves

Page 30: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Averaging Cost Curves

Page 31: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curve Avg. in ROC Space

Page 32: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Confidence Intervals

True

Predicted

pos neg

pos 78 22

neg 40 60

OriginalTP = 0.78FP = 0.4

Predicted

negposTrue

6238neg

1783pos

Resample #2TP = 0.83FP = 0.38

Resample confusion matrix 10000 times and take 95% envelope

Resample #1TP = 0.75FP = 0.45

Predicted

negposTrue

5545neg

2575pos

Page 33: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Confidence Interval Example

Page 34: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Paired Resampling to Test Statistical Significance

Predicted by

Classifier1

Predicted by Classifier2

pos neg

pos 30 10

neg 0 60

For the 100 test examples in the negative class:

FP for classifier1: (30+10)/100 = 0.40FP for classifier2: (30+0)/100 = 0.30FP2 – FP1 = -0.10

Resample this matrix 10000 times to get (FP2-FP1) values.Do the same for the matrix based on positive test examples.Plot and take 95% envelope as before.

Page 35: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Paired Resampling to Test Statistical Significance

classifier1

classifier2

FN2-FN1FP2-FP1

Page 36: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Correlation between Classifiers

Predicted by

Classifier1

Predicted by Classifier2

pos neg

pos 30 10

neg 0 60

High Correlation

Low Correlation (same FP1 and FP2 as above)

Predicted by

Classifier1

Predicted by Classifier2

pos neg

pos 0 40

neg 30 30

Page 37: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Low correlation = Low significance

classifier1

classifier2

FN2-FN1FP2-FP1

Page 38: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Limited Range of Significance

Page 39: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Better Data Analysis

Page 40: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

ROC, C4.5 Splitting Criteria

Page 41: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curve, C4.5 Splitting Criteria

Page 42: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

ROC, Selection procedure

Suppose this classifier wasproduced by a training setwith a class ratio of 10:1,

and was used whenever thedeployment situation had a

10:1 class ratio.

Page 43: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curves, Selection Procedure

Page 44: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

ROC, Many Points

Page 45: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Cost Curves, Many Lines

Page 46: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Conclusions

• Scalar performance measures should not be used if costs and class distributions are not exactly known or might vary with time or location.

• Cost curves enable easy visualization of– Average performance (expected cost)– operating range– confidence intervals on performance– difference in performance and its significance

Page 47: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,

Fin

• Cost curve software is available. Contact: [email protected]

• Thanks toAlberta Ingenuity Centre for Machine Learning

(www.aicml.ca)