cost-sensitive classifier evaluation robert holte computing science dept. university of alberta...
Post on 19-Dec-2015
221 views
TRANSCRIPT
![Page 1: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/1.jpg)
Cost-Sensitive Classifier Evaluation
Robert Holte
Computing Science Dept.
University of Alberta
Co-author
Chris Drummond
IIT, National Research Council, Ottawa
![Page 2: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/2.jpg)
Classifiers
• A classifier assigns an object to one of a predefined set of categories or classes.
• Examples:– A metal detector either sounds an alarm or
stays quiet when someone walks through.– A credit card application is either approved or
denied.– A medical test’s outcome is either positive or
negative.
• This talk: only two classes, “positive” and “negative”.
![Page 3: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/3.jpg)
Two Types of Error
False negative (“miss”), FNalarm doesn’t sound but person is carrying metal
False positive (“false alarm”), FPalarm sounds but person is not carrying metal
![Page 4: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/4.jpg)
2-class Confusion Matrix
• Reduce the 4 numbers to two ratestrue positive rate = TP = (#TP)/(#P)
false positive rate = FP = (#FP)/(#N)
• Rates are independent of class ratio*
True class
Predicted class
positive negative
positive (#P) #TP #P - #TP
negative (#N) #FP #N - #FP
* subject to certain conditions
![Page 5: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/5.jpg)
Example: 3 classifiers
True
Predicted
pos neg
pos 60 40
neg 20 80
True
Predicted
pos neg
pos 70 30
neg 50 50
True
Predicted
pos neg
pos 40 60
neg 30 70
Classifier 1TP = 0.4FP = 0.3
Classifier 2TP = 0.7FP = 0.5
Classifier 3TP = 0.6FP = 0.2
![Page 6: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/6.jpg)
Assumptions
• Standard Cost Model– correct classification costs 0– cost of misclassification depends only on the class, not
on the individual example– over a set of examples costs are additive
• Costs or Class Distributions:– are not known precisely at evaluation time– may vary with time– may depend on where the classifier is deployed
• True FP and TP do not vary with time or location, and are accurately estimated.
![Page 7: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/7.jpg)
How to Evaluate Performance ?
• Scalar Measures– Accuracy– Expected cost– Area under the ROC curve
• Visualization Techniques– ROC curves– Cost Curves
![Page 8: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/8.jpg)
What’s Wrong with Scalars ?
• A scalar does not tell the whole story.– There are fundamentally two numbers of interest (FP and
TP), a single number invariably loses some information.– How are errors distributed across the classes ?– How will each classifier perform in different testing
conditions (costs or class ratios other than those measured in the experiment) ?
• A scalar imposes a linear ordering on classifiers.– what we want is to identify the conditions under which
each is better.
![Page 9: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/9.jpg)
What’s Wrong with Scalars ?
• A table of scalars is just a mass of numbers.– No immediate impact– Poor way to present results in a paper– Equally poor way for an experimenter to analyze results
• Some scalars (accuracy, expected cost) require precise knowledge of costs and class distributions.– Often these are not known precisely and might vary
with time or location of deployment.
![Page 10: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/10.jpg)
Why visualize performance ?
• Shape of curves more informative than a single number
• Curve informs about– all possible misclassification costs*– all possible class ratios*– under what conditions C1 outperforms C2
• Immediate impact (if done well)
* subject to certain conditions
![Page 11: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/11.jpg)
Example: 3 classifiers
True
Predicted
pos neg
pos 60 40
neg 20 80
True
Predicted
pos neg
pos 70 30
neg 50 50
True
Predicted
pos neg
pos 40 60
neg 30 70
Classifier 1TP = 0.4FP = 0.3
Classifier 2TP = 0.7FP = 0.5
Classifier 3TP = 0.6FP = 0.2
![Page 12: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/12.jpg)
Ideal classifier
chance
always negative
always positive
ROC plot for the 3 Classifiers
![Page 13: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/13.jpg)
Dominance
![Page 14: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/14.jpg)
Operating Range
Slope indicates the class distributions and misclassification costs for which the
classifier is better than always-negative
ditto for always-positive
![Page 15: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/15.jpg)
Convex Hull
Slope indicates the class distributions and misclassification costs for which the red
classifier is the same as the blue one.
![Page 16: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/16.jpg)
Creating an ROC Curve
• A classifier produces a single ROC point.
• If the classifier has a “sensitivity” parameter, varying it produces a series of ROC points (confusion matrices).
• Alternatively, if the classifier is produced by a learning algorithm, a series of ROC points can be generated by varying the class ratio in the training set.
![Page 17: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/17.jpg)
ROC Curve
![Page 18: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/18.jpg)
What’s Wrongwith ROC Curves ?
![Page 19: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/19.jpg)
When to switch from C4.5 to IB1 ?
What is the performance difference ?
When to use the default classifiers ?
ROC curves for two classifiers.
How to tell if two ROC curves’ difference is statistically significant ?
![Page 20: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/20.jpg)
How to average them?
How to compute a confidence interval for the average ROC curve ?
ROC curves from twocross-validation runs.
![Page 21: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/21.jpg)
And we would like be able to answer all these questions by
visual inspection …
![Page 22: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/22.jpg)
Cost Curves
![Page 23: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/23.jpg)
Cost Curves (1)
Err
or R
ate
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0 Classifier 1TP = 0.4FP = 0.3
Classifier 2TP = 0.7FP = 0.5
Classifier 3TP = 0.6FP = 0.2
FP FN = 1-TP
![Page 24: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/24.jpg)
Cost Curves (2)
Err
or R
ate
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
“always negative”“always positive”
Operating Range
![Page 25: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/25.jpg)
Lower Envelope
Err
or R
ate
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
![Page 26: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/26.jpg)
Cost Curves
Err
or R
ate
Probability of Positive P(+)0.8 1.00.0 0.2 0.4 0.6
0.0
0.2
0.4
0.6
0.8
1.0
“always negative”“always positive”
![Page 27: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/27.jpg)
Taking Costs Into Account
Y = FN•X + FP •(1-X)
So far, X = p(+), making Y = error rate
Y = expected cost normalized to [0,1]
X = p(+) • C(-|+)
p(+) • C(-|+) + (1-p(+)) • C(+|-)
![Page 28: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/28.jpg)
Comparing Cost Curves
![Page 29: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/29.jpg)
Averaging ROC Curves
![Page 30: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/30.jpg)
Averaging Cost Curves
![Page 31: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/31.jpg)
Cost Curve Avg. in ROC Space
![Page 32: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/32.jpg)
Confidence Intervals
True
Predicted
pos neg
pos 78 22
neg 40 60
OriginalTP = 0.78FP = 0.4
Predicted
negposTrue
6238neg
1783pos
Resample #2TP = 0.83FP = 0.38
Resample confusion matrix 10000 times and take 95% envelope
Resample #1TP = 0.75FP = 0.45
Predicted
negposTrue
5545neg
2575pos
![Page 33: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/33.jpg)
Confidence Interval Example
![Page 34: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/34.jpg)
Paired Resampling to Test Statistical Significance
Predicted by
Classifier1
Predicted by Classifier2
pos neg
pos 30 10
neg 0 60
For the 100 test examples in the negative class:
FP for classifier1: (30+10)/100 = 0.40FP for classifier2: (30+0)/100 = 0.30FP2 – FP1 = -0.10
Resample this matrix 10000 times to get (FP2-FP1) values.Do the same for the matrix based on positive test examples.Plot and take 95% envelope as before.
![Page 35: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/35.jpg)
Paired Resampling to Test Statistical Significance
classifier1
classifier2
FN2-FN1FP2-FP1
![Page 36: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/36.jpg)
Correlation between Classifiers
Predicted by
Classifier1
Predicted by Classifier2
pos neg
pos 30 10
neg 0 60
High Correlation
Low Correlation (same FP1 and FP2 as above)
Predicted by
Classifier1
Predicted by Classifier2
pos neg
pos 0 40
neg 30 30
![Page 37: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/37.jpg)
Low correlation = Low significance
classifier1
classifier2
FN2-FN1FP2-FP1
![Page 38: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/38.jpg)
Limited Range of Significance
![Page 39: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/39.jpg)
Better Data Analysis
![Page 40: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/40.jpg)
ROC, C4.5 Splitting Criteria
![Page 41: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/41.jpg)
Cost Curve, C4.5 Splitting Criteria
![Page 42: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/42.jpg)
ROC, Selection procedure
Suppose this classifier wasproduced by a training setwith a class ratio of 10:1,
and was used whenever thedeployment situation had a
10:1 class ratio.
![Page 43: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/43.jpg)
Cost Curves, Selection Procedure
![Page 44: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/44.jpg)
ROC, Many Points
![Page 45: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/45.jpg)
Cost Curves, Many Lines
![Page 46: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/46.jpg)
Conclusions
• Scalar performance measures should not be used if costs and class distributions are not exactly known or might vary with time or location.
• Cost curves enable easy visualization of– Average performance (expected cost)– operating range– confidence intervals on performance– difference in performance and its significance
![Page 47: Cost-Sensitive Classifier Evaluation Robert Holte Computing Science Dept. University of Alberta Co-author Chris Drummond IIT, National Research Council,](https://reader030.vdocuments.us/reader030/viewer/2022020800/56649d3b5503460f94a162ea/html5/thumbnails/47.jpg)
Fin
• Cost curve software is available. Contact: [email protected]
• Thanks toAlberta Ingenuity Centre for Machine Learning
(www.aicml.ca)