f. provost and t. fawcett. confusion matrix 2bitirgen - cs678
TRANSCRIPT
F. Provost and T. Fawcett
Confusion Matrix
2Bitirgen - CS678
Introduction
Data mining requires:Experiments with a wide variety of learning
algorithmsUsing different algorithm parametersVarying output threshold valuesUsing different training regimens
Using accuracy alone is inadequate because:Class distributions are skewedMisclassification (FP, FN) costs are not uniform
3Bitirgen - CS678
Class Distributions -Problems with Acc.
… assumes that class distribution among examples is constant and relatively balanced (-which is not the case in real life-)
Classifiers are generally used to scan ‘large number of normal entities’ to find ‘small number of unusual ones’Looking for defrauded customersChecking an assembly lineSkews of 106 were reported (Clearwater & Stern 1991)
4Bitirgen - CS678
Misclassification Costs -Problems with Acc.
‘Equal error costs’ does not hold in real life problemsDisease tests, fraud detection…
Instead of maximizing the accuracy, we need to minimize the error cost.
Cost = FP • c(Y,n) + FN • c(N,p)
5Bitirgen - CS678
6Bitirgen - CS678
ROC Plot and ROC Area
Receiver Operator Characteristic Developed in WWII to statistically model
“false positive” and “false negative” detections of radar operators
Becoming more popular in ML and standard measure in medicine and biology
However does poor job on deciding the choice of classifiers
7Bitirgen - CS678
ROC graph of four classifiers
Informally a point in ROC space is better than the other if it is to the northwest.
8Bitirgen - CS678
9Bitirgen - CS678
Iso-performance Lines Expected cost of a classification by a classifier (FP,TP):
Therefore, two points have
the same performance if
Iso-perf. line: All classifiers
corresponding to points on
the line have the same
expected cost.10Bitirgen - CS678
ROC Convex Hull If a point is not on the
convex hull the classifier represented by that point cannot be optimal.
In this example B and D cannot be optimal because none or their points are on the convex hull.
11Bitirgen - CS678
How to use the ROC Convex Hull
p(n):p(p) = 10:1 Scenario A:
c(N,p) = c(Y,n) m(iso_perf) = 10
Scenario B: c(N,p) = 100 • c(Y,n) m(iso_perf) = 0.1
12Bitirgen - CS678
Adding New Classifiers Adding new classifiers
may or may not extend the existing hull.
E may be optimal under some circumstances since it extends the hull
F and G cannot be optimal
13Bitirgen - CS678
What if distributions & costs are unknown? ROC convex hull gives
us an idea about all classifiers that may be optimal under any conditions.
With complete information the method identifies the optimal classifiers.
In between ?
14Bitirgen - CS678
Sensitivity Analysis Imprecise distribution info
defines a range of slopes for iso-perf lines.
p(n):p(p) = 10:1 Scenario C:
○ $5 < c(Y,n) < $10○ $500 < c(N,p) < $1000○ 0.05 < m(iso_perf) < 0.2
15Bitirgen - CS678
Sensitivity Analysis - 2 Imprecise distribution info
defines a range of slopes for iso-perf lines.
p(n):p(p) = 10:1 Scenario D:
○ 0.2 < m(iso_perf) < 2
16Bitirgen - CS678
Sensitivity Analysis - 3 Can “do nothing” strategy
be better than any of the available classifiers?
17Bitirgen - CS678
Conclusion
Accuracy alone as a performance metric is incapable for various reasons
ROC plots give more accurate information about the performance of classifiers
ROC convex hull method Is an efficient solution to the problem of comparing
multiple classifiers in imprecise environmentsAllows us to incorporate new classifiers easilyAllows us to select the classifiers that are potentially
optimal
18Bitirgen - CS678