evaluation of classification performance on small ... · evaluation of classification performance...
TRANSCRIPT
Evaluation of classification performance on small, imbalanced datasets
Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3, Joachim M. Buhmann1
1 Department of Computer Science, ETH Zurich, Switzerland 2 Laboratory for Social and Neural Systems Research, University of Zurich, Switzerland 3 Wellcome Trust Centre for Neuroimaging, University College London, United Kingdom
The balanced accuracy
1
3
Is the accuracy a faithful performance measure?
actual + actual – actual + actual –
predicted +
predicted –
4
Setting
Observations with labels
Classification-based confusion matrix:
Performance assessment
Accuracy
Balanced accuracy
Assessing classification performance
n
TNTPA
TNFP
TN
FNTP
TPB
2
1
x }1,1{ y
actual + actual –
predicted + TP FP
predicted – FN TN
P N
FNFPI
TNTPC
:
:
5
Assuming a flat prior on the interval [0,1], the posterior of the accuracy follows a Beta distribution
From this we can compute:
the mean:
the mode:
a posterior probability interval:
The posterior distribution of the accuracy
),(~ baBetaA 1,1 IbCa
IC
C
2
1
IC
C
1,1;1;1,1;2
1
2
1 ICFICF BB
with
IC
A xxICB
ICxp )1()1,1(
1),;(
6
Assuming a flat prior on the interval [0,1], the posterior of the balanced accuracy is given by the convolution of two Beta distributions
Based on this density, we can compute:
the mean
the mode
a posterior probability interval
The posterior distribution of the balanced accuracy
BetaavgAAB NP ~)(21
1
01,1;21,1);(2)( dzFPTNzpFNTPzxpxp AAB
7
Two examples average accuracy 2 std. errors
mean accuracy and 95% mass
mean bal. acc. and 95% mass
chance
actual + actual –
Example 2: high accuracies on both classes, no imbalance, no bias
Example 1: fair overall accuracy, high class imbalance, strong prediction bias
actual + actual –
predicted +
predicted –
8
Posterior densities mean
median
mode
95% post. prob. int.
average bal. acc.
chance
Posterior balanced accuracy
Posterior accuracy
predicted +
predicted –
actual + actual –
Smooth precision-recall curves 2
10
Decision values
11
Decision values and the binormal assumption
decision values of negative examples
12
Decision values and the binormal assumption
decision values of negative examples
decision values of positive examples
13
Empirical and parametric curves
ROC curve PR curve
empirical
true TPR
(recall)
FPR (1 – specificity)
14
Decision values and the binormal assumption
decision values of negative examples
decision values of positive examples
15
Decision values and the binormal assumption
decision values of negative examples
decision values of positive examples
16
Empirical and parametric curves
ROC curve PR curve
empirical
binormal
-binormal true
TPR
(recall)
FPR (1 – specificity)
17
The effect of class imbalance on the PR curve
AP
RM
SE
Fraction of positive examples
18
The effect of class imbalance on the PR curve
Esti
mat
ed m
inu
s tr
ue
aver
age
pre
cisi
on
(A
P) empirical
binormal
-binormal
19
Take-home messages
Dont’s
report the average and the standard error of the accuracy across cross-validation folds
look at empirical ROC or PR curves
Do’s
report a statistic of the posterior distribution of the balanced accuracy
compute a smooth ROC or PR curve under parametric assumptions
K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010) The balanced accuracy and its posterior distribution. Proceedings of the 20th International Conference on Pattern Recognition (in press).
K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010) The binormal assumption on precision-recall curves. Proceedings of the 20th International Conference on Pattern Recognition (in press).