evaluation of classification performance on small ... · evaluation of classification performance...

Evaluation of classification performance on small, imbalanced datasets

Kay H. Brodersen1,2, Cheng Soon Ong1, Klaas E. Stephan1,2,3, Joachim M. Buhmann1

1 Department of Computer Science, ETH Zurich, Switzerland 2 Laboratory for Social and Neural Systems Research, University of Zurich, Switzerland 3 Wellcome Trust Centre for Neuroimaging, University College London, United Kingdom

The balanced accuracy

1

3

Is the accuracy a faithful performance measure?

actual + actual – actual + actual –

predicted +

predicted –

4

Setting

Observations with labels

Classification-based confusion matrix:

Performance assessment

Accuracy

Balanced accuracy

Assessing classification performance

n

TNTPA

TNFP

TN

FNTP

TPB

2

1

x }1,1{ y

actual + actual –

predicted + TP FP

predicted – FN TN

P N

FNFPI

TNTPC

:

:

5

Assuming a flat prior on the interval [0,1], the posterior of the accuracy follows a Beta distribution

From this we can compute:

the mean:

the mode:

a posterior probability interval:

The posterior distribution of the accuracy

),(~ baBetaA 1,1 IbCa

IC

C

2

1

IC

C

1,1;1;1,1;2

1

2

1 ICFICF BB

with

IC

A xxICB

ICxp )1()1,1(

1),;(

6

Assuming a flat prior on the interval [0,1], the posterior of the balanced accuracy is given by the convolution of two Beta distributions

Based on this density, we can compute:

the mean

the mode

a posterior probability interval

The posterior distribution of the balanced accuracy

BetaavgAAB NP ~)(21

1

01,1;21,1);(2)( dzFPTNzpFNTPzxpxp AAB

7

Two examples average accuracy 2 std. errors

mean accuracy and 95% mass

mean bal. acc. and 95% mass

chance

actual + actual –

Example 2: high accuracies on both classes, no imbalance, no bias

Example 1: fair overall accuracy, high class imbalance, strong prediction bias

actual + actual –

predicted +

predicted –

8

Posterior densities mean

median

mode

95% post. prob. int.

average bal. acc.

chance

Posterior balanced accuracy

Posterior accuracy

predicted +

predicted –

actual + actual –

Smooth precision-recall curves 2

10

Decision values

11

Decision values and the binormal assumption

decision values of negative examples

12



decision values of positive examples

13

Empirical and parametric curves

ROC curve PR curve

empirical

true TPR

(recall)

FPR (1 – specificity)

14




15




16

Empirical and parametric curves

ROC curve PR curve

empirical

binormal

-binormal true

TPR

(recall)

FPR (1 – specificity)

17

The effect of class imbalance on the PR curve

AP

RM

SE

Fraction of positive examples

18

The effect of class imbalance on the PR curve

Esti

mat

ed m

inu

s tr

ue

aver

age

pre

cisi

on

(A

P) empirical

binormal

-binormal

19

Take-home messages

Dont’s

report the average and the standard error of the accuracy across cross-validation folds

look at empirical ROC or PR curves

Do’s

report a statistic of the posterior distribution of the balanced accuracy

compute a smooth ROC or PR curve under parametric assumptions

K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010) The balanced accuracy and its posterior distribution. Proceedings of the 20th International Conference on Pattern Recognition (in press).

K.H. Brodersen, C.S. Ong, K.E. Stephan, J.M. Buhmann (2010) The binormal assumption on precision-recall curves. Proceedings of the 20th International Conference on Pattern Recognition (in press).

evaluation of classification performance on small ... · evaluation of classification performance...

Documents