the expected performance curve samy bengio, johnny mariéthoz, mikaela keller

32
1 The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller MI – 25. oktober 2007 Kresten Toftgaard Andersen

Upload: marlon

Post on 06-Jan-2016

31 views

Category:

Documents


0 download

DESCRIPTION

The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller. MI – 25. oktober 2007 Kresten Toftgaard Andersen. Introduction to the paper. By Samy Bengio, Johnny Mariéthoz and Mikaela Keller, 2005 For machine learning community and researchers ect, who need to compare models. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

1

The Expected Performance CurveSamy Bengio, Johnny Mariéthoz, Mikaela Keller

MI – 25. oktober 2007Kresten Toftgaard Andersen

Page 2: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

2

Introduction to the paper

By Samy Bengio, Johnny Mariéthoz and Mikaela Keller, 2005 For machine learning community and researchers ect, who need to compare

models.

Content of the paper: Introduces ROC curves very briefly. Points out some risks when using ROC curves for comparing different classifying

models. Argues that ROC curves can be misleading by showing some results. The authors contributes with a so called “Expected Performance Curve”, and

argues why it is better for comparing models. Extends EPC with confidence intervals and statistical difference tests. Concludes the paper summarizing their contribution and by listing strenghts and

weaknesses of ROC and EPC. Acknowledgement and references

Page 3: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

3

Content

Motivation Introduce terminology and notation, define problem. Introduce ROC curves Example: how to calculate a ROC Present arguments of why ROC curves should be used with great care Introduce EPC Continue example showing how to calculate an EPC Present arguments of why EPC might be better than ROC Confidence interval My opinion Discussion

Page 4: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

4

Motivation

ROC analysis is an important why to compare binary classifier models.

Can be used to select optimal models and discard suboptimal models.

Area of use: Medicine (diagnostic testing, evaluate evidence-based medicine approaches) Epidemiology (factors affecting health, evaluate optimal treatment approaches) Radiology (radar signals, evaluate new radiology techniques ) Psychology (signal detection, assess human detection of weak signals) Machine Learning (evaluation of machine learning techniques) …

Page 5: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

5

Definition of 2-class classifiers

Definition of 2-class classification problems:

Apply function and associated threshold on a seperate test data set (true class must be known) and count the outcome.

Page 6: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

6

Confusion matrix

Given a 2 class classifier and an instance, there are four possible outcomes:

TP: instance is positive and is classified as positive FN: instance is positive and is classified as negative TN: instance is negative and is classified as negative FN: instance is negative and is classified as positive

Page 7: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

7

Perfomance metrics

Selected measure is a pair which is generically called V1 and V2. V1 and V2 can be calculated in many ways depending on the situation. All

are simple combinations of TP, TN, FP and FN. Exact calculation of V1 and V2 is not important in this paper.

Page 8: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

8

Perfomance metrics

An unique measure generically called V combines V1 and V2 V can also be calculated in several ways depending on the situation

(Half Total Error Rate)

Page 9: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

9

What is a ROC curve?

ROC Abbreviation for ”Receiver Operating Characteristics”. Technique for visualizing, organizing and selecting classifiers based on their

performance. ROC can both be presented as a graph or a curve.

Classifiers Discrete classifiers (decision trees, rule sets ect.) Probabilistic classifiers (Naive Bayes, neural network ect.) Varying a threshold for a probabilistic classifier will trace a curve (ROC)

Following example will show this.

Page 10: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

10

Example

Page 11: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

11

Example

Page 12: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

12

Example

Threshold

Page 13: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

13

Example

Threshold

Page 14: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

14

Example

Threshold

Page 15: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

15

Example

Threshold

Page 16: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

16

Example

Page 17: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

17

Example

Page 18: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

18

ROC curves

• BEP = Breake Even Point

• BEP corresponds to the threshold nearst to a solutions such that V1 = V2

• The selected threshold have a significant impact on the model.

• The threshold represents the a trade-off between giving importance to V1 or V2.

Page 19: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

19

Potential risk of using ROC

Each point corresponds to a particular setting of the threshold. But in “real applications” the thresholds need to be decided before seeing the test set.

Normally the threshold is found by searching for the BEP using some equation. Possibility of mismatch because training set is different from the test set. Situations may occur where the optimal threshold found be using the training set,

doesn’t correspond to the optimal threshold on the test set. One parameter, the threshold, is tuned using the training set. Potential risk to

expect that the training error reflects the general error.

“Real applications often suffer from an additional mismatch between training and test conditions”.

Risk of a different trade-off (V1, V2) in test set. ROC curves does not take the risk of a mismatch into account. This probalility should be reflected in the procedure when calculating the performance curve.

Page 20: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

20

Potential risk of using ROC

ROC’s of two real models for a Text-Independent Speaker Verifacation task.

Looking at the curves only model B seems to be better than model A.

Looking at the thresholds, A is actually the best model.

Page 21: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

21

Expected performance curve

EPC present a range of possible expected performance on the test set. The calculation takes into account the possible mismatch while estimating the

desired threshold. A parameter alpha is used to estimate the possible missmatch of the threshold.

Framework:

Paremetric performance measure: C( V1(θ, D), V2(θ, D); )Depends on:The parameter , V1 and V2 computed on some data D for the threshold θ.

Example:C( V1(θ, D), V2(θ, D); )= C( Precision(θ, D), Recall(θ, D) ; )= - ( Precision(θ, D) + (1 - ) Recall(θ, D))

Procedure:Vary inside a reasonable range and for each estimate θ that minimizes C(-,-;) on a development set and then use the obtained θ to compute V on the test set. At last plot V with respect to .

Page 22: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

22

EPC Algorithm

Page 23: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

23

Example

Page 24: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

24

Example

Page 25: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

25

Example

Page 26: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

26

Example

Page 27: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

27

Example

Page 28: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

28

Example of an typical EPC

Alpha > 0,5 = more importance to false acceptance errors

Alpha < 0,5 = more importance to false rejection errors

Page 29: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

29

EPC in real applications

Expected Performance Curves for person authentication, where one wants to trade-off false acceptance rates with false rejection rates.

Expected Performance Curves for text categorization, where one wants to trade-off precision and recall and print the F1 measure.

Page 30: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

30

Confidence Interval Confidence intervals are used to indicate the reliability of an estimate

Page 31: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

31

My opinion

The authors got a point and the idea is good. Good for comparing models… …but hard to read much from EPC, ROC more informative. Cumbersome to compute EPC. Useful… maybe? Apparently only used by the authors?

Page 32: The Expected Performance Curve Samy Bengio, Johnny Mariéthoz, Mikaela Keller

32

End of Line

QuestionsDiscussion