week 2 video 2 - google sites · pdf file · 2015-08-29later this week we’ll...

39
Diagnostic Metrics, Part 1 Week 2 Video 2

Upload: lethuan

Post on 27-Mar-2018

217 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Diagnostic Metrics, Part 1

Week 2 Video 2

Page 2: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Different Methods, Different Measures

¨  Today we’ll focus on metrics for classifiers ¨  Later this week we’ll discuss metrics for regressors

¨  And metrics for other methods will be discussed later in the course

Page 3: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Metrics for Classifiers

Page 4: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Accuracy

Page 5: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Accuracy

¨  One of the easiest measures of model goodness is accuracy

¨  Also called agreement, when measuring inter-rater reliability

# of agreements total number of codes/assessments

Page 6: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Accuracy

¨  There is general agreement across fields that accuracy is not a good metric

Page 7: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Accuracy

¨  Let’s say that my new Kindergarten Failure Detector achieves 92% accuracy

¨  Good, right?

Page 8: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Non-even assignment to categories

¨  Accuracy does poorly when there is non-even assignment to categories ¤ Which is almost always the case

¨  Imagine an extreme case ¤ 92% of students pass Kindergarten ¤ My detector always says PASS

¨  Accuracy of 92% ¨  But essentially no information

Page 9: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Kappa

Page 10: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

(Agreement – Expected Agreement) (1 – Expected Agreement)

Kappa

Page 11: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 12: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the percent agreement?

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 13: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the percent agreement? •  80%

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 14: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is Data’s expected frequency for on-task?

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 15: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is Data’s expected frequency for on-task? •  75%

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 16: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is Detector’s expected frequency for on-task?

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 17: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is Detector’s expected frequency for on-task? •  65%

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 18: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the expected on-task agreement?

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 19: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the expected on-task agreement? •  0.65*0.75= 0.4875

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60

Page 20: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the expected on-task agreement? •  0.65*0.75= 0.4875

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60 (48.75)

Page 21: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What are Data and Detector’s expected frequencies for

off-task behavior?

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60 (48.75)

Page 22: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What are Data and Detector’s expected frequencies for off-task behavior? •  25% and 35%

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60 (48.75)

Page 23: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the expected off-task agreement?

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60 (48.75)

Page 24: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the expected off-task agreement? •  0.25*0.35= 0.0875

Detector Off-Task

Detector On-Task

Data Off-Task

20 5

Data On-Task

15 60 (48.75)

Page 25: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the expected off-task agreement? •  0.25*0.35= 0.0875

Detector Off-Task

Detector On-Task

Data Off-Task

20 (8.75) 5

Data On-Task

15 60 (48.75)

Page 26: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the total expected agreement?

Detector Off-Task

Detector On-Task

Data Off-Task

20 (8.75) 5

Data On-Task

15 60 (48.75)

Page 27: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is the total expected agreement? •  0.4875+0.0875 = 0.575

Detector Off-Task

Detector On-Task

Data Off-Task

20 (8.75) 5

Data On-Task

15 60 (48.75)

Page 28: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is kappa?

Detector Off-Task

Detector On-Task

Data Off-Task

20 (8.75) 5

Data On-Task

15 60 (48.75)

Page 29: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Computing Kappa (Simple 2x2 example)

•  What is kappa? •  (0.8 – 0.575) / (1-0.575) •  0.225/0.425 •  0.529

Detector Off-Task

Detector On-Task

Data Off-Task

20 (8.75) 5

Data On-Task

15 60 (48.75)

Page 30: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

So is that any good?

•  What is kappa? •  (0.8 – 0.575) / (1-0.575) •  0.225/0.425 •  0.529

Detector Off-Task

Detector On-Task

Data Off-Task

20 (8.75) 5

Data On-Task

15 60 (48.75)

Page 31: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Interpreting Kappa

¨  Kappa = 0 ¤ Agreement is at chance

¨  Kappa = 1 ¤ Agreement is perfect

¨  Kappa = -1 ¤ Agreement is perfectly inverse

¨  Kappa > 1 ¤  You messed up somewhere

Page 32: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Kappa<0

¨  This means your model is worse than chance

¨  Very rare to see unless you’re using cross-validation ¨  Seen more commonly if you’re using cross-validation

¤  It means your model is junk

Page 33: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

0<Kappa<1

¨  What’s a good Kappa?

¨  There is no absolute standard

Page 34: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

0<Kappa<1

¨  For data mined models, ¤ Typically 0.3-0.5 is considered good enough to call the

model better than chance and publishable ¤  In affective computing, lower is still often OK

Page 35: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Why is there no standard?

¨  Because Kappa is scaled by the proportion of each category

¨  When one class is much more prevalent ¤ Expected agreement is higher than

¨  If classes are evenly balanced

Page 36: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Because of this…

¨  Comparing Kappa values between two data sets, in a principled fashion, is highly difficult ¤  It is OK to compare Kappa values within a data set

¨  A lot of work went into statistical methods for comparing Kappa values in the 1990s

¨  No real consensus

¨  Informally, you can compare two data sets if the proportions of each category are “similar”

Page 37: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Quiz

•  What is kappa? A: 0.645 B: 0.502 C: 0.700 D: 0.398

Detector Insult during Collaboration

Detector No Insult during Collaboration

Data Insult

16 7

Data No Insult

8 19

Page 38: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Quiz

•  What is kappa? A: 0.240 B: 0.947 C: 0.959 D: 0.007

Detector Academic Suspension

Detector No Academic Suspension

Data Suspension

1 2

Data No Suspension

4 141

Page 39: Week 2 Video 2 - Google Sites · PDF file · 2015-08-29Later this week we’ll discuss metrics for regressors ! ... Non-even assignment to categories ! ... Typically 0.3-0.5 is considered

Next lecture

¨  ROC curves ¨  A’ ¨  Precision ¨  Recall