evaluation (practice). 2 predicting performance assume the estimated error rate is 25%. how close...

20
Evaluation (practice)

Post on 19-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

Evaluation(practice)

Page 2: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

2

Predicting performance

Assume the estimated error rate is 25%. How close is this to the true error rate? Depends on the amount of test data

Prediction is just like tossing a (biased!) coin “Head” is a “success”, “tail” is an “error”

In statistics, a succession of independent events like this is called a Bernoulli process Statistical theory provides us with confidence intervals

for the true underlying proportion

Page 3: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

3

Confidence intervals We can say: p lies within a certain specified interval with a certain

specified confidence

Example: S=750 successes in N=1000 trials Estimated success rate: 75% How close is this to true success rate p?

Answer: with 80% confidence p[73.2,76.7] Another example: S=75 and N=100

Estimated success rate: 75% With 80% confidence p[69.1,80.1]

I.e. the probability that p[69.1,80.1] is 0.8.

Bigger the N more confident we are, i.e. the surrounding interval is smaller. Above, for N=100 we were less confident than for N=1000.

Page 4: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

4

Mean and Variance Let Y be the random variable with possible values

1 for success and 0 for error.

Let probability of success be p. Then probability of error is q=1-p.

What’s the mean?1*p + 0*q = p

What’s the variance?(1-p)2*p + (0-p)2*q = q2*p+p2*q = pq(p+q)= pq

Page 5: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

5

Estimating p Well, we don’t know p. Our goal is to estimate p. For this we make N trials, i.e. tests.

More trials we do more confident we are.

Let S be the random variable denoting the number of successes, i.e. S is the sum of N value samplings of Y.

Now, we approximate p with the success rate in N trials, i.e. S/N.

By the Central Limit Theorem, when N is big, the probability distribution of the random variable f=S/N is approximated by a normal distribution with mean p and variance pq/N.

Page 6: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

6

Estimating p c% confidence interval [–z ≤ X ≤ z] for random variable with 0

mean is given by:

Pr[− z≤ X≤ z]= c With a symmetric distribution:

Pr[− z≤ X≤ z]=1−2× Pr[ x≥ z] Confidence limits for the normal distribution with 0

mean and a variance of 1:

Thus: Pr[−1.65≤ X≤1.65]=90%To use this we have to reduce our random variable f=S/N to have 0 mean and unit variance

Page 7: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

7

Estimating p

Thus: Pr[−1.65≤ X≤1.65]=90%To use this we have to reduce our random variable S/N to have 0 mean and unit variance:Pr[−1.65≤ (S/N – p) / S/N ≤1.65]=90%

Now we solve two equations:(S/N – p) / S/N =1.65

(S/N – p) / S/N =-1.65

Page 8: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

8

Estimating p

Let N=100, and S=70S/N is sqrt(pq/N) and we approximate it by

sqrt(p'(1-p')/N) where p' is the estimation of p,

i.e. 0.7So, S/N is approximated by

sqrt(.7*.3/100) = .046The two equations become:(0.7 – p) / .046 =1.65

p = .7 - 1.65*.046 = .624(0.7 – p) / .046 =-1.65

p = .7 + 1.65*.046 = .776

Thus, we say:

With a 90% confidence we have that the success rate p of the classifier will be

0.624 p 0.776

Page 9: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

9

Exercise Suppose I want to be 95% confident in my estimation.

Looking at a detailed table we find: Pr[−2≤ X≤2] 95%

Normalizing S/N, we need to solve:(S/N – p) / f =2

(S/N – p) / f =-2

We approximate f with

where p' is the estimation of p through trials, i.e. S/N

N

S

N

S

NN

qps f 1

1''

2

11

NS

NS

N

pNS

2

11

NS

NS

N

pNS

N

S

N

S

NN

Sp 1

12

So we need to solve:

So,

Page 10: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

10

Exercise

Suppose N=1000 trials, S=590 successes p'=S/N=590/1000 = .59

N

S

N

S

NN

Sp 1

12

0155.59.1000

41.*59.259.

p

Page 11: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

11

Cross-validation

k-fold cross-validation:

First step: split data into k subsets of equal size

Second step: use each subset in turn for testing, the remainder for training

The error estimates are averaged to yield an overall error estimate

Page 12: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

12

Comparing data mining Schemes

Frequent question: which of two learning schemes performs better?

Obvious way: compare for example 10-fold Cross Validation estimates

Problem: variance in estimate We don’t know whether the results are reliable

need to use statistical-test for that

Page 13: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

13

Paired t-test

Student’s t-test tells whether the means of two samples are significantly different.

In our case the samples are cross-validation estimates for different datasets from the domain

Use a paired t-test because the individual samples are paired The same Cross Validation is applied twice

Page 14: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

14

Distribution of the means x1, x2, … xk and y1, y2, … yk are the 2k samples for the k

different datasets mx and my are the means With enough samples, the mean of a set of

independent samples is normally distributed Estimated variances of the means are

sx2 / k and sy

2 / k

If x and y are the true means then the following are approximately normally distributed with mean 0, and variance 1:

k

m

x

xx

2

k

m

y

yy

2

Page 15: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

15

Student’s distribution With small samples (k < 30) the mean follows

Student’s distribution with k–1 degrees of freedom similar shape, but wider than normal distribution

Confidence limits (mean 0 and variance 1):

Page 16: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

16

Distribution of the differences

Let md = mx – my

The difference of the means (md) also has a Student’s distribution with k–1 degrees of freedom

Let sd2 be the estimated variance of the

difference The standardized version of md is called the

t-statistic:

ks

mt

d

d

2

Page 17: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

17

Performing the test

Fix a significance level If a difference is significant at the % level, there

is a (100- )% chance that the true means differ

Divide the significance level by two because the test is two-tailed

Look up the value for z that corresponds to /2 If t–z or tz then the difference is significant

I.e. the null hypothesis (that the difference is zero) can be rejected

Page 18: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

18

Example We have compared two classifiers through cross-

validation on 10 different datasets. The error rates are:

Dataset Classifier A Classifier B Difference1 10.6 10.2 .42 9.8 9.4 .43 12.3 11.8 .54 9.7 9.1 .65 8.8 8.3 .56 10.6 10.2 .47 9.8 9.4 .48 12.3 11.8 .59 9.7 9.1 .610 8.8 8.3 .5

Page 19: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

19

Example

md = 0.48

sd = 0.0789

24.19100789.0

48.02

ks

mt

d

d

The critical value of t for a two-tailed statistical test, = 10% and 9 degrees of freedom is: 1.83

19.24 is way bigger than 1.83, so classifier B is much better than A.

Page 20: Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount

20

Dependent estimates We assumed that we have enough data to create several datasets of the desired size Need to reuse data if that's not the case

E.g. running cross-validations with different randomizations on the same data

Samples become dependent insignificant differences can become significant

A heuristic test is the corrected resampled t-test: Assume we use the repeated holdout method, with n1

instances for training and n2 for testing

New test statistic is:2

1

21d

d

snn

k

mt