introduction to machine learningethem/i2ml/slides/v1... · introduction to machine learning ethem...
TRANSCRIPT
INTRODUCTION TO
Machine Learning
ETHEM ALPAYDIN© The MIT Press, 2004
[email protected]://www.cmpe.boun.edu.tr/~ethem/i2ml
Lecture Slides for
CHAPTER 14:
Assessing and Comparing Classification Algorithms
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
3
Introduction
Questions:
Assessment of the expected error of a learning algorithm: Is the error rate of 1-NN less than 2%?
Comparing the expected errors of two algorithms: Is k-NN more accurate than MLP ?
Training/validation/test sets
Resampling methods: K-fold cross-validation
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
4
Algorithm Preference
Criteria (Application-dependent):
Misclassification error, or risk (loss functions)
Training time/space complexity
Testing time/space complexity
Interpretability
Easy programmability
Cost-sensitive learning
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
5
Resampling and K-Fold Cross-Validation
The need for multiple training/validation sets{Xi,Vi}i: Training/validation sets of fold i
K-fold cross-validation: Divide X into k, Xi,i=1,...,K
Ti share K-2 parts121
31222
32111
−∪∪∪==
∪∪∪==
∪∪∪==
KKKK
K
K
XXXTXV
XXXTXV
XXXTXV
L
M
L
L
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
6
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )
( ) ( )1510
2510
259
159
124
224
223
123
112
212
211
111
XVXT
XVXT
XVXT
XVXT
XVXT
XVXT
==
==
==
==
==
==
M
5×2 Cross-Validation
5 times 2 fold cross-validation (Dietterich, 1998)
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
7
Bootstrapping
Draw instances from a dataset with replacement
Prob that we do not pick an instance after N draws
that is, only 36.8% is new!
36801
1 1 .eN
N
=≈⎟⎠⎞
⎜⎝⎛ − −
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
8
Measuring Error
Error rate = # of errors / # of instances = (FN+FP) / NRecall = # of found positives / # of positives
= TP / (TP+FN) = sensitivity = hit ratePrecision = # of found positives / # of found
= TP / (TP+FP)Specificity = TN / (TN+FP)False alarm rate = FP / (FP+TN) = 1 - Specificity
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
9
ROC Curve
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
10
Interval Estimation
X = { xt }t where xt ~ N ( µ, σ2)
m ~ N ( µ, σ2/N)
100(1‐ α) percentconfidence interval
( )
( )
α−=⎭⎬⎫
⎩⎨⎧ σ
+<µ<σ
−
=⎭⎬⎫
⎩⎨⎧ σ
+<µ<σ
−
=⎭⎬⎫
⎩⎨⎧ <
σµ−
<−
σµ−
αα 1
950961961
950961961
22N
zmN
zmP
.N
.mN
.mP
..m
N.P
~m
N
//
Z
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
11
When σ2 is not known:
( )
α−=⎭⎬⎫
⎩⎨⎧ µ<
σ−
=⎭⎬⎫
⎩⎨⎧ µ<
σ−
=⎭⎬⎫
⎩⎨⎧ <
σµ−
α 1
950641
950641
NzmP
.N
.mP
..m
NP
( ) ( ) ( )
α−=⎭⎬⎫
⎩⎨⎧ +<µ<−
µ−−−=
−α−α
−∑
1
1
1212
1
22
N
Stm
N
StmP
t~SmN
N/mxS
N,/N,/
Nt
t
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
12
Hypothesis Testing
Reject a null hypothesis if not supported by the sample with enough confidenceX = { xt }t where xt ~ N ( µ, σ2)
H0: µ = µ0 vs. H1: µ ≠ µ0
Accept H0 with level of significance α if µ0 is in the 100(1- α) confidence interval
Two-sided test
( ) ( )220
// z,zmN
αα−∈σ
µ−
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
13
One-sided test: H0: µ ≤ µ0 vs. H1: µ > µ0
Accept if
Variance unknown: Use t, instead of z
Accept H0: µ = µ0 if
( ) ( )α∞−∈σ
µ−z,
mN 0
( ) ( )12120
−α−α−∈µ−
N,/N,/ t,tS
mN
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
14
Assessing Error: H0: p ≤ p0 vs. H1: p > p0
Single training/validation set: Binomial Test
If error prob is p0, prob that there are e errors or less in N validation trials is
1‐ α
Accept if this prob is less than 1- α
N=100, e=20
{ } ( ) jNjje
j
ppj
NeXP
−
=
−⎟⎟⎠
⎞⎜⎜⎝
⎛=≤ ∑ 00
1
1
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
15
Normal Approximation to the Binomial
Number of errors X is approx N with mean Np0 and var Np0(1-p0)
Accept if this prob for X = e is less than z1-α
1‐ α
( )Z~
pNp
NpX
00
0
1−−
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
16
Paired t Test
Multiple training/validation setsxt
i = 1 if instance t misclassified on fold iError rate of fold i:
With m and s2 average and var of pi
we accept p0 or less error if
is less than tα,K-1
N
xp
N
t
ti
i∑ == 1
( )1
0−
−Kt~
SpmK
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
17
Comparing Classifiers: H0: µ0 = µ1 vs. H1: µ0 ≠ µ1
Single training/validation set: McNemar’s Test
Under H0, we expect e01= e10=(e01+ e10)/2
Accept if < X2α,1
( ) 21
1001
2
1001 1X~
ee
ee
+−−
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
18
K-Fold CV Paired t Test
Use K-fold cv to get K training/validation folds
pi1, pi
2: Errors of classifiers 1 and 2 on fold i
pi = pi1 – pi
2 : Paired difference on fold i
The null hypothesis is whether pi has mean 0
( )
( ) ( )12121
1
2
21
00
inif Accept0
1
0 vs.0
−−−
==
−⋅
=−
−
−==
≠=
∑∑
K,/K,/K
K
i i
K
i i
t,tt~s
mKsmK
K
mps
K
pm
:H:H
αα
µµ
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
19
5×2 cv Paired t Test
Use 5×2 cv to get 2 folds of 5 tra/val replications (Dietterich, 1998)
pi(j) : difference btw errors of 1 and 2 on fold j=1, 2
of replication i=1,...,5
Two-sided test: Accept H0: μ0 = μ1 if in (-tα/2,5,tα/2,5) One-sided test: Accept H0: μ0 ≤ μ1 if < tα,5
( ) ( )( ) ( )( ) ( )( )( )
55
1
2
11
2221221
5
2
t~/s
p
pppps/ppp
i i
iiiiiiii
∑ =
−+−=+=
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
20
5×2 cv Paired F Test
Two-sided test: Accept H0: μ0 = μ1 if < Fα,10,5
( )( )5105
1
2
5
1
2
1
2
2,
i i
i j
ji
F~s
p
∑∑ ∑
=
= =
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
21
Comparing L>2 Algorithms: Analysis of Variance (Anova)
Errors of L algorithms on K folds
We construct two estimators to σ2 .
One is valid if H0 is true, the other is always valid.
We reject H0 if the two estimators disagree.
L:H µ==µ=µ L210
( ) K,...,i,L,...,j,,~X jij 1 12 ==σµN
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
22
( )
( )
( )
( ) ( )
212
0
2212
2
1
22
22
2
21
2
1
0
have wetrue, is whenSo
1
namely, , is of estimator an Thus1
true is If
−
−
=
=
=
σ
−≡σ
−
−−
=σ
⋅σ−
−==
σµ=
∑∑
∑
∑∑
∑
L
jjL
j
j
L
j
j
j j
L
j j
K
i
ijj
~SSb
H
mmKSSb~K/
mm
L
mmKˆ
SKL
mmS
L
mm
K/,~K
Xm
:H
X
X
N
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
23
( ) ( )( )
( )
( ) ( )
( )( )( )( ) ( )
( )11210
11
22
212
212
2
2
2
1
221
2
2
2
20
if
11
11
1
1
1
: variancesgroupof average
the is to estimator second our of Regardless
−−
−−
−−
=
=
<===
−−
=⎟⎟⎠
⎞⎜⎜⎝
⎛−⎟⎟
⎠
⎞⎜⎜⎝
⎛−
−
−≡
−−
==−
−=
∑∑
∑∑∑∑
KL,L,L
KL,L
KLKj
j ijij
j i
jijL
j
j
K
i jijj
j
F:H
F~KL/SSw
L/SSbKL
/SSw/
L/SSb
~SSw
~S
K
mXSSw
KL
mX
L
Sˆ
K
mXS
S
H
αµµµ
σσσσ
σ
σ
L
XX
Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
24
Other Tests
Range test (Newman-Keuls):Nonparametric tests (Sign test, Kruskal-Wallis)Contrasts: Check if 1 and 2 differ from 3,4, and 5Multiple comparisons require Bonferroni correctionIf there are m tests, to have an overall significance of α, each test should have a significance of α/m.Regression: CLT states that the sum of iid variables from any distribution is approximately normal and the preceding methods can be used.Other loss functions ?
23 145