ml 2012 lecture 02 a - free university of bozen-bolzanozini/ml/slides/ml_2012_lecture_02_a.pdf ·...
TRANSCRIPT
05/03/12
1
ETHEM ALPAYDIN © The MIT Press, 2010
h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e
Lecture Slides for
05/03/12
2
Learning a Class from Examples � Class C of a “family car”
� PredicLon: Is car x a family car? � Knowledge extracLon: What do people expect from a family car?
� Output of classificaLon target funcLon: PosiLve (+) and negaLve (–) examples
� Input representaLon: x1: price, x2 : engine power
3 Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
Training set X Nt
tt ,r 1}{ == xX
⎩⎨⎧
=negative is if positive is if
xx
01
r
4
⎥⎦
⎤⎢⎣
⎡=
2
1
xx
x
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
05/03/12
3
Class C
5 Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
C ! p1 " price " p2( ) AND e1 " engine power " e2( )
Hypothesis class H
h(x) =1 if h says x is positive0 if h says x is negative
!"#
$#
E(h|X ) = 1 h xt( ) ! rt( )t=1
N
"
1(a ! b) =1 if a ! b 0 if a = b #$%
6
Error of h on X
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
H ! p' " price " p''( ) AND e' " engine power " e''( )
05/03/12
4
S, G, and the Version Space
7
most specific hypothesis, S
most general hypothesis, G
h ∈ H, between S and G is consistent and make up the version space (Mitchell, 1997)
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
Margin � Choose h with largest margin
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0) 8
05/03/12
5
Noise and Model Complexity Use the simpler one because � Simpler to use (lower computaLonal complexity)
� Easier to train (lower space complexity)
� Easier to explain (more interpretable)
� Generalizes befer (lower variance -‐ Occam’s razor)
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0) 9
MulLple Classes, Ci i=1,...,K X = {xt ,rt }t=1
N
rit =
1 if xt ! Ci
0 if xt ! C j, j " i
#$%
&%
( )⎩⎨⎧
≠∈
∈=
, if if
ijh
jt
it
ti C
Cxx
x01
Train hypotheses hi(x), i =1,...,K:
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
reject
E({hi}i=1K |X ) = 1 hi x
t( ) ! rit( )i=1
K
"t=1
N
"
05/03/12
6
Regression
( ) ( )[ ]∑=
−=N
t
tt xgrN
gE1
21X|
( ) ( )[ ]∑=
+−=N
t
tt wxwrN
wwE1
20101
1X|,
{ }
( ) ε+=
ℜ∈
= =
tt
t
Nt
tt
xfr
r
rx 1,X
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
( ) 01 wxwxg +=
( ) 012
2 wxwxwxg ++=
Model SelecLon & GeneralizaLon � Learning is an ill-‐posed problem:; data only is not sufficient to find a unique soluLon � Examining all examples of family and non family do not allow to find a unique classificaLon funcLon that works on any instance
� The need for inducLve bias, assumpLons about H � H is a rectangle in the input space � H is a linear funcLon
� Model selecLon: Choice of the right bias � GeneralizaLon: How well a model performs on new data � Underfiong: H less complex than C or f � Overfiong: H more complex than C or f
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0) 12
05/03/12
7
Triple Trade-‐Off � There is a trade-‐off between three factors (Dieferich, 2003): 1. Complexity of H, i.e., its capacity of representaLon of
good target funcLons; 2. The size N of the training set; 3. GeneralizaLon error, E, on new data
� As N increases, E decreases � As the complexity of H increases, first E decreases and then E increases
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0) 13
Cross-‐ValidaLon � To esLmate the generalizaLon error of different models, we need data unseen during training.
� We split the data X as � Training set (50%)
� The learning algorithm uses this data to find the best model in each hypothesis class
� ValidaLon set (25%) � The generalizaLon error of the best models is calculated and the model
with minimal error is selected � Test set (25%)
� The selected model is tested on new data to evaluate its performance
� Resampling when there is few data � The split can be done several Lmes and the generalizaLon error of each model averaged.
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0) 14
05/03/12
8
Dimensions of a Supervised Learner � Learning problem
� Given a training set: � Build a good and useful approximaLon to rt using the model g(xt|Θ)
� Decisions 1. Model:
2. Loss funcLon:
3. OpLmizaLon procedure:
( )θ|xg
( ) ( )( )∑=t
tt grLE θθ |,| xX
15
( )X|min arg* θθθE=
Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)
X = {xt ,rt }t=1N