ml 2012 lecture 02 a - free university of bozen-bolzanozini/ml/slides/ml_2012_lecture_02_a.pdf ·...

05/03/12

1

ETHEM ALPAYDIN © The MIT Press, 2010

[email protected]

h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e

Lecture Slides for

05/03/12

2

Learning a Class from Examples �  Class C of a “family car”

�  PredicLon: Is car x a family car? �  Knowledge extracLon: What do people expect from a family car?

� Output of classificaLon target funcLon: PosiLve (+) and negaLve (–) examples

�  Input representaLon: x1: price, x2 : engine power

3 Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)

Training set X Nt

tt ,r 1}{ == xX

⎩⎨⎧

=negative is if positive is if

xx

01

r

4

⎥⎦

⎤⎢⎣

⎡=

2

1

xx

x

Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)

05/03/12

3

Class C

5 Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0)

C ! p1 " price " p2( ) AND e1 " engine power " e2( )

Hypothesis class H

h(x) =1 if h says x is positive0 if h says x is negative

!"#

$#

E(h|X ) = 1 h xt( ) ! rt( )t=1

N

"

1(a ! b) =1 if a ! b 0 if a = b #$%

6

Error of h on X


H ! p' " price " p''( ) AND e' " engine power " e''( )

05/03/12

4

S, G, and the Version Space

7

most specific hypothesis, S

most general hypothesis, G

h ∈ H, between S and G is consistent and make up the version space (Mitchell, 1997)


Margin �  Choose h with largest margin

Lecture Notes for E Alpaydın 2010 IntroducLon to Machine Learning 2e © The MIT Press (V1.0) 8

05/03/12

5

Noise and Model Complexity Use the simpler one because �  Simpler to use (lower computaLonal complexity)

�  Easier to train (lower space complexity)

�  Easier to explain (more interpretable)

�  Generalizes befer (lower variance -‐ Occam’s razor)


MulLple Classes, Ci i=1,...,K X = {xt ,rt }t=1

N

rit =

1 if xt ! Ci

0 if xt ! C j, j " i

#$%

&%

( )⎩⎨⎧

≠∈

∈=

, if if

ijh

jt

it

ti C

Cxx

x01

Train hypotheses hi(x), i =1,...,K:


reject

E({hi}i=1K |X ) = 1 hi x

t( ) ! rit( )i=1

K

"t=1

N

"

05/03/12

6

Regression

( ) ( )[ ]∑=

−=N

t

tt xgrN

gE1

21X|

( ) ( )[ ]∑=

+−=N

t

tt wxwrN

wwE1

20101

1X|,

{ }

( ) ε+=

ℜ∈

= =

tt

t

Nt

tt

xfr

r

rx 1,X


( ) 01 wxwxg +=

( ) 012

2 wxwxwxg ++=

Model SelecLon & GeneralizaLon �  Learning is an ill-‐posed problem:; data only is not sufficient to find a unique soluLon �  Examining all examples of family and non family do not allow to find a unique classificaLon funcLon that works on any instance

�  The need for inducLve bias, assumpLons about H �  H is a rectangle in the input space �  H is a linear funcLon

� Model selecLon: Choice of the right bias �  GeneralizaLon: How well a model performs on new data �  Underfiong: H less complex than C or f �  Overfiong: H more complex than C or f


05/03/12

7

Triple Trade-‐Off �  There is a trade-‐off between three factors (Dieferich, 2003): 1.  Complexity of H, i.e., its capacity of representaLon of

good target funcLons; 2.  The size N of the training set; 3.  GeneralizaLon error, E, on new data

� As N increases, E decreases � As the complexity of H increases, first E decreases and then E increases


Cross-‐ValidaLon �  To esLmate the generalizaLon error of different models, we need data unseen during training.

� We split the data X as �  Training set (50%)

�  The learning algorithm uses this data to find the best model in each hypothesis class

�  ValidaLon set (25%) �  The generalizaLon error of the best models is calculated and the model

with minimal error is selected �  Test set (25%)

�  The selected model is tested on new data to evaluate its performance

�  Resampling when there is few data �  The split can be done several Lmes and the generalizaLon error of each model averaged.


05/03/12

8

Dimensions of a Supervised Learner �  Learning problem

�  Given a training set: �  Build a good and useful approximaLon to rt using the model g(xt|Θ)

� Decisions 1.  Model:

2.  Loss funcLon:

3.  OpLmizaLon procedure:

( )θ|xg

( ) ( )( )∑=t

tt grLE θθ |,| xX

15

( )X|min arg* θθθE=


X = {xt ,rt }t=1N

ml 2012 lecture 02 a - free university of bozen-bolzanozini/ml/slides/ml_2012_lecture_02_a.pdf ·...

Documents