ml 2012 lecture 02 a - free university of bozen-bolzanozini/ml/slides/ml_2012_lecture_02_a.pdf ·...

8
05/03/12 1 ETHEM ALPAYDIN © The MIT Press, 2010 [email protected] h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e Lecture Slides for

Upload: others

Post on 02-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

1  

ETHEM  ALPAYDIN  ©  The  MIT  Press,  2010  

 [email protected]  

h2p://www.cmpe.boun.edu.tr/~ethem/i2ml2e  

Lecture  Slides  for  

Page 2: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

2  

Learning  a  Class  from  Examples  �  Class  C  of  a  “family  car”  

�  PredicLon:  Is  car  x  a  family  car?  �  Knowledge  extracLon:  What  do  people  expect  from  a  family  car?  

� Output  of  classificaLon  target  funcLon:        PosiLve  (+)  and  negaLve  (–)  examples  

�  Input  representaLon:        x1:  price,  x2  :  engine  power  

3  Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

Training  set  X Nt

tt ,r 1}{ == xX

⎩⎨⎧

=negative  is    if  positive  is    if  

xx

01

r

4  

⎥⎦

⎤⎢⎣

⎡=

2

1

xx

x

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

Page 3: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

3  

Class  C

5  Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

C ! p1 " price " p2( ) AND e1 " engine power " e2( )

Hypothesis  class  H

h(x) =1 if h says x is positive0 if h says x is negative

!"#

$#

E(h|X ) = 1 h xt( ) ! rt( )t=1

N

"

1(a ! b) =1 if a ! b 0 if a = b #$%

6  

Error  of  h  on  X  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

H ! p' " price " p''( ) AND e' " engine power " e''( )

Page 4: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

4  

S,  G,  and  the  Version  Space  

7  

most  specific  hypothesis,  S  

most  general  hypothesis,  G  

h  ∈ H,  between  S  and  G  is  consistent    and  make  up  the    version  space  (Mitchell,  1997)  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

Margin  �  Choose  h  with  largest  margin  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)   8  

Page 5: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

5  

Noise  and  Model  Complexity  Use  the  simpler  one  because  �  Simpler  to  use      (lower  computaLonal      complexity)  

�  Easier  to  train  (lower      space  complexity)  

�  Easier  to  explain      (more  interpretable)  

�  Generalizes  befer  (lower      variance  -­‐  Occam’s  razor)  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)   9  

MulLple  Classes,  Ci  i=1,...,K  X = {xt ,rt }t=1

N

rit =

1 if xt ! Ci

0 if xt ! C j, j " i

#$%

&%

( )⎩⎨⎧

≠∈

∈=

       ,  if      if    

ijh

jt

it

ti C

Cxx

x01

Train  hypotheses    hi(x),  i  =1,...,K:  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

reject

E({hi}i=1K |X ) = 1 hi x

t( ) ! rit( )i=1

K

"t=1

N

"

Page 6: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

6  

Regression  

( ) ( )[ ]∑=

−=N

t

tt xgrN

gE1

21X|

( ) ( )[ ]∑=

+−=N

t

tt wxwrN

wwE1

20101

1X|,

{ }

( ) ε+=

ℜ∈

= =

tt

t

Nt

tt

xfr

r

rx 1,X

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

( ) 01 wxwxg +=

( ) 012

2 wxwxwxg ++=

Model  SelecLon  &  GeneralizaLon  �  Learning  is  an  ill-­‐posed  problem:;  data  only  is  not  sufficient  to  find  a  unique  soluLon  �  Examining  all  examples  of  family  and  non  family  do  not  allow  to  find  a  unique  classificaLon  funcLon  that  works  on  any  instance  

�  The  need  for  inducLve  bias,  assumpLons  about  H  �   H is  a  rectangle  in  the  input  space  �   H is  a  linear  funcLon    

� Model  selecLon:  Choice  of  the  right  bias  �  GeneralizaLon:  How  well  a  model  performs  on  new  data  �  Underfiong:  H  less  complex  than  C  or  f  �  Overfiong:  H  more  complex  than  C  or  f    

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)   12  

Page 7: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

7  

Triple  Trade-­‐Off  �  There  is  a  trade-­‐off  between  three  factors  (Dieferich,  2003):  1.  Complexity  of  H,  i.e.,  its  capacity  of  representaLon  of  

good  target  funcLons;  2.  The  size  N  of  the  training  set;  3.  GeneralizaLon  error,  E,  on  new  data  

� As  N  increases, E  decreases    � As  the  complexity  of  H  increases, first  E  decreases  and  then  E  increases  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)   13  

Cross-­‐ValidaLon  �  To  esLmate  the  generalizaLon  error  of  different  models,  we  need  data  unseen  during  training.  

� We  split  the  data  X  as  �  Training  set  (50%)  

�  The  learning  algorithm  uses  this  data  to  find  the  best  model  in  each  hypothesis  class  

�  ValidaLon  set  (25%)  �  The  generalizaLon  error  of  the  best  models  is  calculated  and  the  model  

with  minimal  error  is  selected  �  Test  set  (25%)  

�  The  selected  model  is  tested  on  new  data  to  evaluate  its  performance  

�  Resampling  when  there  is  few  data  �  The  split  can  be  done  several  Lmes  and  the  generalizaLon  error  of  each  model  averaged.  

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)   14  

Page 8: ml 2012 lecture 02 a - Free University of Bozen-Bolzanozini/ML/slides/ml_2012_lecture_02_a.pdf · " price "p 2) AND (e 1" engine power "e 2) Hypothesis’class’H h(x)= 1 if h says

05/03/12  

8  

Dimensions  of  a  Supervised  Learner  �  Learning  problem  

�  Given  a  training  set:  �  Build  a  good  and  useful  approximaLon  to  rt  using  the  model  g(xt|Θ)  

� Decisions  1.  Model:    

2.  Loss  funcLon:  

3.  OpLmizaLon  procedure:    

( )θ|xg

( ) ( )( )∑=t

tt grLE θθ |,| xX

15  

( )X|min  arg* θθθE=

Lecture  Notes  for  E  Alpaydın  2010  IntroducLon  to  Machine  Learning  2e  ©  The  MIT  Press  (V1.0)  

X = {xt ,rt }t=1N