computational learning theory: occam’s razor learning fall 2017 computational learning theory:...

MachineLearningFall2017

ComputationalLearningTheory:Occam’sRazor

1SlidesbasedonmaterialfromDanRoth,Avrim Blum,TomMitchellandothers

Thislecture:ComputationalLearningTheory

• TheTheoryofGeneralization

• ProbablyApproximatelyCorrect(PAC)learning

• Positiveandnegativelearnabilityresults

• AgnosticLearning

• ShatteringandtheVCdimension

2

Wherearewe?

• TheTheoryofGeneralization– Whencanbetrustthelearningalgorithm?– Whatfunctionscanbelearned?– BatchLearning

• ProbablyApproximatelyCorrect(PAC)learning

• Positiveandnegativelearnabilityresults

• AgnosticLearning

• ShatteringandtheVCdimension

3

Thissection

1. Analyzeasimplealgorithmforlearningconjunctions

2. DefinethePACmodeloflearning

3. MakeformalconnectionstotheprincipleofOccam’srazor

4

Thissection

ü Analyzeasimplealgorithmforlearningconjunctions

ü DefinethePACmodeloflearning

3. MakeformalconnectionstotheprincipleofOccam’srazor

5

Occam’sRazor

NamedafterWilliamofOccam– AD1300s

Prefersimplerexplanationsovermorecomplexones

“Numquam ponenda est pluralitas sinenecessitate”

Historically,awidelyprevalentideaacrossdifferentschoolsofphilosophy

6

(Neverpositpluralitywithoutnecessity.)

TowardsformalizingOccam’sRazor

Claim:Theprobabilitythatthereisahypothesish2 Hthat:1. IsConsistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Proof:Lethbesuchabadhypothesisthathasanerror>²ProbabilitythathisconsistentwithoneexampleisPr[f(x)=h(x)]<1- ²

ThetrainingsetconsistsofmexamplesdrawnindependentlySo,probabilitythathisconsistentwithmexamples<(1- ²)m

Probabilitythatsomebadhypothesis inHisconsistentwithmexamplesislessthan|H|(1- ²)m

7






8

(Assumingconsistency)






9


Thatis,consistentyetbad






10








11








12








13



Occam’sRazor

Theprobabilitythatthereisahypothesish2 Hthatis1. Consistentwithmexamples,and2. HaserrD(h)>²islessthan|H|(1- ²)m

Justlikebefore,wewanttomakethisprobabilitysmall,saysmallerthan±|H|(1- ²)m<±

ln(|H|)+mln(1- ²)<ln ±

WeknowthatLet’suseln(1- ²) <-² togetasafer±

14

Thatis,if then,theprobabilityofgettingabadhypothesisissmall

Occam’sRazor



ln(|H|)+mln(1- ²)<ln ±


15


Occam’sRazor



ln(|H|)+mln(1- ²)<ln ±


16


Occam’sRazor



ln(|H|)+mln(1- ²)<ln ±


17


Occam’sRazor



ln(|H|)+mln(1- ²)<ln ±


18


Occam’sRazor



ln(|H|)+mln(1- ²)<ln ±


19


Occam’sRazor

LetHbeanyhypothesisspace.Withprobability1- ±,ahypothesish2 Hthatisconsistent withatrainingsetofsizem willhaveanerror<² onfutureexamplesif

ThisiscalledtheOccam’sRazorbecauseitexpressesapreferencetowardssmallerhypothesisspaces.

Showswhenam-consistenthypothesisgeneralizeswell(i.e error<²).

Complicated/largerhypothesisspacesarenotnecessarilybad.Butsimpleronesareunlikelytofoolusbybeingconsistentwithmanyexamples!

20

Occam’sRazor





21

1.Expectinglowererrorincreasessamplecomplexity(i.e moreexamplesneededfortheguarantee)

Occam’sRazor





22


2.Ifwehavealargerhypothesisspace,thenwewillmakelearningharder(i.e highersamplecomplexity)

Occam’sRazor





23


2.Ifwehavealargerhypothesisspace,thenwewillmakelearningharder(i.e highersamplecomplexity)

3.Ifwewantahigherconfidenceintheclassifierwewillproduce,samplecomplexitywillbehigher.

Occam’sRazor





24

Occam’sRazor





25

Occam’sRazor





26

Occam’sRazor





27

Consistent LearnersandOccam’sRazorFromthedefinition,wegetthefollowinggeneralschemeforPAClearning

28

GivenasampleDofmexamples• FindsomehÎ H thatisconsistentwithallmexamples

• Ifmislargeenough,aconsistenthypothesismustbecloseenoughtof

• Checkthatmdoesnothavetobetoolarge(i.e polynomialintherelevantparameters):weshowedthatthe“closeness”guaranteerequiresthat

m>1/² (ln |H|+ln 1/±)

• ShowthattheconsistenthypothesishÎ H canbecomputedefficiently

Consistent LearnersandOccam’sRazorFromthedefinition,wegetthefollowinggeneralschemeforPAClearning

Weworkedoutthedetailsforconjunctions• TheEliminationalgorithmtofindahypothesishthatisconsistentwiththetraining

set(easytocompute)• Weshoweddirectlythatifwehavesufficientlymanyexamples(polynomialinthe

parameters),thanhisclosetothetargetfunction.29

GivenasampleDofmexamples• FindsomehÎ H thatisconsistentwithallmexamples

• Ifmislargeenough,aconsistenthypothesismustbecloseenoughtof

• Checkthatmdoesnothavetobetoolarge(i.e polynomialintherelevantparameters):weshowedthatthe“closeness”guaranteerequiresthat

m>1/² (ln |H|+ln 1/±)

• ShowthattheconsistenthypothesishÎ H canbecomputedefficiently

Exercises

Wehaveseenthedecisiontreelearningalgorithm.Supposeourproblemhasnbinaryfeatures.Whatisthesizeofthehypothesisspace?

AredecisiontreesefficientlyPAClearnable?

30

computational learning theory: occam’s razor learning fall 2017 computational learning theory:...

Documents