Download - Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

MachineLearningforIntelligentSystems

Instructors:NikaHaghtalab (thistime)andThorstenJoachims

Lecture3:SupervisedLearningandDecisionTrees

Reading:UML18-18.2

PlacementExam

82

84

86

88

90

92

94

Calculus Optimization Probability Lin.Algebra

Example:AppleHarvestFestivalLearntoclassifytastyapplesbasedonexperience.à Trainingset:Applesyouhavetastedinthepast,theirfeaturesandwhethertheyweretasty.

à Youseeanapple,wouldyoubuyit?

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes#11 A Red Medium Crunchy ?

Features Label

SupervisedLearningInstanceSpace:

InstancespaceX includingfeaturerepresentation.E.g.,X = A, B × red, green × large,medium, small × crunchy, soft .

TargetAttributes(Labels):AsetY oflabels.E.g.,Y = Tasty, Not Tasty orY = Yes, No .

Hiddentargetfunction:Anunknownfunction,f: X → Y,e.g.,howanapplesfeaturescorrelate

withitstastinessinreallife.TrainingData:

Asetoflabeledpairs x, f x ∈ X×Y thatwehaveseenbefore,e.g.,wehavetasteda(A, red, large, crunchy) applebeforethatwasTasty.

Givenalargeenoughnumberoftrainingexamples,learnahypothesish: X → Y thatapproximatesf(⋅)

Informal:Ourmaingoal

HypothesisSpaceHypothesisspace:

ThesetH offunctionsweconsiderwhenlookingforh: X → Y.

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes

Example:AllhypothesesthatareANDoffeature-values:Farm = whatever ∧ color = red ∧ size = whatever∧ Iirmness = Crunchy

Consistency

Afunctionh: X → Y isconsistentwithasetoflabeledtrainingexamplesS ifandonlyifh x = y forall x, y ∈ S.

Consistency

Example:Is Farm = whatever ∧ color = red ∧ size = whatever ∧Iirmness = Crunchy consistentwiththetableofapples?

IsitconsistentwiththeapplesthatcamefromfarmA?

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes

InductiveLearning

Givenalargeenoughnumberoftrainingexamples,andahypothesisspaceH,

learnahypothesish ∈ H thatapproximatesf(⋅)

MoreFormal:Ourmaingoal

Ourhope:Thereisah ∈ H thatapproximatesf.

Ourstrategy: Our trainingexamplesarerepresentativeoftherealityàWehaveseenmany examples.à Theywereselectedwithoutanybias.àAnyhypothesisthatisconsistentwithtrainingdatawouldbeaccurateonunobservedinstances.

Findh ∈ H suchthatisconsistentwithS.

h x = f x foralmostallx ∈ X

UsingConsistencyinLearningIdea: OnatrainingsetS,returnanyh ∈ H thatisconsistentwithit.

TheversionspaceVS(H, S) isasubsetoffunctionsfromHthatisconsistentwithS.

VersionSpace:

Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes

Example:ForthefollowingsetoftrainingexamplesSwhatandH thatisallhypothesesthatareANDoffeature-valuesettings.WhatisVS(H, S)?

ComputingtheVersionSpace

• InitializeVS = H• Foreachtrainingexample x, y ∈ S,

o removeanyh ∈ H suchthath x ≠ y.• OutputVS.

List-then-EliminateAlgorithm

Atahighlevel:ListallhypothesisinH,RemovethemiftheyarenotconsistentwithsomeexampleinS.

Takeaway:Thealgorithmworks,butà Keepingtrackoftheversionspaceistimeandspaceconsuming.à Onlyneedoneconsistentà Buildaconsistenthypothesisdirectly.

DecisionTrees

Firmness

soft

No

crunchy

Color

greenred

Yes Size

largesmall

No Yes

Internalnodes:Testonefeature,e.g.,color.

Branchesfromaninternalnode:Possiblevaluesofthefeaturethat’sbeingtested,e.g.,{red,green}.

Leafnodes:Thelabelofinstanceswhosefeaturescorrespondtotheroot-to-leafpath,e.g.,Yes orNo.

No

medium


UsingaDecisionTreeLetfunctionf bethefollowingdecisiontree.1. Whatisf((B, Green, Large, Crunchy))?2. Whatisf((B, Red, Large, Soft))?3. Whatisf((A, Green, Small, Crunchy))?

Firmness

soft

No

crunchy

Color

greenred

Yes Size

largesmall

No YesNo

medium

✓

✓

✓

✓

✓

✓

✓

✓

✓

✓

Is f consistentwiththetrainingsetinthetable.

MakingaTree

Example:Whatisthedecisiontreecorrespondingtofunctions1. Iirmness = Crunchy ?2. Iirmness = Crunchy ∧ (color = Red)?3. color = Red ∨ size = Large ?4. Iirmness = Crunchy ∧ color = Red ∨ size = Large ?

Top-DownInductionofDecisionTreesIdea:• Don’tuseList-then-Eliminatetooinefficient.• Insteadgrowagooddecisiontreetostartwith.

àWegrowfromtheroottotheleavesà Repeatedlytakeanexitingleafthatdoesnotincludeadefinitivelabelandreplaceitwithaninternalnode.

• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.

• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat

SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.

TD-IDT(S,y)

GrowaconsistentDT


• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.

• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat

SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.

TD-IDT(S,y)

WhichSplit?HowshouldwepickfeatureA forsplitting.à Howdoesthepredictionimproveafterthesplità Ateachnode,takethenumberofpositiveandnegativeinstances.E.g.(4Yes,2No).

à Ifweweretostop,besttopredictthemajoritylabel.E.g.,Yes.à Howdoweachieveamoredefiniteprediction?

No

Firmness

softcrunchy

Yes

(4Y,6N)

(4Y,2N) (0Y,4N)

Yes

Size

largesmall

No No

(4Y,6N)

(2Y,3N)

medium

(0Y,3N) (2Y,0N)

Yes

Farm

BA

No

(4Y,6N)

(1Y,4N) (3Y,2N)

WhichSplit?

No

Firmness

softcrunchy

Yes

{4Y,6N}

{4Y,2N} {0Y,4N}

Yes

Size

largesmall

No No

(4Y,6N)

(2Y,3N)

medium

(0Y,3N) (2Y,0N)

Yes

Farm

BA

No

(4Y,6N)

(1Y,4N) (3Y,2N)

Differentwaystomeasurethis:• Attributethatminimizesthenumberoferror:

• Beforesplit:Err 𝑆 = size of the minority label.• Aftersplit:Err 𝑆|𝐴 = ∑^_ Err(𝑆a)

• Attributethatminimizesentropy(maximizeInformationGain).• Beforesplit:𝐻 𝑆 = −∑𝑝 𝑥 log(𝑝 𝑥 )• Aftersplit:I 𝑆|𝐴 = ∑^_ 𝑝 𝑆a 𝐻(𝑆a)

Example:TextClassification• Task:LearnrulethatclassifiesReutersBusinessNews

• Class+:“CorporateAcquisitions”• Class-:Otherarticles• 2000traininginstances

• Representation:• Booleanattributes,indicatingpresenceofakeywordinarticle• 9947suchkeywords(moreaccurately,word“stems”)

LAROCHE STARTS BID FOR NECO SHARES

Investor David F. La Roche of North Kingstown, R.I., said he is offering to purchase 170,000 common shares of NECO Enterprises Inc at 26 dlrs each. He said the successful completion of the offer, plus shares he already owns, would give him 50.5 pct of NECO's 962,016 common shares. La Roche said he may buy more, and possible all NECO shares. He said the offer and withdrawal rights will expire at 1630 EST/2130 gmt, March 30, 1987.

SALANT CORP 1ST QTR FEB 28 NET

Oper shr profit seven cts vs loss 12 cts. Oper net profit 216,000 vs loss 401,000. Sales 21.4 mln vs 24.9 mln. NOTE: Current year net excludes 142,000 dlr tax credit. Company operating in Chapter 11 bankruptcy.

+ -

DecisionTreefor“CorporateAcq.”• vs=1:-• vs=0:• |export=1:…• |export=0:• ||rate=1:• |||stake=1:+• |||stake=0:• ||||debenture=1:+• ||||debenture=0:• |||||takeover=1:+• |||||takeover=0:• ||||||file=0:-• ||||||file=1:• |||||||share=1:+• |||||||share=0:-…andmanymore

Learnedtree:• has437nodes• isconsistent

Accuracyoflearnedtree:• 11%errorrate

Note:wordstemsexpandedforimprovedreadability.

Download - Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and

Top Related