![Page 1: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/1.jpg)
MachineLearningforIntelligentSystems
Instructors:NikaHaghtalab (thistime)andThorstenJoachims
Lecture3:SupervisedLearningandDecisionTrees
Reading:UML18-18.2
![Page 2: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/2.jpg)
PlacementExam
82
84
86
88
90
92
94
Calculus Optimization Probability Lin.Algebra
![Page 3: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/3.jpg)
Example:AppleHarvestFestivalLearntoclassifytastyapplesbasedonexperience.à Trainingset:Applesyouhavetastedinthepast,theirfeaturesandwhethertheyweretasty.
à Youseeanapple,wouldyoubuyit?
Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes#11 A Red Medium Crunchy ?
Features Label
![Page 4: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/4.jpg)
SupervisedLearningInstanceSpace:
InstancespaceX includingfeaturerepresentation.E.g.,X = A, B × red, green × large,medium, small × crunchy, soft .
TargetAttributes(Labels):AsetY oflabels.E.g.,Y = Tasty, Not Tasty orY = Yes, No .
Hiddentargetfunction:Anunknownfunction,f: X → Y,e.g.,howanapplesfeaturescorrelate
withitstastinessinreallife.TrainingData:
Asetoflabeledpairs x, f x ∈ X×Y thatwehaveseenbefore,e.g.,wehavetasteda(A, red, large, crunchy) applebeforethatwasTasty.
Givenalargeenoughnumberoftrainingexamples,learnahypothesish: X → Y thatapproximatesf(⋅)
Informal:Ourmaingoal
![Page 5: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/5.jpg)
HypothesisSpaceHypothesisspace:
ThesetH offunctionsweconsiderwhenlookingforh: X → Y.
Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes
Example:AllhypothesesthatareANDoffeature-values:Farm = whatever ∧ color = red ∧ size = whatever∧ Iirmness = Crunchy
![Page 6: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/6.jpg)
Consistency
Afunctionh: X → Y isconsistentwithasetoflabeledtrainingexamplesS ifandonlyifh x = y forall x, y ∈ S.
Consistency
Example:Is Farm = whatever ∧ color = red ∧ size = whatever ∧Iirmness = Crunchy consistentwiththetableofapples?
IsitconsistentwiththeapplesthatcamefromfarmA?
Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes
![Page 7: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/7.jpg)
InductiveLearning
Givenalargeenoughnumberoftrainingexamples,andahypothesisspaceH,
learnahypothesish ∈ H thatapproximatesf(⋅)
MoreFormal:Ourmaingoal
Ourhope:Thereisah ∈ H thatapproximatesf.
Ourstrategy: Our trainingexamplesarerepresentativeoftherealityàWehaveseenmany examples.à Theywereselectedwithoutanybias.àAnyhypothesisthatisconsistentwithtrainingdatawouldbeaccurateonunobservedinstances.
Findh ∈ H suchthatisconsistentwithS.
h x = f x foralmostallx ∈ X
![Page 8: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/8.jpg)
UsingConsistencyinLearningIdea: OnatrainingsetS,returnanyh ∈ H thatisconsistentwithit.
TheversionspaceVS(H, S) isasubsetoffunctionsfromHthatisconsistentwithS.
VersionSpace:
Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#7 A Red Small Soft No#10 A Red Small Crunchy Yes
Example:ForthefollowingsetoftrainingexamplesSwhatandH thatisallhypothesesthatareANDoffeature-valuesettings.WhatisVS(H, S)?
![Page 9: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/9.jpg)
ComputingtheVersionSpace
• InitializeVS = H• Foreachtrainingexample x, y ∈ S,
o removeanyh ∈ H suchthath x ≠ y.• OutputVS.
List-then-EliminateAlgorithm
Atahighlevel:ListallhypothesisinH,RemovethemiftheyarenotconsistentwithsomeexampleinS.
Takeaway:Thealgorithmworks,butà Keepingtrackoftheversionspaceistimeandspaceconsuming.à Onlyneedoneconsistentà Buildaconsistenthypothesisdirectly.
![Page 10: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/10.jpg)
DecisionTrees
Firmness
soft
No
crunchy
Color
greenred
Yes Size
largesmall
No Yes
Internalnodes:Testonefeature,e.g.,color.
Branchesfromaninternalnode:Possiblevaluesofthefeaturethat’sbeingtested,e.g.,{red,green}.
Leafnodes:Thelabelofinstanceswhosefeaturescorrespondtotheroot-to-leafpath,e.g.,Yes orNo.
No
medium
![Page 11: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/11.jpg)
Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes
UsingaDecisionTreeLetfunctionf bethefollowingdecisiontree.1. Whatisf((B, Green, Large, Crunchy))?2. Whatisf((B, Red, Large, Soft))?3. Whatisf((A, Green, Small, Crunchy))?
Firmness
soft
No
crunchy
Color
greenred
Yes Size
largesmall
No YesNo
medium
✓
✓
✓
✓
✓
✓
✓
✓
✓
✓
Is f consistentwiththetrainingsetinthetable.
![Page 12: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/12.jpg)
MakingaTree
Example:Whatisthedecisiontreecorrespondingtofunctions1. Iirmness = Crunchy ?2. Iirmness = Crunchy ∧ (color = Red)?3. color = Red ∨ size = Large ?4. Iirmness = Crunchy ∧ color = Red ∨ size = Large ?
![Page 13: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/13.jpg)
Top-DownInductionofDecisionTreesIdea:• Don’tuseList-then-Eliminatetooinefficient.• Insteadgrowagooddecisiontreetostartwith.
àWegrowfromtheroottotheleavesà Repeatedlytakeanexitingleafthatdoesnotincludeadefinitivelabelandreplaceitwithaninternalnode.
• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.
• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat
SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.
TD-IDT(S,y)
![Page 14: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/14.jpg)
GrowaconsistentDT
Apple Farm Color Size Firmness Tasty?#1 A Red Medium Soft No#2 A Green Small Crunchy No#3 A Red Medium Soft No#4 B Red Large Crunchy Yes#5 B Green Large Crunchy Yes#6 B Green Small Crunchy No#7 A Red Small Soft No#8 B Red Small Crunchy Yes#9 B Green Medium Soft No#10 A Red Small Crunchy Yes
• Ifallexamplesin S havethesamelabelyàMakealeafwithlabely.
• Elseà PickfeatureAà ForeachvalueaV ofA,makeachildnodesuchthat
SV = { x, y ∈ S: feature A of x has value aV}àMakeatreewithA asarootandTD-IDT(SV,y)assubtrees.
TD-IDT(S,y)
![Page 15: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/15.jpg)
WhichSplit?HowshouldwepickfeatureA forsplitting.à Howdoesthepredictionimproveafterthesplità Ateachnode,takethenumberofpositiveandnegativeinstances.E.g.(4Yes,2No).
à Ifweweretostop,besttopredictthemajoritylabel.E.g.,Yes.à Howdoweachieveamoredefiniteprediction?
No
Firmness
softcrunchy
Yes
(4Y,6N)
(4Y,2N) (0Y,4N)
Yes
Size
largesmall
No No
(4Y,6N)
(2Y,3N)
medium
(0Y,3N) (2Y,0N)
Yes
Farm
BA
No
(4Y,6N)
(1Y,4N) (3Y,2N)
![Page 16: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/16.jpg)
WhichSplit?
No
Firmness
softcrunchy
Yes
{4Y,6N}
{4Y,2N} {0Y,4N}
Yes
Size
largesmall
No No
(4Y,6N)
(2Y,3N)
medium
(0Y,3N) (2Y,0N)
Yes
Farm
BA
No
(4Y,6N)
(1Y,4N) (3Y,2N)
Differentwaystomeasurethis:• Attributethatminimizesthenumberoferror:
• Beforesplit:Err 𝑆 = size of the minority label.• Aftersplit:Err 𝑆|𝐴 = ∑^_ Err(𝑆a)
• Attributethatminimizesentropy(maximizeInformationGain).• Beforesplit:𝐻 𝑆 = −∑𝑝 𝑥 log(𝑝 𝑥 )• Aftersplit:I 𝑆|𝐴 = ∑^_ 𝑝 𝑆a 𝐻(𝑆a)
![Page 17: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/17.jpg)
Example:TextClassification• Task:LearnrulethatclassifiesReutersBusinessNews
• Class+:“CorporateAcquisitions”• Class-:Otherarticles• 2000traininginstances
• Representation:• Booleanattributes,indicatingpresenceofakeywordinarticle• 9947suchkeywords(moreaccurately,word“stems”)
LAROCHE STARTS BID FOR NECO SHARES
Investor David F. La Roche of North Kingstown, R.I., said he is offering to purchase 170,000 common shares of NECO Enterprises Inc at 26 dlrs each. He said the successful completion of the offer, plus shares he already owns, would give him 50.5 pct of NECO's 962,016 common shares. La Roche said he may buy more, and possible all NECO shares. He said the offer and withdrawal rights will expire at 1630 EST/2130 gmt, March 30, 1987.
SALANT CORP 1ST QTR FEB 28 NET
Oper shr profit seven cts vs loss 12 cts. Oper net profit 216,000 vs loss 401,000. Sales 21.4 mln vs 24.9 mln. NOTE: Current year net excludes 142,000 dlr tax credit. Company operating in Chapter 11 bankruptcy.
+ -
![Page 18: Machine Learning for Intelligent Systems · Machine Learning for Intelligent Systems Instructors: Nika Haghtalab(this time) and Thorsten Joachims Lecture 3: Supervised Learning and](https://reader030.vdocuments.us/reader030/viewer/2022040608/5ec67cc1ae6d260984338118/html5/thumbnails/18.jpg)
DecisionTreefor“CorporateAcq.”• vs=1:-• vs=0:• |export=1:…• |export=0:• ||rate=1:• |||stake=1:+• |||stake=0:• ||||debenture=1:+• ||||debenture=0:• |||||takeover=1:+• |||||takeover=0:• ||||||file=0:-• ||||||file=1:• |||||||share=1:+• |||||||share=0:-…andmanymore
Learnedtree:• has437nodes• isconsistent
Accuracyoflearnedtree:• 11%errorrate
Note:wordstemsexpandedforimprovedreadability.