introduction to machine learning
DESCRIPTION
Introduction to Machine Learning. Learning. Agent has made observations ( data ) Now must make sense of it ( hypotheses ) Hypotheses alone may be important (e.g., in basic science) For inference (e.g., forecasting) To take sensible actions (decision making) - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/1.jpg)
INTRODUCTION TO MACHINE LEARNING
![Page 2: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/2.jpg)
LEARNING Agent has made observations (data) Now must make sense of it (hypotheses)
Hypotheses alone may be important (e.g., in basic science)
For inference (e.g., forecasting) To take sensible actions (decision making)
A basic component of economics, social and hard sciences, engineering, …
![Page 3: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/3.jpg)
LAST TIME Going from observed data to unknown
hypothesis 3 types of statistical learning techniques
Bayesian inference Maximum likelihood Maximum a posterior
Applied to learning: Candy bag example (5 discrete hypotheses) Coin flip probability (infinite hypotheses from 0
to 1)
![Page 4: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/4.jpg)
BAYESIAN VIEW OF LEARNING P(hi|d) = a P(d|hi) P(hi) is the posterior
(Recall, 1/a = P(d) = Si P(d|hi) P(hi)) P(d|hi) is the likelihood P(hi) is the hypothesis prior
h1C: 100%L: 0%
h2C: 75%L: 25%
h3C: 50%L: 50%
h4C: 25%L: 75%
h5C: 0%L: 100%
![Page 5: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/5.jpg)
BAYESIAN VS. MAXIMUM LIKELIHOOD VS MAXIMUM A POSTERIORI Bayesian reasoning requires
thinking about all hypotheses
ML and MAP just try to get the “best”
ML ignores prior information MAP uses it
Smoothes out the estimate for small datasets
All are asymptotically equivalent given large enough datasets
P(X|hML)
P(X|d)
P(X|hMAP)
![Page 6: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/6.jpg)
LEARNING BERNOULLI DISTRIBUTIONS Example data
ML estimates
A B C # obs1 1 1 31 1 0 51 0 1 11 0 0 100 1 1 40 1 0 70 0 1 60 0 0 7
P(C|AB)
A,B q1=3/8A,ØB q2=1/11ØA,B q3=4/11
ØA,ØB q4=6/13
![Page 7: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/7.jpg)
MAXIMUM LIKELIHOOD FOR BN For any BN, the ML parameters of any CPT
can be derived by the fraction of observed values in the data
Alarm
Earthquake Burglar
E 500 B: 200
N=1000
P(E) = 0.5 P(B) = 0.2
A|E,B: 19/20A|B: 188/200A|E: 170/500A| : 1/380
E B P(A|E,B)T T 0.95F T 0.95T F 0.34F F 0.003
![Page 8: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/8.jpg)
MAXIMUM A POSTERIORI WITH BETA PRIORS Example data MAP Estimates
assuming Betaprior with a=b=3
virtual counts 2H,2T
A B C # obs1 1 1 31 1 0 51 0 1 11 0 0 100 1 1 40 1 0 70 0 1 60 0 0 7
P(C|AB)
A,B q1=(3+2)/(8+4)
A,ØB q2=(1+2)/(11+4)
ØA,B q3=(4+2)/(11+4)
ØA,ØB q4=(6+2)/(13+4)
![Page 9: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/9.jpg)
TOPICS IN MACHINE LEARNING
ApplicationsDocument retrievalDocument classificationData miningComputer visionScientific discoveryRobotics…
Tasks & settingsClassificationRankingClusteringRegressionDecision-making
SupervisedUnsupervisedSemi-supervisedActiveReinforcement learning
TechniquesBayesian learningDecision treesNeural networksSupport vector machinesBoostingCase-based reasoning Dimensionality reduction…
![Page 10: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/10.jpg)
WHAT IS LEARNING? Mostly generalization from experience:
“Our experience of the world is specific,
yet we are able to formulate general theories that account for the past and predict the future”M.R. Genesereth and N.J. Nilsson, in Logical Foundations of AI, 1987
Concepts, heuristics, policies Supervised vs. un-supervised learning
![Page 11: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/11.jpg)
INDUCTIVE LEARNINGBasic form: learn a function from
examples f is the unknown target functionAn example is a pair (x, f(x))Problem: find a hypothesis h
such that h ≈ fgiven a training set of examples D
Instance of supervised learningClassification task: f {0,1,…,C} (usually
C=1)Regression task: f reals
![Page 12: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/12.jpg)
INDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training
set (h is consistent if it agrees with f on all
examples) E.g., curve fitting:
![Page 13: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/13.jpg)
INDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training
set (h is consistent if it agrees with f on all
examples) E.g., curve fitting:
![Page 14: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/14.jpg)
INDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training
set (h is consistent if it agrees with f on all
examples) E.g., curve fitting:
![Page 15: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/15.jpg)
INDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training
set (h is consistent if it agrees with f on all
examples) E.g., curve fitting:
![Page 16: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/16.jpg)
INDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training set (h is consistent if it agrees with f on all examples) E.g., curve fitting:
![Page 17: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/17.jpg)
INDUCTIVE LEARNING METHOD Construct/adjust h to agree with f on training
set (h is consistent if it agrees with f on all
examples) E.g., curve fitting: h=D is a trivial, but
perhaps uninteresting solution (caching)
![Page 18: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/18.jpg)
CLASSIFICATION TASK The target function f(x) takes on
values True and False A example is positive if f is True, else it
is negative The set X of all examples is the
example set The training set is a subset of X
a small one!
![Page 19: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/19.jpg)
LOGIC-BASED INDUCTIVE LEARNING Here, examples (x, f(x)) take on discrete
values
![Page 20: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/20.jpg)
LOGIC-BASED INDUCTIVE LEARNING Here, examples (x, f(x)) take on discrete
valuesConcept
Note that the training set does not say whether an observable predicate is pertinent or not
![Page 21: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/21.jpg)
REWARDED CARD EXAMPLE Deck of cards, with each card designated by [r,s],
its rank and suit, and some cards “rewarded” Background knowledge KB:
((r=1) v … v (r=10)) NUM(r)((r=J) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)
Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) ØREWARD([5,H]) ØREWARD([J,S])
![Page 22: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/22.jpg)
REWARDED CARD EXAMPLE Deck of cards, with each card designated by [r,s],
its rank and suit, and some cards “rewarded” Background knowledge KB:
((r=1) v … v (r=10)) NUM(r)((r=J) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)
Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S]) ØREWARD([5,H]) ØREWARD([J,S])
Possible inductive hypothesis:h (NUM(r) BLACK(s) REWARD([r,s]))
There are several possible inductive hypotheses
![Page 23: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/23.jpg)
LEARNING A LOGICAL PREDICATE (CONCEPT CLASSIFIER) Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in
E, that takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … (e.g., NUM,
RED) Training set: values of CONCEPT for some
combinations of values of the observable predicates
![Page 24: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/24.jpg)
LEARNING A LOGICAL PREDICATE (CONCEPT CLASSIFIER) Set E of objects (e.g., cards) Goal predicate CONCEPT(x), where x is an object in
E, that takes the value True or False (e.g., REWARD) Observable predicates A(x), B(X), … (e.g., NUM,
RED) Training set: values of CONCEPT for some
combinations of values of the observable predicates
Find a representation of CONCEPT in the form: CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built with the observable predicates, e.g.: CONCEPT(x) A(x) (ØB(x) v C(x))
![Page 25: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/25.jpg)
HYPOTHESIS SPACE An hypothesis is any sentence of the form:
CONCEPT(x) S(A,B, …)where S(A,B,…) is a sentence built using the observable predicates
The set of all hypotheses is called the hypothesis space H
An hypothesis h agrees with an example if it gives the correct value of CONCEPT
![Page 26: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/26.jpg)
+
++
+
+
+
+
++
+
+
+ --
-
-
-
-
-- -
-
-
-
Example set X{[A, B, …, CONCEPT]}
INDUCTIVE LEARNING SCHEME
Hypothesis space H{[CONCEPT(x) S(A,B, …)]}
Training set DInductive
hypothesis h
![Page 27: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/27.jpg)
SIZE OF HYPOTHESIS SPACE n observable predicates 2n entries in truth table defining
CONCEPT and each entry can be filled with True or False
In the absence of any restriction (bias), there are
hypotheses to choose from n = 6 2x1019 hypotheses!
22n
![Page 28: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/28.jpg)
h1 NUM(r) BLACK(s) REWARD([r,s])h2 BLACK(s) Ø(r=J) REWARD([r,s])h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S])
REWARD([r,s])h4 Ø([r,s]=[5,H]) Ø([r,s]=[J,S]) REWARD([r,s])agree with all the examples in the training set
MULTIPLE INDUCTIVE HYPOTHESES Deck of cards, with each card designated by [r,s], its
rank and suit, and some cards “rewarded” Background knowledge KB:
((r=1) v …v (r=10)) NUM(r)((r=J ) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)
Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S])
ØREWARD([5,H]) ØREWARD([J ,S])
![Page 29: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/29.jpg)
h1 NUM(r) BLACK(s) REWARD([r,s])h2 BLACK(s) Ø(r=J) REWARD([r,s])h3 ([r,s]=[4,C]) ([r,s]=[7,C]) [r,s]=[2,S])
REWARD([r,s])h4 Ø([r,s]=[5,H]) Ø([r,s]=[J,S]) REWARD([r,s])agree with all the examples in the training set
MULTIPLE INDUCTIVE HYPOTHESES Deck of cards, with each card designated by [r,s], its
rank and suit, and some cards “rewarded” Background knowledge KB:
((r=1) v …v (r=10)) NUM(r)((r=J ) v (r=Q) v (r=K)) FACE(r)((s=S) v (s=C)) BLACK(s)((s=D) v (s=H)) RED(s)
Training set D:REWARD([4,C]) REWARD([7,C]) REWARD([2,S])
ØREWARD([5,H]) ØREWARD([J ,S])
Need for a system of preferences – called an inductive bias – to compare possible hypotheses
![Page 30: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/30.jpg)
NOTION OF CAPACITY It refers to the ability of a machine to learn any
training set without error A machine with too much capacity is like a
botanist with photographic memory who, when presented with a new tree, concludes that it is not a tree because it has a different number of leaves from anything he has seen before
A machine with too little capacity is like the botanist’s lazy brother, who declares that if it’s green, it’s a tree
Good generalization can only be achieved when the right balance is struck between the accuracy attained on the training set and the capacity of the machine
![Page 31: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/31.jpg)
KEEP-IT-SIMPLE (KIS) BIAS Examples
• Use much fewer observable predicates than the training set
• Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax
Motivation• If an hypothesis is too complex it is not worth
learning it (data caching does the job as well)• There are much fewer simple hypotheses than
complex ones, hence the hypothesis space is smaller
![Page 32: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/32.jpg)
KEEP-IT-SIMPLE (KIS) BIAS Examples
• Use much fewer observable predicates than the training set
• Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax
Motivation• If an hypothesis is too complex it is not worth
learning it (data caching does the job as well)• There are much fewer simple hypotheses than
complex ones, hence the hypothesis space is smaller
Einstein: “A theory must be as simple as possible, but not simpler than this”
![Page 33: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/33.jpg)
KEEP-IT-SIMPLE (KIS) BIAS Examples
• Use much fewer observable predicates than the training set
• Constrain the learnt predicate, e.g., to use only “high-level” observable predicates such as NUM, FACE, BLACK, and RED and/or to have simple syntax
Motivation• If an hypothesis is too complex it is not worth
learning it (data caching does the job as well)• There are much fewer simple hypotheses than
complex ones, hence the hypothesis space is smaller
If the bias allows only sentences S that areconjunctions of k << n predicates picked fromthe n observable predicates, then the size of H is O(nk)
![Page 34: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/34.jpg)
CAPACITY IS NOT THE ONLY CRITERION Accuracy on training set isn’t the best
measure of performance
+
++
+
+
+
+
++
+
+
+ --
-
-
-
-
-- -
-
-
-Learn
Test
Example set X Hypothesis space H
Training set D
![Page 35: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/35.jpg)
GENERALIZATION ERROR A hypothesis h is said to generalize well if it
achieves low error on all examples in X
+
++
+
+
+
+
++
+
+
+ --
-
-
-
-
-- -
-
-
-
Learn
Test
Example set X Hypothesis space H
![Page 36: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/36.jpg)
ASSESSING PERFORMANCE OF A LEARNING ALGORITHM Samples from X are typically unavailable Take out some of the training set
Train on the remaining training set Test on the excluded instances Cross-validation
![Page 37: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/37.jpg)
CROSS-VALIDATION Split original set of examples, train
+
+
+
+
++
+
-
-
-
--
-
+
+
+
+
+
-
-
-
--
-Hypothesis space H
Train
Examples D
![Page 38: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/38.jpg)
CROSS-VALIDATION Evaluate hypothesis on testing set
+
+
+
+
++
+
-
-
-
--
-
Hypothesis space H
Testing set
![Page 39: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/39.jpg)
CROSS-VALIDATION Evaluate hypothesis on testing set
Hypothesis space H
Testing set
++
++
+
--
-
-
-
-
++
Test
![Page 40: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/40.jpg)
CROSS-VALIDATION Compare true concept against prediction
+
+
+
+
++
+
-
-
-
--
-
Hypothesis space H
Testing set
++
++
+
--
-
-
-
-
++
9/13 correct
![Page 41: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/41.jpg)
TENNIS EXAMPLE Evaluate learning algorithm
PlayTennis = S(Temperature,Wind)
![Page 42: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/42.jpg)
TENNIS EXAMPLE Evaluate learning algorithm
PlayTennis = S(Temperature,Wind)
Trained hypothesis
PlayTennis =(T=Mild or Cool) (W=Weak)Training errors = 3/10Testing errors = 4/4
![Page 43: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/43.jpg)
TENNIS EXAMPLE Evaluate learning algorithm
PlayTennis = S(Temperature,Wind)
Trained hypothesis
PlayTennis = (T=Mild or Cool)Training errors = 3/10Testing errors = 1/4
![Page 44: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/44.jpg)
TENNIS EXAMPLE Evaluate learning algorithm
PlayTennis = S(Temperature,Wind)
Trained hypothesis
PlayTennis = (T=Mild or Cool)Training errors = 3/10Testing errors = 2/4
![Page 45: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/45.jpg)
TEN COMMANDMENTS OF MACHINE LEARNING Thou shalt not:
Train on examples in the testing set Form assumptions by “peeking” at the testing
set, then formulating inductive bias
![Page 46: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/46.jpg)
SUPERVISED LEARNING FLOW CHART
Training set
TargetfunctionDatapoints
InductiveHypothesis
Prediction
Learner
Hypothesisspace
Choice of learning algorithm
Unknown concept we want to approximate
Observations we have seen
Test set
Observations we will see in the future
Better quantities to assess performance
![Page 47: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/47.jpg)
HOW TO CONSTRUCT A BETTER LEARNER? Ideas?
![Page 48: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/48.jpg)
PREDICATE AS A DECISION TREEThe predicate CONCEPT(x) A(x) (ØB(x) v C(x)) can be represented by the following decision tree:
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
Example:A mushroom is poisonous iffit is yellow and small, or yellow, big and spotted• x is a mushroom• CONCEPT = POISONOUS• A = YELLOW• B = BIG• C = SPOTTED
![Page 49: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/49.jpg)
PREDICATE AS A DECISION TREEThe predicate CONCEPT(x) A(x) (ØB(x) v C(x)) can be represented by the following decision tree:
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
Example:A mushroom is poisonous iffit is yellow and small, or yellow, big and spotted• x is a mushroom• CONCEPT = POISONOUS• A = YELLOW• B = BIG• C = SPOTTED• D = FUNNEL-CAP• E = BULKY
![Page 50: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/50.jpg)
TRAINING SETEx. # A B C D E CONCEP
T1 False False True False True False2 False True False False False False3 False True True True True False4 False False True False False False5 False False False True True False6 True False True False False True7 True False False True False True8 True False True False True True9 True True True False True True10 True True True True True True11 True True False False False False12 True True False False True False13 True False True True True True
![Page 51: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/51.jpg)
TrueTrueTrueTrueFalseTrue13FalseTrueFalseFalseTrueTrue12FalseFalseFalseFalseTrueTrue11TrueTrueTrueTrueTrueTrue10TrueTrueFalseTrueTrueTrue9TrueTrueFalseTrueFalseTrue8TrueFalseTrueFalseFalseTrue7TrueFalseFalseTrueFalseTrue6FalseTrueTrueFalseFalseFalse5FalseFalseFalseTrueFalseFalse4FalseTrueTrueTrueTrueFalse3FalseFalseFalseFalseTrueFalse2FalseTrueFalseTrueFalseFalse1CONCEPTEDCBAEx. #
POSSIBLE DECISION TREED
CE
B
E
AA
A
T
F
F
FF
F
T
T
T
TT
![Page 52: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/52.jpg)
POSSIBLE DECISION TREED
CE
B
E
AA
A
T
F
F
FF
F
T
T
T
TT
CONCEPT (D(ØEvA))v(ØD(C(Bv(ØB((EØA)v(ØEA))))))
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
CONCEPT A (ØB v C)
![Page 53: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/53.jpg)
POSSIBLE DECISION TREED
CE
B
E
AA
A
T
F
F
FF
F
T
T
T
TT
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
CONCEPT A (ØB v C)
KIS bias Build smallest decision tree
Computationally intractable problem greedy algorithm
CONCEPT (D(ØEvA))v(ØD(C(Bv(ØB((EØA)v(ØEA))))))
![Page 54: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/54.jpg)
GETTING STARTED:TOP-DOWN INDUCTION OF DECISION TREE
Ex. # A B C D E CONCEPT
1 False False True False True False
2 False True False False False False
3 False True True True True False
4 False False True False False False
5 False False False True True False
6 True False True False False True
7 True False False True False True
8 True False True False True True
9 True True True False True True
10 True True True True True True
11 True True False False False False
12 True True False False True False
13 True False True True True True
True: 6, 7, 8, 9, 10,13False: 1, 2, 3, 4, 5, 11, 12
The distribution of training set is:
![Page 55: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/55.jpg)
GETTING STARTED: TOP-DOWN INDUCTION OF DECISION TREE
True: 6, 7, 8, 9, 10,13False: 1, 2, 3, 4, 5, 11, 12
The distribution of training set is:
Without testing any observable predicate, wecould report that CONCEPT is False (majority rule) with an estimated probability of error P(E) = 6/13
Assuming that we will only include one observable predicate in the decision tree, which predicateshould we test to minimize the probability of error (i.e., the # of misclassified examples in the training set)? Greedy algorithm
![Page 56: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/56.jpg)
ASSUME IT’S AA
True:False:
6, 7, 8, 9, 10, 1311, 12 1, 2, 3, 4, 5
T F
If we test only A, we will report that CONCEPT is Trueif A is True (majority rule) and False otherwise
The number of misclassified examples from the training set is 2
![Page 57: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/57.jpg)
ASSUME IT’S BB
True:False:
9, 102, 3, 11, 12 1, 4, 5
T F
If we test only B, we will report that CONCEPT is Falseif B is True and True otherwise
The number of misclassified examples from the training set is 5
6, 7, 8, 13
![Page 58: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/58.jpg)
ASSUME IT’S CC
True:False:
6, 8, 9, 10, 131, 3, 4 1, 5, 11, 12
T F
If we test only C, we will report that CONCEPT is Trueif C is True and False otherwise
The number of misclassified examples from the training set is 4
7
![Page 59: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/59.jpg)
ASSUME IT’S DD
T F
If we test only D, we will report that CONCEPT is Trueif D is True and False otherwise
The number of misclassified examples from the training set is 5
True:False:
7, 10, 133, 5 1, 2, 4, 11, 12
6, 8, 9
![Page 60: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/60.jpg)
ASSUME IT’S EE
True:False:
8, 9, 10, 131, 3, 5, 12 2, 4, 11
T F
If we test only E we will report that CONCEPT is False,independent of the outcome
The number of misclassified examples from the training set is 6
6, 7
![Page 61: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/61.jpg)
ASSUME IT’S EE
True:False:
8, 9, 10, 131, 3, 5, 12 2, 4, 11
T F
If we test only E we will report that CONCEPT is False,independent of the outcome
The number of misclassified examples from the training set is 6
6, 7
So, the best predicate to test is A
![Page 62: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/62.jpg)
CHOICE OF SECOND PREDICATE
AT F
C
True:False:
6, 8, 9, 10, 1311, 127
T FFalse
The number of misclassified examples from the
training set is 1
![Page 63: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/63.jpg)
CHOICE OF THIRD PREDICATE
CT F
B
True:False: 11,12
7
T F
AT F
False
True
![Page 64: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/64.jpg)
FINAL TREEA
CTrue
True
True BTrue
TrueFalse
False
FalseFalse
False
CONCEPT A (C v ØB) CONCEPT A (ØB v C)
A?
B?
C?True
True
True
True
FalseTrue
False
FalseFalse
False
![Page 65: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/65.jpg)
TOP-DOWNINDUCTION OF A DT
DTL(D, Predicates)1. If all examples in D are positive then return True2. If all examples in D are negative then return False3. If Predicates is empty then return failure4. A error-minimizing predicate in Predicates5. Return the tree whose:
- root is A, - left branch is DTL(D+A,Predicates-A), - right branch is DTL(D-A,Predicates-A)
A
CTrue
True
TrueB
True
TrueFalse
False
FalseFalse
False
Subset of examples that satisfy A
![Page 66: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/66.jpg)
TOP-DOWNINDUCTION OF A DT
DTL(D, Predicates)1. If all examples in D are positive then return True2. If all examples in D are negative then return False3. If Predicates is empty then return failure4. A error-minimizing predicate in Predicates5. Return the tree whose:
- root is A, - left branch is DTL(D+A,Predicates-A), - right branch is DTL(D-A,Predicates-A)
A
CTrue
True
TrueB
True
TrueFalse
False
FalseFalse
False
Noise in training set!May return majority rule,
instead of failure
![Page 67: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/67.jpg)
COMMENTS Widely used algorithm Greedy Robust to noise (incorrect examples) Not incremental
![Page 68: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/68.jpg)
MISCELLANEOUS ISSUES Assessing performance:
Training set and test set Learning curve
size of training set
% c
orre
ct o
n te
st s
et 100
Typical learning curve
![Page 69: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/69.jpg)
MISCELLANEOUS ISSUES Assessing performance:
Training set and test set Learning curve
size of training set
% c
orre
ct o
n te
st s
et 100
Typical learning curve
Some concepts are unrealizable within a machine’s capacity
![Page 70: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/70.jpg)
MISCELLANEOUS ISSUES Assessing performance:
Training set and test set Learning curve
Overfitting Risk of using irrelevantobservable predicates togenerate an hypothesis
that agrees with all examples
in the training set
size of training set
% c
orre
ct o
n te
st s
et
100
Typical learning curve
![Page 71: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/71.jpg)
MISCELLANEOUS ISSUES Assessing performance:
Training set and test set Learning curve
Overfitting Tree pruning
Risk of using irrelevantobservable predicates togenerate an hypothesis
that agrees with all examples
in the training set
Terminate recursion when# errors / information gain
is small
![Page 72: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/72.jpg)
MISCELLANEOUS ISSUES Assessing performance:
Training set and test set Learning curve
Overfitting Tree pruning
Terminate recursion when# errors / information gain
is small
Risk of using irrelevantobservable predicates togenerate an hypothesis
that agrees with all examples
in the training setThe resulting decision tree + majority rule may not classify correctly all examples in the training set
![Page 73: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/73.jpg)
MISCELLANEOUS ISSUES Assessing performance:
Training set and test set Learning curve
Overfitting Tree pruning
Incorrect examples Missing data Multi-valued and continuous attributes
![Page 74: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/74.jpg)
CONTINUOUS ATTRIBUTES Continuous attributes can be converted into
logical ones via thresholds X => X<a
When considering splitting on X, pick the threshold a to minimize # of errors
7 7 6 5 6 5 4 5 4 3 4 5 4 5 6 7
![Page 75: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/75.jpg)
LEARNABLE CONCEPTSSome simple concepts cannot be
represented compactly in DTsParity(x) = X1 xor X2 xor … xor XnMajority(x) = 1 if most of Xi’s are 1, 0
otherwiseExponential size in # of attributesNeed exponential # of examples to
learn exactlyThe ease of learning is dependent on
shrewdly (or luckily) chosen attributes that correlate with CONCEPT
![Page 76: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/76.jpg)
APPLICATIONS OF DECISION TREE Medical diagnostic / Drug design Evaluation of geological systems for
assessing gas and oil basins Early detection of problems (e.g., jamming)
during oil drilling operations Automatic generation of rules in expert
systems
![Page 77: Introduction to Machine Learning](https://reader036.vdocuments.us/reader036/viewer/2022062501/56815c36550346895dca1d92/html5/thumbnails/77.jpg)
HUMAN-READABILITY DTs also have the advantage of being easily
understood by humans Legal requirement in many areas
Loans & mortgages Health insurance Welfare