1 decision trees exampleskyairtemphumiditywindwaterforecastenjoysport...
TRANSCRIPT
1
Decision Trees
Example
Sky AirTemp
Humidity
Wind Water Forecast
EnjoySport
1 Sunny Warm Normal Strong
Warm Same Yes
2 Sunny Warm High Strong
Warm Same Yes
3 Rainy Cold High Strong
Warm Change No
4 Sunny Warm High Strong
Cool Change Yes
5 Cloudy
Warm High Weak Cool Same Yes
6 Cloudy
Cold High Weak Cool Same No
2
Decision Trees
Sky
AirTemp
Sunny Rainy Cloudy
Warm Cold
Yes No
Yes No
(Sky = Sunny) (Sky = Cloudy AirTemp = Warm)
3
Decision TreesSky
AirTemp
Sunny Rainy Cloudy
Warm Cold
Yes No
Yes No
7 Rainy Warm Normal Weak Cool Same ?
8 Cloudy Warm High Strong
Cool Change ?
4
Decision TreesHumidity
Normal
High
Yes Sky
AirTemp
Sunny Rainy Cloudy
Warm Cold
Yes No
Yes No
5
Decision Trees
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
+ + +
A2 = v2
A1 = v1
6
Homogenity of Examples
• Entropy(S) = p+log2p+ p-log2p-
0.5
7
Homogenity of Examples
• Entropy(S) = i=1,c pilog2pi impurity measure
8
Information Gain
• Gain(S, A) = Entropy(S) vValues(A)(|Sv|/|
S|).Entropy(Sv)
A
Sv1 Sv2 ...
9
Example
• Entropy(S) = p+log2p+ p-log2p- = (4/6)log2(4/6) (2/6)log2(2/6)
= 0.389 + 0.528 = 0.917
• Gain(S, Sky)
= Entropy(S) v{Sunny, Rainy, Cloudy}(|Sv|/|S|)Entropy(Sv)
= Entropy(S) [(3/6).Entropy(SSunny) + (1/6).Entropy(SRainy) +
(2/6).Entropy(SCloudy)]
= Entropy(S) (2/6).Entropy(SCloudy)
= Entropy(S) (2/6)[ (1/2)log2(1/2) (1/2)log2(1/2)]
= 0.917 0.333 = 0.584
10
Example
• Entropy(S) = p+log2p+ p-log2p- = (4/6)log2(4/6) (2/6)log2(2/6)
= 0.389 + 0.528 = 0.917
• Gain(S, Water)
= Entropy(S) v{Warm, Cool}(|Sv|/|S|)Entropy(Sv)
= Entropy(S) [(3/6).Entropy(SWarm) + (3/6).Entropy(SCool)]
= Entropy(S) (3/6).2.[ (2/3)log2(2/3) (1/3)log2(1/3)]
= Entropy(S) 0.389 0.528
= 0
11
ExampleSky
?
Sunny Rainy Cloudy
Yes No
• Gain(SCloudy, AirTemp)
= Entropy(SCloudy) v{Warm, Cold}(|Sv|/|S|)Entropy(Sv)
= 1
• Gain(SCloudy, Humidity)
= Entropy(SCloudy) v{Normal, High}(|Sv|/|S|)Entropy(Sv)
= 0
12
Inductive Bias
• Hypothesis space: complete!
13
Inductive Bias
• Hypothesis space: complete!
• Shorter trees are preferred over larger trees
• Prefer the simplest hypothesis that fits the data
14
Inductive Bias
• Decision Tree algorithm: searches incompletely thru a complete hypothesis space.
Preference bias
• Cadidate-Elimination searches completely thru an incomplete hypothesis space.
Restriction bias
15
Overfitting
• hH is said to overfit the training data if there exists h’H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances:
16
Overfitting
• hH is said to overfit the training data if there exists h’H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances:
– There is noise in the data
– The number of training examples is too small to produce a representative sample of the target concept
17
Homework
Exercises 3-13.4 (Chapter 3, ML textbook)