1 decision trees exampleskyairtemphumiditywindwaterforecastenjoysport...

1

Decision Trees

Example

Sky AirTemp

Humidity

Wind Water Forecast

EnjoySport

1 Sunny Warm Normal Strong

Warm Same Yes

2 Sunny Warm High Strong

Warm Same Yes

3 Rainy Cold High Strong

Warm Change No

4 Sunny Warm High Strong

Cool Change Yes

5 Cloudy

Warm High Weak Cool Same Yes

6 Cloudy

Cold High Weak Cool Same No

2

Decision Trees

Sky

AirTemp

Sunny Rainy Cloudy

Warm Cold

Yes No

Yes No

(Sky = Sunny) (Sky = Cloudy AirTemp = Warm)

3

Decision TreesSky

AirTemp

Sunny Rainy Cloudy

Warm Cold

Yes No

Yes No

7 Rainy Warm Normal Weak Cool Same ?

8 Cloudy Warm High Strong

Cool Change ?

4

Decision TreesHumidity

Normal

High

Yes Sky

AirTemp

Sunny Rainy Cloudy

Warm Cold

Yes No

Yes No

5

Decision Trees

+ + +

+ + +

+ + +

+ + +

+ + +

+ + +

+ + +

+ + +

A2 = v2

A1 = v1

6

Homogenity of Examples

• Entropy(S) = p+log2p+ p-log2p-

0.5

7

Homogenity of Examples

• Entropy(S) = i=1,c pilog2pi impurity measure

8

Information Gain

• Gain(S, A) = Entropy(S) vValues(A)(|Sv|/|

S|).Entropy(Sv)

A

Sv1 Sv2 ...

9

Example

• Entropy(S) = p+log2p+ p-log2p- = (4/6)log2(4/6) (2/6)log2(2/6)

= 0.389 + 0.528 = 0.917

• Gain(S, Sky)

= Entropy(S) v{Sunny, Rainy, Cloudy}(|Sv|/|S|)Entropy(Sv)

= Entropy(S) [(3/6).Entropy(SSunny) + (1/6).Entropy(SRainy) +

(2/6).Entropy(SCloudy)]

= Entropy(S) (2/6).Entropy(SCloudy)

= Entropy(S) (2/6)[ (1/2)log2(1/2) (1/2)log2(1/2)]

= 0.917 0.333 = 0.584

10

Example

• Entropy(S) = p+log2p+ p-log2p- = (4/6)log2(4/6) (2/6)log2(2/6)

= 0.389 + 0.528 = 0.917

• Gain(S, Water)

= Entropy(S) v{Warm, Cool}(|Sv|/|S|)Entropy(Sv)

= Entropy(S) [(3/6).Entropy(SWarm) + (3/6).Entropy(SCool)]

= Entropy(S) (3/6).2.[ (2/3)log2(2/3) (1/3)log2(1/3)]

= Entropy(S) 0.389 0.528

= 0

11

ExampleSky

?

Sunny Rainy Cloudy

Yes No

• Gain(SCloudy, AirTemp)

= Entropy(SCloudy) v{Warm, Cold}(|Sv|/|S|)Entropy(Sv)

= 1

• Gain(SCloudy, Humidity)

= Entropy(SCloudy) v{Normal, High}(|Sv|/|S|)Entropy(Sv)

= 0

12

Inductive Bias

• Hypothesis space: complete!

13

Inductive Bias

• Hypothesis space: complete!

• Shorter trees are preferred over larger trees

• Prefer the simplest hypothesis that fits the data

14

Inductive Bias

• Decision Tree algorithm: searches incompletely thru a complete hypothesis space.

Preference bias

• Cadidate-Elimination searches completely thru an incomplete hypothesis space.

Restriction bias

15

Overfitting

• hH is said to overfit the training data if there exists h’H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances:

16

Overfitting

• hH is said to overfit the training data if there exists h’H, such that h has smaller error than h’ over the training examples, but h’ has a smaller error than h over the entire distribution of instances:

– There is noise in the data

– The number of training examples is too small to produce a representative sample of the target concept

17

Homework

Exercises 3-13.4 (Chapter 3, ML textbook)

1 decision trees exampleskyairtemphumiditywindwaterforecastenjoysport...

Documents