machine learning

28
Machine Learning Lecture 10 Decision Trees G53MLE Machine Learning Dr Guoping Qiu 1

Upload: pierce

Post on 25-Feb-2016

32 views

Category:

Documents


0 download

DESCRIPTION

Machine Learning. Lecture 10 Decision Trees. Trees. Node Root Leaf Branch Path Depth. Decision Trees. A hierarchical data structure that represents data by implementing a divide and conquer strategy Can be used as a non-parametric classification method . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Machine Learning

Machine Learning

Lecture 10

Decision Trees

G53MLE Machine LearningDr Guoping Qiu 1

Page 2: Machine Learning

TreesNodeRootLeafBranchPathDepth

2

Page 3: Machine Learning

Decision Trees A hierarchical data structure that represents data by implementing a divide

and conquer strategy Can be used as a non-parametric classification method. Given a collection of examples, learn a decision tree that represents it. Use this representation to classify new examples

3

Page 4: Machine Learning

Decision Trees Each node is associated with a feature (one of the elements of a feature

vector that represent an object); Each node test the value of its associated feature; There is one branch for each value of the feature Leaves specify the categories (classes) Can categorize instances into multiple disjoint categories – multi-class

4

YesHumidity

No Yes

Wind

No Yes

Outlook

RainSunny Overcast

High WeakStrongNormal

Page 5: Machine Learning

Decision Trees Play Tennis Example Feature Vector = (Outlook, Temperature, Humidity, Wind)

5

YesHumidity

No Yes

Wind

No Yes

Outlook

RainSunny Overcast

High WeakStrongNormal

Page 6: Machine Learning

Decision Trees

6

YesHumidity

No Yes

Wind

No Yes

Outlook

RainSunny Overcast

High WeakStrongNormal

Node associated with a feature

Node associated with a feature

Node associated with a feature

Page 7: Machine Learning

Decision Trees Play Tennis Example Feature values:

Outlook = (sunny, overcast, rain) Temperature =(hot, mild, cool) Humidity = (high, normal) Wind =(strong, weak)

7

Page 8: Machine Learning

Decision Trees Outlook = (sunny, overcast, rain)

8

YesHumidity

No Yes

Wind

No Yes

Outlook

RainSunny Overcast

High WeakStrongNormal

One branch for each value

One branch for each value

One branch for each value

Page 9: Machine Learning

Decision Trees Class = (Yes, No)

9

YesHumidity

No Yes

Wind

No Yes

Outlook

RainSunny Overcast

High WeakStrongNormal

Leaf nodes specify classes

Leaf nodes specify classes

Page 10: Machine Learning

Decision Trees Design Decision Tree Classifier

Picking the root node Recursively branching

10

Page 11: Machine Learning

Decision TreesPicking the root node

Consider data with two Boolean attributes (A,B) and two classes + and –

{ (A=0,B=0), - }: 50 examples{ (A=0,B=1), - }: 50 examples{ (A=1,B=0), - }: 3 examples{ (A=1,B=1), + }: 100 examples

11

Page 12: Machine Learning

Decision TreesPicking the root node

Trees looks structurally similar; which attribute should we choose?

12

B

-

01

A

+ -

01

A

-

01

B

+ -

01

Page 13: Machine Learning

Decision TreesPicking the root node

The goal is to have the resulting decision tree as small as possible (Occam’s Razor)

The main decision in the algorithm is the selection of the next attribute to condition on (start from the root node).

We want attributes that split the examples to sets that are relatively pure in one label; this way we are closer to a leaf node.

The most popular heuristics is based on information gain, originated with the ID3 system of Quinlan.

13

Page 14: Machine Learning

Entropy

S is a sample of training examples

p+ is the proportion of positive examples in S

p- is the proportion of negative examples in S

Entropy measures the impurity of S

14

ppppSEntropy loglog

p+

Page 15: Machine Learning

15

+ - - + + + - - + - + - + + - - + + + - - + - + - - + - - + - + - - + - + - + + - - + + - - - + - + - + + - - + + + - - + - + - + + - - + - +

- - + + + - + - + + - + - + + + - - + - + - - + - +

- - + - + - + - - - + - - - - + - - + - -

-

+ + + + + + + +

- - - - - - - - - - - -

+ + + + + + + + + + + + + +

- - + - + - + - + + + - - - - - -

- - - - - -

+ + + + + +

- - - - -

Highly Disorganized

High Entropy

Much Information Required

Highly Organized

Low Entropy

Little Information Required

Page 16: Machine Learning

Information Gain

Gain (S, A) = expected reduction in entropy due to sorting on A

Values (A) is the set of all possible values for attribute A, Sv is the subset of S which attribute A has value v, |S| and | Sv | represent the number of samples in set S and set Sv respectively

Gain(S,A) is the expected reduction in entropy caused by knowing the value of attribute A.

16

)(

)()(),(AValuesv

vv SEntropySS

SEntropyASGain

Page 17: Machine Learning

Information Gain

Example: Choose A or B ?

17

A 01

100 +3 -

100 -

B 01

100 +50-

53-

Split on A Split on B

Page 18: Machine Learning

Example

Play Tennis Example

18

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

0.94

)145log(14

5

)149log(14

9Entropy(S)

Page 19: Machine Learning

Example

19

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Humidity

High Normal3+,4- 6+,1-

E=.985 E=.592

)(

)()(),(AValuesv

vv SEntropySS

SEntropyASGain

Gain(S, Humidity) = .94 - 7/14 * 0.985 - 7/14 *.592 = 0.151

Page 20: Machine Learning

Example

20

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

)(

)()(),(AValuesv

vv SEntropySS

SEntropyASGain

Wind

Weak Strong6+2- 3+,3-E=.811 E=1.0

Gain(S, Wind) = .94 - 8/14 * 0.811 - 6/14 * 1.0 = 0.048

Page 21: Machine Learning

Example

21

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

)(

)()(),(AValuesv

vv SEntropySS

SEntropyASGain

Outlook

Overcast Rain3,7,12,13 4,5,6,10,14

3+,2-

Sunny1,2,8,9,11

4+,0-2+,3-0.00.970 0.970

Gain(S, Outlook) = 0.246

Page 22: Machine Learning

Example

Pick Outlook as the root

22

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Outlook

Overcast RainSunny Gain(S, Wind) = 0.048

Gain(S, Humidity) = 0.151

Gain(S, Temperature) = 0.029

Gain(S, Outlook) = 0.246

Page 23: Machine Learning

Example

Pick Outlook as the root

23

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Outlook

3,7,12,134,5,6,10,14

3+,2-1,2,8,9,11

4+,0-2+,3-

Yes

?

Continue until: Every attribute is included in path, or, all examples in the leaf have same label

Overcast RainSunny

?

Page 24: Machine Learning

Example

24

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Outlook

3,7,12,131,2,8,9,11

4+,0-2+,3-

Yes

?

Overcast RainSunny

Gain (Ssunny, Humidity) = .97-(3/5) * 0-(2/5) * 0 = .97Gain (Ssunny, Temp) = .97- 0-(2/5) *1 = .57Gain (Ssunny, Wind) = .97-(2/5) *1 - (3/5) *.92 = .02

1 Sunny Hot High Weak No2 Sunny Hot High Strong No8 Sunny Mild High Weak No9 Sunny Cool Normal Weak Yes11 Sunny Mild Normal Strong Yes

Page 25: Machine Learning

Example

25

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Outlook

YesHumidity

Overcast RainSunny

Gain (Ssunny, Humidity) = .97-(3/5) * 0-(2/5) * 0 = .97Gain (Ssunny, Temp) = .97- 0-(2/5) *1 = .57Gain (Ssunny, Wind) = .97-(2/5) *1 - (3/5) *.92 = .02

No Yes

NormalHigh

Page 26: Machine Learning

Example

26

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Outlook

YesHumidity

Overcast RainSunny

Gain (Srain, Humidity) =Gain (Srain, Temp) =Gain (Srain, Wind) =

No Yes

NormalHigh4,5,6,10,14

3+,2-

?

Page 27: Machine Learning

Example

27

Day Outlook Temperature Humidity Wind Play

Tennis

Day1 Sunny Hot High Weak No Day2 Sunny Hot High Strong No

Day3 Overcast Hot High Weak Yes

Day4 Rain Mild High Weak Yes

Day5 Rain Cool Normal Weak Yes

Day6 Rain Cool Normal Strong No

Day7 Overcast Cool Normal Strong Yes

Day8 Sunny Mild High Weak No

Day9 Sunny Cool Normal Weak Yes

Day10 Rain Mild Normal Weak Yes

Day11 Sunny Mild Normal Strong Yes

Day12 Overcast Mild High Strong Yes

Day13 Overcast Hot Normal Weak Yes

Day14 Rain Mild High Strong No

Outlook

YesHumidity

Overcast RainSunny

No Yes

NormalHigh

Wind

No Yes

WeakStrong

Page 28: Machine Learning

Tutorial/Exercise Questions

G53MLE Machine Learning Dr Guoping Qiu 28

An experiment has produced the following 3d feature vectors X = (x1, x2, x3) belonging to two classes. Design a decision tree classifier to class an unknown feature vector X = (1, 2, 1).

X = (x1, x2, x3)x1 x2 x3 Classes

1 1 1 11 1 1 21 1 1 12 1 1 22 1 2 12 2 2 22 2 2 12 2 1 21 2 2 21 1 2 1

1 2 1 = ?