decisiontrees1.ppt

18
Induction of Decision Trees

Upload: aarthy

Post on 11-Jul-2016

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: DecisionTrees1.ppt

Induction of Decision Trees

Page 2: DecisionTrees1.ppt

Induction of Decision Trees• Data Set (Learning Set)

– Each example = Attributes + Class• Induced description = Decision tree

• TDIDT– Top Down Induction of Decision Trees

• Recursive Partitioning

Page 3: DecisionTrees1.ppt

Example 2# Class

Outlook Temperature Humidity Windy Play1 sunny hot high no N2 sunny hot high yes N3 overcast hot high no P4 rainy moderate high no P5 rainy cold normal no P6 rainy cold normal yes N7 overcast cold normal yes P8 sunny moderate high no N9 sunny cold normal no P10 rainy moderate normal no P11 sunny moderate normal yes P12 overcast moderate high yes P13 overcast hot normal no P14 rainy moderate high yes N

Attribute

Page 4: DecisionTrees1.ppt

Simple TreeOutlook

Humidity WindyP

sunnyovercast

rainy

PN

high normal

PN

yes no

Page 5: DecisionTrees1.ppt

Information-Theoretic Approach

• To classify an object, a certain information is needed– I, information

• After we have learned the value of attribute A, we only need some remaining amount of information to classify the object– Ires, residual information

• Gain– Gain(A) = I – Ires(A)

• The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain

Page 6: DecisionTrees1.ppt

Entropy• The average amount of information I needed

to classify an object is given by the entropy measure

• For a two-class problem:entropy

p(c1)

Page 7: DecisionTrees1.ppt

Residual Information• After applying attribute A, S is

partitioned into subsets according to values v of A

• Ires is equal to weighted sum of the amounts of information for the subsets

Page 8: DecisionTrees1.ppt

Triangles and Squares# Shape

Color Outline Dot1 green dashed no triange2 green dashed yes triange3 yellow dashed no square4 red dashed no square5 red solid no square6 red solid yes triange7 green solid no square8 green dashed no triange9 yellow solid yes square10 red solid no square11 green solid yes square12 yellow dashed yes square13 yellow solid no square14 red dashed yes triange

Attribute

Page 9: DecisionTrees1.ppt

Triangles and Squares

..

...

.

# ShapeColor Outline Dot

1 green dashed no triange2 green dashed yes triange3 yellow dashed no square4 red dashed no square5 red solid no square6 red solid yes triange7 green solid no square8 green dashed no triange9 yellow solid yes square10 red solid no square11 green solid yes square12 yellow dashed yes square13 yellow solid no square14 red dashed yes triange

Attribute

Data Set:A set of classified objects

Page 10: DecisionTrees1.ppt

Entropy• 5 triangles• 9 squares• class probabilities

• entropy

..

...

.

Page 11: DecisionTrees1.ppt

Entropyreduction

bydata setpartitioni

ng

..

...

.

..

..

.

.

Color?

red

yellow

green

Page 12: DecisionTrees1.ppt

..

...

...

..

.

.

Color?

red

yellow

green

Page 13: DecisionTrees1.ppt

Info

rmat

ion

Gai

n.

.

...

...

..

.

.

Color?

red

yellow

green

Page 14: DecisionTrees1.ppt

Information Gain of The Attribute

• Attributes– Gain(Color) = 0.246– Gain(Outline) = 0.151– Gain(Dot) = 0.048

• Heuristics: attribute with the highest gain is chosen

• This heuristics is local (local minimization of impurity)

Page 15: DecisionTrees1.ppt

..

...

...

..

.

.

Color?

red

yellowgreen

Gain(Outline) = 0.971 – 0 = 0.971 bitsGain(Dot) = 0.971 – 0.951 = 0.020

bits

Page 16: DecisionTrees1.ppt

..

...

...

..

.

.

Color?

red

yellowgreen

.

.

Outline?dashed

solid

Gain(Outline) = 0.971 – 0.951 = 0.020 bits

Gain(Dot) = 0.971 – 0 = 0.971 bits

Page 17: DecisionTrees1.ppt

..

...

...

..

.

.

Color?

red

yellowgreen

.

.dashed

solid

Dot?no

yes

..

Outline?

Page 18: DecisionTrees1.ppt

Decision Tree

Color

Dot Outlinesquare

redyellow

green

squaretriangle

yes no

squaretriangle

dashed solid

..

...

.