decisiontrees1.ppt
TRANSCRIPT
![Page 1: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/1.jpg)
Induction of Decision Trees
![Page 2: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/2.jpg)
Induction of Decision Trees• Data Set (Learning Set)
– Each example = Attributes + Class• Induced description = Decision tree
• TDIDT– Top Down Induction of Decision Trees
• Recursive Partitioning
![Page 3: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/3.jpg)
Example 2# Class
Outlook Temperature Humidity Windy Play1 sunny hot high no N2 sunny hot high yes N3 overcast hot high no P4 rainy moderate high no P5 rainy cold normal no P6 rainy cold normal yes N7 overcast cold normal yes P8 sunny moderate high no N9 sunny cold normal no P10 rainy moderate normal no P11 sunny moderate normal yes P12 overcast moderate high yes P13 overcast hot normal no P14 rainy moderate high yes N
Attribute
![Page 4: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/4.jpg)
Simple TreeOutlook
Humidity WindyP
sunnyovercast
rainy
PN
high normal
PN
yes no
![Page 5: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/5.jpg)
Information-Theoretic Approach
• To classify an object, a certain information is needed– I, information
• After we have learned the value of attribute A, we only need some remaining amount of information to classify the object– Ires, residual information
• Gain– Gain(A) = I – Ires(A)
• The most ‘informative’ attribute is the one that minimizes Ires, i.e., maximizes Gain
![Page 6: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/6.jpg)
Entropy• The average amount of information I needed
to classify an object is given by the entropy measure
• For a two-class problem:entropy
p(c1)
![Page 7: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/7.jpg)
Residual Information• After applying attribute A, S is
partitioned into subsets according to values v of A
• Ires is equal to weighted sum of the amounts of information for the subsets
![Page 8: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/8.jpg)
Triangles and Squares# Shape
Color Outline Dot1 green dashed no triange2 green dashed yes triange3 yellow dashed no square4 red dashed no square5 red solid no square6 red solid yes triange7 green solid no square8 green dashed no triange9 yellow solid yes square10 red solid no square11 green solid yes square12 yellow dashed yes square13 yellow solid no square14 red dashed yes triange
Attribute
![Page 9: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/9.jpg)
Triangles and Squares
..
...
.
# ShapeColor Outline Dot
1 green dashed no triange2 green dashed yes triange3 yellow dashed no square4 red dashed no square5 red solid no square6 red solid yes triange7 green solid no square8 green dashed no triange9 yellow solid yes square10 red solid no square11 green solid yes square12 yellow dashed yes square13 yellow solid no square14 red dashed yes triange
Attribute
Data Set:A set of classified objects
![Page 10: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/10.jpg)
Entropy• 5 triangles• 9 squares• class probabilities
• entropy
..
...
.
![Page 11: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/11.jpg)
Entropyreduction
bydata setpartitioni
ng
..
...
.
..
..
.
.
Color?
red
yellow
green
![Page 12: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/12.jpg)
..
...
...
..
.
.
Color?
red
yellow
green
![Page 13: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/13.jpg)
Info
rmat
ion
Gai
n.
.
...
...
..
.
.
Color?
red
yellow
green
![Page 14: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/14.jpg)
Information Gain of The Attribute
• Attributes– Gain(Color) = 0.246– Gain(Outline) = 0.151– Gain(Dot) = 0.048
• Heuristics: attribute with the highest gain is chosen
• This heuristics is local (local minimization of impurity)
![Page 15: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/15.jpg)
..
...
...
..
.
.
Color?
red
yellowgreen
Gain(Outline) = 0.971 – 0 = 0.971 bitsGain(Dot) = 0.971 – 0.951 = 0.020
bits
![Page 16: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/16.jpg)
..
...
...
..
.
.
Color?
red
yellowgreen
.
.
Outline?dashed
solid
Gain(Outline) = 0.971 – 0.951 = 0.020 bits
Gain(Dot) = 0.971 – 0 = 0.971 bits
![Page 17: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/17.jpg)
..
...
...
..
.
.
Color?
red
yellowgreen
.
.dashed
solid
Dot?no
yes
..
Outline?
![Page 18: DecisionTrees1.ppt](https://reader036.vdocuments.us/reader036/viewer/2022083018/577c80121a28abe054a72d57/html5/thumbnails/18.jpg)
Decision Tree
Color
Dot Outlinesquare
redyellow
green
squaretriangle
yes no
squaretriangle
dashed solid
..
...
.