decision trees. 2 outline what is a decision tree ? how to construct a decision tree ? what are...

41
Decision Trees

Upload: dale-davidson

Post on 13-Jan-2016

222 views

Category:

Documents


3 download

TRANSCRIPT

Page 1: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

Decision Trees

Page 2: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

2

Outline What is a decision tree ? How to construct a decision tree ?

What are the major steps in decision tree induction ?

How to select the attribute to split the node ?

What are the other issues ?

Page 3: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

3

Classification by Decision Tree Induction Decision tree

A flow-chart-like tree structure Internal node denotes a test on an

attribute Branch represents an outcome of test Leaf nodes represent class labels or class

distribution Age?

Student? Credit?

fairexcellent

>40

31…40

30

NO YES

no yes

NO YES

YES

Page 4: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

4

Training Dataset

Age Income

Student

Credit Buys_computer

P1 <=30

high no fair no

P2 <=30

high no excellent

no

P3 31…40

high no fair yes

P4 >40 medium

no fair yes

P5 >40 low yes fair yes

P6 >40 low yes excellent

no

P7 31…40

low yes excellent

yes

P8 <=30

medium

no fair no

P9 <=30

low yes fair yes

P10 >40 medium

yes fair yes

P11 <=30

medium

yes excellent

yes

P12 31…40

medium

no excellent

yes

P13 31…40

high yes fair yes

P14 >40 medium

no excellent

no

Page 5: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

5

Output: A Decision Tree for “buy_computer”

Age?

Student? Credit?

fairexcellent

>4031…40<=30

NO YES

no yes

NO YES

YES

Page 6: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

6

Outline What is a decision tree ? How to construct a decision tree ?

What are the major steps in decision tree induction ?

How to select the attribute to split the node ?

What are the other issues ?

Page 7: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

7

Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm) Attributes are categorical (if continuous-

valued, they are discretized in advance) Tree is constructed in a top-down recursive

divide-and-conquer manner At start, all training examples are at the

root Test attributes are selected on basis of a

heuristic or statistical measure (e.g., information gain)

Examples are partitioned recursively based on selected attributes

Page 8: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

8

Training Dataset

Age Income

Student

Credit Buys_computer

P1 <=30

high no fair no

P2 <=30

high no excellent

no

P3 31…40

high no fair yes

P4 >40 medium

no fair yes

P5 >40 low yes fair yes

P6 >40 low yes excellent

no

P7 31…40

low yes excellent

yes

P8 <=30

medium

no fair no

P9 <=30

low yes fair yes

P10 >40 medium

yes fair yes

P11 <=30

medium

yes excellent

yes

P12 31…40

medium

no excellent

yes

P13 31…40

high yes fair yes

P14 >40 medium

no excellent

no

Page 9: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

9

Construction of A Decision Tree for “buy_computer”

? [P1,…P14]

Yes: 9, No:5

Page 10: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

10

Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm) Attributes are categorical (if continuous-

valued, they are discretized in advance) Tree is constructed in a top-down recursive

divide-and-conquer manner At start, all training examples are at the

root Test attributes are selected on basis of a

heuristic or statistical measure (e.g., information gain)

Examples are partitioned recursively based on selected attributes

Page 11: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

11

Construction of A Decision Tree for “buy_computer”

Age?

>4031…40<=30

[P1,…P14]Yes: 9,

No:5

Page 12: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

12

Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm) Attributes are categorical (if continuous-

valued, they are discretized in advance) Tree is constructed in a top-down recursive

divide-and-conquer manner At start, all training examples are at the

root Test attributes are selected on basis of a

heuristic or statistical measure (e.g., information gain)

Examples are partitioned recursively based on selected attributes

Page 13: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

13

Training Dataset

Age Income

Student

Credit Buys_computer

P1 <=30

high no fair no

P2 <=30

high no excellent

no

P3 31…40

high no fair yes

P4 >40 medium

no fair yes

P5 >40 low yes fair yes

P6 >40 low yes excellent

no

P7 31…40

low yes excellent

yes

P8 <=30

medium

no fair no

P9 <=30

low yes fair yes

P10 >40 medium

yes fair yes

P11 <=30

medium

yes excellent

yes

P12 31…40

medium

no excellent

yes

P13 31…40

high yes fair yes

P14 >40 medium

no excellent

no

Page 14: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

14

Construction of A Decision Tree for “buy_computer”

Age?

>4031…40<=30

[P1,…P14]Yes: 9,

No:5

[P1,P2,P8,P9,P11]

Yes: 2, No:3

[P3,P7,P12,P13]Yes: 4, No:0

[P4,P5,P6,P10,P14]

Yes: 3, No:2YES ? ?

Page 15: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

15

Construction of A Decision Tree for “buy_computer”

Age?

>4030…40<=30

[P1,…P14]Yes: 9,

No:5

[P1,P2,P8,P9,P11]

Yes: 2, No:3

[P3,P7,P12,P13]Yes: 4, No:0

[P4,P5,P6,P10,P14]

Yes: 3, No:2Student?

no yes

YES ?

[P1,P2,P8]

Yes: 0, No:3

[P9,P11]Yes: 2,

No:0NO YES

Page 16: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

16

Construction of A Decision Tree for “buy_computer”

Age?

>4030…40<=30

[P1,…P14]Yes: 9,

No:5

[P1,P2,P8,P9,P11]

Yes: 2, No:3

[P3,P7,P12,P13]Yes: 4, No:0

[P4,P5,P6,P10,P14]

Yes: 3, No:2Student?

no yes

YES

[P1,P2,P8]

Yes: 0, No:3

[P9,P11]Yes: 2,

No:0

Credit?

fairexcellent

NO YES NO YES

[P6,P14]Yes: 0,

No:2

[P4,P5,P10]

Yes: 3, No:0

Page 17: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

17

Algorithm for Decision Tree Induction

Basic algorithm (a greedy algorithm) Attributes are categorical (if continuous-

valued, they are discretized in advance) Tree is constructed in a top-down recursive

divide-and-conquer manner At start, all training examples are at the

root Test attributes are selected on basis of a

heuristic or statistical measure (e.g., information gain)

Examples are partitioned recursively based on selected attributes

Page 18: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

18

Outline What is a decision tree ? How to construct a decision tree ?

What are the major steps in decision tree induction ?

How to select the attribute to split the node ?

What are the other issues ?

Page 19: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

19

Which Attribute is the Best?

The attribute most useful for classifying examples

Information gain An information-theoretic approach Measure how well an attribute separates

the training examples Use the attribute with the highest

information gain to split Minimize the expected number of tests

needed to classify a new tuple

How useful?

How well separated?

How pure splitting result?

Information gain

Page 20: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

20

Choosing an attribute Idea: a good attribute splits the examples into subsets

that are (ideally) "all positive" or "all negative"

Patrons? is a better choice

Page 21: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

21

Information theory

If there are n equally probable possible messages, then the probability p of each is 1/n

Information conveyed by a message is -log(p) = log(n)

E.g., if there are 16 messages, then log(16) = 4 and we need 4 bits to identify/send each message

In general, if we are given a probability distribution

P = (p1, p2, .., pn) Then the information conveyed by the distribution (aka entropy of P) is:

I(P) = -(p1*log(p1) + p2*log(p2) + .. + pn*log(pn))

Page 22: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

22

Information theory II Information conveyed by distribution (a.k.a.

entropy of P):

I(P) = -(p1*log(p1) + p2*log(p2) + .. + pn*log(pn))

Examples: If P is (0.5, 0.5) then I(P) is 1 If P is (0.67, 0.33) then I(P) is 0.92 If P is (1, 0) then I(P) is 0

The more uniform the probability distribution, the greater its information: More information is conveyed by a message telling you which event actually occurred

Entropy is the average number of bits/message needed to represent a stream of messages

Page 23: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

23

Information for classification

If a set S of records is partitioned into disjoint exhaustive classes (C1,C2,..,Ck) on the basis of the value of the class attribute, then the information needed to identify the class of an element of S is Info(S) = I(P)

where P is the probability distribution of partition (C1,C2,..,Ck):

P = (|C1|/|S|, |C2|/|S|, ..., |Ck|/|S|)

C1

C2

C3

C1

C2C3

High informationLow information

Page 24: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

24

Information, Entropy, and Information Gain

S contains si tuples of class Ci for i = {1, ..., m} Information measures “the amount of info” required to classify any arbitrary tuple

where is the probability that an arbitrary tuple belongs to Ci

Example: S contains 100 tuples, 25 belong to class C1 and 75 belong to class C2

i

m

iim21 pp,...,s,ss 2

1

log)I(

811.0100

75log

100

75

100

25log

100

25)75,25I(

22

|| S

sp ii

Page 25: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

25

Information, Entropy, and Information Gain

Information reflects the “purity” of the data set

Low information value indicates high purity

High information value indicates high diversity

Example: S contains 100 tuples 0 belongs to class C1 and 100 belong to class

C2

50 belong to class C1 and 50 belong to class C2

0100

100log

100

100

100

0log

100

0)100,0I(

22

1100

50log

100

50

100

50log

100

50)50,50I(

22

Page 26: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

26

Information for classification II If we partition S w.r.t attribute X into sets

{T1,T2, ..,Tn} then the information needed to identify the class of an element of S becomes the weighted average of the information needed to identify the class of an element of Ti, i.e. the weighted average of Info(Ti):

Info(X,T) = |Ti|/|S| * Info(Ti)

C1

C2

C3C1

C2

C3

High information Low information

Page 27: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

27

Information gain Consider the quantity Gain(X,S) defined as Gain(X,S) = Info(S) - Info(X,S) This represents the difference between

information needed to identify an element of S and information needed to identify an element of S after the

value of attribute X has been obtained

That is, this is the gain in information due to attribute X We can use this to rank attributes and to build

decision trees where at each node is located the attribute with greatest gain among the attributes not yet considered in the path from the root

The intent of this ordering is: To create small decision trees so that records can be

identified after only a few questions To match a hoped-for minimality of the process

represented by the records being considered (Occam’s Razor)

Page 28: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

28

Information, Entropy, and Information Gain

S contains si tuples of class Ci for i = {1, ..., m}

Attribute A has values {a1,a2,...,av} Let sij be the number of tuples which

belong to class Ci, and have a value of aj in attribute A

Entropy of attribute A is

Information gained by branching on attribute A

),...,(||

...E(A) 1

1

1

mjj

mjj

ssIS

ssv

j

)E(),...,,I()Gain( 21 AsssA m

Page 29: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

29

Information, Entropy, and Information Gain

Let Tj be the set of tuples having value aj in attribute A s1j+…,+smj = |Tj| I(s1j,…,smj) = I(Tj)

Entropy of attribute A is ),...,(

||

...E(A) 1

1

1

mjj

mjj

ssIS

ssv

j

Proportion of |Tj| over |S|

Information of Tj

Page 30: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

30

Information, Entropy, and Information Gain

A=a2

A=a3

40 60

10

10

10 20

20 30

I(10,10)=1

I(10,20)=0.918

I(20,30)=0.971

971.0100

3020918.0

100

20101

100

1010)(

AE

961.0

01.0)()60,40()( AEIAGain

I(40,60)=0.971

S contains 100 tuples, 40 belong to class C1 (red) and 60 belong to class C2 (blue)

30 tuples

50 tuples

A=a120 tuples

Page 31: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

31

Computing information gain

French

Italian

Thai

Burger

Empty Some Full

Y

Y

Y

Y

Y

YN

N

N

N

N

N

•I(S) = - (.5 log .5 + .5 log .5) = .5 + .5 = 1

•I (Pat, S) = 1/6 (0) + 1/3 (0) + 1/2 (- (2/3 log 2/3 +

1/3 log 1/3)) = 1/2 (2/3*.6 + 1/3*1.6) = .47

•I (Type, S) = 1/6 (1) + 1/6 (1) + 1/3 (1) + 1/3 (1) = 1

Gain (Pat, S) = 1 - .47 = .53Gain (Type, T) = 1 – 1 = 0

Page 32: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

32

Regarding the Definition of Entropy…

On Text book Page 134 (Equ. 3.6)

On Text book Page 287 (Equ. 7.2)

m

iii ppSEntropy

12log)(

),...,(||

...(A)E 1

1

1

mjj

mjj

ssIS

ssntropy

v

j

PolymophismWhen entropy is defined on tuples, use Equ. 3.6When entropy is defined on attribute, use Equ. 7.2

Page 33: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

33

How well does it work?

Many case studies have shown that decision trees are at least as accurate as human experts. A study for diagnosing breast cancer had humans correctly classifying the examples 65% of the time; the decision tree classified 72% correct

British Petroleum designed a decision tree for gas-oil separation for offshore oil platforms that replaced an earlier rule-based expert system

Cessna designed an airplane flight controller using 90,000 examples and 20 attributes per example

Page 34: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

34

Outline What is a decision tree ? How to construct a decision tree ?

What are the major steps in decision tree induction ?

How to select the attribute to split the node ?

What are the other issues ?

Page 35: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

35

Extracting Classification Rules from Trees Represent knowledge in the form of IF-

THEN rules One rule is created for each path from

root to a leaf Each attribute-value pair along a path

forms a conjunction Leaf node holds class prediction Rules are easier for humans to

understand

Page 36: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

36

Examples of Classification Rules

Age?

Student? Credit?

fairexcellent

>40

31…40

30

NO YES

no yes

NO YES

YES

Classification rules:1. IF age = “<=30” AND student = “no” THEN buys_computer = “no”2. IF age = “<=30” AND student = “yes” THEN buys_computer = “yes”3. IF age = “31…40” THEN buys_computer = “yes”4. IF age = “>40” AND credit_rating = “excellent” THEN buys_computer = “yes”5. IF age = “>40” AND credit_rating = “fair” THEN buys_computer = “no”

Page 37: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

37

Avoid Over-fitting in Classification Generated tree may over-fit training data

Too many branches, some may reflect anomalies due to noise or outliers

Result is in poor accuracy for unseen samples

Two approaches to avoiding over-fitting Pre-pruning: Halt tree construction early—

do not split a node if this would result in goodness measure falling below a threshold

Difficult to choose an appropriate threshold Post-pruning: Remove branches from a

“fully grown” tree—get a sequence of progressively pruned trees

Use a set of data different from training data to decide which is “best pruned tree”

Page 38: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

38

Enhancements to basic decision tree induction

Dynamic discretization for continuous-valued attributes

Dynamically define new discrete-valued attributes that partition continuous attribute value into a discrete set of intervals

Handle missing attribute values Assign most common value of attribute Assign probability to each of possible values

Attribute construction Create new attributes based on existing ones

that are sparsely represented Reduce fragmentation (no. of samples at branch

becomes too small to be statistically significant), repetition (attribute is repeatedly tested along a branch), and replication (duplicate subtrees)

Page 39: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

39

Classification in Large Databases Classification—a classical problem extensively

studied by statisticians and machine learning researchers

Scalability: Classifying data sets with millions of examples and hundreds of attributes with reasonable speed

Why decision tree induction in data mining? relatively faster learning speed (than other

classification methods) convertible to simple and easy to understand

classification rules can use SQL queries for accessing databases comparable classification accuracy with other

methods

Page 40: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

40

Scalable Decision Tree Induction Methods SLIQ (EDBT’96 — Mehta et al.)

Build an index for each attribute and only class list and the current attribute list reside in memory

SPRINT (VLDB’96 — J. Shafer et al.) constructs an attribute list data structure

PUBLIC (VLDB’98 — Rastogi & Shim) integrates tree splitting and tree pruning:

stop growing the tree earlier RainForest (VLDB’98 — Gehrke,

Ramakrishnan & Ganti) separates the scalability aspects from the

criteria that determine the quality of the tree builds an AVC-list (attribute, value, class

label)

Page 41: Decision Trees. 2 Outline  What is a decision tree ?  How to construct a decision tree ? What are the major steps in decision tree induction ? How to

41

Summary What is a decision tree ?

A flow-chart-like tree: internal nodes, branches, and leaf nodes

How to construct a decision tree ? What are the major steps in decision tree

induction ? Test attribute selection Sample partition

How to select the attribute to split the node ? Select the attribute with the highest

information gain Calculate the information of the node Calculate the entropy of the attribute Calculate the difference between the

information and entropy What are the other issues ?