learning and decision trees

8/10/2019 Learning and Decision Trees

1/16

Learning And Decision Trees


2/16

Learning decision trees

Problem: decide whether to wait for a table at arestaurant, based on the following attributes:

1. Alternate: is there an alternative restaurant nearby?

2. Bar: is there a comfortable bar area to wait in?

3. Fri/Sat: is today Friday or Saturday?4. Hungry: are we hungry?

5. Patrons: number of people in the restaurant (None,Some, Full)

6. Price: price range ($, $$, $$$)

7. Raining: is it raining outside?8. Reservation: have we made a reservation?

9. Type: kind of restaurant (French, Italian, Thai,Burger)

10.WaitEstimate: estimated waiting time (0-10, 10-30,

30-60, >60)


3/16

Attribute-based representations

Examples described by attribute values (Boolean, discrete,continuous)

E.g., situations where I will/won't wait for a table:

Classificationof examples is positive(T) or negative(F)


4/16

Learning Decision Trees

problem: find a decision tree that agreeswith the training set

trivial solution: construct a tree with onebranch for each sample of the training set works perfectly for the samples in the training setmay not work well for new samples

(generalization) results in relatively large trees

better solution: find a concise tree that stillagrees with all samples corresponds to the simplest hypothesis that is

consistent with the training set


5/16

Constructing Decision Trees

in general, constructing the smallestpossible decision tree is an intractableproblem

algorithms exist for constructingreasonably small trees

basic idea: test the most importantattribute first attribute that makes the most difference for the

classification of an examplecan be determined through information theory

hopefully will yield the correct classification withfew tests


6/16

Decision Tree Algorithm

recursive formulation select the best attribute to split positive and negative

examples if only positive or only negative examples are left, we

are done if no examples are left, no such examples were

observersreturn a default value calculated from the majority

classification at the nodes parent

if we have positive and negative examples left, but noattributes to split them we are in troublesamples have the same description, but different

classificationsmay be caused by incorrect data (noise), or by a lack of

information, or by a truly non-deterministic domain


7/16

Decision Tree Learning

Aim: find a small tree consistent with the training examples

Idea: (recursively) choose "most significant" attribute asroot of (sub)tree


8/16

Information


9/16

Entropy


10/16


11/16

Information gain

For the training set, p= n= 6, I(6/12, 6/12) = 1

Consider the attributes Patronsand Type(and others too):

Patronshas the highest IG of all attributes and so is chosen bythe DTL algorithm as the root

2 4 6 2 4IG(Patrons)=1-[ I(0,1)+ I(1,0)+ I( , )]=.054112 12 12 6 6

2 1 1 2 1 1 4 2 2 4 2 2IG(Type)=1-[ I( , )+ I( , )+ I( , )+ I( , )]=0

12 2 2 12 2 2 12 4 4 12 4 4


12/16

Example contd.

Decision tree learned from the 12 examples:

Substantially simpler than true tree---a more complexhypothesis isnt justified by small amount of data


13/16

Expressiveness of Decision Trees

decision trees can also be expressed asimplication sentences

in principle, they can express propositional

logic sentences each row in the truth table of a sentence can be

represented as a path in the tree

often there are more efficient trees

some functions require exponentially largedecision trees

parity function, majority function


14/16

Expressiveness

Decision trees can express any function of the inputattributes.

E.g., for Boolean functions, truth table row path to leaf:

Trivially, there is a consistent decision tree for any trainingset with one path to leaf for each example (unless fnondeterministic inx) but it probably won't generalize tonew examples


15/16

Performance of Decision Tree Learning

quality of predictions predictions for the classification of unknown

examples that agree with the correct result areobviously better

can be measured easily after the fact it can be assessed in advance by splitting the

available examples into a training set and a testsetlearn the training set, and assess the

performance via the test set

size of the tree a smaller tree (especially depth-wise) is a more

concise representation


16/16

learning and decision trees

Documents