decision tree and random forest

Decision Tree & Random Forest Algorithm

Outline

Introduction Example of Decision Tree Principles of Decision Tree

– Entropy– Information gain

Random Forest

The problem

Given a set of training cases/objects and their attribute values, try to determine the target attribute value of new examples.

– Classification– Prediction

Apply Model

Induction

Deduction

Learn Model

Tid Attrib1 Attrib2 Attrib3 Class

1 Yes Large 125K No

2 No Medium 100K No

3 No Small 70K No

4 Yes Medium 120K No

5 No Large 95K Yes

6 No Medium 60K No

7 Yes Large 220K No

8 No Small 85K Yes

9 No Medium 75K No

10 No Small 90K Yes 10

Tid Attrib1 Attrib2 Attrib3 Class

11 No Small 55K ?

12 Yes Medium 80K ?

13 Yes Large 110K ?

14 No Small 95K ?

15 No Large 67K ? 10

Test Set

Learningalgorithm

Training Set

Key Requirements

Attribute-value description: object or case must be expressible in terms of a fixed collection of properties or attributes (e.g., hot, mild, cold).

Predefined classes (target values): the target function has discrete output values (bollean or multiclass)

Sufficient data: enough training cases should be provided to learn the model.

A simple example

Principled Criterion

Choosing the most useful attribute for classifying examples. Entropy

- A measure of homogeneity of the set of examples- If the sample is completely homogeneous the entropy is zero and if

the sample is an equally divided it has entropy of one Information Gain

- Measures how well a given attribute separates the training examples according to their target classification

- This measure is used to select among the candidate attributes at each step while growing the tree

Information Gain

Step 1 : Calculate entropy of the target

Information Gain (Cont’d)

Step 2 : Calculate information gain for each attribute

Step 3: Choose attribute with the largest information gain as the decision node.

Step 4a: A branch with entropy of 0 is a leaf node.

Step 4b: A branch with entropy more than 0 needs further splitting.

Step 5: The algorithm is run recursively on the non-leaf branches, until all data is classified.

Random Forest

Decision Tree : one tree Random Forest : more than one tree

Decision Tree & Random Forest

Decision Tree

Random Forest

Tree 1 Tree 2

Tree 3

Decision Tree

Outlook Temp. Humidity Windy Play GolfRainy Mild High False ?

Result : No

Random Forest

Tree 1 Tree 2

Tree 3

Tree 1 : NoTree 2 : NoTree 3 : Yes

Yes : 1No : 2

Result : No

OOB Error Rate

OOB error rate can be used to get a running unbiased estimate of the classification error as trees are added to the forest.

decision tree and random forest

Data & Analytics

random forest - eth z

decision tree ensembles - courses.cs.washington.edudecision...

distributed algorithms for decision forest training in the...

random forest for big data

random forest using apache mahout

user manual for diyabc random forest v1 random forest...

fpga random forest classifier - github...

bagging and random forests - duke...

brief overview what is tree improvement? of forest tree ......

forest & tree conservation

tree forest

random forest ensemble...

distributed random forest classification of...

ensemble learning (2), tree and forest classification and...

unsupervised learning with random forest predictors ·...

random forests ujjwol subedi. introduction what is random...

unsupervised learning with random forest … learning with...

building random forest at scale

forest tree species service forest vn mard 2003

structured random forests - github pages · decision forest...