bab 4.1 - 1/44 bab 4 classification: basic concepts, decision trees & model evaluation part 1...
TRANSCRIPT
![Page 1: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/1.jpg)
Bab 4.1 - 1/44
Bab 4Bab 4Classification: Basic Concepts,Classification: Basic Concepts,
Decision Trees & Model Decision Trees & Model EvaluationEvaluation
Part 1Part 1Classification With Decision Classification With Decision
treetree
![Page 2: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/2.jpg)
Bab 4.1 - 2/44
Classification: Definition
![Page 3: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/3.jpg)
Bab 4.1 - 3/44
Example of Classification Task
![Page 4: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/4.jpg)
Bab 4.1 - 4/44
General Approach for Building Classification Model
![Page 5: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/5.jpg)
Bab 4.1 - 5/44
Classification Techniques
![Page 6: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/6.jpg)
Bab 4.1 - 6/44
Example of Decision Tree
![Page 7: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/7.jpg)
Bab 4.1 - 7/44
Another Example of Decision Tree
![Page 8: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/8.jpg)
Bab 4.1 - 8/44
Decision Tree Classification Task
![Page 9: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/9.jpg)
Bab 4.1 - 9/44
Apply Model to Test Data
![Page 10: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/10.jpg)
Bab 4.1 - 10/44
Decision Tree Classification Task
![Page 11: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/11.jpg)
Bab 4.1 - 11/44
Decision Tree Induction
![Page 12: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/12.jpg)
Bab 4.1 - 12/44
General Structure of Hunt’s Algorithm
![Page 13: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/13.jpg)
Bab 4.1 - 13/44
Hunt’s Algorithm
![Page 14: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/14.jpg)
Bab 4.1 - 14/44
Design Issues of Decision Tree Induction
![Page 15: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/15.jpg)
Bab 4.1 - 15/44
Methods for Expression Test Conditions
![Page 16: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/16.jpg)
Bab 4.1 - 16/44
Test Condition for Nominal Attributes
![Page 17: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/17.jpg)
Bab 4.1 - 17/44
Test Condition for Ordinal Attributes
![Page 18: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/18.jpg)
Bab 4.1 - 18/44
Test Condition for Continues Attributes
![Page 19: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/19.jpg)
Bab 4.1 - 19/44
Splitting Based on Continues Attributes
![Page 20: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/20.jpg)
Bab 4.1 - 20/44
How to Determine the Best Split / 1
![Page 21: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/21.jpg)
Bab 4.1 - 21/44
How to Determine the Best Split / 2
![Page 22: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/22.jpg)
Bab 4.1 - 22/44
Measures of Node Impurity
![Page 23: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/23.jpg)
Bab 4.1 - 23/44
Finding the Best Split / 1
![Page 24: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/24.jpg)
Bab 4.1 - 24/44
Finding the Best Split / 2
![Page 25: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/25.jpg)
Bab 4.1 - 25/44
Measure of Impurity: GINI
![Page 26: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/26.jpg)
Bab 4.1 - 26/44
Computing GINI Index of a Single Node
![Page 27: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/27.jpg)
Bab 4.1 - 27/44
Computing GINI Index for a Collection of Nodes
![Page 28: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/28.jpg)
Bab 4.1 - 28/44
Binary Attributes: Computing GINI Index
![Page 29: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/29.jpg)
Bab 4.1 - 29/44
Categorical Attributes: Computing GINI Index
![Page 30: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/30.jpg)
Bab 4.1 - 30/44
Continuous Attributes: Computing GINI Index / 1
![Page 31: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/31.jpg)
Bab 4.1 - 31/44
Continuous Attributes: Computing GINI Index / 2
![Page 32: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/32.jpg)
Bab 4.1 - 32/44
Measure of Impurity: Entropy
![Page 33: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/33.jpg)
Bab 4.1 - 33/44
Computing Entropy of a Single Node
![Page 34: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/34.jpg)
Bab 4.1 - 34/44
Computing information Gain After Splitting
![Page 35: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/35.jpg)
Bab 4.1 - 35/44
Problems with Information Gain
![Page 36: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/36.jpg)
Bab 4.1 - 36/44
Gain Ratio
![Page 37: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/37.jpg)
Bab 4.1 - 37/44
Measure of Impurity: Classification Error
![Page 38: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/38.jpg)
Bab 4.1 - 38/44
Computing Error of a Single Node
![Page 39: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/39.jpg)
Bab 4.1 - 39/44
Comparison among Impurity Measures
For binary (2-class) classification problems
![Page 40: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/40.jpg)
Bab 4.1 - 40/44
Misclassification Error vs Gini index
![Page 41: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/41.jpg)
Bab 4.1 - 41/44
Example: C4.5
• Simple depth-first construction.• Uses Information Gain• Sorts Continuous Attributes at each node.• Needs entire data to fit in memory.• Unsuitable for Large Datasets.
Needs out-of-core sorting.
• You can download the software from:http://www.cse.unsw.edu.au/~quinlan/c4.5r8.tar.gz
![Page 42: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/42.jpg)
Bab 4.1 - 42/44
Scalable Decision Tree Induction / 1
• How scalable is decision tree induction? Particularly suitable for small data set
• SLIQ (EDBT’96 — Mehta et al.) Builds an index for each attribute and only class
list and the current attribute list reside in memory
![Page 43: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/43.jpg)
Bab 4.1 - 43/44
Scalable Decision Tree Induction / 2
• SLIQ
Sample data for the class buys_computer
Disk-resident attribute lists Memory-resident class list
RID Credit_rating Age Buys_computer
1 excellent 38 yes
2 excellent 26 yes
3 fair 35 no
4 excellent 49 no
Credit_rating
RID
excellent 1
excellent 2
excellent 4
fair 3
… …
age RID
26 2
35 3
38 1
49 4
… …
RID Buys_computer
node
1 yes 5
2 yes 2
3 no 3
4 no 6
… … …
0
1 2
3 4
5 6
![Page 44: Bab 4.1 - 1/44 Bab 4 Classification: Basic Concepts, Decision Trees & Model Evaluation Part 1 Classification With Decision tree](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649c7b5503460f9492fb15/html5/thumbnails/44.jpg)
Bab 4.1 - 44/44
Decision Tree Based Classification• Advantages
Inexpensive to construct Extremely fast at classifying unknown records Easy to interpret for small-sized tress Accuracy is comparable to other classification
techniques for many data sets
• Practical Issues of Classification Underfitting and Overfitting Missing Values Costs of Classification