1 classification with decision trees ii instructor: qiang yang hong kong university of science and...
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/1.jpg)
1
Classification with Decision Trees II
Instructor: Qiang YangHong Kong University of Science and Technology
Thanks: Eibe Frank and Jiawei Han
![Page 2: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/2.jpg)
2
Part II: Industrial-strength algorithms
Requirements for an algorithm to be useful in a wide range of real-world applications:
Can deal with numeric attributes Doesn’t fall over when missing values are present Is robust in the presence of noise Can (at least in principle) approximate arbitrary
concept descriptions Basic schemes (may) need to be extended to
fulfill these requirements
![Page 3: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/3.jpg)
3
Decision trees
Extending ID3 to deal with numeric attributes: pretty straightforward
Dealing sensibly with missing values: a bit trickier
Stability for noisy data: requires sophisticated pruning mechanism
End result of these modifications: Quinlan’s C4.5
Best-known and (probably) most widely-used learning algorithm
Commercial successor: C5.0
![Page 4: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/4.jpg)
4
Numeric attributes
Standard method: binary splits (i.e. temp < 45)
Difference to nominal attributes: every attribute offers many possible split points
Solution is straightforward extension: Evaluate info gain (or other measure) for every
possible split point of attribute Choose “best” split point Info gain for best split point is info gain for attribute
Computationally more demanding
![Page 5: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/5.jpg)
5
An example
Split on temperature attribute from weather data:
Eg. 4 yeses and 2 nos for temperature < 71.5 and 5 yeses and 3 nos for temperature 71.5
Info([4,2],[5,3]) = (6/14)info([4,2]) + (8/14)info([5,3]) = 0.939 bits
Split points are placed halfway between values All split points can be evaluated in one pass!
64 65 68 69 70 71 72 72 75 75 80 81 83 85Yes No Yes Yes Yes No No Yes Yes Yes No Yes Yes No
![Page 6: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/6.jpg)
6
Avoiding repeated sorting
Instances need to be sorted according to the values of the numeric attribute considered
Time complexity for sorting: O(n log n) Does this have to be repeated at each node? No! Sort order from parent node can be used
to derive sort order for children Time complexity of derivation: O(n) Only drawback: need to create and store an array of
sorted indices for each numeric attribute
![Page 7: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/7.jpg)
7
Notes on binary splits
Information in nominal attributes is computed using one multi-way split on that attribute
This is not the case for binary splits on numeric attributes
The same numeric attribute may be tested several times along a path in the decision tree
Disadvantage: tree is relatively hard to read Possible remedies: pre-discretization of
numeric attributes or multi-way splits instead of binary ones
![Page 8: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/8.jpg)
8
Example of Binary Split
Age<=3
Age<=5
Age<=10
![Page 9: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/9.jpg)
9
Missing values
C4.5 splits instances with missing values into pieces (with weights summing to 1)
A piece going down a particular branch receives a weight proportional to the popularity of the branch
Info gain etc. can be used with fractional instances using sums of weights instead of counts
During classification, the same procedure is used to split instances into pieces
Probability distributions are merged using weights
![Page 10: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/10.jpg)
10
Stopping Criteria
When all cases have the same class. The leaf node is labeled by this class.
When there is no available attribute. The leaf node is labeled by the majority class.
When the number of cases is less than a specified threshold. The leaf node is labeled by the majority class.
![Page 11: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/11.jpg)
11
Pruning
Pruning simplifies a decision tree to prevent overfitting to noise in the data
Two main pruning strategies:1. Postpruning: takes a fully-grown decision tree and
discards unreliable parts2. Prepruning: stops growing a branch when
information becomes unreliable Postpruning preferred in practice because of
early stopping in prepruning
![Page 12: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/12.jpg)
12
Prepruning
Usually based on statistical significance test Stops growing the tree when there is no
statistically significant association between any attribute and the class at a particular node
Most popular test: chi-squared test ID3 used chi-squared test in addition to
information gain Only statistically significant attributes where allowed
to be selected by information gain procedure
![Page 13: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/13.jpg)
13
The Weather example: Observed Count
Play
Outlook
Yes No Outlook Subtotal
Sunny 2 0 2
Cloudy 0 1 1
PlaySubtotal:
2 1 Total count in table =3
![Page 14: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/14.jpg)
14
The Weather example: Expected Count
Play
Outlook
Yes No Subtotal
Sunny 2*2/6=4/3=1.3
2*1/6=2/3=0.6
2
Cloudy 2*1/3=0.6 1*1/3=0.3 1
Subtotal: 2 1 Total count in table =3
If attributes were independent, then the subtotals would be Like this
![Page 15: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/15.jpg)
15
Question: How different between observed and expected?
•If Chi-squared value is very large, then A1 and A2 are not independent that is, they are dependent!
•Degrees of freedom: if table has n*m items, then freedom = (n-1)*(m-1)
•If all attributes in a node are independent with the class attribute, then stop splitting further.
![Page 16: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/16.jpg)
16
Postpruning
Builds full tree first and prunes it afterwards Attribute interactions are visible in fully-grown tree
Problem: identification of subtrees and nodes that are due to chance effects
Two main pruning operations: 1. Subtree replacement2. Subtree raising
Possible strategies: error estimation, significance testing, MDL principle
![Page 17: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/17.jpg)
17
Subtree replacement
Bottom-up: tree is considered for replacement once all its subtrees have been considered
![Page 18: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/18.jpg)
18
Subtree raising
Deletes node and redistributes instances Slower than subtree replacement
(Worthwhile?)
![Page 19: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/19.jpg)
19
Estimating error rates
Pruning operation is performed if this does not increase the estimated error
Of course, error on the training data is not a useful estimator (would result in almost no pruning)
One possibility: using hold-out set for pruning (reduced-error pruning)
C4.5’s method: using upper limit of 25% confidence interval derived from the training data
Standard Bernoulli-process-based method
![Page 20: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/20.jpg)
Outlook Tempreature Humidity Windy Classsunny hot high false Nsunny hot high true Novercast hot high false Prain mild high false Prain cool normal false Prain cool normal true Novercast cool normal true Psunny mild high false Nsunny cool normal false Prain mild normal false Psunny mild normal true Povercast mild high true Povercast hot normal false Prain mild high true N
Training Set
![Page 21: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/21.jpg)
21
Post-pruning in C4.5 Bottom-up pruning: at each non-leaf node v, if
merging the subtree at v into a leaf node improves accuracy, perform the merging.
Method 1: compute accuracy using examples not seen by the algorithm.
Method 2: estimate accuracy using the training examples: Consider classifying E examples incorrectly out of
N examples as observing E events in N trials in the binomial distribution.
For a given confidence level CF, the upper limit on the error rate over the whole population is with CF% confidence. ),( NEUCF
![Page 22: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/22.jpg)
22
Usage in Statistics: Sampling error estimation Example:
population: 1,000,000 people, could be regarded as infinite population mean: percentage of the left handed people sample: 100 people sample mean: 6 left-handed How to estimate the REAL population mean?
Pessimistic Estimate
U0.25(100,6)L0.25(100,6)6
Possibility(%)
2 10
75% confidence interval15%
![Page 23: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/23.jpg)
23
Usage in Decision Tree (DT): error estimation for some node in the DT
example: unknown testing data: could be regarded as infinite universe population mean: percentage of error made by this node sample: 100 examples from training data set sample mean: 6 errors for the training data set How to estimate the REAL average error rate?
Pessimistic Estimate
U0.25(100,6)L0.25(100,6)6
Possibility(%)
2 10
75% confidence interval
Heuristic!But works well...
![Page 24: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/24.jpg)
24
C4.5’s method
Error estimate for subtree is weighted sum of error estimates for all its leaves
Error estimate for a node:
If c = 25% then z = 0.69 (from normal distribution)
f is the error on the training data N is the number of instances covered by the
leaf
Nz
N
zNf
Nf
zNz
fe2
2
222
142
![Page 25: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/25.jpg)
25
Example for Estimating Error
Consider a subtree rooted at Outlook with 3 leaf nodes: Sunny: Play = yes : (0 error, 6 instances) Overcast: Play= yes: (0 error, 9 instances) Cloudy: Play = no (0 error, 1 instance)
The estimated error for this subtree is 6*0.206+9*0.143+1*0.750=3.273
If the subtree is replaced with the leaf “yes”, the estimated error is
So the pruning is performed and the tree is merged (see next page)
750.0)0,1(,143.0)0,9(,206.0)0,6( 25.025.025.0 UUU
512.2157.0*16)1,16(*16 25.0 U
![Page 26: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/26.jpg)
26
Example continued
Outlook
yes yes no
sunny
overcast
cloudy yes
![Page 27: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/27.jpg)
27
Example
f=0.33 e=0.47 f=0.5
e=0.72f=0.33 e=0.47
f=5/14 e=0.46
Combined usingratios 6:2:6this gives 0.51
![Page 28: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/28.jpg)
28
Complexity of tree induction
Assume m attributes, n training instances and a tree depth of O(log n)
Cost for building a tree: O(mn log n) Complexity of subtree replacement: O(n) Complexity of subtree raising: O(n (log n)2)
Every instance may have to be redistributed at every node between its leaf and the root: O(n log n)
Cost for redistribution (on average): O(log n) Total cost: O(mn log n) + O(n (log n)2)
![Page 29: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/29.jpg)
29
The CART Algorithm
)()( ii
i TsdT
TTsdSDR
Tx
xxPTSD 2)(*)()(
k
jjjkk xwxwxwxwxwy
0
)1()1()1(22
)1(11
)1(00
)1( ...
yXXXWT
T 1)(
![Page 30: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/30.jpg)
30
Numeric prediction Counterparts exist for all schemes that we previously discussed
Decision trees, rule learners, SVMs, etc.
All classification schemes can be applied to regression problems using discretization
Prediction: weighted average of intervals’ midpoints (weighted according to class probabilities)
Regression more difficult than classification (i.e. percent correct vs. mean squared error)
![Page 31: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/31.jpg)
31
Regression trees
Differences to decision trees: Splitting criterion: minimizing intra-subset variation Pruning criterion: based on numeric error measure Leaf node predicts average class values of training
instances reaching that node Can approximate piecewise constant functions Easy to interpret More sophisticated version: model trees
![Page 32: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/32.jpg)
32
Model trees
Regression trees with linear regression functions at each node
Linear regression applied to instances that reach a node after full regression tree has been built
Only a subset of the attributes is used for LR Attributes occurring in subtree (+maybe attributes
occurring in path to the root) Fast: overhead for LR not large because usually
only a small subset of attributes is used in tree
![Page 33: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/33.jpg)
33
Smoothing
Naïve method for prediction outputs value of LR for corresponding leaf node
Performance can be improved by smoothing predictions using internal LR models
Predicted value is weighted average of LR models along path from root to leaf
Smoothing formula: Same effect can be achieved by incorporating
the internal models into the leaf nodes
kn
kqnpp
![Page 34: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/34.jpg)
34
Building the tree
Splitting criterion: standard deviation reduction
Termination criteria (important when building trees for numeric prediction):
Standard deviation becomes smaller than certain fraction of sd for full training set (e.g. 5%)
Too few instances remain (e.g. less than four)
)()( ii
i TsdT
TTsdSDR
![Page 35: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/35.jpg)
35
Pruning
Pruning is based on estimated absolute error of LR models
Heuristic estimate: LR models are pruned by greedily removing
terms to minimize the estimated error Model trees allow for heavy pruning: often a
single LR model can replace a whole subtree Pruning proceeds bottom up: error for LR
model at internal node is compared to error for subtree
orsolute_erraverage_abvn
n
![Page 36: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/36.jpg)
36
Nominal attributes
Nominal attributes are converted into binary attributes (that can be treated as numeric ones)
Nominal values are sorted using average class val. If there are k values, k-1 binary attributes are
generated The ith binary attribute is 0 if an instance’s value is one
of the first i in the ordering, 1 otherwise It can be proven that the best split on one of
the new attributes is the best binary split on original
But M5‘ only does the conversion once
![Page 37: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/37.jpg)
37
Missing values Modified splitting criterion:
Procedure for deciding into which subset the instance goes: surrogate splitting
Choose attribute for splitting that is most highly correlated with original attribute
Problem: complex and time-consuming Simple solution: always use the class
Testing: replace missing value with average
)()( i
i
i TsdT
TTsd
T
mSDR
![Page 38: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/38.jpg)
38
Pseudo-code for M5’
Four methods: Main method: MakeModelTree() Method for splitting: split() Method for pruning: prune() Method that computes error: subtreeError()
We’ll briefly look at each method in turn Linear regression method is assumed to
perform attribute subset selection based on error
![Page 39: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/39.jpg)
39
MakeModelTree()
MakeModelTree (instances){ SD = sd(instances) for each k-valued nominal attribute convert into k-1 synthetic binary attributes root = newNode root.instances = instances split(root) prune(root) printTree(root)}
![Page 40: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/40.jpg)
40
split()split(node) { if sizeof(node.instances) < 4 or sd(node.instances) < 0.05*SD node.type = LEAF else node.type = INTERIOR for each attribute for all possible split positions of the attribute calculate the attribute's SDR node.attribute = attribute with maximum SDR split(node.left) split(node.right)}
![Page 41: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/41.jpg)
41
prune()
prune(node){ if node = INTERIOR then prune(node.leftChild) prune(node.rightChild) node.model = linearRegression(node) if subtreeError(node) > error(node)
then node.type = LEAF}
![Page 42: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/42.jpg)
42
subtreeError()
subtreeError(node){ l = node.left; r = node.right if node = INTERIOR then return (sizeof(l.instances)*subtreeError(l) + sizeof(r.instances)*subtreeError(r))
/sizeof(node.instances) else return error(node)}
![Page 43: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/43.jpg)
43
Model tree for servo data
![Page 44: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/44.jpg)
44
Variations of CART
Applying Logistic Regression predict probability of “True” or “False” instead of making a
numerical valued prediction predict a probability value (p) rather than the outcome itself Probability= odds ratio
ii XWp
p)
1log(
)(1
1WXe
p
![Page 45: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/45.jpg)
45
Other Trees
• Classification Trees•Current node •Children nodes (L, R):
•Decision Trees •Current node •Children nodes (L, R):
•GINI index used in CART (STD= )•Current node •Children nodes (L, R):
),min(),min( 1010RRRLLL ppppppQ
),min( 10 ppQ
1100 loglog ppppQ
RRLL QpQpQ
10 pp10 ppQ
RRLL QpQpQ
![Page 46: 1 Classification with Decision Trees II Instructor: Qiang Yang Hong Kong University of Science and Technology Qyang@cs.ust.hk Thanks: Eibe Frank and Jiawei](https://reader035.vdocuments.us/reader035/viewer/2022062421/56649d575503460f94a36902/html5/thumbnails/46.jpg)
47
Efforts on Scalability
Most algorithms assume data can fit in memory. Recent efforts focus on disk-resident implementation for
decision trees. Random sampling Partitioning Examples
SLIQ (EDBT’96 -- [MAR96]) SPRINT (VLDB96 -- [SAM96]) PUBLIC (VLDB98 -- [RS98]) RainForest (VLDB98 -- [GRG98])