Download - Fitting Models to Data
![Page 1: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/1.jpg)
![Page 2: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/2.jpg)
![Page 3: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/3.jpg)
Fitting Models to DataLinear and Quadratic Discriminant Analysis Decision Trees
![Page 4: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/4.jpg)
Year What Notes Who1963 AID: Automatic Interaction Detector Continuous James Morgan
John Sonquist
1973 THAID: THeta AID Categorical James Morgan Robert Messenger
1980 CHAID: CHi-Square AID Multiple Splits Kass
1984 CART: Classification and Regression Trees
Popular Approach Leo Breiman
1986 Iterative Dichotomiser 3 (ID3) Categorical Quinlan Ross
1994 C4.5 Algorithm Continuous and Categorical Quinlan Ross
1994 Bagging Resampling Leo Breiman
Boosting Cascading Small Trees Rob SchapireJerry Friedman
2001 Random Forests Many trees Leo BreimanAdele Cutler
![Page 5: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/5.jpg)
AID: Automatic Interaction Detector
AssociationCo-Occurence
![Page 6: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/6.jpg)
CHAID
![Page 7: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/7.jpg)
CART: Classification and Regression Trees CART family is oriented to statistics using the concept of impurityMeasures how well are the two classes separated – Ideally we would like toseparate all 0s and 1
http://freakonometrics.hypotheses.org/1279
![Page 8: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/8.jpg)
Fitting Models to Data
![Page 9: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/9.jpg)
OverFitting
![Page 10: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/10.jpg)
![Page 11: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/11.jpg)
Bagging• Builds multiple decision trees by repeatedly
resampling training data with replacement
• Fit a Model to each Sample• Voting across the trees for a consensus prediction.
![Page 12: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/12.jpg)
• Learns slowly• Given the current model, we fit a decision tree to the
residuals (misclassifications) from the model. • We then add this new decision tree into the fitted
function in order to update the residuals.• Each of these trees can be rather small, with just a
few terminal nodes, determined by the parameter d in the algorithm.• By fitting small trees to the residuals, we slowly
improve fit in areas where it does not perform well
Boosting
![Page 13: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/13.jpg)
Random Forests
![Page 14: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/14.jpg)
http://www.stat.berkeley.edu/~breiman/RandomForests/
![Page 15: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/15.jpg)
![Page 16: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/16.jpg)
Gradient Boosting
![Page 17: Fitting Models to Data](https://reader035.vdocuments.us/reader035/viewer/2022062501/56816735550346895ddbe231/html5/thumbnails/17.jpg)
Many AlgorithmsDecision Trees
rpart (CART)tree (CART)ctree (conditional inference tree)CHAID (chi-squared automatic interaction detection)evtree (evolutionary algorithm)mvpart (multivariate CART)knnTree (nearest-neighbor-based trees)RWeka (J4.8, M50, LMT)LogicReg (Logic Regression)BayesTreeTWIX (with extra splits)party (conditional inference trees, model-based trees)
Random ForestsrandomForest(CART-based random forests)randomSurvivalForest(for censored responses)party(conditional random forests)gbm(tree-based gradient boosting)mboost(model-based and tree-based gradient boosting)