machine learning workshop
DESCRIPTION
Presentation on Decision Trees and Random Forests given for the Boston Predictive Analytics Machine Learning Workshop on December 2, 2012. Code to accompany the slides is available at www.github.com/dgerlanc/mlclass or http://www.enplusadvisors.com/wp-content/uploads/2012/12/mlclass_1.0.tar.gzTRANSCRIPT
![Page 1: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/1.jpg)
Hands on Classification:Decision Trees and Random Forests
Daniel Gerlanc, Managing DirectorEnplus Advisors, [email protected]
Predictive Analytics Meetup GroupMachine Learning WorkshopDecember 2, 2012
![Page 2: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/2.jpg)
© Daniel Gerlanc, 2012. All rights reserved.
If you’d like to use this material for any purpose, please contact [email protected]
![Page 3: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/3.jpg)
What You’ll Learn
•Intuition behind decision trees and random forests
•Implementation in R
•Assessing the results
![Page 4: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/4.jpg)
Dataset
•Chemical Analysis of Italian Wines
•http://www.parvus.unige.it/
•178 records, 14 attributes
![Page 5: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/5.jpg)
Follow along
> library(mlclass)> data(wine)> str(wine)'data.frame': 178 obs. of 14 variables: $ Type : Factor w/ 2 levels "Grig","No": 2 2 2 2 2 2 2 2 2 2 ... $ Alcohol : num 14.2 13.2 13.2 14.4 13.2 ... $ Malic : num 1.71 1.78 2.36 1.95 2.59 1.76 1.87 2.15 1.64 1.35 ... $ Ash : num 2.43 2.14 2.67 2.5 2.87 2.45 2.45 2.61 2.17 2.27 ... $ Alcalinity : num 15.6 11.2 18.6 16.8 21 15.2 14.6 17.6 14 16 ...
![Page 6: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/6.jpg)
What are Decision Trees?
•Model for partitioning an input space
![Page 7: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/7.jpg)
What’s partitioning?
See rf-1.R
![Page 8: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/8.jpg)
Create the 1st split.
G
Not G
See rf-1.R
![Page 9: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/9.jpg)
G
Not G
G
Create the 2nd Split
See rf-1.R
![Page 10: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/10.jpg)
G
Not G
G
Create more splits…
Not G
I drew this one in.
![Page 11: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/11.jpg)
Another view of partitioning
See rf-2.R
![Page 12: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/12.jpg)
Use R to do the partitioning.
tree.1 <- rpart(Type ~ ., data=wine)prp(tree.1, type=4, extra=2)
• See the ‘rpart’ and ‘rpart.plot’ R packages.• Many parameters available to control the fit.
See rf-2.R
![Page 13: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/13.jpg)
Make predictions on a test dataset
predict(tree.1, data=wine, type=“vector”)
![Page 14: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/14.jpg)
How’d it do?
Guessing: 60.11%
CART: 94.38% Accuracy • Precision: 92.95% (66 / 71)• Sensitivity/Recall: 92.95% (66 / 71)
Actual
Predicted Grig no
Grig (1)66 (3) 5
No (2) 5 (4) 102
![Page 15: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/15.jpg)
Decision Tree Problems
•Overfitting the data
•May not use all relevant features
•Perpendicular decision boundaries
![Page 16: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/16.jpg)
Random Forests
One Decision Tree
Many Decision Trees (Ensemble)
![Page 17: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/17.jpg)
Random Forest Fixes
•Overfitting the data
•May not use all relevant features
•Perpendicular decision boundaries
![Page 18: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/18.jpg)
Building RF
For each tree:
Sample from the data
At each split, sample from the available variables
![Page 19: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/19.jpg)
Bootstrap Sampling
![Page 20: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/20.jpg)
Sample Attributes at each split
![Page 21: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/21.jpg)
Motivations for RF
•Create uncorrelated trees
•Variance reduction
•Subspace exploration
![Page 22: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/22.jpg)
Random Forestsrffit.1 <- randomForest(Type ~ ., data=wine)
See rf-3.R
![Page 23: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/23.jpg)
RF Parameters in RMost important parameters are:
Variable
Description Default
ntree Number of Trees 500
mtry Number of variables to randomly select at each node
• square root of # predictors for classification
• # predictors / 3 for regression
nodesize
Minimum number of records in a terminal node
• 1 for classification• 5 for regression
sampsize
Number of records to select in each bootstrap sample
• 63.2%
![Page 24: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/24.jpg)
How’d it do?
Guessing Accuracy: 60.11%
Random Forest: 98.31% Accuracy • Precision: 95.77% (68 / 71)• Sensitivity/Recall: 100% (68 / 68)
Actual
Predicted Grig No
Grig (1)68 (3) 3
No (2) 0 (4) 107
![Page 25: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/25.jpg)
Tuning RF: Grid Search
See rf-4.R
Th
is is
the d
efa
ult
.
![Page 26: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/26.jpg)
Tuning is Expensive
•Polynomial in the number of tuning parameters:
•Plus repeated model fitting in cross-validation
![Page 27: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/27.jpg)
Benefits of RF
•Good performance with default settings
•Relatively easy to make parallel
•Many implementations
•R, Weka, RapidMiner, Mahout
![Page 28: Machine Learning Workshop](https://reader033.vdocuments.us/reader033/viewer/2022061300/54c64e784a7959d95b8b464d/html5/thumbnails/28.jpg)
References
• A. Liaw and M. Wiener (2002). Classification and Regression by randomForest. R News 2(3), 18--22.
• Breiman, Leo. Classification and Regression Trees. Belmont, Calif: Wadsworth International Group, 1984. Print.
• Brieman, Leo and Adele Cutler. Random forests. http://www.stat.berkeley.edu/~breiman/RandomForests/cc_contact.htm