lecture 19: tree-based methods pt. 2jcrouser/sds293/lectures/19-trees-pt2.pdf · lecture 19:...
TRANSCRIPT
![Page 1: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/1.jpg)
LECTURE 19:
TREE-BASED METHODS PT. 2November 15, 2017SDS 293: Machine Learning
![Page 2: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/2.jpg)
Outline
üBasic mechanics of tree-based methodsüClassification exampleüChoosing good splitsüPruning
• How to avoid over-fitting- Bootstrap aggregation (“bagging”)- Random forests- Boosting
• Lab
![Page 3: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/3.jpg)
!"#$#!% &'(#$#)*+*,-./ $#0,1/2
3 4
!"#$#!% &'(#$#
4 3
VJBKD;BCW
4 3
(566/2/7,8
![Page 4: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/4.jpg)
VJBKD;BCW
92-:./;*'DEGD'8B<EBFC9
![Page 5: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/5.jpg)
/EKCHKKE7F
P <=/*,5-7>#QDBI'CBF'Q9'M7'I7'C7:;BI'DEGD'8B<EBFC9XP )7*?/2>#;77IKI<BO'BFM'BGG<9GBI9Y
![Page 6: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/6.jpg)
,BGGEFG
P @5A#5B/C> HK9'<9GHJB<'7JM';77IKI<BOOEFG'I7'G9F9<BI9'B';HFCD'7L'KB:OJ9'I<BEFEFG'K9IK>';HEJM'I<99K'L7<'9BCD'7F9>'BFM'CD/2CA/ ID9E<'<9KHJIEFG'O<9MECIE7FK
Z:BG9'C7H<I9K['&BCD7OEF@@'7F'/98EBFI-<I
![Page 7: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/7.jpg)
Estimating test error
• Fun fact: there is a straightforward way to estimate the test error of a bagged model, without needing to use a test set or cross-validate!
• Key: trees are repeatedly fit to bootstrapped subsets - Each bagged tree only trains on ~2/3 of the observations- The remaining ~1/3 are called the out-of-bag (OOB) observations
• Question: how does this help?
![Page 8: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/8.jpg)
22,'9<<7<
$<BEF9M'7F'IDEK'7;K9<8BIE7FX$<BEF9M'7F'IDEK'
&9O9BI'L7<'/D/2+#-:*/2DC,5-7>'B89<BG9'I7'G9I'0."'7<'CJBKKELECBIE7F'9<<7<
^QEID'9F7HGD'I<99K>'IDEK'EK'9KK9FIEBJJ['9_HE8BJ9FI'I7'!22#`'9<<7<
![Page 9: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/9.jpg)
09BKH<EFG'O<9MECI7<'E:O7<IBFC9
P aEID'bHKI'7F9'I<99>'EI'QBK'9BK['I7'OECW'7HI'ID9':7KI'E:O7<IBFI'O<9MECI7<'RQDECD'7F9'QBK'EIXU
![Page 10: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/10.jpg)
09BKH<EFG'O<9MECI7<'E:O7<IBFC9
P aEID'J7IK'7L'I<99K>'Q9'CBFcI'bHKI'S<9BM L<7:'ID9'I7OT
P ZFKI9BM>'Q9'CBF'J77W'BI'B89<BG9'<9MHCIE7F'EF'&..'7<'dEFE'MH9'I7'KOJEIK'7F'B'GE89F'O<9MECI7<'RQ9cJJ'K99'IDEK'EF'JB;U
![Page 11: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/11.jpg)
eHKI'7F9'O<7;J9:f
P 2F9'EKKH9'QEID';BGGEFG'EK'IDBI'EI'K7:9IE:9K'GE89K'HK'I<99K'IDBI'B<9'O<9II['15A1.+#E-22/.C,/B#RQD[XU
P -89<BGEFG'DEGDJ['C7<<9JBI9M'8BJH9K'B-/*7I,#2/B=E/#DC25C7E/BK':HCD'BK'B89<BGEFG'HFC7<<9JBI9M'_HBFIEIE9K
ZL'Q9'DB89'7F9'D/2+#*,2-7A#J2/B5E,-2#EF'ID9'MBIB'K9I>':7KI'7<'BJJ'7L'ID9'I<99K'QEJJ'HK9'IDEK'O<9MECI7<'EF'ID9'
,-J#*J.5,
![Page 12: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/12.jpg)
Random forests
• Strange idea: what if each time we go to make a split, we randomly limit the choice to some subset of predictors?-Only consider m out of p predictors each time we decide on a split
- Roughly (p-m)/p splits won’t even consider that bully predictor, so other predictors will have a chance
- Note: when m = p, this is just bagging!
![Page 13: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/13.jpg)
,77KIEFG
P 92/D5-=*#;/,1-B*>#G9F9<BI9'B';HFCD'7L'I<BEFEFG'K9IK>'LEI'B'I<99'7F'9BCD'7F9'EFM9O9FM9FIJ[>'BFM'BGG<9GBI9'<9KHJIK
P @--*,57A Q7<WK'EF'B'KE:EJB<'QB[>'9NC9OI'IDBI'9BCD'I<99'EK'G<7QF'HKEFG'EFL7<:BIE7F'L<7:'J2/D5-=*#,2//*
![Page 14: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/14.jpg)
Boosting
• Big idea: fit each new tree using the residuals from the previous tree as the response
• A shrinkage parameter 𝜆 slows the process even more, allowing different-shaped trees to try to deal w/ residuals
• By fitting small trees to the residuals (i.e. variance we haven’t yet explained), we slowly* improve the model in areas where it makes mistakes
* in general, statistical learning approaches that learn slowly tend to perform well
![Page 15: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/15.jpg)
,77KIEFG*'BJG7<EID:
(4 ZFEIEBJEg9'"# $ % & BFM''( % )(*L7<'BJJ'+ EF'ID9'I<BEFEFG'K9I54 V7<'9BCD',* % *-. /. 0 . 1*BU VEI'B'I<99'"#2 QEID'3 KOJEIK'R3 4 - I9<:EFBJ'F7M9KU'I7'56. '7
;U %OMBI9'"# ;['BMMEFG'EF'B'KD<HFW9F'89<KE7F'7L'ID9'F9Q'I<99*"# $ 8 "# $ 4 !"#2 $ *
CU %OMBI9'ID9'<9KEMHBJK*''( 8 '( 9 !"#2 $(
A4 2HIOHI'ID9';77KI9M':7M9J*
"# $ % :"#2 $;
2<=
![Page 16: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/16.jpg)
Takeaways
• Tree-based methods partition the predictors into a number of simple regions, and use the average value of each region to make predictions
• While easy to interpret, trees typically won’t outperformother methods we’ve seen in terms of prediction accuracy
• Bagging, random forests, and boosting all try to fix this by growing multiple trees and using “consensus prediction”
• In lab, we will see that combining a large number of trees can result in dramatic improvements in prediction accuracy (at the expense of some loss in interpretability)
![Page 17: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/17.jpg)
Lab: bagging, random forests, & boosting
• To do today’s lab in R: tree, randomForest, gbm• To do today’s lab in python: graphviz
• Instructions and code:[course website]/labs/lab14-r.html
[course website]/labs/lab14-py.html
• Full version can be found beginning on p. 324 of ISLR
![Page 18: LECTURE 19: TREE-BASED METHODS PT. 2jcrouser/SDS293/lectures/19-trees-pt2.pdf · LECTURE 19: TREE-BASED METHODS PT. 2 November 15, 2017 SDS 293: Machine Learning](https://reader035.vdocuments.us/reader035/viewer/2022081617/6041e5c2127d5d5720047478/html5/thumbnails/18.jpg)
Coming up
• A7 due tonight by 11:59pm
• Next week: final project workshop• After break: support vector machines