ensemble learning and the heritage health prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4....
TRANSCRIPT
Ensemble Learning and the Heritage Health Prize
Jonathan Stroud, Igii Enverga, Tiffany Silverstein,Brian Song, and Taylor Rogers
iCAMP 2012University of California, Irvine
Advisors: Dr. Max Welling, Dr. Alexander Ihler, Sungjin Ahn, and Qiang Liu
December 4, 2013
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Outline
I Heritage Health PrizeI DataI Evaluation
I Our ApproachI MotivationI ModelsI Blending
I Results
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
The Heritage Health Prize
I Goal: Identify patients who will be admitted to a hospitalwithin the next year, using historical claims data.[1]
I 1,659 teams
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Purpose
I Reduce cost of unnecessary hospital admissions per yearI Identify at-risk patients earlier
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Evaluation
Root Mean Squared Logarithmic Error (RMSLE)
ε =
√√√√1
n
n∑i
[log(pi + 1) − log(ai + 1)]2
Threshold: ε ≤ .4
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Our Approach
I Motivation
I Individual Models
I Optimized Ensemble
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Blending
Blend several predictors to create a more accurate predictorMotivated by solution to the Netflix Prize [3]
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Prediction Models
I Preprocessing: Feature Selection
I K-Nearest Neighbors
I Random Forests
I Gradient Boosting Machines
I Logistic Regression
I Support Vector Regression
I Neural Networks
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Feature Selection
I Used Market Makers method [2]
I Reduced each patient to vector of 139 features
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Kitchen Sink
I Generate large number of featuresI Need method for eliminating useless features
I Compare distributions of feature in test and training setsI Basis pursuit
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Basis Pursuit
I Find best generated features (reduces error the most)
I Add to feature set
I Repeat until enough useful features are found
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
K-Nearest Neighbors
I Weighted average of closest neighbors
I Very slow
Neighbors: k = 1000RMSLE: 0.475197
(996th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Decision Trees
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Random Forests
RMSLE: 0.464918(469th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Gradient Boosting Machines
Trees = 8000Shrinkage = 0.002
Depth = 7Minimum Observations = 100
RMSLE: 0.462998(325th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Logistic Regression
I Optimized with gradient descent
RMSLE: 0.466726(580th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Support Vector Regression
ε = .02RMSLE: 0.467152
(629th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Neural Networks
Number of hidden neurons = 7Number of cycles = 3000
RMSLE: 0.465705(511th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Individual Predictors (Summary)
I K-Nearest Neighbors 0.475197 (996th place)
I Support Vector Regression 0.467152 (629th place)
I Logistic Regression 0.466726 (580th place)
I Neural Networks 0.465705 (511th place)
I Random Forests 0.464918 (469th place)
I Gradient Boosting Machines 0.462998 (325th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
The Blending Equation
X as a combination of predictors
X = Xw
Minimize cost function
C =1
n
N∑i=1
(Yi − Xi )2
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
The Blending Equation
Optimizing predictors’ weights
wc = (Y TX )(XTX )−1
Y TX =∑i
X 2ic +
n∑i=1
(Xi − Yi )2 −
n∑i=1
Y 2i
Y TX =∑i
X 2ic + nε20 − nε2c
Y = Actual values (Unknown)X = Our predictions (Known)
ε = Feedback (Known)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Blending Results
RMSLE: 0.46143288th place on the final milestone leaderboard
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Results(Summary)
I K-Nearest Neighbors 0.475197 (996th place)
I Support Vector Regression 0.467152 (629th place)
I Logistic Regression 0.466726 (580th place)
I Neural Networks 0.465705 (511th place)
I Random Forests 0.464918 (469th place)
I Gradient Boosting Machines 0.462998 (325th place)
I Blending 0.461432 (88th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Observations and Problems
I Fewer repeated classifiers worked better
I Overfitting based on feedback
I Test and Training data were inconsistent
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Future Work
I Optimizing Blending Equation with Regularization Constant
wc = (Y TX )(XTX + λI )−1
I More predictors
I Adjust for changes over time
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Questions
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
References
Heritage provider network health prize, 2012.http://www.heritagehealthprize.com/c/hhp.
David Vogel Phil Brierley and Randy Axelrod.Market makers - milestone 1 description.September 2011.
Andreas Toscher and Michael Jahrer.The bigchaos solution to the netflix grand prize.September 2009.
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning