ensemble learning and the heritage health prize...ensemble learning and the heritage health prize...
TRANSCRIPT
Ensemble Learning and the Heritage Health Prize
Jonathan Stroud, Igii Enverga, Tiffany Silverstein,Brian Song, and Taylor Rogers
iCAMP 2012University of California, Irvine
Advisors: Max Welling, Alexander Ihler, Sungjin Ahn, and Qiang Liu
August 14, 2012
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
The Heritage Health Prize
I Goal: Identify patients who will be admitted to a hospitalwithin the next year, using historical claims data.[1]
I 1,250 teams
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Purpose
I Reduce cost of unnecessary hospital admissions per yearI Identify at-risk patients earlier
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Kaggle
I Public online competitions
I Gives feedback on prediction models
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Data
I Provided through Kaggle
I Three years of patient data
I Two years include days spent in hospital (training set)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Evaluation
Root Mean Squared Logarithmic Error (RMSLE)
ε =
√√√√1
n
n∑i
[log(pi + 1) − log(ai + 1)]2
Threshold: ε ≤ .4
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
The Netflix Prize
I $1 Million prize
I Leading teams combined predictors to pass threshold
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Blending
Blend several predictors to create a more accurate predictor
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Prediction Models
I Optimized Constant Value
I K-Nearest Neighbors
I Logistic Regression
I Support Vector Regression
I Random Forests
I Gradient Boosting Machines
I Neural Networks
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Feature Selection
I Used Market Makers method [2]
I Reduced each patient to vector of 139 features
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Optimized Constant Value
I Predicts same number of days for each patient
I Best constant prediction is p = 0.209179
RMSLE: 0.486459(800th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
K-Nearest Neighbors
I Weighted average of closest neighbors
I Very slow
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Eigenvalue Decomposition
Reduces number of features for each patient
Xk = λ−1/2k UT
k Xc
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
K-Nearest Neighbors Results
Neighbors: k = 1000RMSLE: 0.475197
(600th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Logistic Regression
RMSLE: 0.466726(375th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Support Vector Regression
ε = .02RMSLE: 0.467152
(400th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Decision Trees
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Random Forests
RMSLE: 0.464918(315th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Gradient Boosting Machines
Trees = 8000Shrinkage = 0.002
Depth = 7Minimum Observations = 100
RMSLE: 0.462998(200th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Artificial Neural Networks
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Back Propagation in Neural Networking
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Neural Networking Results
Number of hidden neurons = 7Number of cycles = 3000
RMSLE: 0.465705(340th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Individual Predictors (Summary)
I Optimized Constant Value 0.486459 (800th place)
I K-Nearest Neighbors 0.475197 (600th place)
I Logistic Regression 0.466726 (375th place)
I Support Vector Regression 0.467152 (400th place)
I Random Forests 0.464918 (315th place)
I Gradient Boosting Machines 0.462998 (200th place)
I Neural Networks 0.465705 (340th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Individual Predictors (Summary)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm
Error (RMSE)
ε =
√√√√1
n
n∑i=1
(Xi − Yi )2
nε2c =n∑
i=1
(Xi − Yi )2
nε20 =n∑
i=1
Y 2i
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
X as a combination of predictors
X̃ = Xw
or
X̃i =∑c
wcXic
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
Minimizing the cost function
C =1
n
N∑i=1
(Yi − X̃i )2
∂C
∂w=
∑i
(Yi −∑c
wcXic)(−Xic) = 0
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
Minimizing the cost function (continued)∑i
YiXic =∑i
∑c
wcXicXic
Y TX = wTc XT
c X
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
Optimizing predictors’ weights
wc = (Y TX )(XTX )−1∑i
YiXic =∑i
X 2ic +
∑i
Y 2ic −
∑i
(Yi − Xic)2
∑i
YiXic =∑i
X 2ic + nε20 − nε2c
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
Error (RMSE)
ε =
√√√√1
n
n∑i=1
(Xi − Yi )2
nε2c =n∑
i=1
(Xi − Yi )2
nε20 =n∑
i=1
Y 2i
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
Optimizing predictors’ weights
wc = (Y TX )(XTX )−1∑i
YiXic =∑i
X 2ic +
∑i
Y 2ic −
∑i
(Yi − Xic)2
∑i
YiXic =∑i
X 2ic + nε20 − nε2c
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Deriving the Blending Algorithm (Continued)
X as a combination of predictors
X̃ = Xw
or
X̃i =∑c
wcXic
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Blending Algorithm (Summary)
1. Submit and record all predictions X and errors ε
2. Calculate M = (XTX )−1 and
vc = (XTY )c =1
2
∑i (X
2ic + nε20 − nε2c)
3. Because wc = (Y TX )(XTX )−1, calculate weights w = Mv
4. Final blended prediction is X̃i = Xw
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Blending Results
RMSLE: 0.461432(98th place)
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Future Work
I Optimizing Blending Equation with Regularization Constant
wc = (Y TX )(XTX + λI )−1
I Improved feature selection
I More predictors
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
Questions
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning
References
Heritage provider network health prize, 2012.http://www.heritagehealthprize.com/c/hhp.
David Vogel Phil Brierley and Randy Axelrod.Market makers - milestone 1 description.September 2011.
Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning