ensemble learning and the heritage health prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4....

Ensemble Learning and the Heritage Health Prize

Jonathan Stroud, Igii Enverga, Tiffany Silverstein,Brian Song, and Taylor Rogers

iCAMP 2012University of California, Irvine

Advisors: Dr. Max Welling, Dr. Alexander Ihler, Sungjin Ahn, and Qiang Liu

December 4, 2013

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Outline

I Heritage Health PrizeI DataI Evaluation

I Our ApproachI MotivationI ModelsI Blending

I Results


The Heritage Health Prize

I Goal: Identify patients who will be admitted to a hospitalwithin the next year, using historical claims data.[1]

I 1,659 teams


Purpose

I Reduce cost of unnecessary hospital admissions per yearI Identify at-risk patients earlier


Evaluation

Root Mean Squared Logarithmic Error (RMSLE)

ε =

√√√√1

n

n∑i

[log(pi + 1) − log(ai + 1)]2

Threshold: ε ≤ .4


Our Approach

I Motivation

I Individual Models

I Optimized Ensemble


Blending

Blend several predictors to create a more accurate predictorMotivated by solution to the Netflix Prize [3]


Prediction Models

I Preprocessing: Feature Selection

I K-Nearest Neighbors

I Random Forests

I Gradient Boosting Machines

I Logistic Regression

I Support Vector Regression

I Neural Networks


Feature Selection

I Used Market Makers method [2]

I Reduced each patient to vector of 139 features


Kitchen Sink

I Generate large number of featuresI Need method for eliminating useless features

I Compare distributions of feature in test and training setsI Basis pursuit


Basis Pursuit

I Find best generated features (reduces error the most)

I Add to feature set

I Repeat until enough useful features are found


K-Nearest Neighbors

I Weighted average of closest neighbors

I Very slow

Neighbors: k = 1000RMSLE: 0.475197

(996th place)


Decision Trees


Random Forests

RMSLE: 0.464918(469th place)


Gradient Boosting Machines

Trees = 8000Shrinkage = 0.002

Depth = 7Minimum Observations = 100



Logistic Regression

I Optimized with gradient descent



Support Vector Regression

ε = .02RMSLE: 0.467152

(629th place)


Neural Networks

Number of hidden neurons = 7Number of cycles = 3000



Individual Predictors (Summary)

I K-Nearest Neighbors 0.475197 (996th place)

I Support Vector Regression 0.467152 (629th place)

I Logistic Regression 0.466726 (580th place)

I Neural Networks 0.465705 (511th place)

I Random Forests 0.464918 (469th place)

I Gradient Boosting Machines 0.462998 (325th place)


The Blending Equation

X as a combination of predictors

X = Xw

Minimize cost function

C =1

n

N∑i=1

(Yi − Xi )2


The Blending Equation

Optimizing predictors’ weights

wc = (Y TX )(XTX )−1

Y TX =∑i

X 2ic +

n∑i=1

(Xi − Yi )2 −

n∑i=1

Y 2i

Y TX =∑i

X 2ic + nε20 − nε2c

Y = Actual values (Unknown)X = Our predictions (Known)

ε = Feedback (Known)


Blending Results

RMSLE: 0.46143288th place on the final milestone leaderboard


Results(Summary)

I K-Nearest Neighbors 0.475197 (996th place)

I Support Vector Regression 0.467152 (629th place)

I Logistic Regression 0.466726 (580th place)

I Neural Networks 0.465705 (511th place)

I Random Forests 0.464918 (469th place)

I Gradient Boosting Machines 0.462998 (325th place)

I Blending 0.461432 (88th place)


Observations and Problems

I Fewer repeated classifiers worked better

I Overfitting based on feedback

I Test and Training data were inconsistent


Future Work

I Optimizing Blending Equation with Regularization Constant

wc = (Y TX )(XTX + λI )−1

I More predictors

I Adjust for changes over time


Questions


References

Heritage provider network health prize, 2012.http://www.heritagehealthprize.com/c/hhp.

David Vogel Phil Brierley and Randy Axelrod.Market makers - milestone 1 description.September 2011.

Andreas Toscher and Michael Jahrer.The bigchaos solution to the netflix grand prize.September 2009.


ensemble learning and the heritage health prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4....

Documents