ensemble learning and the heritage health prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4....

27
Ensemble Learning and the Heritage Health Prize Jonathan Stroud, Igii Enverga, Tiffany Silverstein, Brian Song, and Taylor Rogers iCAMP 2012 University of California, Irvine Advisors: Dr. Max Welling, Dr. Alexander Ihler, Sungjin Ahn, and Qiang Liu December 4, 2013 Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Upload: others

Post on 29-Aug-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Ensemble Learning and the Heritage Health Prize

Jonathan Stroud, Igii Enverga, Tiffany Silverstein,Brian Song, and Taylor Rogers

iCAMP 2012University of California, Irvine

Advisors: Dr. Max Welling, Dr. Alexander Ihler, Sungjin Ahn, and Qiang Liu

December 4, 2013

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 2: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Outline

I Heritage Health PrizeI DataI Evaluation

I Our ApproachI MotivationI ModelsI Blending

I Results

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 3: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

The Heritage Health Prize

I Goal: Identify patients who will be admitted to a hospitalwithin the next year, using historical claims data.[1]

I 1,659 teams

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 4: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Purpose

I Reduce cost of unnecessary hospital admissions per yearI Identify at-risk patients earlier

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 5: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Evaluation

Root Mean Squared Logarithmic Error (RMSLE)

ε =

√√√√1

n

n∑i

[log(pi + 1) − log(ai + 1)]2

Threshold: ε ≤ .4

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 6: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Our Approach

I Motivation

I Individual Models

I Optimized Ensemble

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 7: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Blending

Blend several predictors to create a more accurate predictorMotivated by solution to the Netflix Prize [3]

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 8: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Prediction Models

I Preprocessing: Feature Selection

I K-Nearest Neighbors

I Random Forests

I Gradient Boosting Machines

I Logistic Regression

I Support Vector Regression

I Neural Networks

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 9: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Feature Selection

I Used Market Makers method [2]

I Reduced each patient to vector of 139 features

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 10: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Kitchen Sink

I Generate large number of featuresI Need method for eliminating useless features

I Compare distributions of feature in test and training setsI Basis pursuit

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 11: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Basis Pursuit

I Find best generated features (reduces error the most)

I Add to feature set

I Repeat until enough useful features are found

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 12: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

K-Nearest Neighbors

I Weighted average of closest neighbors

I Very slow

Neighbors: k = 1000RMSLE: 0.475197

(996th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 13: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Decision Trees

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 14: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Random Forests

RMSLE: 0.464918(469th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 15: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Gradient Boosting Machines

Trees = 8000Shrinkage = 0.002

Depth = 7Minimum Observations = 100

RMSLE: 0.462998(325th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 16: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Logistic Regression

I Optimized with gradient descent

RMSLE: 0.466726(580th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 17: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Support Vector Regression

ε = .02RMSLE: 0.467152

(629th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 18: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Neural Networks

Number of hidden neurons = 7Number of cycles = 3000

RMSLE: 0.465705(511th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 19: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Individual Predictors (Summary)

I K-Nearest Neighbors 0.475197 (996th place)

I Support Vector Regression 0.467152 (629th place)

I Logistic Regression 0.466726 (580th place)

I Neural Networks 0.465705 (511th place)

I Random Forests 0.464918 (469th place)

I Gradient Boosting Machines 0.462998 (325th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 20: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

The Blending Equation

X as a combination of predictors

X = Xw

Minimize cost function

C =1

n

N∑i=1

(Yi − Xi )2

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 21: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

The Blending Equation

Optimizing predictors’ weights

wc = (Y TX )(XTX )−1

Y TX =∑i

X 2ic +

n∑i=1

(Xi − Yi )2 −

n∑i=1

Y 2i

Y TX =∑i

X 2ic + nε20 − nε2c

Y = Actual values (Unknown)X = Our predictions (Known)

ε = Feedback (Known)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 22: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Blending Results

RMSLE: 0.46143288th place on the final milestone leaderboard

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 23: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Results(Summary)

I K-Nearest Neighbors 0.475197 (996th place)

I Support Vector Regression 0.467152 (629th place)

I Logistic Regression 0.466726 (580th place)

I Neural Networks 0.465705 (511th place)

I Random Forests 0.464918 (469th place)

I Gradient Boosting Machines 0.462998 (325th place)

I Blending 0.461432 (88th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 24: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Observations and Problems

I Fewer repeated classifiers worked better

I Overfitting based on feedback

I Test and Training data were inconsistent

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 25: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Future Work

I Optimizing Blending Equation with Regularization Constant

wc = (Y TX )(XTX + λI )−1

I More predictors

I Adjust for changes over time

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 26: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

Questions

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Page 27: Ensemble Learning and the Heritage Health Prizerickl/courses/ics-h197/2013-fq-h197/... · 2016. 4. 3. · RMSLE: 0.462998 (325th place) Stroud, Enverga, Silverstein, Song, and Rogers

References

Heritage provider network health prize, 2012.http://www.heritagehealthprize.com/c/hhp.

David Vogel Phil Brierley and Randy Axelrod.Market makers - milestone 1 description.September 2011.

Andreas Toscher and Michael Jahrer.The bigchaos solution to the netflix grand prize.September 2009.

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning