ensemble learning and the heritage health prize...ensemble learning and the heritage health prize...

Post on 02-Aug-2020

2 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ensemble Learning and the Heritage Health Prize

Jonathan Stroud, Igii Enverga, Tiffany Silverstein,Brian Song, and Taylor Rogers

iCAMP 2012University of California, Irvine

Advisors: Max Welling, Alexander Ihler, Sungjin Ahn, and Qiang Liu

August 14, 2012

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

The Heritage Health Prize

I Goal: Identify patients who will be admitted to a hospitalwithin the next year, using historical claims data.[1]

I 1,250 teams

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Purpose

I Reduce cost of unnecessary hospital admissions per yearI Identify at-risk patients earlier

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Kaggle

I Public online competitions

I Gives feedback on prediction models

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Data

I Provided through Kaggle

I Three years of patient data

I Two years include days spent in hospital (training set)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Evaluation

Root Mean Squared Logarithmic Error (RMSLE)

ε =

√√√√1

n

n∑i

[log(pi + 1) − log(ai + 1)]2

Threshold: ε ≤ .4

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

The Netflix Prize

I $1 Million prize

I Leading teams combined predictors to pass threshold

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Blending

Blend several predictors to create a more accurate predictor

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Prediction Models

I Optimized Constant Value

I K-Nearest Neighbors

I Logistic Regression

I Support Vector Regression

I Random Forests

I Gradient Boosting Machines

I Neural Networks

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Feature Selection

I Used Market Makers method [2]

I Reduced each patient to vector of 139 features

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Optimized Constant Value

I Predicts same number of days for each patient

I Best constant prediction is p = 0.209179

RMSLE: 0.486459(800th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

K-Nearest Neighbors

I Weighted average of closest neighbors

I Very slow

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Eigenvalue Decomposition

Reduces number of features for each patient

Xk = λ−1/2k UT

k Xc

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

K-Nearest Neighbors Results

Neighbors: k = 1000RMSLE: 0.475197

(600th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Logistic Regression

RMSLE: 0.466726(375th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Support Vector Regression

ε = .02RMSLE: 0.467152

(400th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Decision Trees

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Random Forests

RMSLE: 0.464918(315th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Gradient Boosting Machines

Trees = 8000Shrinkage = 0.002

Depth = 7Minimum Observations = 100

RMSLE: 0.462998(200th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Artificial Neural Networks

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Back Propagation in Neural Networking

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Neural Networking Results

Number of hidden neurons = 7Number of cycles = 3000

RMSLE: 0.465705(340th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Individual Predictors (Summary)

I Optimized Constant Value 0.486459 (800th place)

I K-Nearest Neighbors 0.475197 (600th place)

I Logistic Regression 0.466726 (375th place)

I Support Vector Regression 0.467152 (400th place)

I Random Forests 0.464918 (315th place)

I Gradient Boosting Machines 0.462998 (200th place)

I Neural Networks 0.465705 (340th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Individual Predictors (Summary)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm

Error (RMSE)

ε =

√√√√1

n

n∑i=1

(Xi − Yi )2

nε2c =n∑

i=1

(Xi − Yi )2

nε20 =n∑

i=1

Y 2i

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

X as a combination of predictors

X̃ = Xw

or

X̃i =∑c

wcXic

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

Minimizing the cost function

C =1

n

N∑i=1

(Yi − X̃i )2

∂C

∂w=

∑i

(Yi −∑c

wcXic)(−Xic) = 0

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

Minimizing the cost function (continued)∑i

YiXic =∑i

∑c

wcXicXic

Y TX = wTc XT

c X

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

Optimizing predictors’ weights

wc = (Y TX )(XTX )−1∑i

YiXic =∑i

X 2ic +

∑i

Y 2ic −

∑i

(Yi − Xic)2

∑i

YiXic =∑i

X 2ic + nε20 − nε2c

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

Error (RMSE)

ε =

√√√√1

n

n∑i=1

(Xi − Yi )2

nε2c =n∑

i=1

(Xi − Yi )2

nε20 =n∑

i=1

Y 2i

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

Optimizing predictors’ weights

wc = (Y TX )(XTX )−1∑i

YiXic =∑i

X 2ic +

∑i

Y 2ic −

∑i

(Yi − Xic)2

∑i

YiXic =∑i

X 2ic + nε20 − nε2c

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Deriving the Blending Algorithm (Continued)

X as a combination of predictors

X̃ = Xw

or

X̃i =∑c

wcXic

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Blending Algorithm (Summary)

1. Submit and record all predictions X and errors ε

2. Calculate M = (XTX )−1 and

vc = (XTY )c =1

2

∑i (X

2ic + nε20 − nε2c)

3. Because wc = (Y TX )(XTX )−1, calculate weights w = Mv

4. Final blended prediction is X̃i = Xw

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Blending Results

RMSLE: 0.461432(98th place)

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Future Work

I Optimizing Blending Equation with Regularization Constant

wc = (Y TX )(XTX + λI )−1

I Improved feature selection

I More predictors

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

Questions

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

References

Heritage provider network health prize, 2012.http://www.heritagehealthprize.com/c/hhp.

David Vogel Phil Brierley and Randy Axelrod.Market makers - milestone 1 description.September 2011.

Stroud, Enverga, Silverstein, Song, and Rogers Ensemble Learning

top related