minghui conference cross-validation talk

41
. . . . . . Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results . . Challenges with the Use of Cross-validation for Comparing Structured Models Wei Wang joint work with Andrew Gelman Department of Statistics, Columbia University April 13, 2013

Upload: wei-wang

Post on 04-Jul-2015

161 views

Category:

Business


3 download

TRANSCRIPT

Page 1: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

.

......

Challenges with the Use of Cross-validation forComparing Structured Models

Wei Wangjoint work with Andrew Gelman

Department of Statistics, Columbia University

April 13, 2013

Page 2: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Overview

...1 Multilevel Models

...2 Decision-eoretic Model Assessment Framework

...3 Data and Model

...4 Results

Page 3: Minghui Conference Cross-Validation Talk

Overview

...1 Multilevel Models

...2 Decision-eoretic Model Assessment Framework

...3 Data and Model

...4 Results

. . . . . .

Page 4: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Bayesian Interpretation of Multilevel Models

Multilevel Models have long been proposed to handle data withgroup structures, e.g., longitudinal study with multiple obs. foreach participant, national survey with various demographic andgeographic variables.

From a Bayesian point of view, what Multilevel Modeling does isto partially pool the estimates through a prior, as opposed todoing separate analysis for each group (no pooling) or analyzingthe data as if there is no group structure (complete pooling).

Page 5: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Bayesian Interpretation of Multilevel Models

Multilevel Models have long been proposed to handle data withgroup structures, e.g., longitudinal study with multiple obs. foreach participant, national survey with various demographic andgeographic variables.

From a Bayesian point of view, what Multilevel Modeling does isto partially pool the estimates through a prior, as opposed todoing separate analysis for each group (no pooling) or analyzingthe data as if there is no group structure (complete pooling).

Page 6: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Multilevel Models for Deeply Nested Data Structure

Our substantive interest is survey data with deeply nestedstructures resulting from various categoricaldemographic-geographic variables, e.g., state, income, education,ethnicity et al.

One typical conundrum is how many interactions between thosedemographic-geographic variables to include in the model.

Page 7: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Multilevel Models for Deeply Nested Data Structure

Our substantive interest is survey data with deeply nestedstructures resulting from various categoricaldemographic-geographic variables, e.g., state, income, education,ethnicity et al.

One typical conundrum is how many interactions between thosedemographic-geographic variables to include in the model.

Page 8: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

ree Prototypes of Models

In the simple case of two predictors, the three prototypes of models areshown below. e response yi is binary.

Complete Pooling model

Eyij ∼ g−1(µij)

µij = µ0 + ai + bjNo Pooling model

Eyij ∼ g−1(µij)

µij = µ0 + ai + bj + rijPartial Pooling model

Eyij ∼ g−1(µij)

µij = µ0 + ai + bj + γij

γ ∼ Φ(·)

Page 9: Minghui Conference Cross-Validation Talk

Overview

...1 Multilevel Models

...2 Decision-eoretic Model Assessment Framework

...3 Data and Model

...4 Results

. . . . . .

Page 10: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

True model, Pseudo-true model and Actual Belief model

We assume there is a true underlying model pt(·), from which theobservations (both available and future observations) come from.

While acknowledging the fact that the true distribution is neveraccessible, some researchers propose basing the discussion on arich enough Actual Belief Model), which supposedly fully re ectsthe uncertainty of future data. (Bernardo and Smith 1994)

Page 11: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

M-closed,M-completed andM-open views

InM-closed view, it is assumed that the true model is includedin a enumerable collection of models, and the Actual BeliefModel is the Bayesian Model Averaging predictive distribution.

InM-completed view, the Actual Belief Model p(y|D,M) isconsidered to be the best available description of the uncertaintyof future data.

InM-open view, the correct speci cation of the Actual BeliefModel is avoided and the strategy is to generate Monte Carlosamples from it, such as sample re-use methods.

Page 12: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

A Decision-eoretical Framework

We de ne a loss function l(y, aM), which is the loss incurredfrom our inferential action aM, based on a modelM, in face offuture observation y.

en the predictive loss from our inferential action aM is

Lp(pt,M,D, l) = Ept(y)l(y, aM) =∫

l(y, aM)pt(y)dy

It is oen convenient and theoretically desirable to use the wholeposterior predictive distribution as aM and the log loss as l(·, ·).

Lpred(pt,M,D)=Ept [− log p(y|D,M)]=−∫pt(y) log p(y|D,M)dy

Page 13: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Decision-eoretic Framework Cont'd

For Model Selection task, from a pool of candidate models{Mk : k ∈ K}, we should select the model that minimizes theexpected predictive loss.

minMk:k∈K

−∫

pt(y) log p(y|D,M)dy

For Model Assessment task of a particular modelM, we look atthe Kullback-Leibler divergence between the true model and theposterior predictive distribution. We call it the predictive error.

Err(pt,M,D) = −∫

pt(y) log p(y|D,M)dy+∫

pt(y) log pt(y)dy

= KL(p(·|D,M); pt(·))

Page 14: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Decision-eoretic Framework Cont'd

For Model Selection task, from a pool of candidate models{Mk : k ∈ K}, we should select the model that minimizes theexpected predictive loss.

minMk:k∈K

−∫

pt(y) log p(y|D,M)dy

For Model Assessment task of a particular modelM, we look atthe Kullback-Leibler divergence between the true model and theposterior predictive distribution. We call it the predictive error.

Err(pt,M,D) = −∫

pt(y) log p(y|D,M)dy+∫

pt(y) log pt(y)dy

= KL(p(·|D,M); pt(·))

Page 15: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimating Expected Predictive Loss

e central obstacle of getting the Expected Predicitve Loss isthat we don't know the true distribution pt(·).

AM-closed orM-completed view will substitute the truedistribution with a reference distribution.From aM-open view, plug in available sample gives us theTraining Loss, which has a downward bias, since we used thesample twice.

Ltraining(M,D) = −1

n

n∑i=1

log p(yi|D,M)

ere exist two approaches to get an unbiased estimate ofPredictive Loss: Bias Correction which leads to variousInformation Criteria; Held-out Practices which lead toLeave-one-out Cross Validation and k-fold Cross Validation.

Page 16: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimating Expected Predictive Loss

e central obstacle of getting the Expected Predicitve Loss isthat we don't know the true distribution pt(·).AM-closed orM-completed view will substitute the truedistribution with a reference distribution.

From aM-open view, plug in available sample gives us theTraining Loss, which has a downward bias, since we used thesample twice.

Ltraining(M,D) = −1

n

n∑i=1

log p(yi|D,M)

ere exist two approaches to get an unbiased estimate ofPredictive Loss: Bias Correction which leads to variousInformation Criteria; Held-out Practices which lead toLeave-one-out Cross Validation and k-fold Cross Validation.

Page 17: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimating Expected Predictive Loss

e central obstacle of getting the Expected Predicitve Loss isthat we don't know the true distribution pt(·).AM-closed orM-completed view will substitute the truedistribution with a reference distribution.From aM-open view, plug in available sample gives us theTraining Loss, which has a downward bias, since we used thesample twice.

Ltraining(M,D) = −1

n

n∑i=1

log p(yi|D,M)

ere exist two approaches to get an unbiased estimate ofPredictive Loss: Bias Correction which leads to variousInformation Criteria; Held-out Practices which lead toLeave-one-out Cross Validation and k-fold Cross Validation.

Page 18: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimating Expected Predictive Loss

e central obstacle of getting the Expected Predicitve Loss isthat we don't know the true distribution pt(·).AM-closed orM-completed view will substitute the truedistribution with a reference distribution.From aM-open view, plug in available sample gives us theTraining Loss, which has a downward bias, since we used thesample twice.

Ltraining(M,D) = −1

n

n∑i=1

log p(yi|D,M)

ere exist two approaches to get an unbiased estimate ofPredictive Loss: Bias Correction which leads to variousInformation Criteria; Held-out Practices which lead toLeave-one-out Cross Validation and k-fold Cross Validation.

Page 19: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimation Methods

ere is a long list of variants of Information Criteria,AIC/BIC/DIC/TIC/NIC/WAIC et al.

LOO Cross Validation has been shown to be asymptoticallyequivalent to AIC/WAIC. But the computational burden is huge.e Importance Sampling method introduces new problem ofthe reliability of the importance weights.

We are using the computationally convenient k-fold crossvalidation, in which the data set is randomly partitioned into kparts, and in each fold, one part is used as the testing set whilethe rest serve as the training set.

Page 20: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimation Methods

ere is a long list of variants of Information Criteria,AIC/BIC/DIC/TIC/NIC/WAIC et al.

LOO Cross Validation has been shown to be asymptoticallyequivalent to AIC/WAIC. But the computational burden is huge.e Importance Sampling method introduces new problem ofthe reliability of the importance weights.

We are using the computationally convenient k-fold crossvalidation, in which the data set is randomly partitioned into kparts, and in each fold, one part is used as the testing set whilethe rest serve as the training set.

Page 21: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Estimation Methods

ere is a long list of variants of Information Criteria,AIC/BIC/DIC/TIC/NIC/WAIC et al.

LOO Cross Validation has been shown to be asymptoticallyequivalent to AIC/WAIC. But the computational burden is huge.e Importance Sampling method introduces new problem ofthe reliability of the importance weights.

We are using the computationally convenient k-fold crossvalidation, in which the data set is randomly partitioned into kparts, and in each fold, one part is used as the testing set whilethe rest serve as the training set.

Page 22: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

k-fold Cross Validation

en the k-fold Cross Validation estimate of the Predictive Lossis given by

LCV(M,D) = −K∑

k=1

∑i∈testk

log p(yi|Dk,M) = −

N∑i=1

log p(yi|D(\i),M)

To estimate the Predictive Error, we still need an estimate of theEntropy of the true distribution. We can use the training loss ofthe saturated model as a surrogate.

−∫

pt(y) log pt(y)dy = −1

n

n∑i=1

log p(yi|D,Msaturated)

Page 23: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

k-fold Cross Validation

en the k-fold Cross Validation estimate of the Predictive Lossis given by

LCV(M,D) = −K∑

k=1

∑i∈testk

log p(yi|Dk,M) = −

N∑i=1

log p(yi|D(\i),M)

To estimate the Predictive Error, we still need an estimate of theEntropy of the true distribution. We can use the training loss ofthe saturated model as a surrogate.

−∫

pt(y) log pt(y)dy = −1

n

n∑i=1

log p(yi|D,Msaturated)

Page 24: Minghui Conference Cross-Validation Talk

Overview

...1 Multilevel Models

...2 Decision-eoretic Model Assessment Framework

...3 Data and Model

...4 Results

. . . . . .

Page 25: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Data Set

Cooperative Congressional Election Survey 2006

N=30,000

71 social and political response outcomes

Deeply nested demographic variables, e.g., state, inc, edu, ethn,gender et al.

Page 26: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Data Set Cont'd

Figure: A sample of the questions in CCES 2006 survey.

Page 27: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Model Setup

For demonstration, we only consider two demographic variables,state and income, together with their interaction. e responsesare all yes-no binary outcomes.

Complete Pooling

πj1j2 = logit−1(βsttj1+ βinc

j2

)No Pooling

πj1j2 = logit−1(βsttj1+ βinc

j2+ βstt*inc

j1j2

)Partial Pooling

πj1j2 = logit−1(βsttj1+ βinc

j2+ βstt*inc

j1j2

)βstt*incj1j2

∼ Φ(·)

Page 28: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

k-fold Cross Validation Estimate

Due to computational constraints, we are using Maximum APosteriori plug-in estimate instead of full Bayesian estimate.p(y|D,M) ≈ p(y|πij(D),M)

en under the aforementioned setup, the Cross Validationestimate of the Predictive Loss is

LCV(M,D)=− 1N∑K

k=1

∑l∈testk

log p(yl|Dk,M)

=− 1N∑K

k=1

∑i,j[y

testkij log πij(Dtraink )+(ntestkij −ytestkij ) log(1−πij(Dtraink ))]

=− 1N∑

i,j∑K

k=1[log πij(Dtraink )ytestkij +log(1−πij(Dtraink ))(ntestkij −ytestkij )]

=− 1N∑

i,j

[log πij(Dtrain)yij+log(1−πij(Dtrain))(nij−yij)

]=−

∑i,j

nijN

[log πij(Dtrain)πij+log(1−πij(Dtrain))(1−πij)

]

Page 29: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

k-fold Cross Validation Estimate

Due to computational constraints, we are using Maximum APosteriori plug-in estimate instead of full Bayesian estimate.p(y|D,M) ≈ p(y|πij(D),M)

en under the aforementioned setup, the Cross Validationestimate of the Predictive Loss is

LCV(M,D)=− 1N∑K

k=1

∑l∈testk

log p(yl|Dk,M)

=− 1N∑K

k=1

∑i,j[y

testkij log πij(Dtraink )+(ntestkij −ytestkij ) log(1−πij(Dtraink ))]

=− 1N∑

i,j∑K

k=1[log πij(Dtraink )ytestkij +log(1−πij(Dtraink ))(ntestkij −ytestkij )]

=− 1N∑

i,j

[log πij(Dtrain)yij+log(1−πij(Dtrain))(nij−yij)

]=−

∑i,j

nijN

[log πij(Dtrain)πij+log(1−πij(Dtrain))(1−πij)

]

Page 30: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Calibration of Improvement

Let's suppose we only have one cell, with true proportion .4, andthe good model gives a posterior estimate of log proportion atroughly log(0.41), and the lesser model gives a estimate oflog(0.44) or log(0.38).en the Predictive Loss under the good model is−[.4 ∗ log(.41) + .6 ∗ log(.59)] = 0.67322, and under the twolesser models is−[.4 ∗ log(.44) + .6 ∗ log(.56)] = 0.67386 and−[.4 ∗ log(.38) + .6 ∗ log(.62)] = 0.67628. We can see theimprovement of the Predictive Loss is between 0.0006 to 0.003.Also, the lower bound is given by−[.4 ∗ log(.4) + .6 ∗ log(.6)] = 0.67301, so the Predictive Errorof the good model is about 0.0002.

Page 31: Minghui Conference Cross-Validation Talk

Overview

...1 Multilevel Models

...2 Decision-eoretic Model Assessment Framework

...3 Data and Model

...4 Results

. . . . . .

Page 32: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Cross Validation Results on All Outcomes

Responses (ordered by the lower bound)

Est

imat

ed P

redi

ctiv

e E

rror

0.01

0.02

0.03

0.04

0.05

10 20 30 40 50 60 70

models

complete pooling

partial pooling

no pooling

Figure: Measure of t (Estimated Predictive Error) for all response outcomesin CCES 2006 survey data. Responses are ordered by the lower bound(training loss of the saturated model). No Pooling model gives very bad t,while Predictive Error of Partial Pooling is dominated by Complete Pooling,but the differences seem small.

Page 33: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Compare Partial Pooling and Complete Pooling

In the previous gure, apparently No Pooling is doing very badly,but the differences between Partial Pooling and CompletePooling seem small. We need to further calibrate them.

e summary of the differences between Partial Pooling andComplete Pooling for all the outcomes is

Min. 1st Qu. Median Mean 3rd Qu. Max.-0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770

We can see that the improvement in terms of the Predictive Lossindeed corresponds to some meaningful improvement inprediction accuracy.

Page 34: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Compare Partial Pooling and Complete Pooling

In the previous gure, apparently No Pooling is doing very badly,but the differences between Partial Pooling and CompletePooling seem small. We need to further calibrate them.

e summary of the differences between Partial Pooling andComplete Pooling for all the outcomes is

Min. 1st Qu. Median Mean 3rd Qu. Max.-0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770

We can see that the improvement in terms of the Predictive Lossindeed corresponds to some meaningful improvement inprediction accuracy.

Page 35: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Compare Partial Pooling and Complete Pooling

In the previous gure, apparently No Pooling is doing very badly,but the differences between Partial Pooling and CompletePooling seem small. We need to further calibrate them.

e summary of the differences between Partial Pooling andComplete Pooling for all the outcomes is

Min. 1st Qu. Median Mean 3rd Qu. Max.-0.0003405 0.0001821 0.0003827 0.0006041 0.0005630 0.0053770

We can see that the improvement in terms of the Predictive Lossindeed corresponds to some meaningful improvement inprediction accuracy.

Page 36: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Simulations Based on Real Data

We want to explore how the structure of the multilevel modelsaffects the dynamics of the performance of different models.Speci cally, we are interested in total sample size and howbalanced the cells are in terms of cell size.

We generated simulated data sets based on the real data set, i.e.,we use the estimated from the Multilevel model t of the real datasets and enlarge the total sample size by 2, 3 and 4 times, eitherkeeping the original relative proportions (highly unequal) ofdifferent cells or making the proportions roughly equal.

Page 37: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Simulations Based on Real Data

We want to explore how the structure of the multilevel modelsaffects the dynamics of the performance of different models.Speci cally, we are interested in total sample size and howbalanced the cells are in terms of cell size.

We generated simulated data sets based on the real data set, i.e.,we use the estimated from the Multilevel model t of the real datasets and enlarge the total sample size by 2, 3 and 4 times, eitherkeeping the original relative proportions (highly unequal) ofdifferent cells or making the proportions roughly equal.

Page 38: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Simulation Results: Total Sample Size

Responses (ordered by the lower bound)

Estim

ated P

redict

ive Er

ror

0.002

0.003

0.004

0.005

10 20 30 40 50 60 70

models

complete pooling

partial pooling

no pooling

Responses (ordered by the lower bound)

Estim

ated P

redict

ive Er

ror

0.0020

0.0025

0.0030

0.0035

0.0040

0.0045

10 20 30 40 50 60 70

models

complete pooling

partial pooling

no pooling

Responses (ordered by the lower bound)

Estim

ated P

redict

ive Er

ror

0.002

0.003

0.004

0.005

0.006

10 20 30 40 50 60 70

models

complete pooling

partial pooling

no pooling

Figure: Estimated Predictive Error of all response outcomes for``augmented'' data sets.

Page 39: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Simulation Results: Total Sample Size on House Rep Vote

sample size

Est

imat

ed P

redi

ctiv

e E

rror

0.002

0.004

0.006

0.008

0.010

0.012

0.014

50000 100000 150000 200000

models

complete pooling

partial pooling

no pooling

Figure: Predictive Error of the three models as sample size grows. eoutcome under consideration is the Republican vote in the House election.

Page 40: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Simulation Results: Balancedness of the Structure

Responses (ordered by the lower bound)

Est

imat

ed P

redi

ctiv

e E

rror

0.010

0.015

0.020

0.025

0.030

10 20 30 40 50 60 70

models

complete pooling

partial pooling

no pooling

Figure: Measure of t (Predictive Error) for all responses, ordered by lowerbound. e data set is simulated from real data set, and has the same samplesize in total as the real data set, but keeping all demographic-geographic cellsbalanced.

Page 41: Minghui Conference Cross-Validation Talk

. . . . . .

Multilevel Models Decision-eoretic Model Assessment Framework Data and Model Results

Conclusions

Cross-validation is not a very sensitive instrument in comparingmultilevel models.

Careful calibrations are needed for better understanding of theresults.

We also explored how different aspects of the data set structureaffect the margin of improvement.