mcmc estimation for random effect modelling – the mlwin experience dr william j. browne school of...

MCMC Estimation for Random Effect Modelling – The MLwiN

experienceDr William J. Browne

School of Math Sciences

University of Nottingham

Contents

• Random effect modelling and MLwiN.• Methods comparison – Guatemalan child

health example.• Extendibility of MCMC algorithms:• Cross classified and multiple membership

models.• Artificial insemination and Danish chicken

examples.• Further Extensions.

Random effect models

• Models that account for the underlying structure in the dataset.

• Originally developed for nested structures (multilevel models), for example in education, pupils nested within schools.

• An extension of linear modelling with the inclusion of random effects.

A typical 2-level model is

Here i indexes pupils and j indexes schools.

),0(~),,0(~ 22

10

eijuj

ijjijij

NeNu

euxy

MLwiN

• Software package designed specifically for fitting multilevel models.

• Developed by a team led by Harvey Goldstein and Jon Rasbash at the Institute of Education in London over past 15 years or so. Earlier incarnations ML2, ML3, MLN.

• Originally contained ‘classical’ IGLS estimation methods for fitting models.

• MLwiN launched in 1998 also included MCMC estimation.

• My role in team was as developer of MCMC functionality in MLwiN during 4.5 years at the IOE.

Estimation Methods for Multilevel Models

Due to additional random effects no simple matrix formulae exist for finding estimates in multilevel models.

Two alternative approaches exist:1. Iterative algorithms e.g. IGLS, RIGLS, EM in

HLM that alternate between estimating fixed and random effects until convergence. Can produce ML and REML estimates.

2. Simulation-based Bayesian methods e.g. MCMC that attempt to draw samples from the posterior distribution of the model.

Methods for non-normal responses

• When the response variable is Binomial or Poisson then different algorithms are required.

• IGLS/RIGLS methods give quasilikelihood estimates e.g. MQL, PQL.

• MCMC algorithms including Metropolis Hastings sampling and Adaptive Rejection sampling are possible.

• Numerical Quadrature can give ML estimates but is not without problems.

So why use MCMC?

• Often gives better estimates for non-normal responses.

• Gives full posterior distribution so interval estimates for derived quantities are easy to produce.

• Can easily be extended to more complex problems.

• Potential downside 1: Prior distributions required for all unknown parameters.

• Potential downside 2: MCMC estimation is much slower than the IGLS algorithm.

The Guatemalan Child Health dataset.

This consists of a subsample of 2,449 respondents from the 1987 National Survey of Maternal and Child Helath, with a 3-level structure of births within mothers within communities.

The subsample consists of all women from the chosen communities who had some form of prenatal care during pregnancy. The response variable is whether this prenatal care was modern (physician or trained nurse) or not.

Rodriguez and Goldman (1995) use the structure of this dataset to consider how well quasi-likelihood methods compare with considering the dataset without the multilevel structure and fitting a standard logistic regression.

They perform this by constructing simulated datasets based on the original structure but with known true values for the fixed effects and variance parameters.

They consider the MQL method and show that the estimates of the fixed effects produced by MQL are worse than the estimates produced by standard logistic regression disregarding the multilevel structure!

The Guatemalan Child Health dataset.

Goldstein and Rasbash (1996) consider the same problem but use the PQL method. They show that the results produced by PQL 2nd order estimation are far better than for MQL but still biased.

The model in this situation is

In this formulation i,j and k index the level 1, 2 and 3 units respectively.

The variables x1,x2 and x3 are composite scales at each level because the original model contained many covariates at each level.

Browne and Draper (2004) considered the hybrid Metropolis-Gibbs method in MLwiN and two possible variance priors (Gamma-1(ε,ε) and Uniform.

).,0(~ and ),0(~ where

)(logit

with)(Bernouilli~

22

3322110

vkujk

kjkkjkijkijk

ijkijk

NvNu

vuxxxp

py

Simulation Results

The following gives point estimates (MCSE) for 4 methods and 500 simulated datasets.

Parameter (True)

MQL1 PQL2 Gamma Uniform

β0 (0.65) 0.474 (0.01) 0.612 (0.01) 0.638 (0.01) 0.655 (0.01)

β1 (1.00) 0.741 (0.01) 0.945 (0.01) 0.991 (0.01) 1.015 (0.01)

β2 (1.00) 0.753 (0.01) 0.958 (0.01) 1.006 (0.01) 1.031 (0.01)

β3 (1.00) 0.727 (0.01) 0.942 (0.01) 0.982 (0.01) 1.007 (0.01)

σ2v (1.00) 0.550 (0.01) 0.888 (0.01) 1.023 (0.01) 1.108 (0.01)

σ2u (1.00) 0.026 (0.01) 0.568 (0.01) 0.964 (0.02) 1.130 (0.02)

Simulation Results

The following gives interval coverage probabilities (90%/95%) for 4 methods and 500 simulated datasets.

Parameter (True)

MQL1 PQL2 Gamma Uniform

β0 (0.65) 67.6/76.8 86.2/92.0 86.8/93.2 88.6/93.6

β1 (1.00) 56.2/68.6 90.4/96.2 92.8/96.4 92.2/96.4

β2 (1.00) 13.2/17.6 84.6/90.8 88.4/92.6 88.6/92.8

β3 (1.00) 59.0/69.6 85.2/89.8 86.2/92.2 88.6/93.6

σ2v (1.00) 0.6/2.4 70.2/77.6 89.4/94.4 87.8/92.2

σ2u (1.00) 0.0/0.0 21.2/26.8 84.2/88.6 88.0/93.0

Summary of simulations

The Bayesian approach yields excellent bias and coverage results.

For the fixed effects, MQL performs badly but the other 3 methods all do well.

For the random effects, MQL and PQL both perform badly but MCMC with both priors is much better.

Note that this is an extreme scenario with small levels 1 in level 2 yet high level 2 variance and in other examples MQL/PQL will not be so bad.

Extension 1: Cross-classified modelsFor example, schools by neighbourhoods. Schools will draw pupils from many different neighbourhoods and the pupils of a neighbourhood will go to several schools. No pure hierarchy can be found and pupils are said to be contained within a cross-classification of schools by neighbourhoods :

nbhd 1 nbhd 2 Nbhd 3

School 1 xx x

School 2 x x

School 3 xx x

School 4 x xxx

School S1 S2 S3 S4

Pupil P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Nbhd N1 N2 N3

Notation

With hierarchical models we use a subscript notation that has one subscript per level and nesting is implied reading from the left. For example, subscript pattern ijk denotes the i’th level 1 unit within the j’th level 2 unit within the k’th level 3 unit.

If models become cross-classified we use the term classification instead of level. With notation that has one subscript per classification, that captures the relationship between classifications, notation can become very cumbersome. We propose an alternative notation introduced in Browne et al. (2001) that only has a single subscript no matter how many classifications are in the model.

Single subscript notationSchool S1 S2 S3 S4

Pupil P1 P2 P3 P4 P5 P6 P7 P8 P9 P10 P11 P12

Nbhd N1 N2 N3

i nbhd(i) sch(i)1 1 12 2 13 1 14 2 25 1 26 2 27 2 38 3 39 3 410 2 411 3 412 3 4

)1()3()(

)2()(0 iischinbhdi euuy

We write the model as

1)3(

4)2(

3011

1)3(

1)2(

101

euuy

euuy

Where classification 2 is neighbourhood and classification 3 is school. Classification 1 always corresponds to the classification at which the response measurements are made, in this case patients. For pupils 1 and 11 equation (1) becomes:

Classification diagrams

School

Pupil

Neighbourhood

School

Pupil

Neighbourhood

Nested structure where schools are contained within neighbourhoods

Cross-classified structure where pupils from a school come from many neighbourhoods and pupils from a neighbourhood attend several schools.

In the single subscript notation we lose information about the relationship(crossed or nested) between classifications. A useful way of conveying this information is with the classification diagram. Which has one node per classification and nodes linked by arrows have a nested relationship and unlinked nodes have a crossed relationship.

Example : Artificial insemination by donor

Women w1 w2 w3 Cycles c1 c2 c3 c4… c1 c2 c3 c4… c1 c2 c3 c4… Donations d1 d2 d1 d2 d3 d1 d2 Donors m1 m2 m3

1901 women279 donors 1328 donations12100 ovulatory cyclesresponse is whether conception occurs in a given cycle

In terms of a unit diagram:

Donor

Woman

Cycle

Donation

Or a classification diagram:

Model for artificial insemination data

),0(~

),0(~

),0(~

)()logit(

),1(~

2)4(

)4()(

2)3(

)3()(

2)2(

)2()(

)4()(

)3()(

)2()(i

uidonor

uidonation

uiwoman

idonoridonationiwomani

ii

Nu

Nu

Nu

uuuX

Binomialy

We can write the model as

2)4(u

0

1

2

3

4

5

6

7

2)2(u

2)3(u

Parameter Description Estimate(se)

intercept -4.04(2.30)

azoospermia * 0.22(0.11)

semen quality 0.19(0.03)

womens age>35 -0.30(0.14)

sperm count 0.20(0.07)

sperm motility 0.02(0.06)

insemination to early -0.72(0.19)

insemination to late -0.27(0.10)

women variance 1.02(0.21)

donation variance 0.644(0.21)

donor variance 0.338(0.07)

Results:

Note cross-classified models can be fitted in IGLS but are far easier to fit using MCMC estimation.

Extension 2: Multiple membership models

When level 1 units are members of more than one higher level unit we describe a model for such data as a multiple membership model.

For example,

• Pupils change schools/classes and each school/class has an effect on pupil outcomes.

• Patients are seen by more than one nurse during the course of their treatment.

Notation

),0(~

)2(),0(~

)(

2

2)2(

)2(

)(

)2()2(,

ei

uj

inursejijjiii

Ne

Nu

euwXBy

Note that nurse(i) now indexes the set of nurses that treat patient i and w(2)

i,j is a weighting factor relating patient i to nurse j. For example, with four patients and three nurses, we may have the following weights:

n1(j=1) n2(j=2) n3(j=3)

p1(i=1) 0.5 0 0.5

p2(i=2) 1 0 0

p3(i=3) 0 0.5 0.5

p4(i=4) 0.5 0.5 0

4)2(

2)2(

144

3)2(

3)2(

233

2)2(

122

1)2(

3)2(

111

5.05.0)(

5.05.0)(

1)(

5.05.0)(

euuXBy

euuXBy

euXBy

euuXBy

Here patient 1 was seen by nurse 1 and 3 but not nurse 2 and so on. If we substitute the values of w(2)

i,j , i and j. from the table into (2) we get the series of equations :

Classification diagrams for multiple membership relationships

Double arrows indicate a multiple membership relationship between classifications.

patient

nurseWe can mix multiple membership, crossed and hierarchical structures in a single model.

patient

nurse

hospital

GP practice

Here patients are multiple members of nurses, nurses are nested within hospitals and GP

practice is crossed with both nurse and hospital.

Example involving nesting, crossing and multiple membership – Danish chickens

Production hierarchy10,127 child flocks 725 houses 304 farms

Breeding hierarchy10,127 child flocks200 parent flocks

farm f1 f2… Houses h1 h2 h1 h2 Child flocks c1 c2 c3… c1 c2 c3…. c1 c2 c3…. c1 c2 c3…. Parent flock p1 p2 p3 p4 p5….

Child flock

House

Farm

Parent flock

As a unit diagram: As a classification diagram:

Model and results

),0(~

),0(~),0(~

)()logit(

),1(~

2)4(

)4()(

2)3(

)3()(

2)2(

)2(

)(.

)4()(

)3()(

)2()2(,i

uifarm

uihouseuj

iflockpjiifarmihousejjii

ii

Nu

NuNu

euuuwXB

Binomialy

0

1

2

3

4

5

2)2(u

2)3(u

2)4(u

Parameter Description Estimate(se)

intercept -2.322(0.213)

1996 -1.239(0.162)

1997 -1.165(0.187)

hatchery 2 -1.733(0.255)

hatchery 3 -0.211(0.252)

hatchery 4 -1.062(0.388)

parent flock variance 0.895(0.179)

house variance 0.208(0.108)

farm variance 0.927(0.197)

Results:

Note multiple membership models can be fitted in IGLS and this model/dataset represents roughly the most complex model that the method can handle.

Such models are far easier to fit using MCMC estimation.

Further Extensions / Work in progress

1. Multilevel factor models

2. Response variables at different levels

3. Missing data and multiple imputation

4. Sample size calculations

Multilevel factor analysis modelling

• In survey analysis often there are many responses for each individual.

• Techniques like factor analysis are often used to identify underlying latent traits amongst the responses.

• Multilevel factor analysis allows factor analysis modelling to identify factors at various classifications in the dataset.

• Due to the nature of MCMC algorithms by adding a step to allow for multilevel factor models in MLwiN, cross-classified models can also be fitted without any additional programming!

• See Goldstein and Browne (2002,2005) for more detail.

Responses at different levels

• In a household survey some responses refer to individuals in a household while others refer to the household itself.

• Models that combine these responses can be fitted using the IGLS algorithm in MLwiN and shouldn’t pose any problems to MCMC.

• The Centre for Multilevel modelling in Bristol are investigating such models.

• See also Nicky Best’s talk on combining datasets which is a similar issue.

Missing data and multiple imputation

• Missing data is proliferate in survey research. • One popular approach to dealing with missing data is

multiple imputation (Rubin 1987) where several imputed datasets are created and then the model of interest is fitted to each dataset and the estimates combined.

• Using a multivariate normal response multilevel model to generate the imputations using MCMC in MLwiN is described in chapter 17 of Browne (2003)

• James Carpenter (LSHTM) has begun work on macros in MLwiN that automate the multiple imputation procedure.

Sample size calculations

• Another issue in data collection is how big a sample do we need to collect?

• Such sample size calculations have simple formulae if we can assume that an independent sample can be generated.

• If however we wish to account for the data structure in the calculation then things are more complex.

• One possibility is a simulation-based approach similar to that used in the model comparisons described earlier where many datasets are simulated to look at the power for a fixed sample size.

• Bonne Zijlstra will be joining me in Nottingham in January on an ESRC grant looking at such an approach.

Conclusions

• In this talk we have attempted to show the flexibility of MCMC methods in attempting to match the complexity of data structures found in real problems.

• We have also contrasted the methods with the IGLS algorithm.

• Although we have used MLwiN, all the models considered here could also be fitted in WinBUGS.

• WinBUGS offers even more flexibility in model choice for complex data structures however it is typically slower in fitting models that can also be fitted in MLwiN.

• Genstat and the GLLAMM package in Stata also deserve mention as alternatives in the ML/REML world for the models considered.

References • Browne, W.J. (2003). MCMC Estimation in MLwiN. London: Institute of Education,

University of London.• Browne, W.J. and Draper, D. (2004). A Comparison of Bayesian and Likelihood-

based methods for fitting multilevel models. University of Nottingham Research Report 04-01

• Browne, W.J., Goldstein, H. and Rasbash, J. (2001). Multiple membership multiple classification (MMMC) models. Statistical Modelling 1: 103-124.

• Goldstein, H. and Browne, W. J. (2002). Multilevel factor analysis modelling using Markov Chain Monte Carlo (MCMC) estimation. In Marcoulides and Moustaki (Eds.), Latent Variable and Latent Structure Models. p 225-243. Lawrence Erlbaum, New Jersey.

• Goldstein, H. and Browne, W.J. (2005). Multilevel Factor Analysis Models for Continuous and Discrete Data. In Maydeu-Olivares, A and McArdle, J.J. (Eds.), Contemporary psychometrics: a festschrift for Roderick P. McDonald, p 453-475. Lawrence Erlbaum, New Jersey.

• Goldstein, H. and Rasbash, J. (1996). Improved approximations for multilevel models with binary responses. Journal of the Royal Statistical Society, A. 159: 505-13.

• Rodriguez, G. and Goldman, N. (1995). An assessment of estimation procedures for multilevel models with binary responses. Journal of the Royal Statistical Society, Series A, 158, 73-89.

• Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. New York: J. Wiley and Sons.

mcmc estimation for random effect modelling – the mlwin experience dr william j. browne school of...

Documents

mcmc estimation

random effect models

multilevel structure

fitting multilevel models

fitting models

interval estimates

iglsrigls methods

fixed effects