longitudinal data analysis · • this way, the analysis of longitudinal data is reduced to the...

Longitudinal Data

Analysis

Geert Verbeke

Geert Molenberghs

Food and Drug Administration

Friday, August 1, 2003

Contents

1 Related References 1

2 Introduction 5

2.1 Repeated Measures / Longitudinal Data . . . . . . . . . . . . . . . . . . . . . 6

2.2 Rat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7

2.3 The BLSA Prostate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3 Simple Methods 14

3.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

4 A Model for Longitudinal Data 17

4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

4.2 A 2-stage Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.1 Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

4.2.2 Stage 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

i

4.3 Example: The Rat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22

4.4 The General Linear Mixed-effects Model . . . . . . . . . . . . . . . . . . . . . 26

4.5 Hierarchical versus Marginal Model . . . . . . . . . . . . . . . . . . . . . . . . 28

4.6 Example: The Rat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

4.7 A Model for the Residual Covariance Structure . . . . . . . . . . . . . . . . . . 34

5 Estimation and Inference in the Marginal Model 39

5.1 ML and REML Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

5.1.2 Maximum Likelihood Estimation (ML) . . . . . . . . . . . . . . . . . . 43

5.1.3 Restricted Maximum Likelihood Estimation (REML) . . . . . . . . . . . 44

5.2 Fitting Linear Mixed Models in SAS . . . . . . . . . . . . . . . . . . . . . . . . 46

5.3 Hierarchical versus Marginal Model . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3.1 Example 1: The rat data . . . . . . . . . . . . . . . . . . . . . . . . . 49

5.3.2 Example 2: Bivariate Observations . . . . . . . . . . . . . . . . . . . . 54

5.4 Inference for the Mean Structure . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

5.4.2 Approximate Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . 57

5.4.3 Approximate t-test and F -test . . . . . . . . . . . . . . . . . . . . . . . 58

5.4.4 Robust Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59

ii

5.4.5 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 63

5.5 Inference for the Variance Components . . . . . . . . . . . . . . . . . . . . . . 65

5.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5.5.2 Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

6 Inference for the Random Effects 68

6.1 Empirical Bayes Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

6.2 Example: Prostate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

6.3 Shrinkage Estimators bi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

6.4 Example: Prostate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6.5 The Normality Assumption for Random Effects . . . . . . . . . . . . . . . . . . 79

7 Non-Continuous Data: Motivating Studies 82

7.1 Toenail Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82

7.2 The Analgesic Trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84

8 Generalized Linear Models 86

8.1 The Bernoulli Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

8.2 Generalized Linear Models (GLM) . . . . . . . . . . . . . . . . . . . . . . . . . 89

9 Parametric Modeling Families 90

9.1 Continuous Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90

iii

9.2 Longitudinal Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . 91

9.2.1 Marginal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

9.2.2 Random-effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . 93

10 Generalized Estimating Equations 94

10.1 Large Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

10.2 Unknown Covariance Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 97

10.3 The Sandwich Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

10.4 The Working Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 99

10.5 Estimation of Working Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 100

10.6 Fitting GEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101

10.7 Application to Analgesic Trial . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10.8 Code and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102

10.9 Comparison of GEE Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 104

11 Random-Effects Models 105

12 Generalized Linear Mixed Models 106

12.1 Generalized Linear Mixed Models (GLMM) . . . . . . . . . . . . . . . . . . . . 107

12.2 Numerical Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

12.3 Application to Toenail Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

iv

12.3.1 SAS PROC NLMIXED Program . . . . . . . . . . . . . . . . . . . . . . 113

12.3.2 SAS output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

12.3.3 Accuracy of Numerical Integration . . . . . . . . . . . . . . . . . . . . 116

13 Variations to a Common Theme 118

13.1 Application to the Analgesic Trial . . . . . . . . . . . . . . . . . . . . . . . . . 119

13.1.1 PROC NLMIXED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

13.1.2 MIXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120

13.1.3 PQL2 (MLwiN) Without and With Extra-dispersion Parameter . . . . . . 122

13.1.4 PQL (MLwiN) Without and With Extra-dispersion Parameter . . . . . . 123

13.1.5 PQL (GLIMMIX) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124

13.1.6 Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125

14 Marginal Versus Random-effects Inference 126

14.1 Marginal Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

14.2 Sampling Based Marginalization on Toenail Data . . . . . . . . . . . . . . . . . 133

15 Key Points 137

16 Missing Data 138

16.1 Factorizing the Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

16.2 Missing Data Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140

v

16.3 Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

16.4 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

17 Simple Methods 144

17.1 Simple Methods and Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . 145

18 Three Likelihood Approaches 146

19 Direct Likelihood Approach 147

19.1 Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148

20 Weighted Estimating Equations 151

20.1 A Model for Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153

20.2 Computing the Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155

20.3 Weighted Estimating Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 157

21 Selection Models for MNAR and Sensitivity Analysis 159

21.1 Mastitis in Dairy Cattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

21.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

vi

Chapter 1

Related References

• Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M.(2002). Topics in Modelling of Clustered Data.London: Chapman and Hall.

• Brown, H. and Prescott, R. (1999). Applied Mixed

Models in Medicine. New-York: John Wiley & Sons.

• Crowder, M.J. and Hand, D.J. (1990). Analysis of

Repeated Measures. London: Chapman and Hall.

• Davidian, M. and Giltinan, D.M. (1995). Nonlinear

Models For Repeated Measurement Data. London:Chapman and Hall.

1

• Davis, C.S. (2002). Statistical Methods for the

Analysis of Repeated Measurements. New York:Springer-Verlag.

• Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger,S.L. (2002). Analysis of Longitudinal Data. (2ndedition). Oxford: Oxford University Press.

• Fahrmeir, L. and Tutz, G. (2002). Multivariate

Statistical Modelling Based on Generalized Linear

Models , (2nd edition). Springer Series in Statistics.New-York: Springer-Verlag.

• Goldstein, H. (1979). The Design and Analysis of

Longitudinal Studies. London: Academic Press.

• Goldstein, H. (1995). Multilevel Statistical Models.

London: Edward Arnold.

• Hand, D.J. and Crowder, M.J. (1995). Practical

Longitudinal Data Analysis. London: Chapman andHall.

2

• Jones, B. and Kenward, M.G. (1989). Design and

Analysis of Crossover Trials. London: Chapman andHall.

• Kshirsagar, A.M. and Smith, W.B. (1995). Growth

Curves. New-York: Marcel Dekker.

• Lindsey, J.K. (1993). Models for Repeated

Measurements. Oxford: Oxford University Press.

• Longford, N.T. (1993). Random Coefficient Models.

Oxford: Oxford University Press.

• McCullagh, P. and Nelder, J.A. (1989). Generalized

Linear Models (second edition). London: Chapmanand Hall.

• Molenberghs, G. and Verbeke, G. (2004). Discrete

Repeated Measures. New York: Springer-Verlag (inpreparation).

3

• Pinheiro, J.C. and Bates D.M. (2000). Mixed effects

models in S and S-Plus, Springer Series in Statisticsand Computing. New-York: Springer-Verlag.

• Searle, S.R., Casella, G., and McCulloch, C.E. (1992).Variance Components. New-York: Wiley.

• Senn, S.J. (1993). Cross-over Trials in Clinical

Research. Chichester: Wiley.

• Verbeke, G. and Molenberghs, G. (1997). Linear

Mixed Models In Practice: A SAS Oriented Approach,Lecture Notes in Statistics 126. New-York:Springer-Verlag.

• Verbeke, G. and Molenberghs, G. (2000). Linear

Mixed Models for Longitudinal Data. Springer Seriesin Statistics. New-York: Springer-Verlag.

• Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and

Non-linear Models for the Analysis of Repeated

Measurements. Marcel Dekker: Basel.

4

Chapter 2

Introduction

• Repeated Measures / Longitudinal data

• Examples

5

2.1 Repeated Measures / LongitudinalData

Repeated measures are obtained whena response is measured repeatedly

on a set of units

• Units:

. Subjects, patients, participants, . . .

. Animals, plants, . . .

. Clusters: families, towns, branches of acompany,. . .

. . . .

• Special case: Longitudinal data

6

2.2 Rat Data

• Department of dentistry, K.U.Leuven

• Research question:

How does craniofacial growth depend ontestosteron production ?

• Randomized experiment in which 50 male Wistar ratsare randomized to:

. Control (15 rats)

. Low dose of Decapeptyl (18 rats)

. High dose of Decapeptyl (17 rats)

• Treatment starts at the age of 45 days.

7

• Measurements taken every 10 days, from day 50 on.

• The responses are distances (pixels) between welldefined points on x-ray pictures of the skull of eachrat:

• Measurements with respect to the roof, base andheight of the skull.

• Here, we consider only one response, reflecting theheight of the skull.

8

• Individual profiles:

• Complication: Dropout due to anaesthesia (56%):

# Observations

Age (days) Control Low High Total

50 15 18 17 50

60 13 17 16 46

70 13 15 15 43

80 10 15 13 38

90 7 12 10 29

100 4 10 10 24

110 4 8 10 22

9

• Remarks:

. Much variability between rats

. Much less variability within rats

. Fixed number of measurements scheduled persubject, but not all measurements available due todropout, for known reason.

. Measurements taken at fixed timepoints

10

2.3 The BLSA Prostate Data

• References:

. Carter et al., Cancer Research (1992).

. Carter et al., Journal of the American MedicalAssociation (1992).

. Morrell et al., Journal of the American StatisticalAssociation (1995).

. Pearson et al., Statistics in Medicine (1994).

• Prostate disease is one of the most common and mostcostly medical problems in the United States

• Important to look for markers which can detect thedisease at an early stage

• Prostate-Specific Antigen is an enzyme produced byboth normal and cancerous prostate cells

• PSA level is related to the volume of prostate tissue.

11

• Problem: Patients with Benign ProstaticHyperplasia also have an increased PSA level

• Overlap in PSA distribution for cancer and BPH casesseriously complicates the detection of prostate cancer.

• Research question (hypothesis based on clinicalpractice):

Can longitudinal PSA profiles be used todetect prostate cancer in an early stage ?

• A retrospective case-control study based on frozenserum samples:

. 16 control patients

. 20 BPH cases

. 14 local cancer cases

. 4 metastatic cancer cases

12

• Complication: No perfect match for age at diagnosisand years of follow-up possible


• Remarks:

. Much variability between subjects

. Little variability within subjects

. Highly unbalanced data

13

Chapter 3

Simple Methods

• Summary statistics

• Disadvantages

14

3.1 Summary Statistics

• General procedure :

. Step 1 : Summarize data of each subject intoone statistic, a summary statistic

. Step 2 : Analyze the summary statistics, e.g.analysis of covariance to compare groups aftercorrection for important covariates

• This way, the analysis of longitudinal data is reducedto the analysis of independent observations, for whichclassical statistical procedures are available.

• Examples:

. Analysis at each time point separately

. Analysis of Area Under the Curve (AUC)

. Analysis of endpoints

. Analysis of increments

. Analysis of covariance

15

3.2 Disadvantages

• (Lots of) information is lost

• They often do not allow to draw conclusions aboutthe way the endpoint has been reached:

• Results are not always meaningful, e.g., analysis ofcovariance

16

Chapter 4

A Model for Longitudinal Data

• Introduction

• The 2-stage model formulation

• Example: Rat data

• The general linear mixed-effects model

• Hierarchical versus marginal model

• Example: Rat data

• A model for the residual covariance structure

17

4.1 Introduction

• In practice: often unbalanced data:

. unequal number of measurements per subject

. measurements not taken at fixed time points

• Therefore, multivariate regression techniques are oftennot applicable

• Often, subject-specific longitudinal profiles can bewell approximated by linear regression functions

• This leads to a 2-stage model formulation:

. Stage 1: Linear regression model for each subjectseparately

. Stage 2: Explain variability in the subject-specificregression coefficients using known covariates

18

4.2 A 2-stage Model Formulation

4.2.1 Stage 1

• Response Yij for ith subject, measured at time tij,i = 1, . . . , N , j = 1, . . . , ni

• Response vector Yi for ith subject:

Yi = (Yi1, Yi2, . . . , Yini)′

• Stage 1 model:

Yi = Ziβi + εi

• Zi is a (ni × q) matrix of known covariates

• βi is a q-dimensional vector of subject-specificregression coefficients

19

• εi ∼ N (0,Σi), often Σi = σ2Ini

• Note that the above model describes the observedvariability within subjects

20

4.2.2 Stage 2

• Between-subject variability can now be studied fromrelating the βi to known covariates

• Stage 2 model:

βi = Kiβ + bi

• Ki is a (q × p) matrix of known covariates

• β is a p-dimensional vector of unknown regressionparameters

• bi ∼ N (0, D)

21

4.3 Example: The Rat Data


22

• Transformation of the time scale to linearize theprofiles:

Ageij −→ tij = ln[1 + (Ageij − 45)/10)]

• Note that t = 0 corresponds to the start of thetreatment (moment of randomization)

• Stage 1 model:

Yij = β1i + β2itij + εij, j = 1, . . . , ni,

• Matrix notation:

Yi = Ziβi + εi

• Zi matrix:

Zi =

1 ti1

1 ti2

... ...

1 tini

23

• In the second stage, the subject-specific interceptsand time effects are related to the treatment of therats

• Stage 2 model:

β1i = β0 + b1i,

β2i = β1Li + β2Hi + β3Ci + b2i,

• Li, Hi, and Ci are indicator variables:

Li =

1 if low dose

0 otherwise

Hi =

1 if high dose

0 otherwise

Ci =

1 if control

0 otherwise

24

• Parameter interpretation:

. β0: average response at the start of the treatment(independent of treatment)

. β1, β2, and β3: average time effect for eachtreatment group

25

4.4 The General Linear Mixed-effectsModel

• A 2-stage approach can be performed explicitly in theanalysis

• However, this is just another example of the use ofsummary statistics:

. Yi is summarized by βi

. summary statistics βi analysed in second stage

• The associated drawbacks can be avoided bycombining the two stages into one model:

Yi = Ziβi + εiβi = Kiβ + bi

=⇒ Yi = ZiKiβ + Zibi + εi

= Xiβ + Zibi + εi

26

• General linear mixed-effects model:

Yi = Xiβ + Zibi + εi

bi ∼ N (0, D),

εi ∼ N (0,Σi),

b1, . . . , bN, ε1, . . . , εN independent,

• Terminology:

. Fixed effects: β

. Random effects: bi

. Variance components: elements in D and Σi

27

4.5 Hierarchical versus Marginal Model

• The general linear mixed model is given by:


bi ∼ N (0, D),

εi ∼ N (0,Σi),

b1, . . . , bN, ε1, . . . , εN independent,

• It can be rewritten as:

Yi|bi ∼ N (Xiβ + Zibi,Σi)

bi ∼ N (0, D)

28

• It is therefore also called a hierarchical model:

. A model for Yi given bi

. A model for bi

• Marginally, we have that Yi is distributed as:

Yi ∼ N (Xiβ, ZiDZ′i + Σi)

• Hence, very specific assumptions are made about thedependence of mean and covariance onn thecovariates Xi and Zi:

. Implied mean : Xiβ

. Implied covariance : Vi = ZiDZ′i + Σi

• Note that the hierarchical model implies the marginalone, not vice versa

29

4.6 Example: The Rat Data

• Stage 1 model:

Yij = β1i + β2itij + εij, j = 1, . . . , ni,

• Stage 2 model:

β1i = β0 + b1i,

β2i = β1Li + β2Hi + β3Ci + b2i,

• Combined:

Yij = (β0 + b1i) + (β1Li + β2Hi + β3Ci + b2i)tij + εij

=

β0 + b1i + (β1 + b2i)tij + εij, if low dose

β0 + b1i + (β2 + b2i)tij + εij, if high dose

β0 + b1i + (β3 + b2i)tij + εij, if control.

30

• Implied marginal mean structure:

. Linear average evolution in each group

. Equal average intercepts

. Different average slopes

• Implied marginal covariance structure (Σi = σ2Ini):

Cov(Yi(t1),Yi(t2))

= 1 t1

D

1

t2

+ σ2δ{t1,t2}

= d22t1 t2 + d12(t1 + t2) + d11 + σ2δ{t1,t2}.

• Note that the model implicitly assumes that thevariance function is quadratic over time, with positivecurvature d22.

31

• A model which assumes that all variability insubject-specific slopes can be ascribed to treatmentdifferences can be obtained by omitting the randomslopes b2i from the above model:

Yij = (β0 + b1i) + (β1Li + β2Hi + β3Ci)tij + εij

=

β0 + b1i + β1tij + εij, if low dose

β0 + b1i + β2tij + εij, if high dose

β0 + b1i + β3tij + εij, if control.

• This is the so-called random-intercepts model

• The same marginal mean structure is obtained asunder the model with random slopes

32

• Implied marginal covariance structure (Σi = σ2Ini):

Cov(Yi(t1),Yi(t2))

= 1

D

1

+ σ2δ{t1,t2}

= d11 + σ2δ{t1,t2}.

• Hence, the implied covariance matrix is compoundsymmetry:

. constant variance d11 + σ2

. constant correlation ρI = d11/(d11 + σ2) betweenany two repeated measurements within the samerat

33

4.7 A Model for the ResidualCovariance Structure

• Often, Σi is taken equal to σ2Ini

• We then obtain conditional independence:Conditional on bi, the elements in Yi are independent

• In the presence of no, or little, random effects,conditional independence is often unrealistic

• For example, the random intercepts model not onlyimplies constant variance, it also implicitly assumesconstant correlation between any two measurementswithin subjects.

• Hence, when there is no evidence for (additional)random effects, or if they would have no substantivemeaning, the correlation structure in the data can beaccounted for in an appropriate model for Σi

34

• Frequently used model:

Yi = Xiβ + Zibi + ε(1)i + ε(2)i︸︷︷︸

↓εi

• 3 stochastic components:

. bi: between-subject variability

. ε(1)i: measurement error

. ε(2)i: serial correlation component

• ε(2)i represents the belief that part of an individual’sobserved profile is a response to time-varyingstochastic processes operating within that individual.

• This results in a correlation between serialmeasurements, which is usually a decreasing functionof the time separation between these measurements

35

• Graphical representation of 3 stochastic components:

• The correlation matrix Hi of ε(2)i assumed to have(j, k) element of the form hijk = g(|tij − tik|) forsome decreasing function g(·) with g(0) = 1

• Frequently used functions g(·):. Exponential serial correlation: g(u) = exp(−φu)

. Gaussian serial correlation: g(u) = exp(−φu2)

36

• Graphical representation (φ = 1):

• Extreme cases:

. φ = +∞: components in ε(2)i independent

. φ = 0: components in ε(2)i perfectly correlated

• In general, the smaller φ, the stronger is the serialcorrelation.

37

• Resulting final linear mixed model:

Yi = Xiβ + Zibi + ε(1)i + ε(2)i

bi ∼ N (0, D)

ε(1)i ∼ N (0, σ2Ini)

ε(2)i ∼ N (0, τ 2Hi)

independent

38

Chapter 5

Estimation and Inference in the MarginalModel

• ML and REML estimation

• Fitting linear mixed models in SAS

• Estimation problems

• Inference for mean structure

• Inference for variance components

39

5.1 ML and REML Estimation

5.1.1 Introduction

• Recall that the general linear mixed model equals


bi ∼ N (0, D)

εi ∼ N (0,Σi)

independent

• The implied marginal model equals

Yi ∼ N (Xiβ, ZiDZ′i + Σi).

• Note that inferences based on the marginal model donot explicitly assume the presence of random effectsrepresenting the natural heterogeneity betweensubjects

40

• Notation:

. β: vector of fixed effects (as before)

. α: vector of all variance components in D and Σi

. θ = (β′,α′)′: vector of all parameters in marginalmodel

• Marginal likelihood function:

LML(θ) =N∏

i=1

(2π)−ni/2 |Vi(α)|−

1

2

× exp

−1

2(Yi − Xiβ)′ V −1

i (α) (Yi − Xiβ)

• If α were known, MLE of β equals

β(α) =

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWiyi,

where Wi equals V −1i .

41

• In most cases, α is not known, and needs to bereplaced by an estimate α

• Two frequently used estimation methods for α:

. Maximum likelihood

. Restricted maximum likelihood

42

5.1.2 Maximum Likelihood Estimation (ML)

• αML obtained from maximizing

LML(α,β(α))

with respect to α

• The resulting estimate β(αML) for β will be denoted

by βML

• αML and βML can also be obtained from maximizing

LML(θ) with respect to θ, i.e., with respect to α andβ simultaneously.

43

5.1.3 Restricted Maximum LikelihoodEstimation (REML)

• Correct for ‘loss of degrees of freedom’ in estimationof β

• We first combine all models

Yi ∼ N (Xiβ, Vi)

into one model

Y ∼ N (Xβ, V )

in which

Y =

Y1

...

YN

, X =

X1

...

XN

, V (α) =

V1 · · · 0

... . . . ...

0 · · · VN

• Further, the data are transformed orthogonal to X :

U = A′Y ∼ N (0, A′V (α)A)

44

• The MLE of α, based on U is called the REMLestimate, and is denoted by αREML

• The resulting estimate β(αREML) for β will be denoted

by βREML

• αREML and βREML can also be obtained from maximizing

LREML(θ) =

∣∣∣∣∣∣∣

N∑

i=1X ′

iWi(α)Xi

∣∣∣∣∣∣∣

−12

LML(θ)

with respect to θ, i.e., with respect to α and βsimultaneously.

• Although strictly speaking no likelihood, LREML(θ) istherefore still called the REML likelihood function.

45

5.2 Fitting Linear Mixed Models in SAS

• A model for the prostate data:

ln(PSAij + 1)

= β1Agei + β2Ci + β3Bi + β4Li + β5Mi

+ (β6Agei + β7Ci + β8Bi + β9Li + β10Mi) tij+ (β11Agei + β12Ci + β13Bi + β14Li + β15Mi) t

2ij

+ b1i + b2itij + b3it2ij + εij.

• We will assume Σi = σ2Ini

• SAS program:

proc mixed data=prostate method=reml;

class id group timeclss;

model lnpsa = group age group*time age*time

group*time2 age*time2

/ noint solution;

random intercept time time2

/ type=un subject=id g gcorr v vcorr;

repeated timeclss / type=simple subject=id r rcorr;

run;

46

• ML and REML estimates:

Effect Parameter MLE (s.e.) REMLE (s.e.)

Age effect β1 0.026 (0.013) 0.027 (0.014)

Intercepts:Control β2 −1.077 (0.919) −1.098 (0.976)

BPH β3 −0.493 (1.026) −0.523 (1.090)

L/R cancer β4 0.314 (0.997) 0.296 (1.059)Met. cancer β5 1.574 (1.022) 1.549 (1.086)

Age×time effect β6 −0.010 (0.020) −0.011 (0.021)

Time effects:

Control β7 0.511 (1.359) 0.568 (1.473)BPH β8 0.313 (1.511) 0.396 (1.638)

L/R cancer β9 −1.072 (1.469) −1.036 (1.593)

Met. cancer β10 −1.657 (1.499) −1.605 (1.626)Age×time2 effect β11 0.002 (0.008) 0.002 (0.009)

Time2 effects:

Control β12 −0.106 (0.549) −0.130 (0.610)

BPH β13 −0.119 (0.604) −0.158 (0.672)L/R cancer β14 0.350 (0.590) 0.342 (0.656)

Met. cancer β15 0.411 (0.598) 0.395 (0.666)

Covariance of bi:

var(b1i) d11 0.398 (0.083) 0.452 (0.098)var(b2i) d22 0.768 (0.187) 0.915 (0.230)

var(b3i) d33 0.103 (0.032) 0.131 (0.041)

cov(b1i, b2i) d12 = d21 −0.443 (0.113) −0.518 (0.136)cov(b2i, b3i) d23 = d32 −0.273 (0.076) −0.336 (0.095)

cov(b3i, b1i) d13 = d31 0.133 (0.043) 0.163 (0.053)

Residual variance:

var(εij) σ2 0.028 (0.002) 0.028 (0.002)

Log-likelihood −1.788 −31.235

47

• Fitted average profiles at median age at diagnosis:

48

5.3 Hierarchical versus Marginal Model

5.3.1 Example 1: The rat data

• Reconsider the model for the rat data:

Yij = (β0 + b1i) + (β1Li + β2Hi + β3Ci + b2i)tij + εij

in which tij equals ln[1 + (Ageij − 45)/10)]

• REML estimates obtained from SAS procedureMIXED:

Effect Parameter REMLE (s.e.)

Intercept β0 68.606 (0.325)

Time effects:

Low dose β1 7.503 (0.228)

High dose β2 6.877 (0.231)

Control β3 7.319 (0.285)

Covariance of bi:

var(b1i) d11 3.369 (1.123)

var(b2i) d22 0.000 ( )

cov(b1i, b2i) d12 = d21 0.090 (0.381)

Residual variance:

var(εij) σ2 1.445 (0.145)

REML log-likelihood −466.173

49

• Brown & Prescott (1999, p.237) :

• This suggests that the REML likelihood could befurther increased by allowing negative estimates ford22

• In SAS, this can be done by adding the option‘nobound’ to the PROC MIXED statement.

50

• Results:

Parameter restrictions for α

dii ≥ 0, σ2 ≥ 0 dii ∈ IR, σ2 ∈ IR

Effect Parameter REMLE (s.e.) REMLE (s.e.)

Intercept β0 68.606 (0.325) 68.618 (0.313)

Time effects:

Low dose β1 7.503 (0.228) 7.475 (0.198)High dose β2 6.877 (0.231) 6.890 (0.198)

Control β3 7.319 (0.285) 7.284 (0.254)

Covariance of bi:var(b1i) d11 3.369 (1.123) 2.921 (1.019)

var(b2i) d22 0.000 ( ) −0.287 (0.169)

cov(b1i, b2i) d12 = d21 0.090 (0.381) 0.462 (0.357)

Residual variance:var(εij) σ2 1.445 (0.145) 1.522 (0.165)

REML log-likelihood −466.173 −465.193

• As expected, a negative estimate for d22 is obtained

• Note that the REML log-likelihood value has beenfurther increased.

51

• Meaning of negative ‘variance’ ?

. Fitted variance function:

Var(Yi(t)) = 1 t

D

1

t

+ σ2

= d22t2 + 2 d12t+ d11 + σ2

= −0.287t2 + 0.924t + 4.443

. Hence, the negative variance component suggestsnegative curvature in the variance function.

. This is supported by the sample variance function:

52

• This again shows that the hierarchical and marginalmodels are not equivalent:

. The marginal model allows negative variancecomponents, as long as the marginal covariancesVi = ZiDZ

′i + σ2Ini

are positive definite

. The hierarchical interpretation of the model doesnot allow negative variance components

53

5.3.2 Example 2: Bivariate Observations

• Balanced data, two measurements per subject(ni = 2).

• Model 1: Random intercepts + heterogeneous errors

V =

1

1

(d) (1 1) +

σ21 0

0 σ22

=

d + σ21 d

d d + σ22

• Model 2: Uncorrelated intercepts and slopes +measurement error

V =

1 0

1 1

d1 0

0 d2

1 1

0 1

+

σ2 0

0 σ2

=

d1 + σ2 d1

d1 d1 + d2 + σ2

54

• Different hierarchical models can produce the samemarginal model

• Hence, a good fit of the marginal model cannot beinterpreted as evidence for any of the hierarchicalmodels.

• A satisfactory treatment of the hierarchical model isonly possible within a Bayesian context.

55

5.4 Inference for the Mean Structure

5.4.1 Introduction

• Estimate for β:

β(α) =

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWiyi

with α replaced by its ML or REML estimate

• Conditional on α, β(α) is multivariate normal with

mean β and covariance

var(β)

=

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWivar(Yi)WiXi

N∑

i=1X ′

iWiXi

−1

=

N∑

i=1X ′

iWiXi

−1

,

• In practice one again replaces α by its ML or REMLestimate

56

5.4.2 Approximate Wald Test

For any known matrix L:

(β − β)′L′L

N∑

i=1X ′

iV−1i (α)Xi

−1

L′

−1

L(β − β) ≈ χ2rank(L)

57

5.4.3 Approximate t-test and F -test

• Wald, but corrected for variability introduced fromreplacing α by some estimate, replacing χ2 by F :

(β − β)′L′L

N∑

i=1X ′

iV−1i (α)Xi

−1

L′

−1

L(β − β)

rank(L)≈ Frank(L),ν

• d.d.f. ν to be estimated from the data:

. Containment method

. Satterthwaite approximation

. Kenward and Roger approximation

. . . .

58

5.4.4 Robust Inference

• Estimate for β:

β(α) =

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWiYi

with α replaced by its ML or REML estimate

• Conditional on α, β has mean

E[β(α)

]=

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWiE(Yi)

=

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWiXiβ

= β

provided that E(Yi) = Xiβ

• Hence, in order for β to be unbiased, it is sufficient

that the mean of the response is correctly specified.

59

• Conditional on α, β has covariance

Var(β)

=

N∑

i=1X ′

iWiXi

−1

N∑

i=1X ′

iWiVar(Yi)WiXi

N∑

i=1X ′

iWiXi

−1

=

N∑

i=1X ′

iWiXi

−1

,

• Note that this assumes that the covariance matrixVar(Yi) is correctly modelled as Vi = ZiDZ

′i + Σi

• This covariance estimate is therefore often called the‘naive’ estimate.

• The so-called ‘robust’ estimate for Var(β), whichdoes not assume the covariance matrix to be correctlyspecified is obtained from replacing Var(Yi) by[Yi −Xi

β] [Yi −Xi

β]′

rather than Vi

60

• The only condition for[Yi −Xi

β] [Yi −Xi

β]′

to beunbiased for Var(Yi) is that the mean is againcorrectly specified.

• The so-obtained estimate is called the ‘robust’variance estimate, also called the sandwich estimate:

Var(β)

=

N∑

i=1X ′

iWiXi

−1

︸︷︷︸

↓

BREAD

N∑

i=1X ′

iWiVar(Yi)WiXi

︸︷︷︸

↓

MEAT

N∑

i=1X ′

iWiXi

−1

︸︷︷︸

↓

BREAD

• Based on this sandwich estimate, robust versions ofthe Wald test as well as of the approximate t-test andF -test can be obtained.

• Note that this suggests that as long as interest is onlyin inferences for the mean structure, little effortshould be spent in modeling the covariance structure,provided that the data set is sufficiently large

61

• Extreme point of view: OLS with robust standarderrors

• Appropriate covariance modeling may still be ofinterest:

. for the interpretation of random variation in data

. for gaining efficiency

. in presence of missing data, robust inference onlyvalid under very severe assumptions about theunderlying missingness process (see later)

62

5.4.5 Likelihood Ratio Test

• Comparison of nested models with different meanstructures, but equal covariance structure

• Example: prostate data, testing H0 : β1 = 0 inreduced model:

ln(PSAij + 1)


+ (β8Bi + β9Li + β10Mi) tij+ β14 (Li +Mi) t

2ij

+ b1i + b2itij + b3it2ij + εij,

• Results under ML estimation:

ML estimation

Under β1 ∈ IR LML = −3.575

Under H0 : β1 = 0 LML = −6.876

−2 lnλN 6.602

degrees of freedom 1

p-value 0.010

63

• Results under REML estimation:

REML estimation

Under β1 ∈ IR LREML = −20.165

Under H0 : β1 = 0 LREML = −19.003

−2 lnλN −2.324

degrees of freedom

p-value

• Conclusion:

LR tests for the mean structure are notvalid under REML

64

5.5 Inference for the VarianceComponents

5.5.1 Introduction

• Inference for the mean structure is usually of primaryinterest.

• However, inferences for the covariance structure is ofinterest as well:

. interpretation of the random variation in the data

. overparameterized covariance structures lead toinefficient inferences for mean

. too restrictive models invalidate inferences for themean structure

• Approximate Wald test based on asymptoticnormality of MLE’s and REMLE’s

• LR tests based on ML or REML

65

5.5.2 Caution

• We reconsider the reduced model for the prostatedata:

ln(PSAij + 1)



2ij


• Selected SAS output:

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

UN(1,1) XRAY 0.4432 0.09349 4.74 <.0001

UN(2,1) XRAY -0.4903 0.1239 -3.96 <.0001

UN(2,2) XRAY 0.8416 0.2033 4.14 <.0001

UN(3,1) XRAY 0.1480 0.04702 3.15 0.0017

UN(3,2) XRAY -0.3000 0.08195 -3.66 0.0003

UN(3,3) XRAY 0.1142 0.03454 3.31 0.0005

timeclss XRAY 0.02837 0.002276 12.47 <.0001

66

• Problem 1: Marginal vs. hierarchical modelinterpretation:

H0 : d33 = 0

vs.

H0 : d33 = d32 = d31 = 0

• Problem 2: Boundary problems !

67

Chapter 6

Inference for the Random Effects

• Empirical Bayes inference

• Example: Prostate data

• Shrinkage

• Example: Prostate data

• Normality assumption for random effects

68

6.1 Empirical Bayes Inference

• Random effects bi reflect how the evolution for theith subject deviates from the expected evolution Xiβ.

• Estimation of the bi helpful for detecting outlyingprofiles

• This is only meaningful under the hierarchical modelinterpretation:


bi ∼ N (0, D)

• Since the bi are random, it is most natural to useBayesian methods

69

• Terminology: prior distribution N (0, D) for bi

• Posterior density:

f (bi|yi) ≡ f (bi|Yi = yi)

=f (yi|bi) f (bi)

∫f (yi|bi) f (bi) dbi

∝ f (yi|bi) f (bi)

∝ . . .

∝ exp

−

1

2(bi −DZ ′

iWi(yi −Xiβ))′

Λ−1i (bi −DZ ′

iWi(yi −Xiβ))

for some positive definite matrix Λi.

• Posterior distribution:

bi | yi ∼ N (DZ ′iWi(yi −Xiβ),Λi)

70

• Posterior mean as estimate for bi:

bi(θ) = E [bi | Yi = yi]

=∫bi f (bi|yi) dbi

= DZ ′iWi(α)(yi −Xiβ)

• bi(θ) is normally distributed with covariance matrix

var(bi(θ)) = DZ ′i

Wi − WiXi

N∑

i=1X ′

iWiXi

−1

X ′iWi

ZiD,

• Note that inference for bi should account for thevariability in bi

• Therefore, inference for bi is usually based on

var(bi(θ) − bi) = D − var(bi(θ))

71

• Wald tests can be derived

• Parameters in θ are replaced by their ML or REMLestimates, obtained from fitting the marginal model.

• bi =

bi(θ) is called the ‘Empirical Bayes’

estimate of bi.

• Approximate t- and F -tests to account for thevariability introduced by replacing θ by

θ,similar to tests for fixed effects.

72

6.2 Example: Prostate Data

• We reconsider the reduced model:

ln(PSAij + 1)



2ij


• In practice, histograms and scatterplots of certaincomponents of

bi are used to detect model deviationsor subjects with ‘exceptional’ evolutions over time

• In SAS the estimates can be obtained from addingthe option ‘solution’ to the random statement:

random intercept time time2

/ type=un subject=id solution;

ods listing exclude solutionr;

ods output solutionr=out;

• The ODS statements are used to write the EBestimates into a SAS output data set, and to preventSAS from printing them in the output window.

73

• Histograms and pairwise scatterplots:

74

• Strong negative correlations in agreement withcorrelation matrix corresponding to fitted D:

Dcorr =

1.000 −0.803 0.658

−0.803 1.000 −0.968

0.658 −0.968 1.000

• Histograms and scatterplots show outliers

• Subjects #22, #28, #39, and #45, have highest fourslopes for time2 and smallest four slopes for time, i.e.,with the strongest (quadratic) growth.

• Subjects #22, #28 and #39 have been furtherexamined and have been shown to be metastaticcancer cases which were misclassified as local cancercases.

• Subject #45 is the metastatic cancer case with thestrongest growth

75

6.3 Shrinkage Estimatorsbi

• Consider the prediction of the evolution of the ithsubject:

Yi ≡ Xi

β + Zi

bi

= Xiβ + ZiDZ

′iV

−1i (yi −Xi

β)

=(Ini

− ZiDZ′iV

−1i

)Xi

β + ZiDZ

′iV

−1i yi

= ΣiV−1i Xi

β +

(Ini − ΣiV

−1i

)yi,

• Hence, Yi is a weighted mean of the

population-averaged profile Xiβ and the observed

data yi, with weights ΣiV −1

i and Ini− Σi

V −1i

respectively.

• Note that Xiβ gets much weight if the residual

variability is ‘large’ in comparison to the totalvariability.

76

• This phenomenon is usually called shrinkage :

The observed data are shrunktowards the

prior average profile Xiβ.

• This is also reflected in the fact that for any linearcombination λ′bi of random effects,

var(λ′bi) ≤ var(λ′bi).

77

6.4 Example: Prostate Data

• Comparison of predicted, average, and observedprofiles for the subjects #15 and #28, obtained underthe reduced model:

• Illustration of the shrinkage effect :

Var(bi) =

0.403 −0.440 0.131

−0.440 0.729 −0.253

0.131 −0.253 0.092

D =

0.443 −0.490 0.148

−0.490 0.842 −0.300

0.148 −0.300 0.114

78

6.5 The Normality Assumption forRandom Effects

• In practice, histograms of EB estimates are often usedto check the normality assumption for the randomeffects

• However, since

bi = DZ ′iWi(yi − Xiβ)

var(bi) = DZ ′i

Wi − WiXi

N∑

i=1X ′

iWiXi

−1

X ′iWi

ZiD

one should at least first standardize the EB estimates

• Further, due to the shrinkage property the EBestimates do not fully reflect the heterogeneity in thedata.

79

• Small simulation example:

. 1000 profiles with 5 measurements, balanced

. 1000 random intercepts sampled from

1

2N (−2, 1) +

1

2N (2, 1)

. Histogram of sampled intercepts:

. Σi = σ2Ini, σ2 = 30

. Data analysed assuming normality for theintercepts

80

. Histogram of EB estimates:

. Clearly, severe shrinkage forces the estimates bi tosatisfy the normality assumption

• Conclusion:

EB estimates obtained under normalitycannot be used to check normality

• This suggests that the only possibility to check thenormality assumption is to fit a more general model,with the classical linear mixed model as special case,and to compare both models using LR methods

81

Chapter 7

Non-Continuous Data: MotivatingStudies

7.1 Toenail Data

• Toenail Dermatophyte Onychomycosis: Commontoenail infection, difficult to treat, affecting more than2% of population.

• Classical treatments with antifungal compounds needto be administered until the whole nail has grown outhealthy.

• New compounds have been developed which reducetreatment to 3 months

82

• Randomized, double-blind, parallel group, multicenterstudy for the comparison of two such new compounds(A and B) for oral treatment.


Severity relative to treatment of TDO ?

• 2 × 189 patients randomized, 36 centers

• 48 weeks of total follow up (12 months)

• 12 weeks of treatment (3 months)

• measurements at months 0, 1, 2, 3, 6, 9, 12.

83

7.2 The Analgesic Trial

• single-arm trial with 530 patients recruited (491selected for analysis)

• analgesic treatment for pain caused by chronicnonmalignant disease

• treatment was to be administered for 12 months

• we will focus on Global Satisfaction Assessment(GSA)

• GSA scale goes from 1=very good to 5=very bad

• GSA was rated by each subject 4 times during thetrial, at months 3, 6, 9, and 12.

84

Observed Frequencies

Questions

• Evolution over time

• Relation with baseline covariates: age, sex, durationof the pain, type of pain, disease progression, PainControl Assessment (PCA), . . .

• Investigation of dropout

85

Chapter 8

Generalized Linear Models

1. E(Yi) = µi

2. η(µi) = xTi β with η(.) the link function

3. Var(Yi) = φv(µi), where

• v(.) is a known variance function

• φ is a scale (overdispersion) parameter

• exponential family p.d.f.

f (y|θi, φ) = exp{φ−1[yθi − ψ(θi)] + c(y, φ)

}

with θi the natural parameter and ψ(.) a functionsatisfying

. µi = ψ′(θi)

. v(µi) = ψ′′(θi)

86

8.1 The Bernoulli Model

Y ∼ Bernoulli(π)

• Density function:

f (y|θ, φ) = πy(1 − π)1−y

= exp {y ln π + (1 − y) ln(1 − π)}

= exp

y ln

π

1 − π

+ ln(1 − π)

87

• Exponential family:

. θ = ln(

π1−π

)

. φ = 1

. ψ(θ) = ln(1 − π) = ln(1 + exp(θ))

. c(y, φ) = 0

• Mean and variance function:

. µ = exp θ1+exp θ

= π

. v(µ) = exp θ(1+exp θ)2

= π(1 − π)

• Note that, under this model, the mean and varianceare related:

φv(µ) = µ(1 − µ)

• The link function is the logit link:

θ = ln

µ

1 − µ

88

8.2 Generalized Linear Models (GLM)

• Suppose a sample Y1, . . . , YN of independentobservations is available

• All Yi have densities f (yi|θi, φ) which belong to theexponential family

• Linear predictor:

θi = xi′β

• Log-likelihood:

`(β, φ) =1

φ

∑

i[yiθi − ψ(θi)] +

∑

ic(yi, φ)

• Score equations:

S(β) =∑

i

∂µi

∂βv−1

i (yi − µi) = 0

89

Chapter 9

Parametric Modeling Families

9.1 Continuous Outcomes

• Marginal Models

E(Yij|xij) = x′ijβ

• Random-Effects Models

E(Yij|bi,xij) = x′ijβ + z′ijbi

• Transition Models

E(Yij|Yi,j−1, . . . , Yi1,xij) = x′ijβ + αYi,j−1

90

9.2 Longitudinal Generalized LinearModels

• Normal case: easy transfer between models

• Also non-normal data can be measured repeatedly(over time)

• Lack of key distribution such as the normal [=⇒]

. A lot of modeling options

. Introduction of non-linearity

. No easy transfer between model families

cross-

sectional longitudinal

normal outcome linear model LMM

non-normal outcome GLM ?

91

9.2.1 Marginal Models

• Non-likelihood methods:

. Empirical generalized least squares (EGLS)∗ (Koch et al, 1977) (SAS procedure CATMOD)

. Generalized estimating equations (GEE)∗ Liang and Zeger (1986) (SAS procedure GENMOD)

• Likelihood methods:

. Multivariate Probit Model

. Bahadur Model

. Odds ratio model

92

9.2.2 Random-effects Models

• Beta-binomial model

. Skellam (1948)

. Kleinman (1973)

. Molenberghs, Declerck, and Aerts (1997)

• Generalized linear mixed models

. Engel and Keen (1992)

. Breslow and Clayton (JASA 1993)

. Wolfinger and O’Connell (JSCS 1993)

• Hierarchical generalized linear models

. Lee and Nelder (JRSSB 1996)

• . . .

93

Chapter 10

Generalized Estimating Equations

• Univariate GLM, score function of the form (scalarYi):

S(β) =N∑

i=1

∂µi

∂βv−1

i (yi − µi) = 0

with vi = Var(Yi).

• In longitudinal setting: Y = (Y 1, . . . ,Y N):

S(β) =N∑

i=1DT

i [V i(α)]−1 (yi − µi) = 0

where

. Di is an ni × p matrix with (i, j)th elements

∂µij

∂β

. Vi is ni × ni diagonal,

. yi and µi are ni-vectors with elements yij and µij

94

• The corresponding matrices V i = Var(Y i) involve aset of nuisance parameters, α say, which determinethe covariance structure of Y i.

• Same form as for full likelihood procedure

• We restrict specification to the first moment only

• The second moment is only specified in the variances,not in the correlations.

• Solving these equations:

. version of iteratively weighted least squares

. Fisher scoring

• Liang and Zeger (1986)

95

10.1 Large Sample Properties

As N → ∞√N(β − β) ∼ N (0, I−1

0 )

where

I0 =N∑

i=1DT

i [Vi(α)]−1Di

• (Unrealistic) Conditions:

. α is known

. the parametric form for V i(α) is known

• Solution: working correlation matrix

96

10.2 Unknown Covariance Structure

Keep the score equations

S(β) =N∑

i=1[Di]

T [Vi(α)]−1 (yi − µi) = 0

BUT

• suppose V i(.) is not the true variance of Y i but onlya plausible guess, a so-called working correlationmatrix

• specify correlations and not covariances, because thevariances follow from the mean structure

• the score equations are solved as before

The asymptotic normality results change to√N (β − β) ∼ N (0, I−1

0 I1I−10 )

I0 =N∑

i=1DT

i [Vi(α)]−1Di

I1 =N∑

i=1DT

i [Vi(α)]−1Var(Y i)[Vi(α)]−1Di.

97

10.3 The Sandwich Estimator

• This is the so-called sandwich estimator:

. I0 is the bread

. I1 is the filling (ham or cheese)

• correct guess =⇒ likelihood variance

• the estimators β are consistent even if the workingcorrelation matrix is incorrect

• An estimate is found by replacing the unknownvariance matrix Var(Y i) by

(Y i − µi)(Y i − µi)T .

Even if this estimator is bad for Var(Y i) it leads to agood estimate of I1, provided that:

. replication in the data is sufficiently large

. same model for µi is fitted to groups of subjects

. observation times do not vary too much betweensubjects

• a bad choice of working correlation matrix can affectthe efficiency of β

98

10.4 The Working Correlation Matrix

WriteVi(β,α) = φA

1/2i (β)Ri(α)A

1/2i (β).

Variance function: Ai is (ni × ni) diagonal withelements v(µij), the known GLM variance function.

Working correlation: Ri(α), possibly dependent on adifferent set of parameters α.

Overdispersion parameter: φ, assumed 1 or estimatedfrom the data.

The unknown quantities are expressed in terms of thePearson residuals

eij =yij − µij√v(µij)

.

Note that eij depends on β.

99

10.5 Estimation of Working Correlation

Liang and Zeger (1986) proposed moment-basedestimates for the working correlation.

Some of the more popular ones:

Corr(Yij, Yik) Estimate

Independence 0 —

Exchangeable α α = 1N

∑Ni=1

1ni(ni−1)

∑j 6=k eijeik

AR(1) αt α = 1N

∑Ni=1

1ni−1

∑j≤ni−1 eijei,j+1

Unstructured αjk αjk = 1N

∑Ni=1 eijeik

The dispersion parameter is estimated by

φ =1

N

N∑

i=1

1

ni

ni∑

j=1e2ij.

100

10.6 Fitting GEE

The standard procedure, implemented in the SASprocedure GENMOD.

1. Compute initial estimates for β, using a univariateGLM (i.e., assuming independence).

2. • Compute Pearson residuals eij.

• Compute estimates for α.

• Compute Ri(α).

• Compute estimate for φ.

• Compute Vi(β,α) = φA1/2i (β)Ri(α)A

1/2i (β).

3. Update estimate for β:

β(t+1) = β(t)−

N∑

i=1DT

i V−1i Di

−1

N∑

i=1DT

i V−1i (yi − µi)

.

4. Iterate 2.–3. until convergence.

Estimates of precision by means of I−10 and I−1

0 I1I−10 .

101

10.7 Application to Analgesic Trial

10.8 Code and Output

proc genmod data=gsa;

ods listing exclude parameterestimates classlevels parminfo;

class patid timecls;

model gsabin = pca0 time|time / dist=b;

repeated subject=patid / type=un corrw within=timecls modelse;

run;

Selected Output

The GENMOD Procedure

Analysis Of Initial Parameter Estimates

Standard Wald 95% Chi-

Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 2.8016 0.4902 1.8408 3.7624 32.66 <.0001

pca0 1 -0.2058 0.0864 -0.3751 -0.0365 5.67 0.0172

TIME 1 -0.7864 0.3874 -1.5456 -0.0271 4.12 0.0424

TIME*TIME 1 0.1774 0.0793 0.0219 0.3329 5.00 0.0254

Scale 0 1.0000 0.0000 1.0000 1.0000

102

GEE Model Information

Correlation Structure Unstructured

Within-Subject Effect timecls (4 levels)

Subject Effect PATID (395 levels)

Number of Clusters 395

Correlation Matrix Dimension 4

Maximum Cluster Size 4

Minimum Cluster Size 1

Working Correlation Matrix

Col1 Col2 Col3 Col4

Row1 1.0000 0.1770 0.2481 0.2021

Row2 0.1770 1.0000 0.1811 0.1177

Row3 0.2481 0.1811 1.0000 0.4594

Row4 0.2021 0.1177 0.4594 1.0000

Analysis Of GEE Parameter Estimates

Empirical Standard Error Estimates

Standard 95% Confidence

Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.8731 0.4592 1.9730 3.7731 6.26 <.0001

pca0 -0.2278 0.0959 -0.4157 -0.0398 -2.37 0.0176

TIME -0.7779 0.3234 -1.4117 -0.1440 -2.41 0.0162

TIME*TIME 0.1670 0.0656 0.0384 0.2955 2.55 0.0109

Analysis Of GEE Parameter Estimates

Model-Based Standard Error Estimates

Standard 95% Confidence

Parameter Estimate Error Limits Z Pr > |Z|

Intercept 2.8731 0.4836 1.9252 3.8209 5.94 <.0001

pca0 -0.2278 0.1025 -0.4287 -0.0268 -2.22 0.0263

TIME -0.7779 0.3275 -1.4198 -0.1359 -2.37 0.0176

TIME*TIME 0.1670 0.0665 0.0366 0.2973 2.51 0.0121

Scale 1.0000 . . . . .

103

10.9 Comparison of GEE Estimates

Variable INDependence EXCHangeable AutoRegressive UNstructured

Intercept 2.80 (0.469; 0.490) 2.92 (0.463; 0.494) 2.94 (0.469; 0.488) 2.87 (0.459; 0.484)

Time -0.79 (0.341; 0.387) -0.83 (0.328; 0.343) -0.90 (0.334; 0.352) -0.78 (0.323; 0.328)

Time2 0.18 (0.070; 0.079) 0.18 (0.067; 0.070) 0.20 (0.069; 0.072) 0.17 (0.066; 0.067)

Basel. PCA -0.21 (0.095; 0.086) -0.23 (0.095; 0.103) -0.22 (0.095; 0.099) -0.23 (0.096; 0.103)

Correlation 0 0.219 0.235 see output

Parameter estimates and standard errors (empirical; model-based).

104

Chapter 11

Random-Effects Models

• Assume the responses Yi1, . . . , Yiniconditionally

independent given an unobserved vector bi.

• E(Yij|bi) = µij

• η(µij) = x′ijβ + z′ijbi

• var(Yij|bi) = φv(µij)

• Marginal likelihood

f (yi) =∫ n∏

j=1f (yij|bi)g(b)db

where g(.) denotes the joint p.d.f. of b.

. f and g normal: linear mixed model

. f exp. fam. & g normal: gen. lin. mix. model

. f exp. fam. & g general: hier. gen. lin. model

105

Chapter 12

Generalized Linear Mixed Models

• We re-consider the linear mixed model:


bi ∼ N (0, D)

• The implied marginal model equals

Yi ∼ N (Xiβ, ZiDZ′i + Σi)

• Hence, even under conditional independence, i.e., allΣi equal to σ2Ini

, a marginal association structure isimplied through the random effects.

• Linear models versus GLM: similarities, but alsodifferences !

106

12.1 Generalized Linear Mixed Models(GLMM)

• Given a vector bi of random effects for cluster i, it isassumed that all responses Yij are independent, withdensity

f (yij|θij, φ)

= exp{φ−1[yijθij − ψ(θij)] + c(yij, φ)

}

in which θij is now modelled as

θij = xij′β + zij

′bi

• It is generally assumed that

bi ∼ N (0, D)

107

• Let fij(yij|bi,β, φ) denote the conditional density ofYij given bi, we then have that the marginaldistribution of Yi is given by

fi(yi|β, D, φ) =∫ ni∏

j=1fij(yij|bi,β, φ) f (bi|D) dbi

where f (bi|D) is the density of the N (0, D)distribution.

• The likelihood function for β, D, and φ now equals

L(β,D, φ) =N∏

i=1fi(yi|β, D, φ)

=N∏

i=1

∫ ni∏

j=1fij(yij|bi,β, φ) f (bi|D) dbi

• Under the normal linear model, the integral can beworked out analytically.

108

• In general, approximations are required

• Two approaches can be followed:

. approximation of the integrand

. approximation of the integral: numericalquadrature

• Roughly speaking, under the first approach,∏

jfij(yij|bi,β, φ) is approximated by a normal density

such that the integral can be calculated analytically,as in the normal linear model.

109

12.2 Numerical Quadrature

• Two popular approaches:

. Gaussian quadrature

. Adaptive Gaussian quadrature (assuming integrandapproximately normal)

• Adaptive Gaussian quadrature often needs less points,but the evaluations are more time consuming

• Number of quadrature points can influence theresults !

• Implemented in PROC NLMIXED

110

12.3 Application to Toenail Data

The Model

• As an illustration, we perform a GLMM analysis ofthe severity of infection in the toenail data set.

• The outcome of interest is the binary indicatorreflecting severity of the infection.

• Frequencies at each visit (both treatments):

111

• Conditionally on a random intercept bi, logisticregression models will be used to describe themarginal distributions, i.e., the distribution of theoutcome at each time point separately:

Yij|bi ∼ Bernoulli(πij)

log

πij

1 − πij

= β0 + bi + β1Ti + β2tij + β3Titij

• Notation:

. Ti: treatment indicator for subject i

. tij: time point at which jth measurement is takenfor ith subject

• More complex mean models can be considered as well(e.g. including polynomial time effects, includingcovariates, or including other random effects than justintercepts).

112

12.3.1 SAS PROC NLMIXED Program

• SAS code (random intercept):

proc nlmixed data=test noad qpoints=3;

parms beta0=-1.6 beta1=0 beta2=-0.4 beta3=-0.5 sigma=3.9;

teta = beta0 + b + beta1*treatn + beta2*time + beta3*timetr;

expteta = exp(teta);

p = expteta/(1+expteta);

model onyresp ~ binary(p);

random b ~ normal(0,sigma**2) subject=idnum;

run;

• The inclusion of a random slope can be specified asfollows:

proc nlmixed data=test noad qpoints=3;

parms beta0=-1.6 beta1=0 beta2=-0.4 beta3=-0.5

d11=3.9 d12=0 d22=0.1;

teta = beta0 + b1 + beta1*treatn + beta2*time

+ b2*time + beta3*timetr;

expteta = exp(teta);

p = expteta/(1+expteta);

model onyresp ~ binary(p);

random b1 b2 ~ normal([0, 0] , [d11, d12, d22])

subject=idnum;

run;

113

12.3.2 SAS output

• Model and data specifications:

Specifications

Data Set WORK.TEST

Dependent Variable onyresp

Distribution for Dependent Variable Binary

Random Effects b

Distribution for Random Effects Normal

Subject Variable idnum

Optimization Technique Dual Quasi-Newton

Integration Method Gaussian Quadrature

• Dimensions of the data set:

Dimensions

Observations Used 1908

Observations Not Used 0

Total Observations 1908

Subjects 294

Max Obs Per Subject 7

Parameters 5

Quadrature Points 3

• Analysis of initial parameter estimates:

Parameters

beta0 beta1 beta2 beta3 sigma NegLogLike

-1.6 0 -0.4 -0.5 3.9 760.941002

114

• Minus twice the maximized log-likelihood, as well asassociated information criteria:

Fit Statistics

-2 Log Likelihood 1344.1

AIC (smaller is better) 1354.1

BIC (smaller is better) 1372.6

• Parameter estimates, standard errors, approximatet-tests and confidence intervals:

Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha

beta0 -1.5221 0.3063 293 -4.97 <.0001 0.05

beta1 -0.3932 0.3812 293 -1.03 0.3031 0.05

beta2 -0.3198 0.03481 293 -9.19 <.0001 0.05

beta3 -0.09098 0.05236 293 -1.74 0.0833 0.05

sigma 2.2555 0.1217 293 18.54 <.0001 0.05

Parameter Estimates

Parameter Lower Upper Gradient

beta0 -2.1250 -0.9192 -0.00007

beta1 -1.1433 0.3570 -0.00002

beta2 -0.3883 -0.2513 0.000058

beta3 -0.1940 0.01207 0.000319

sigma 2.0161 2.4950 -0.00003

115

12.3.3 Accuracy of Numerical Integration

• In order to investigate the accuracy of the numericalintegration method, the model was refitted, forvarying numbers of quadrature points, and foradaptive as well as non-adaptive Gaussian quadrature:

Gaussian quadrature

Q = 3 Q = 5 Q = 10 Q = 20 Q = 50

β0 -1.52 (0.31) -2.49 (0.39) -0.99 (0.32) -1.54 (0.69) -1.65 (0.43)

β1 -0.39 (0.38) 0.19 (0.36) 0.47 (0.36) -0.43 (0.80) -0.09 (0.57)

β2 -0.32 (0.03) -0.38 (0.04) -0.38 (0.05) -0.40 (0.05) -0.40 (0.05)

β3 -0.09 (0.05) -0.12 (0.07) -0.15 (0.07) -0.14 (0.07) -0.16 (0.07)

σ 2.26 (0.12) 3.09 (0.21) 4.53 (0.39) 3.86 (0.33) 4.04 (0.39)

−2` 1344.1 1259.6 1254.4 1249.6 1247.7

Adaptive Gaussian quadrature

Q = 3 Q = 5 Q = 10 Q = 20 Q = 50

β0 -2.05 (0.59) -1.47 (0.40) -1.65 (0.45) -1.63 (0.43) -1.63 (0.44)

β1 -0.16 (0.64) -0.09 (0.54) -0.12 (0.59) -0.11 (0.59) -0.11 (0.59)

β2 -0.42 (0.05) -0.40 (0.04) -0.41 (0.05) -0.40 (0.05) -0.40 (0.05)

β3 -0.17 (0.07) -0.16 (0.07) -0.16 (0.07) -0.16 (0.07) -0.16 (0.07)

σ 4.51 (0.62) 3.70 (0.34) 4.07 (0.43) 4.01 (0.38) 4.02 (0.38)

−2` 1259.1 1257.1 1248.2 1247.8 1247.8

116

• (Log-)likelihoods are not comparable

• Different Q can lead to considerable differences inestimates and standard errors.

• For example, using non-adaptive quadrature, withQ = 3, we found no difference in time effect betweenboth treatment groups (t = −0.09/0.05, p = 0.0833).

• Using adaptive quadrature, with Q = 50, we find asignificant interaction between the time effect and thetreatment (t = −0.16/0.07, p = 0.0255).

• Assuming that Q = 50 is sufficient, the ‘final’ resultsare well approximated with smaller Q under adaptivequadrature, but not under non-adaptive quadrature.

117

Chapter 13

Variations to a Common Theme

• Numerical integration methods

. SAS procedure NLMIXED

. MIXOR

• Expansion methods

. SAS macro GLIMMIX

• allows for random effects at several levels

• allows for serial correlation

. multilevel modeling and MLwiN

• Fully Bayesian approach

. WinBUGS

118

13.1 Application to the Analgesic Trial

13.1.1 PROC NLMIXED

• Code:proc nlmixed data=gsa npoints=20 noad noadscale tech=newrap;

parms beta0=3 beta1=-0.8 beta2=0.2 beta3=-0.2 su=1;

eta = beta0 + beta1*time + beta2*time2 + beta3*pca0 + u;

expeta = exp(eta);

p = expeta/(1+expeta);

model gsabin ~ binary(p);

random u ~ normal(0,su**2) subject=patid;

estimate ’ICC’ su**2/(arcos(-1)**2/3 + su**2);

run;

• Selected Output:Parameter Estimates

Standard

Parameter Estimate Error DF t Value Pr > |t| Alpha Lower

beta0 4.0474 0.7097 394 5.70 <.0001 0.05 2.6521

beta1 -1.1600 0.4657 394 -2.49 0.0131 0.05 -2.0755

beta2 0.2445 0.09472 394 2.58 0.0102 0.05 0.05826

beta3 -0.2997 0.1428 394 -2.10 0.0365 0.05 -0.5805

su 1.5914 0.2125 394 7.49 <.0001 0.05 1.1736

Parameter Upper Gradient

beta0 5.4427 -1.56E-9

beta1 -0.2445 -2E-9

beta2 0.4307 1.313E-9

beta3 -0.01893 1.468E-9

su 2.0092 -4.52E-9

Additional Estimates

Standard

Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper

ICC 0.4350 0.06564 394 6.63 <.0001 0.05 0.3059 0.5640

119

13.1.2 MIXOR

MIXOR - The program for mixed-effects ordinal regression analysis

(version 2)

Global Satisfaction Assessment

Response function: logistic

Random-effects distribution: normal

Covariate(s) and random-effect(s) mean subtracted from thresholds

==> positive coefficient = positive association between regressor

and ordinal outcome

Numbers of observations

-----------------------

Level 1 observations = 1137

Level 2 observations = 395

The number of level 1 observations per level 2 unit are:

1 1 4 2 1 4 3 4 3 3 4 4 4 3 1 1 1 3 2

...

Descriptive statistics for all variables

----------------------------------------

Variable Minimum Maximum Mean Stand. Dev.

GSAbin 0.00000 1.00000 0.81882 0.38534

intcpt 1.00000 1.00000 1.00000 0.00000

Time 1.00000 4.00000 2.25330 1.12238

Time2 1.00000 16.00000 6.33597 5.55444

PCA0 1.00000 5.00000 3.02375 0.89519

Categories of the response variable GSAbin

--------------------------------------------

Category Frequency Proportion

0.00 206.00 0.18118

1.00 931.00 0.81882

120

Starting values

---------------

mean 1.022

covariates 0.295 -0.066 0.079

var. terms 0.574

==> The number of level 2 observations with non-varying responses

= 284 ( 71.90 percent )

---------------------------------------------------------

* Final Results - Maximum Marginal Likelihood Estimates *

---------------------------------------------------------

Total Iterations = 10

Quad Pts per Dim = 20

Log Likelihood = -506.275

Deviance (-2logL) = 1012.549

Ridge = 0.000

Variable Estimate Stand. Error Z p-value

-------- ------------ ------------ ------------ ------------

intcpt 4.04741 0.71278 5.67835 0.00000 (2)

Time -1.16003 0.47453 -2.44457 0.01450 (2)

Time2 0.24449 0.09678 2.52624 0.01153 (2)

PCA0 -0.29971 0.15375 -1.94932 0.05126 (2)

Random effect variance term (standard deviation)

intcpt 1.59139 0.20578 7.73355 0.00000 (1)

note: (1) = 1-tailed p-value

(2) = 2-tailed p-value

Calculation of the intracluster correlation

-------------------------------------------

residual variance = pi*pi / 3 (assumed)

cluster variance = (1.591 * 1.591) = 2.533

intracluster correlation = 2.533 / ( 2.533 + (pi*pi/3)) = 0.435

121

13.1.3 PQL2 (MLwiN) Without and WithExtra-dispersion Parameter

122

13.1.4 PQL (MLwiN) Without and WithExtra-dispersion Parameter

123

13.1.5 PQL (GLIMMIX)

• Code:

%glimmix(data=gsa, procopt=%str(method=ml noclprint covtest),

stmts=%str(


model gsabin = time|time pca0 / s;

repeated timecls / sub=patid type=un rcorr=3;

),

error=binomial);

• Output:The Mixed Procedure

Covariance Parameter Estimates

Standard Z

Cov Parm Subject Estimate Error Value Pr Z

Intercept PATID 3.1651 0.3911 8.09 <.0001

Residual 0.4954 0.02521 19.65 <.0001

Solution for Fixed Effects

Standard

Effect Estimate Error DF t Value Pr > |t|

Intercept 4.0292 0.5476 394 7.36 <.0001

TIME -1.2788 0.3341 739 -3.83 0.0001

TIME*TIME 0.2590 0.06800 739 3.81 0.0002

pca0 -0.2922 0.1300 739 -2.25 0.0249

Type 3 Tests of Fixed Effects

Num Den

Effect DF DF F Value Pr > F

TIME 1 739 14.65 0.0001

TIME*TIME 1 739 14.50 0.0002

pca0 1 739 5.05 0.0249

124

13.1.6 Overview of Results

GLIM- NL- MLwiN

Parameter logistic GEE MIX MIXED MIXOR (PQL1) (PQL2)

Interc. β0 2.802 2.873 4.029 4.047 4.047 3.021 4.067

(0.490) (0.484;0.459) (0.548) (0.710) (0.713) (0.547) (0.703)

Time β1 -0.786 -0.778 -1.279 -1.160 -1.160 -0.868 -1.172

(0.387) (0.328;0.323) (0.334) (0.466) (0.475) (0.405) (0.477)

Time2 β2 0.177 0.167 0.259 0.245 0.244 0.189 0.245

(0.079) (0.067;0.066) (0.068) (0.095) (0.097) (0.083) (0.098)

Base β3 -0.206 -0.228 -0.292 -0.300 -0.300 -0.223 -0.309

(0.086) (0.103;0.096) (0.130) (0.143) (0.154) (0.108) (0.150)

R.I. var. σ2 3.165 2.533 1.019 2.594

(0.391) (0.676) (0.248) (0.473)

R.I. std. σ 1.591 1.591

(0.213) (0.206)

resid.var. - 0.495

(0.025)

ICC 0.435 0.435

(0.066)

• A wall between marginal and random-effects model(GLMM)

• Numerical integration based methods to be preferred

• Some approximations are poor

• What model/type of inference to choose ?

125

Chapter 14

Marginal Versus Random-effectsInference

model family

↙ ↘marginal random-effects

model model

↓ ↓inference inference

↙ ↘ ↙ ↘likelihood GEE marginal hierarchical

↓ ↓ ↓ ↓βM βM βRE (βRE, bi)

↓ ↓“βM” “βM”

126

14.1 Marginal Interpretation

• We compare our GLMM results for the toenail datawith those from fitting GEE’s (unstructured workingcorrelation):

GLMM GEE

Parameter Estimate (s.e.) Estimate (s.e.)

Intercept group A −1.6308 (0.4356) −0.7219 (0.1656)

Intercept group B −1.7454 (0.4478) −0.6493 (0.1671)

Slope group A −0.4043 (0.0460) −0.1409 (0.0277)

Slope group B −0.5657 (0.0601) −0.2548 (0.0380)

• The strong differences can be explained as follows:

. Consider the following GLMM:

Yij|bi ∼ Bernoulli(πij)

log

πij

1 − πij

= β0 + bi + β1tij

127

. The conditional means E(Yij|bi), as functions oftij, are given by

E(Yij|bi) =exp(β0 + bi + β1tij)

1 + exp(β0 + bi + β1tij)

. Graphical representation:

128

. The marginal average evolution is now opbtainedfrom averaging over the random effects:

E(Yij) = E[E(Yij|bi)]

= E

exp(β0 + bi + β1tij)

1 + exp(β0 + bi + β1tij)

6= exp(β0 + β1tij)

1 + exp(β0 + β1tij)

. Graphical representation:

129

• Hence, the parameter vector β in the GEE modelneeds to be interpreted completely different from theparameter vector β in the GLMM:

. GEE: marginal interpretation

. GLMM: conditional interpretation, conditionallyupon level of random effects

• In general, the model for the marginal average is notof the same parametric form as the conditionalaverage in the GLMM.

• For logistic mixed models, with normally distributedrandom random intercepts, it can be shown that themarginal model can be well approximated by again alogistic model, but with parameters approximatelysatisfying

β

RE

β

M=

√c2σ2 + 1 > 1

σ2 = variance random intercepts

c2 = 16√

3/15π

130

• For the toenail application, σ was estimated as4.0164, such that the ratio equals√c2σ2 + 1 = 2.5649.

• The ratio’s between the GLMM and GEE estimatesare:

GLMM GEE

Parameter Estimate (s.e.) Estimate (s.e.) Ratio

Intercept group A −1.6308 (0.4356) −0.7219 (0.1656) 2.2590

Intercept group B −1.7454 (0.4478) −0.6493 (0.1671) 2.6881

Slope group A −0.4043 (0.0460) −0.1409 (0.0277) 2.8694

Slope group B −0.5657 (0.0601) −0.2548 (0.0380) 2.2202

• Note that this problem does not occur in linear mixedmodels:

. Conditional mean: E(Yi|bi) = Xiβ + Zibi

. Specifically: E(Yi|bi = 0) = Xiβ

. Marginal mean: E(Yi) = Xiβ

131

• The problem arises from the fact that, in general,

E[g(Y )] 6= g[E(Y )]

• So, whenever the random effects enter the conditionalmean in a non-linear way, the regression parameters inthe marginal model need to be interpreted differentlyfrom the regression parameters in the mixed model.

• In practice, the marginal mean can be derived fromthe GLMM output by integrating out the randomeffects.

• This can be done numerically via Gauss quadrature,or based on sampling methods.

132

14.2 Sampling Based Marginalizationon Toenail Data

• As an example, we plot the average evolutions basedon the GLMM output obtained in the toenail example:

P (Yij = 1)

=

E

exp(−1.6308 + bi − 0.4043tij)

1 + exp(−1.6308 + bi − 0.4043tij)

, Treatment A

E

exp(−1.7454 + bi − 0.5657tij)

1 + exp(−1.7454 + bi − 0.5657tij)

, Treatment B

133

• Output:

• The code can easily be extended to models which(also) include random effects other than intercepts.

134

• Average evolutions obtained from the GEE analyses:

P (Yij = 1)

=

exp(−0.7219 − 0.1409tij)

1 + exp(−0.7219 − 0.1409tij), Treatment A

exp(−0.6493 − 0.2548tij)

1 + exp(−0.6493 − 0.2548tij), Treatment B

• Graphical representation:

135

• In a GLMM context, rather than plotting the marginalaverages, one can also plot the profile for an ‘average’subject, i.e., a subject with random effect bi = 0:

P (Yij = 1|bi = 0)

=

exp(−1.6308 − 0.4043tij)

1 + exp(−1.6308 − 0.4043tij), Treatment A

exp(−1.7454 − 0.5657tij)

1 + exp(−1.7454 − 0.5657tij), Treatment B

• Graphical representation:

136

Chapter 15

Key Points

Linear versus Generalized Linear

Distribution: Normal versus exponential family

Mean-variance link: absent versus present

Link: linear versus non-linear

Random effects and measurement error: same place versus

different place

Need for numerical approximation/integration: absent versus

present

Model Families

• Marginal model: β has a population-averaged interpretation

• Random effects model: β conditional upon level of random

effect

137

Chapter 16

Missing Data

• Subject i at occasion (time) j = 1, . . . , ni

• Measurement Yij

• Dropout indicator

Rij =

1 if Yij is observed,

0 otherwise.

• Group Yij into a vector

Y i = (Yi1, . . . , Yini)′ = (Y o

i ,Ymi )

Y oi contains Yij for which Rij = 1,

Y mi contains Yij for which Rij = 0.

• Group Rij into a vector Ri = (Ri1, . . . , Rini)′

• Di: time of dropout: Di = 1 +∑ni

j=1 Rij

138

16.1 Factorizing the Distribution

Consider the distribution of the full data:

f (Yi, Di|θ,ψ)

• θ parametrizes the measurement distribution,

• ψ parametrizes the missingness process.

Several routes are possible:

Selection Models:

f (Yi|θ)f (Di|Yi,ψ)

Pattern-Mixture Models:

f (Yi|Di, θ)f (Di|ψ)

Shared parameter models:

f (Yi, Di|bi, θ,ψ)

139

16.2 Missing Data Processes

f (Di|Y i,ψ) = f (Di|Y oi ,Y

mi ,ψ).

Missing Completely At Random (MCAR)Missingness is independent of the measurements:

f (Di|ψ).

Missing At Random (MAR) Missingness isindependent of the unobserved (missing)measurements, possibly depending on the observedmeasurements:

f (Di|Y oi ,ψ).

Missing Not At Random (MNAR) Missingnessdepends on the missing values.

Ignorability:

• Likelihood/Bayesian + MAR

• Frequentist + MCAR

140

16.3 Growth Data

• Taken from Potthoff and Roy, Biometrika (1964)


Is dental growth related to gender ?

• The distance from the center of the pituitary to themaxillary fissure was recorded at ages 8, 10, 12, and14, for 11 girls and 16 boys

141


• Remarks:

. Much variability between girls / boys

. Considerable variability within girls / boys

. Fixed number of measurements per subject

. Measurements taken at fixed timepoints

142

16.4 Model Fitting

Mean Covar par −2` Ref G2 df p

1 unstr. unstr. 18 416.509

2 6= slopes unstr. 14 419.477 1 2.968 4 0.5632

3 = slopes unstr. 13 426.153 2 6.676 1 0.0098

4 6= slopes Toepl. 8 424.643 2 5.166 6 0.5227

5 6= slopes AR(1) 6 440.681 2 21.204 8 0.0066

4 16.038 2 0.0003

6 6= slopes random 8 427.806 2 8.329 6 0.2150

7 6= slopes CS 6 428.639 2 9.162 8 0.3288

4 3.996 2 0.1356

6 0.833 2 0.6594

6 0.833 1:2 0.5104

8 6= slopes simple 5 478.242 7 49.603 1 <0.0001

7 49.603 0:1 <0.0001

143

Chapter 17

Simple Methods

MCAR

• Rectangular matrix by deletion: complete case

analysis

• Rectangular matrix by completion or imputation

. Vertical: Unconditional mean imputation

. Horizontal: Last observation carried forward

. Diagonal: Conditional mean imputation

• Using data as is: available case analysis

. Likelihood-based frequentist analysis: difficult, notgenerally valid

. Likelihood-based MAR analysis: simple and correct

144

17.1 Simple Methods and Growth Data

Method Model Mean Covar. par

Complete case 7 = slopes CS 6

LOCF 2c quadratic unstr. 16

Unconditional mean 7 = slopes CS 6

Conditional mean 1 unstr. unstr. 18

Simplest methods usually distorting.

145

Chapter 18

Three Likelihood Approaches

MAR

• Direct likelihood maximization

. continuous: SAS PROC MIXED,. . .

. categorical: generalized linear mixed models (SAS PROCNLMIXED,. . . )

. NOT: generalized estimating equations !!!

• EM algorithm

Match the data to the “complete” model

• Multiple Imputation

Accounts properly for uncertainty due to missingness

146

Chapter 19

Direct Likelihood Approach

MAR

Likelihood based inference is valid, whenever

• the mechanism is MAR,

• the parameters describing the missingness mechanismare distinct from the measurement model parameters.

PROC MIXED gives valid inference !

147

19.1 Growth Data

Mean Covar par −2λ Ref G2 df P

1 unstr. unstr. 18 386.957

2 6= slopes unstr. 14 393.288 1 6.331 4 0.1758

3 = slopes unstr. 13 397.400 2 4.112 1 0.0426

4 6= slopes banded 8 398.030 2 4.742 6 0.5773

5 6= slopes AR(1) 6 409.523 2 16.235 8 0.0391

6 6= slopes random 8 400.452 2 7.164 6 0.3059

7 6= slopes CS 6 401.313 6 0.861 2 0.6502

6 0.861 1:2 0.5018

8 6= slopes simple 5 441.583 7 40.270 1 <0.0001

7 40.270 0:1 <0.0001

148

SAS Code

• For Model 1:

proc mixed data=growthav method=ml;

title ’Jennrich and Schluchter (MAR, Altern.), Model 1’;

class sex idnr age;

model measure=sex age*sex / s;

repeated age / type=un subject=idnr r rcorr;

run;

• For Model 2:

data help;

set growthav;

agec=age;

run;

proc mixed data=help method=ml;

title ’Jennrich and Schluchter (MAR, Altern.), Model 2’;

class sex idnr agec;

model measure=sex age*sex / s;

repeated agec / type=un subject=idnr r rcorr;

run;

149

Graphical Summary

150

Chapter 20

Weighted Estimating Equations

MAR and non-ignorable !

• Standard GEE inference is correct only under MCAR.

• Under MAR: weighted GEE is necessary.

Robins, Rotnitzky & Zhao (JASA, 1995)

Fitzmaurice, Molenberghs & Lipsitz (JRSSB, 1995)

151

• The idea is to weight each subject’s contribution inthe GEEs by the inverse probability that a subjectdrops out at the time he dropped out.

• This can be calculated as

P [Di = di] =di−1∏

k=2(1 − P [Rik = 0|Ri2 = . . . = Ri,k−1 = 1]) ×

×P [Ridi= 0|Ri2 = . . . = Ri,di−1 = 1]I{di≤T}

• Such a model can be fitted using logistic regressionwith:

. outcome: whether dropout occurs at a given time,yes or no (from the start of the measurementsequence until time of dropout or the end of thesequence)

. covariates: the outcomes at previous occassions,supplemented with genuine covariate information

• Some data manipulation is needed

• Illustration using analgesic trial

152

20.1 A Model for Dropout

• Code to fit appropriate model:

data gsac;

set gsac;

gendis=axis5_0;

run;

proc genmod data=gsac;

class prevgsa;

model dropout = prevgsa pca0 physfct gendis / pred dist=b;

ods output obstats=pred;

ods listing exclude obstats;

run;

• There is some evidence for MAR:

Model on conditional dropout P [Di = j|Di ≥ j]shows dependence on previous GSA measurements.

• Previous GSA is treated as a class level covariate.

• The model further includes: baseline PCA, physicalfunctioning and genetic/congenital disorder.

153

• Selected output:

Class Level Information

Class Levels Values

prevgsa 5 1 2 3 4 5

Response Profile

Ordered Ordered

Level Value Count

1 0 800

2 1 163

Analysis Of Parameter Estimates

Standard Wald 95% Chi-

Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq

Intercept 1 -1.8043 0.4856 -2.7562 -0.8525 13.80 0.0002

prevgsa 1 1 -1.0183 0.4131 -1.8278 -0.2087 6.08 0.0137

prevgsa 2 1 -1.0374 0.3772 -1.7767 -0.2980 7.56 0.0060

prevgsa 3 1 -1.3439 0.3716 -2.0721 -0.6156 13.08 0.0003

prevgsa 4 1 -0.2636 0.3828 -1.0140 0.4867 0.47 0.4910

prevgsa 5 0 0.0000 0.0000 0.0000 0.0000 . .

pca0 1 0.2542 0.0986 0.0609 0.4474 6.65 0.0099

PHYSFCT 1 0.0090 0.0038 0.0015 0.0165 5.54 0.0186

gendis 1 0.5863 0.2420 0.1120 1.0607 5.87 0.0154

Scale 0 1.0000 0.0000 1.0000 1.0000

• The predicted probabilities from this model need tobe translated into weights

154

20.2 Computing the Weights

• Predicted values have been saved by means of ODSOUTPUT in the logistic regression (GENMOD call)

• The weights are now defined at the individualmeasurement level:

. At the first occasion, the weight is w = 1

. At other than the last ocassion, the weight is thealready accumulated weight, multiplied by 1−thepredicted probability

. At the last occasion within a sequence where

dropout occurs the weight is multiplied by thepredicted probability

. At the end of the process, the weight is inverted

• Program statements (creating at the same timeindicators for the dropout patterns out of themissingness patterns, for the case where this could beuseful)

155

data gsaw;

merge pred gsac(drop=dropout);

run;

data wgt(keep=patid wi);

set gsaw;

by patid;

retain wi;

if first.patid then wi=1;

if not last.patid then wi=wi*(1-pred);

if last.patid then do;

if mpattern not in ("----","--*-","-*--","-**-") then wi=wi*pred;

else wi=wi*(1-pred);

wi=1/wi;

output;

end;

run;

data repbin.gsaw;

merge repbin.gsa wgt;

by patid;

dpattern=mpattern;

if mpattern in ("***-","*-*-","*---","-**-","-*--","--*-")

then dpattern="----";

else if mpattern in ("**-*","*--*","-*-*") then dpattern="---*";

else if mpattern in ("*-**") then dpattern="--**";

run;

156

20.3 Weighted Estimating Equations

• Given the preparatory work, we only need to includethe weights by means ofthe SCWGT statement withinthe GENMOD procedure.

• Together with the use of the REPEATED statement,weighted GEE are then obtained.

• Code:

proc genmod data=repbin.gsaw;

scwgt wi;


model gsabin = time|time pca0 / dist=b;

repeated subject=patid / type=un corrw within=timecls;

run;

157

• GEE versus WGEE (empirical s.e.):

Variable GEE WGEE

Intercept 2.950 (0.465) 2.166 (0.694)

Time -0.842 (0.325) -0.437 (0.443)

Time2 0.181 (0.066) 0.120 (0.089)

Baseline PCA -0.244 (0.097) -0.159 (0.130)

GEE

1 0.173 0.246 0.201

1 0.177 0.113

1 0.456

1

WGEE

1 0.215 0.253 0.167

1 0.196 0.113

1 0.409

1

158

Chapter 21

Selection Models for MNAR andSensitivity Analysis

MNAR

Diggle and Kenward Selection Model (1994)

• Linear mixed model for Y i:

Y i = Xiβ + Zibi + εi

• Logistic regressions for dropout:

logit [P (Di = j | Di ≥ j, Yi,j−1, Yij)]

= ψ0 + ψ1Yi,j−1 + ψ2Yij

159

21.1 Mastitis in Dairy Cattle

• Infectious disease of the udder

• Leads to a reduction in milk yield

• There is a belief among dairy scientists that highyielding cows are more susceptible

• But this cannot be measured directly because of theeffect of the disease: evidence is missing

Is the probability of occurrence of mastitis

related to the yield that would have been observed

had mastitis not occured ?

160

Analysis of Diggle and Kenward

• Model for milk yield: f (Yi)

Yi ∼ N (µ,Σ)

• Model for occurrence of mastitis: f (Ri | Yi)

logit [P (Ri = 1 | Yi1, Yi2)] = ψ0 + ψ1Yi1 + ψ2Yi2

• Fitted model for occurrence of mastitis:

logit [P (Ri = 1 | Yi1, Yi2)]

= 0.37 + 2.25Yi1 − 2.54Yi2

= 0.37 − 0.29Yi1 − 2.54(Yi2 − Yi1)

• LR test for H0 : ψ2 = 0 : G2 = 5.11

Diggle and Kenward (1994)

161

21.2 Issues

• Fitting such a model

. OSWALD in SPlus (see book)

. user-defined software

. convergence almost always an issue

. many properties still under study (e.g.,asymptotics)

• Sensitivity !

. How much belief can be put in model and itsconclusions ?

162

Criticism −→ . . .

• Nice to have MNAR tools ?

Rubin (1994):

“. . . , even inferences for the data parameters generally

depend on the posited missingness mechanism, a fact that

typically implies greatly increased sensitivity of

inference. . . ”

Laird (1994):

“. . . , estimating the ‘unestimable’ can be accomplished only

by making modelling assumptions,. . . . The consequences

of model misspecification will (. . . ) be more severe in the

non-random case.”

Little (1995):

“. . . , but estimates rely heavily on normal assumptions and

the correct specification of the dropout model. . . ”

Molenberghs, Kenward & Lesaffre (1997):

“. . . conclusions are conditional on the appropriateness of the

assumed model, which in a fundamental sense is not

testable.”

• Forget about MNAR ?

163

. . .−→ Sensitivity Analysis

• Fit several plausible models

• Ranges of inference: Interval of Ignorance

• Change distributional assumptions (Kenward 1998)

• Local influence

• Anomalous subjects/measures

• Influential observations

• Semi-parametric framework (Scharfstein et al 1999)

• Pattern-mixture models

• Flexible and general modeling framework

164

longitudinal data analysis · • this way, the analysis of longitudinal data is reduced to the...

Documents