longitudinal data analysis · • this way, the analysis of longitudinal data is reduced to the...
TRANSCRIPT
Longitudinal Data
Analysis
Geert Verbeke
Geert Molenberghs
Food and Drug Administration
Friday, August 1, 2003
Contents
1 Related References 1
2 Introduction 5
2.1 Repeated Measures / Longitudinal Data . . . . . . . . . . . . . . . . . . . . . 6
2.2 Rat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.3 The BLSA Prostate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Simple Methods 14
3.1 Summary Statistics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
3.2 Disadvantages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
4 A Model for Longitudinal Data 17
4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18
4.2 A 2-stage Model Formulation . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.1 Stage 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
4.2.2 Stage 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21
i
4.3 Example: The Rat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
4.4 The General Linear Mixed-effects Model . . . . . . . . . . . . . . . . . . . . . 26
4.5 Hierarchical versus Marginal Model . . . . . . . . . . . . . . . . . . . . . . . . 28
4.6 Example: The Rat Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
4.7 A Model for the Residual Covariance Structure . . . . . . . . . . . . . . . . . . 34
5 Estimation and Inference in the Marginal Model 39
5.1 ML and REML Estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
5.1.2 Maximum Likelihood Estimation (ML) . . . . . . . . . . . . . . . . . . 43
5.1.3 Restricted Maximum Likelihood Estimation (REML) . . . . . . . . . . . 44
5.2 Fitting Linear Mixed Models in SAS . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Hierarchical versus Marginal Model . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.1 Example 1: The rat data . . . . . . . . . . . . . . . . . . . . . . . . . 49
5.3.2 Example 2: Bivariate Observations . . . . . . . . . . . . . . . . . . . . 54
5.4 Inference for the Mean Structure . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.4.2 Approximate Wald Test . . . . . . . . . . . . . . . . . . . . . . . . . . 57
5.4.3 Approximate t-test and F -test . . . . . . . . . . . . . . . . . . . . . . . 58
5.4.4 Robust Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 59
ii
5.4.5 Likelihood Ratio Test . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
5.5 Inference for the Variance Components . . . . . . . . . . . . . . . . . . . . . . 65
5.5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
5.5.2 Caution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
6 Inference for the Random Effects 68
6.1 Empirical Bayes Inference . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
6.2 Example: Prostate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73
6.3 Shrinkage Estimators bi . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.4 Example: Prostate Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78
6.5 The Normality Assumption for Random Effects . . . . . . . . . . . . . . . . . . 79
7 Non-Continuous Data: Motivating Studies 82
7.1 Toenail Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82
7.2 The Analgesic Trial . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 84
8 Generalized Linear Models 86
8.1 The Bernoulli Model . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87
8.2 Generalized Linear Models (GLM) . . . . . . . . . . . . . . . . . . . . . . . . . 89
9 Parametric Modeling Families 90
9.1 Continuous Outcomes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 90
iii
9.2 Longitudinal Generalized Linear Models . . . . . . . . . . . . . . . . . . . . . . 91
9.2.1 Marginal Models . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
9.2.2 Random-effects Models . . . . . . . . . . . . . . . . . . . . . . . . . . 93
10 Generalized Estimating Equations 94
10.1 Large Sample Properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
10.2 Unknown Covariance Structure . . . . . . . . . . . . . . . . . . . . . . . . . . 97
10.3 The Sandwich Estimator . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
10.4 The Working Correlation Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . 99
10.5 Estimation of Working Correlation . . . . . . . . . . . . . . . . . . . . . . . . . 100
10.6 Fitting GEE . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 101
10.7 Application to Analgesic Trial . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.8 Code and Output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
10.9 Comparison of GEE Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
11 Random-Effects Models 105
12 Generalized Linear Mixed Models 106
12.1 Generalized Linear Mixed Models (GLMM) . . . . . . . . . . . . . . . . . . . . 107
12.2 Numerical Quadrature . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
12.3 Application to Toenail Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
iv
12.3.1 SAS PROC NLMIXED Program . . . . . . . . . . . . . . . . . . . . . . 113
12.3.2 SAS output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
12.3.3 Accuracy of Numerical Integration . . . . . . . . . . . . . . . . . . . . 116
13 Variations to a Common Theme 118
13.1 Application to the Analgesic Trial . . . . . . . . . . . . . . . . . . . . . . . . . 119
13.1.1 PROC NLMIXED . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
13.1.2 MIXOR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
13.1.3 PQL2 (MLwiN) Without and With Extra-dispersion Parameter . . . . . . 122
13.1.4 PQL (MLwiN) Without and With Extra-dispersion Parameter . . . . . . 123
13.1.5 PQL (GLIMMIX) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
13.1.6 Overview of Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
14 Marginal Versus Random-effects Inference 126
14.1 Marginal Interpretation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
14.2 Sampling Based Marginalization on Toenail Data . . . . . . . . . . . . . . . . . 133
15 Key Points 137
16 Missing Data 138
16.1 Factorizing the Distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
16.2 Missing Data Processes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
v
16.3 Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
16.4 Model Fitting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
17 Simple Methods 144
17.1 Simple Methods and Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . 145
18 Three Likelihood Approaches 146
19 Direct Likelihood Approach 147
19.1 Growth Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 148
20 Weighted Estimating Equations 151
20.1 A Model for Dropout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153
20.2 Computing the Weights . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
20.3 Weighted Estimating Equations . . . . . . . . . . . . . . . . . . . . . . . . . . 157
21 Selection Models for MNAR and Sensitivity Analysis 159
21.1 Mastitis in Dairy Cattle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
21.2 Issues . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
vi
Chapter 1
Related References
• Aerts, M., Geys, H., Molenberghs, G., and Ryan, L.M.(2002). Topics in Modelling of Clustered Data.London: Chapman and Hall.
• Brown, H. and Prescott, R. (1999). Applied Mixed
Models in Medicine. New-York: John Wiley & Sons.
• Crowder, M.J. and Hand, D.J. (1990). Analysis of
Repeated Measures. London: Chapman and Hall.
• Davidian, M. and Giltinan, D.M. (1995). Nonlinear
Models For Repeated Measurement Data. London:Chapman and Hall.
1
• Davis, C.S. (2002). Statistical Methods for the
Analysis of Repeated Measurements. New York:Springer-Verlag.
• Diggle, P.J., Heagerty, P.J., Liang, K.Y. and Zeger,S.L. (2002). Analysis of Longitudinal Data. (2ndedition). Oxford: Oxford University Press.
• Fahrmeir, L. and Tutz, G. (2002). Multivariate
Statistical Modelling Based on Generalized Linear
Models , (2nd edition). Springer Series in Statistics.New-York: Springer-Verlag.
• Goldstein, H. (1979). The Design and Analysis of
Longitudinal Studies. London: Academic Press.
• Goldstein, H. (1995). Multilevel Statistical Models.
London: Edward Arnold.
• Hand, D.J. and Crowder, M.J. (1995). Practical
Longitudinal Data Analysis. London: Chapman andHall.
2
• Jones, B. and Kenward, M.G. (1989). Design and
Analysis of Crossover Trials. London: Chapman andHall.
• Kshirsagar, A.M. and Smith, W.B. (1995). Growth
Curves. New-York: Marcel Dekker.
• Lindsey, J.K. (1993). Models for Repeated
Measurements. Oxford: Oxford University Press.
• Longford, N.T. (1993). Random Coefficient Models.
Oxford: Oxford University Press.
• McCullagh, P. and Nelder, J.A. (1989). Generalized
Linear Models (second edition). London: Chapmanand Hall.
• Molenberghs, G. and Verbeke, G. (2004). Discrete
Repeated Measures. New York: Springer-Verlag (inpreparation).
3
• Pinheiro, J.C. and Bates D.M. (2000). Mixed effects
models in S and S-Plus, Springer Series in Statisticsand Computing. New-York: Springer-Verlag.
• Searle, S.R., Casella, G., and McCulloch, C.E. (1992).Variance Components. New-York: Wiley.
• Senn, S.J. (1993). Cross-over Trials in Clinical
Research. Chichester: Wiley.
• Verbeke, G. and Molenberghs, G. (1997). Linear
Mixed Models In Practice: A SAS Oriented Approach,Lecture Notes in Statistics 126. New-York:Springer-Verlag.
• Verbeke, G. and Molenberghs, G. (2000). Linear
Mixed Models for Longitudinal Data. Springer Seriesin Statistics. New-York: Springer-Verlag.
• Vonesh, E.F. and Chinchilli, V.M. (1997). Linear and
Non-linear Models for the Analysis of Repeated
Measurements. Marcel Dekker: Basel.
4
Chapter 2
Introduction
• Repeated Measures / Longitudinal data
• Examples
5
2.1 Repeated Measures / LongitudinalData
Repeated measures are obtained whena response is measured repeatedly
on a set of units
• Units:
. Subjects, patients, participants, . . .
. Animals, plants, . . .
. Clusters: families, towns, branches of acompany,. . .
. . . .
• Special case: Longitudinal data
6
2.2 Rat Data
• Department of dentistry, K.U.Leuven
• Research question:
How does craniofacial growth depend ontestosteron production ?
• Randomized experiment in which 50 male Wistar ratsare randomized to:
. Control (15 rats)
. Low dose of Decapeptyl (18 rats)
. High dose of Decapeptyl (17 rats)
• Treatment starts at the age of 45 days.
7
• Measurements taken every 10 days, from day 50 on.
• The responses are distances (pixels) between welldefined points on x-ray pictures of the skull of eachrat:
• Measurements with respect to the roof, base andheight of the skull.
• Here, we consider only one response, reflecting theheight of the skull.
8
• Individual profiles:
• Complication: Dropout due to anaesthesia (56%):
# Observations
Age (days) Control Low High Total
50 15 18 17 50
60 13 17 16 46
70 13 15 15 43
80 10 15 13 38
90 7 12 10 29
100 4 10 10 24
110 4 8 10 22
9
• Remarks:
. Much variability between rats
. Much less variability within rats
. Fixed number of measurements scheduled persubject, but not all measurements available due todropout, for known reason.
. Measurements taken at fixed timepoints
10
2.3 The BLSA Prostate Data
• References:
. Carter et al., Cancer Research (1992).
. Carter et al., Journal of the American MedicalAssociation (1992).
. Morrell et al., Journal of the American StatisticalAssociation (1995).
. Pearson et al., Statistics in Medicine (1994).
• Prostate disease is one of the most common and mostcostly medical problems in the United States
• Important to look for markers which can detect thedisease at an early stage
• Prostate-Specific Antigen is an enzyme produced byboth normal and cancerous prostate cells
• PSA level is related to the volume of prostate tissue.
11
• Problem: Patients with Benign ProstaticHyperplasia also have an increased PSA level
• Overlap in PSA distribution for cancer and BPH casesseriously complicates the detection of prostate cancer.
• Research question (hypothesis based on clinicalpractice):
Can longitudinal PSA profiles be used todetect prostate cancer in an early stage ?
• A retrospective case-control study based on frozenserum samples:
. 16 control patients
. 20 BPH cases
. 14 local cancer cases
. 4 metastatic cancer cases
12
• Complication: No perfect match for age at diagnosisand years of follow-up possible
• Individual profiles:
• Remarks:
. Much variability between subjects
. Little variability within subjects
. Highly unbalanced data
13
Chapter 3
Simple Methods
• Summary statistics
• Disadvantages
14
3.1 Summary Statistics
• General procedure :
. Step 1 : Summarize data of each subject intoone statistic, a summary statistic
. Step 2 : Analyze the summary statistics, e.g.analysis of covariance to compare groups aftercorrection for important covariates
• This way, the analysis of longitudinal data is reducedto the analysis of independent observations, for whichclassical statistical procedures are available.
• Examples:
. Analysis at each time point separately
. Analysis of Area Under the Curve (AUC)
. Analysis of endpoints
. Analysis of increments
. Analysis of covariance
15
3.2 Disadvantages
• (Lots of) information is lost
• They often do not allow to draw conclusions aboutthe way the endpoint has been reached:
• Results are not always meaningful, e.g., analysis ofcovariance
16
Chapter 4
A Model for Longitudinal Data
• Introduction
• The 2-stage model formulation
• Example: Rat data
• The general linear mixed-effects model
• Hierarchical versus marginal model
• Example: Rat data
• A model for the residual covariance structure
17
4.1 Introduction
• In practice: often unbalanced data:
. unequal number of measurements per subject
. measurements not taken at fixed time points
• Therefore, multivariate regression techniques are oftennot applicable
• Often, subject-specific longitudinal profiles can bewell approximated by linear regression functions
• This leads to a 2-stage model formulation:
. Stage 1: Linear regression model for each subjectseparately
. Stage 2: Explain variability in the subject-specificregression coefficients using known covariates
18
4.2 A 2-stage Model Formulation
4.2.1 Stage 1
• Response Yij for ith subject, measured at time tij,i = 1, . . . , N , j = 1, . . . , ni
• Response vector Yi for ith subject:
Yi = (Yi1, Yi2, . . . , Yini)′
• Stage 1 model:
Yi = Ziβi + εi
• Zi is a (ni × q) matrix of known covariates
• βi is a q-dimensional vector of subject-specificregression coefficients
19
• εi ∼ N (0,Σi), often Σi = σ2Ini
• Note that the above model describes the observedvariability within subjects
20
4.2.2 Stage 2
• Between-subject variability can now be studied fromrelating the βi to known covariates
• Stage 2 model:
βi = Kiβ + bi
• Ki is a (q × p) matrix of known covariates
• β is a p-dimensional vector of unknown regressionparameters
• bi ∼ N (0, D)
21
4.3 Example: The Rat Data
• Individual profiles:
22
• Transformation of the time scale to linearize theprofiles:
Ageij −→ tij = ln[1 + (Ageij − 45)/10)]
• Note that t = 0 corresponds to the start of thetreatment (moment of randomization)
• Stage 1 model:
Yij = β1i + β2itij + εij, j = 1, . . . , ni,
• Matrix notation:
Yi = Ziβi + εi
• Zi matrix:
Zi =
1 ti1
1 ti2
... ...
1 tini
23
• In the second stage, the subject-specific interceptsand time effects are related to the treatment of therats
• Stage 2 model:
β1i = β0 + b1i,
β2i = β1Li + β2Hi + β3Ci + b2i,
• Li, Hi, and Ci are indicator variables:
Li =
1 if low dose
0 otherwise
Hi =
1 if high dose
0 otherwise
Ci =
1 if control
0 otherwise
24
• Parameter interpretation:
. β0: average response at the start of the treatment(independent of treatment)
. β1, β2, and β3: average time effect for eachtreatment group
25
4.4 The General Linear Mixed-effectsModel
• A 2-stage approach can be performed explicitly in theanalysis
• However, this is just another example of the use ofsummary statistics:
. Yi is summarized by βi
. summary statistics βi analysed in second stage
• The associated drawbacks can be avoided bycombining the two stages into one model:
Yi = Ziβi + εiβi = Kiβ + bi
=⇒ Yi = ZiKiβ + Zibi + εi
= Xiβ + Zibi + εi
26
• General linear mixed-effects model:
Yi = Xiβ + Zibi + εi
bi ∼ N (0, D),
εi ∼ N (0,Σi),
b1, . . . , bN, ε1, . . . , εN independent,
• Terminology:
. Fixed effects: β
. Random effects: bi
. Variance components: elements in D and Σi
27
4.5 Hierarchical versus Marginal Model
• The general linear mixed model is given by:
Yi = Xiβ + Zibi + εi
bi ∼ N (0, D),
εi ∼ N (0,Σi),
b1, . . . , bN, ε1, . . . , εN independent,
• It can be rewritten as:
Yi|bi ∼ N (Xiβ + Zibi,Σi)
bi ∼ N (0, D)
28
• It is therefore also called a hierarchical model:
. A model for Yi given bi
. A model for bi
• Marginally, we have that Yi is distributed as:
Yi ∼ N (Xiβ, ZiDZ′i + Σi)
• Hence, very specific assumptions are made about thedependence of mean and covariance onn thecovariates Xi and Zi:
. Implied mean : Xiβ
. Implied covariance : Vi = ZiDZ′i + Σi
• Note that the hierarchical model implies the marginalone, not vice versa
29
4.6 Example: The Rat Data
• Stage 1 model:
Yij = β1i + β2itij + εij, j = 1, . . . , ni,
• Stage 2 model:
β1i = β0 + b1i,
β2i = β1Li + β2Hi + β3Ci + b2i,
• Combined:
Yij = (β0 + b1i) + (β1Li + β2Hi + β3Ci + b2i)tij + εij
=
β0 + b1i + (β1 + b2i)tij + εij, if low dose
β0 + b1i + (β2 + b2i)tij + εij, if high dose
β0 + b1i + (β3 + b2i)tij + εij, if control.
30
• Implied marginal mean structure:
. Linear average evolution in each group
. Equal average intercepts
. Different average slopes
• Implied marginal covariance structure (Σi = σ2Ini):
Cov(Yi(t1),Yi(t2))
= 1 t1
D
1
t2
+ σ2δ{t1,t2}
= d22t1 t2 + d12(t1 + t2) + d11 + σ2δ{t1,t2}.
• Note that the model implicitly assumes that thevariance function is quadratic over time, with positivecurvature d22.
31
• A model which assumes that all variability insubject-specific slopes can be ascribed to treatmentdifferences can be obtained by omitting the randomslopes b2i from the above model:
Yij = (β0 + b1i) + (β1Li + β2Hi + β3Ci)tij + εij
=
β0 + b1i + β1tij + εij, if low dose
β0 + b1i + β2tij + εij, if high dose
β0 + b1i + β3tij + εij, if control.
• This is the so-called random-intercepts model
• The same marginal mean structure is obtained asunder the model with random slopes
32
• Implied marginal covariance structure (Σi = σ2Ini):
Cov(Yi(t1),Yi(t2))
= 1
D
1
+ σ2δ{t1,t2}
= d11 + σ2δ{t1,t2}.
• Hence, the implied covariance matrix is compoundsymmetry:
. constant variance d11 + σ2
. constant correlation ρI = d11/(d11 + σ2) betweenany two repeated measurements within the samerat
33
4.7 A Model for the ResidualCovariance Structure
• Often, Σi is taken equal to σ2Ini
• We then obtain conditional independence:Conditional on bi, the elements in Yi are independent
• In the presence of no, or little, random effects,conditional independence is often unrealistic
• For example, the random intercepts model not onlyimplies constant variance, it also implicitly assumesconstant correlation between any two measurementswithin subjects.
• Hence, when there is no evidence for (additional)random effects, or if they would have no substantivemeaning, the correlation structure in the data can beaccounted for in an appropriate model for Σi
34
• Frequently used model:
Yi = Xiβ + Zibi + ε(1)i + ε(2)i︸ ︷︷ ︸
↓εi
• 3 stochastic components:
. bi: between-subject variability
. ε(1)i: measurement error
. ε(2)i: serial correlation component
• ε(2)i represents the belief that part of an individual’sobserved profile is a response to time-varyingstochastic processes operating within that individual.
• This results in a correlation between serialmeasurements, which is usually a decreasing functionof the time separation between these measurements
35
• Graphical representation of 3 stochastic components:
• The correlation matrix Hi of ε(2)i assumed to have(j, k) element of the form hijk = g(|tij − tik|) forsome decreasing function g(·) with g(0) = 1
• Frequently used functions g(·):. Exponential serial correlation: g(u) = exp(−φu)
. Gaussian serial correlation: g(u) = exp(−φu2)
36
• Graphical representation (φ = 1):
• Extreme cases:
. φ = +∞: components in ε(2)i independent
. φ = 0: components in ε(2)i perfectly correlated
• In general, the smaller φ, the stronger is the serialcorrelation.
37
• Resulting final linear mixed model:
Yi = Xiβ + Zibi + ε(1)i + ε(2)i
bi ∼ N (0, D)
ε(1)i ∼ N (0, σ2Ini)
ε(2)i ∼ N (0, τ 2Hi)
independent
38
Chapter 5
Estimation and Inference in the MarginalModel
• ML and REML estimation
• Fitting linear mixed models in SAS
• Estimation problems
• Inference for mean structure
• Inference for variance components
39
5.1 ML and REML Estimation
5.1.1 Introduction
• Recall that the general linear mixed model equals
Yi = Xiβ + Zibi + εi
bi ∼ N (0, D)
εi ∼ N (0,Σi)
independent
• The implied marginal model equals
Yi ∼ N (Xiβ, ZiDZ′i + Σi).
• Note that inferences based on the marginal model donot explicitly assume the presence of random effectsrepresenting the natural heterogeneity betweensubjects
40
• Notation:
. β: vector of fixed effects (as before)
. α: vector of all variance components in D and Σi
. θ = (β′,α′)′: vector of all parameters in marginalmodel
• Marginal likelihood function:
LML(θ) =N∏
i=1
(2π)−ni/2 |Vi(α)|−
1
2
× exp
−1
2(Yi − Xiβ)′ V −1
i (α) (Yi − Xiβ)
• If α were known, MLE of β equals
β(α) =
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWiyi,
where Wi equals V −1i .
41
• In most cases, α is not known, and needs to bereplaced by an estimate α
• Two frequently used estimation methods for α:
. Maximum likelihood
. Restricted maximum likelihood
42
5.1.2 Maximum Likelihood Estimation (ML)
• αML obtained from maximizing
LML(α,β(α))
with respect to α
• The resulting estimate β(αML) for β will be denoted
by βML
• αML and βML can also be obtained from maximizing
LML(θ) with respect to θ, i.e., with respect to α andβ simultaneously.
43
5.1.3 Restricted Maximum LikelihoodEstimation (REML)
• Correct for ‘loss of degrees of freedom’ in estimationof β
• We first combine all models
Yi ∼ N (Xiβ, Vi)
into one model
Y ∼ N (Xβ, V )
in which
Y =
Y1
...
YN
, X =
X1
...
XN
, V (α) =
V1 · · · 0
... . . . ...
0 · · · VN
• Further, the data are transformed orthogonal to X :
U = A′Y ∼ N (0, A′V (α)A)
44
• The MLE of α, based on U is called the REMLestimate, and is denoted by αREML
• The resulting estimate β(αREML) for β will be denoted
by βREML
• αREML and βREML can also be obtained from maximizing
LREML(θ) =
∣∣∣∣∣∣∣
N∑
i=1X ′
iWi(α)Xi
∣∣∣∣∣∣∣
−12
LML(θ)
with respect to θ, i.e., with respect to α and βsimultaneously.
• Although strictly speaking no likelihood, LREML(θ) istherefore still called the REML likelihood function.
45
5.2 Fitting Linear Mixed Models in SAS
• A model for the prostate data:
ln(PSAij + 1)
= β1Agei + β2Ci + β3Bi + β4Li + β5Mi
+ (β6Agei + β7Ci + β8Bi + β9Li + β10Mi) tij+ (β11Agei + β12Ci + β13Bi + β14Li + β15Mi) t
2ij
+ b1i + b2itij + b3it2ij + εij.
• We will assume Σi = σ2Ini
• SAS program:
proc mixed data=prostate method=reml;
class id group timeclss;
model lnpsa = group age group*time age*time
group*time2 age*time2
/ noint solution;
random intercept time time2
/ type=un subject=id g gcorr v vcorr;
repeated timeclss / type=simple subject=id r rcorr;
run;
46
• ML and REML estimates:
Effect Parameter MLE (s.e.) REMLE (s.e.)
Age effect β1 0.026 (0.013) 0.027 (0.014)
Intercepts:Control β2 −1.077 (0.919) −1.098 (0.976)
BPH β3 −0.493 (1.026) −0.523 (1.090)
L/R cancer β4 0.314 (0.997) 0.296 (1.059)Met. cancer β5 1.574 (1.022) 1.549 (1.086)
Age×time effect β6 −0.010 (0.020) −0.011 (0.021)
Time effects:
Control β7 0.511 (1.359) 0.568 (1.473)BPH β8 0.313 (1.511) 0.396 (1.638)
L/R cancer β9 −1.072 (1.469) −1.036 (1.593)
Met. cancer β10 −1.657 (1.499) −1.605 (1.626)Age×time2 effect β11 0.002 (0.008) 0.002 (0.009)
Time2 effects:
Control β12 −0.106 (0.549) −0.130 (0.610)
BPH β13 −0.119 (0.604) −0.158 (0.672)L/R cancer β14 0.350 (0.590) 0.342 (0.656)
Met. cancer β15 0.411 (0.598) 0.395 (0.666)
Covariance of bi:
var(b1i) d11 0.398 (0.083) 0.452 (0.098)var(b2i) d22 0.768 (0.187) 0.915 (0.230)
var(b3i) d33 0.103 (0.032) 0.131 (0.041)
cov(b1i, b2i) d12 = d21 −0.443 (0.113) −0.518 (0.136)cov(b2i, b3i) d23 = d32 −0.273 (0.076) −0.336 (0.095)
cov(b3i, b1i) d13 = d31 0.133 (0.043) 0.163 (0.053)
Residual variance:
var(εij) σ2 0.028 (0.002) 0.028 (0.002)
Log-likelihood −1.788 −31.235
47
• Fitted average profiles at median age at diagnosis:
48
5.3 Hierarchical versus Marginal Model
5.3.1 Example 1: The rat data
• Reconsider the model for the rat data:
Yij = (β0 + b1i) + (β1Li + β2Hi + β3Ci + b2i)tij + εij
in which tij equals ln[1 + (Ageij − 45)/10)]
• REML estimates obtained from SAS procedureMIXED:
Effect Parameter REMLE (s.e.)
Intercept β0 68.606 (0.325)
Time effects:
Low dose β1 7.503 (0.228)
High dose β2 6.877 (0.231)
Control β3 7.319 (0.285)
Covariance of bi:
var(b1i) d11 3.369 (1.123)
var(b2i) d22 0.000 ( )
cov(b1i, b2i) d12 = d21 0.090 (0.381)
Residual variance:
var(εij) σ2 1.445 (0.145)
REML log-likelihood −466.173
49
• Brown & Prescott (1999, p.237) :
• This suggests that the REML likelihood could befurther increased by allowing negative estimates ford22
• In SAS, this can be done by adding the option‘nobound’ to the PROC MIXED statement.
50
• Results:
Parameter restrictions for α
dii ≥ 0, σ2 ≥ 0 dii ∈ IR, σ2 ∈ IR
Effect Parameter REMLE (s.e.) REMLE (s.e.)
Intercept β0 68.606 (0.325) 68.618 (0.313)
Time effects:
Low dose β1 7.503 (0.228) 7.475 (0.198)High dose β2 6.877 (0.231) 6.890 (0.198)
Control β3 7.319 (0.285) 7.284 (0.254)
Covariance of bi:var(b1i) d11 3.369 (1.123) 2.921 (1.019)
var(b2i) d22 0.000 ( ) −0.287 (0.169)
cov(b1i, b2i) d12 = d21 0.090 (0.381) 0.462 (0.357)
Residual variance:var(εij) σ2 1.445 (0.145) 1.522 (0.165)
REML log-likelihood −466.173 −465.193
• As expected, a negative estimate for d22 is obtained
• Note that the REML log-likelihood value has beenfurther increased.
51
• Meaning of negative ‘variance’ ?
. Fitted variance function:
Var(Yi(t)) = 1 t
D
1
t
+ σ2
= d22t2 + 2 d12t+ d11 + σ2
= −0.287t2 + 0.924t + 4.443
. Hence, the negative variance component suggestsnegative curvature in the variance function.
. This is supported by the sample variance function:
52
• This again shows that the hierarchical and marginalmodels are not equivalent:
. The marginal model allows negative variancecomponents, as long as the marginal covariancesVi = ZiDZ
′i + σ2Ini
are positive definite
. The hierarchical interpretation of the model doesnot allow negative variance components
53
5.3.2 Example 2: Bivariate Observations
• Balanced data, two measurements per subject(ni = 2).
• Model 1: Random intercepts + heterogeneous errors
V =
1
1
(d) (1 1) +
σ21 0
0 σ22
=
d + σ21 d
d d + σ22
• Model 2: Uncorrelated intercepts and slopes +measurement error
V =
1 0
1 1
d1 0
0 d2
1 1
0 1
+
σ2 0
0 σ2
=
d1 + σ2 d1
d1 d1 + d2 + σ2
54
• Different hierarchical models can produce the samemarginal model
• Hence, a good fit of the marginal model cannot beinterpreted as evidence for any of the hierarchicalmodels.
• A satisfactory treatment of the hierarchical model isonly possible within a Bayesian context.
55
5.4 Inference for the Mean Structure
5.4.1 Introduction
• Estimate for β:
β(α) =
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWiyi
with α replaced by its ML or REML estimate
• Conditional on α, β(α) is multivariate normal with
mean β and covariance
var(β)
=
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWivar(Yi)WiXi
N∑
i=1X ′
iWiXi
−1
=
N∑
i=1X ′
iWiXi
−1
,
• In practice one again replaces α by its ML or REMLestimate
56
5.4.2 Approximate Wald Test
For any known matrix L:
(β − β)′L′L
N∑
i=1X ′
iV−1i (α)Xi
−1
L′
−1
L(β − β) ≈ χ2rank(L)
57
5.4.3 Approximate t-test and F -test
• Wald, but corrected for variability introduced fromreplacing α by some estimate, replacing χ2 by F :
(β − β)′L′L
N∑
i=1X ′
iV−1i (α)Xi
−1
L′
−1
L(β − β)
rank(L)≈ Frank(L),ν
• d.d.f. ν to be estimated from the data:
. Containment method
. Satterthwaite approximation
. Kenward and Roger approximation
. . . .
58
5.4.4 Robust Inference
• Estimate for β:
β(α) =
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWiYi
with α replaced by its ML or REML estimate
• Conditional on α, β has mean
E[β(α)
]=
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWiE(Yi)
=
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWiXiβ
= β
provided that E(Yi) = Xiβ
• Hence, in order for β to be unbiased, it is sufficient
that the mean of the response is correctly specified.
59
• Conditional on α, β has covariance
Var(β)
=
N∑
i=1X ′
iWiXi
−1
N∑
i=1X ′
iWiVar(Yi)WiXi
N∑
i=1X ′
iWiXi
−1
=
N∑
i=1X ′
iWiXi
−1
,
• Note that this assumes that the covariance matrixVar(Yi) is correctly modelled as Vi = ZiDZ
′i + Σi
• This covariance estimate is therefore often called the‘naive’ estimate.
• The so-called ‘robust’ estimate for Var(β), whichdoes not assume the covariance matrix to be correctlyspecified is obtained from replacing Var(Yi) by[Yi −Xi
β] [Yi −Xi
β]′
rather than Vi
60
• The only condition for[Yi −Xi
β] [Yi −Xi
β]′
to beunbiased for Var(Yi) is that the mean is againcorrectly specified.
• The so-obtained estimate is called the ‘robust’variance estimate, also called the sandwich estimate:
Var(β)
=
N∑
i=1X ′
iWiXi
−1
︸ ︷︷ ︸
↓
BREAD
N∑
i=1X ′
iWiVar(Yi)WiXi
︸ ︷︷ ︸
↓
MEAT
N∑
i=1X ′
iWiXi
−1
︸ ︷︷ ︸
↓
BREAD
• Based on this sandwich estimate, robust versions ofthe Wald test as well as of the approximate t-test andF -test can be obtained.
• Note that this suggests that as long as interest is onlyin inferences for the mean structure, little effortshould be spent in modeling the covariance structure,provided that the data set is sufficiently large
61
• Extreme point of view: OLS with robust standarderrors
• Appropriate covariance modeling may still be ofinterest:
. for the interpretation of random variation in data
. for gaining efficiency
. in presence of missing data, robust inference onlyvalid under very severe assumptions about theunderlying missingness process (see later)
62
5.4.5 Likelihood Ratio Test
• Comparison of nested models with different meanstructures, but equal covariance structure
• Example: prostate data, testing H0 : β1 = 0 inreduced model:
ln(PSAij + 1)
= β1Agei + β2Ci + β3Bi + β4Li + β5Mi
+ (β8Bi + β9Li + β10Mi) tij+ β14 (Li +Mi) t
2ij
+ b1i + b2itij + b3it2ij + εij,
• Results under ML estimation:
ML estimation
Under β1 ∈ IR LML = −3.575
Under H0 : β1 = 0 LML = −6.876
−2 lnλN 6.602
degrees of freedom 1
p-value 0.010
63
• Results under REML estimation:
REML estimation
Under β1 ∈ IR LREML = −20.165
Under H0 : β1 = 0 LREML = −19.003
−2 lnλN −2.324
degrees of freedom
p-value
• Conclusion:
LR tests for the mean structure are notvalid under REML
64
5.5 Inference for the VarianceComponents
5.5.1 Introduction
• Inference for the mean structure is usually of primaryinterest.
• However, inferences for the covariance structure is ofinterest as well:
. interpretation of the random variation in the data
. overparameterized covariance structures lead toinefficient inferences for mean
. too restrictive models invalidate inferences for themean structure
• Approximate Wald test based on asymptoticnormality of MLE’s and REMLE’s
• LR tests based on ML or REML
65
5.5.2 Caution
• We reconsider the reduced model for the prostatedata:
ln(PSAij + 1)
= β1Agei + β2Ci + β3Bi + β4Li + β5Mi
+ (β8Bi + β9Li + β10Mi) tij+ β14 (Li +Mi) t
2ij
+ b1i + b2itij + b3it2ij + εij,
• Selected SAS output:
Covariance Parameter Estimates
Standard Z
Cov Parm Subject Estimate Error Value Pr Z
UN(1,1) XRAY 0.4432 0.09349 4.74 <.0001
UN(2,1) XRAY -0.4903 0.1239 -3.96 <.0001
UN(2,2) XRAY 0.8416 0.2033 4.14 <.0001
UN(3,1) XRAY 0.1480 0.04702 3.15 0.0017
UN(3,2) XRAY -0.3000 0.08195 -3.66 0.0003
UN(3,3) XRAY 0.1142 0.03454 3.31 0.0005
timeclss XRAY 0.02837 0.002276 12.47 <.0001
66
• Problem 1: Marginal vs. hierarchical modelinterpretation:
H0 : d33 = 0
vs.
H0 : d33 = d32 = d31 = 0
• Problem 2: Boundary problems !
67
Chapter 6
Inference for the Random Effects
• Empirical Bayes inference
• Example: Prostate data
• Shrinkage
• Example: Prostate data
• Normality assumption for random effects
68
6.1 Empirical Bayes Inference
• Random effects bi reflect how the evolution for theith subject deviates from the expected evolution Xiβ.
• Estimation of the bi helpful for detecting outlyingprofiles
• This is only meaningful under the hierarchical modelinterpretation:
Yi|bi ∼ N (Xiβ + Zibi,Σi)
bi ∼ N (0, D)
• Since the bi are random, it is most natural to useBayesian methods
69
• Terminology: prior distribution N (0, D) for bi
• Posterior density:
f (bi|yi) ≡ f (bi|Yi = yi)
=f (yi|bi) f (bi)
∫f (yi|bi) f (bi) dbi
∝ f (yi|bi) f (bi)
∝ . . .
∝ exp
−
1
2(bi −DZ ′
iWi(yi −Xiβ))′
Λ−1i (bi −DZ ′
iWi(yi −Xiβ))
for some positive definite matrix Λi.
• Posterior distribution:
bi | yi ∼ N (DZ ′iWi(yi −Xiβ),Λi)
70
• Posterior mean as estimate for bi:
bi(θ) = E [bi | Yi = yi]
=∫bi f (bi|yi) dbi
= DZ ′iWi(α)(yi −Xiβ)
• bi(θ) is normally distributed with covariance matrix
var(bi(θ)) = DZ ′i
Wi − WiXi
N∑
i=1X ′
iWiXi
−1
X ′iWi
ZiD,
• Note that inference for bi should account for thevariability in bi
• Therefore, inference for bi is usually based on
var(bi(θ) − bi) = D − var(bi(θ))
71
• Wald tests can be derived
• Parameters in θ are replaced by their ML or REMLestimates, obtained from fitting the marginal model.
• bi =
bi(θ) is called the ‘Empirical Bayes’
estimate of bi.
• Approximate t- and F -tests to account for thevariability introduced by replacing θ by
θ,similar to tests for fixed effects.
72
6.2 Example: Prostate Data
• We reconsider the reduced model:
ln(PSAij + 1)
= β1Agei + β2Ci + β3Bi + β4Li + β5Mi
+ (β8Bi + β9Li + β10Mi) tij+ β14 (Li +Mi) t
2ij
+ b1i + b2itij + b3it2ij + εij,
• In practice, histograms and scatterplots of certaincomponents of
bi are used to detect model deviationsor subjects with ‘exceptional’ evolutions over time
• In SAS the estimates can be obtained from addingthe option ‘solution’ to the random statement:
random intercept time time2
/ type=un subject=id solution;
ods listing exclude solutionr;
ods output solutionr=out;
• The ODS statements are used to write the EBestimates into a SAS output data set, and to preventSAS from printing them in the output window.
73
• Histograms and pairwise scatterplots:
74
• Strong negative correlations in agreement withcorrelation matrix corresponding to fitted D:
Dcorr =
1.000 −0.803 0.658
−0.803 1.000 −0.968
0.658 −0.968 1.000
• Histograms and scatterplots show outliers
• Subjects #22, #28, #39, and #45, have highest fourslopes for time2 and smallest four slopes for time, i.e.,with the strongest (quadratic) growth.
• Subjects #22, #28 and #39 have been furtherexamined and have been shown to be metastaticcancer cases which were misclassified as local cancercases.
• Subject #45 is the metastatic cancer case with thestrongest growth
75
6.3 Shrinkage Estimatorsbi
• Consider the prediction of the evolution of the ithsubject:
Yi ≡ Xi
β + Zi
bi
= Xiβ + ZiDZ
′iV
−1i (yi −Xi
β)
=(Ini
− ZiDZ′iV
−1i
)Xi
β + ZiDZ
′iV
−1i yi
= ΣiV−1i Xi
β +
(Ini − ΣiV
−1i
)yi,
• Hence, Yi is a weighted mean of the
population-averaged profile Xiβ and the observed
data yi, with weights ΣiV −1
i and Ini− Σi
V −1i
respectively.
• Note that Xiβ gets much weight if the residual
variability is ‘large’ in comparison to the totalvariability.
76
• This phenomenon is usually called shrinkage :
The observed data are shrunktowards the
prior average profile Xiβ.
• This is also reflected in the fact that for any linearcombination λ′bi of random effects,
var(λ′bi) ≤ var(λ′bi).
77
6.4 Example: Prostate Data
• Comparison of predicted, average, and observedprofiles for the subjects #15 and #28, obtained underthe reduced model:
• Illustration of the shrinkage effect :
Var(bi) =
0.403 −0.440 0.131
−0.440 0.729 −0.253
0.131 −0.253 0.092
D =
0.443 −0.490 0.148
−0.490 0.842 −0.300
0.148 −0.300 0.114
78
6.5 The Normality Assumption forRandom Effects
• In practice, histograms of EB estimates are often usedto check the normality assumption for the randomeffects
• However, since
bi = DZ ′iWi(yi − Xiβ)
var(bi) = DZ ′i
Wi − WiXi
N∑
i=1X ′
iWiXi
−1
X ′iWi
ZiD
one should at least first standardize the EB estimates
• Further, due to the shrinkage property the EBestimates do not fully reflect the heterogeneity in thedata.
79
• Small simulation example:
. 1000 profiles with 5 measurements, balanced
. 1000 random intercepts sampled from
1
2N (−2, 1) +
1
2N (2, 1)
. Histogram of sampled intercepts:
. Σi = σ2Ini, σ2 = 30
. Data analysed assuming normality for theintercepts
80
. Histogram of EB estimates:
. Clearly, severe shrinkage forces the estimates bi tosatisfy the normality assumption
• Conclusion:
EB estimates obtained under normalitycannot be used to check normality
• This suggests that the only possibility to check thenormality assumption is to fit a more general model,with the classical linear mixed model as special case,and to compare both models using LR methods
81
Chapter 7
Non-Continuous Data: MotivatingStudies
7.1 Toenail Data
• Toenail Dermatophyte Onychomycosis: Commontoenail infection, difficult to treat, affecting more than2% of population.
• Classical treatments with antifungal compounds needto be administered until the whole nail has grown outhealthy.
• New compounds have been developed which reducetreatment to 3 months
82
• Randomized, double-blind, parallel group, multicenterstudy for the comparison of two such new compounds(A and B) for oral treatment.
• Research question:
Severity relative to treatment of TDO ?
• 2 × 189 patients randomized, 36 centers
• 48 weeks of total follow up (12 months)
• 12 weeks of treatment (3 months)
• measurements at months 0, 1, 2, 3, 6, 9, 12.
83
7.2 The Analgesic Trial
• single-arm trial with 530 patients recruited (491selected for analysis)
• analgesic treatment for pain caused by chronicnonmalignant disease
• treatment was to be administered for 12 months
• we will focus on Global Satisfaction Assessment(GSA)
• GSA scale goes from 1=very good to 5=very bad
• GSA was rated by each subject 4 times during thetrial, at months 3, 6, 9, and 12.
84
Observed Frequencies
Questions
• Evolution over time
• Relation with baseline covariates: age, sex, durationof the pain, type of pain, disease progression, PainControl Assessment (PCA), . . .
• Investigation of dropout
85
Chapter 8
Generalized Linear Models
1. E(Yi) = µi
2. η(µi) = xTi β with η(.) the link function
3. Var(Yi) = φv(µi), where
• v(.) is a known variance function
• φ is a scale (overdispersion) parameter
• exponential family p.d.f.
f (y|θi, φ) = exp{φ−1[yθi − ψ(θi)] + c(y, φ)
}
with θi the natural parameter and ψ(.) a functionsatisfying
. µi = ψ′(θi)
. v(µi) = ψ′′(θi)
86
8.1 The Bernoulli Model
Y ∼ Bernoulli(π)
• Density function:
f (y|θ, φ) = πy(1 − π)1−y
= exp {y ln π + (1 − y) ln(1 − π)}
= exp
y ln
π
1 − π
+ ln(1 − π)
87
• Exponential family:
. θ = ln(
π1−π
)
. φ = 1
. ψ(θ) = ln(1 − π) = ln(1 + exp(θ))
. c(y, φ) = 0
• Mean and variance function:
. µ = exp θ1+exp θ
= π
. v(µ) = exp θ(1+exp θ)2
= π(1 − π)
• Note that, under this model, the mean and varianceare related:
φv(µ) = µ(1 − µ)
• The link function is the logit link:
θ = ln
µ
1 − µ
88
8.2 Generalized Linear Models (GLM)
• Suppose a sample Y1, . . . , YN of independentobservations is available
• All Yi have densities f (yi|θi, φ) which belong to theexponential family
• Linear predictor:
θi = xi′β
• Log-likelihood:
`(β, φ) =1
φ
∑
i[yiθi − ψ(θi)] +
∑
ic(yi, φ)
• Score equations:
S(β) =∑
i
∂µi
∂βv−1
i (yi − µi) = 0
89
Chapter 9
Parametric Modeling Families
9.1 Continuous Outcomes
• Marginal Models
E(Yij|xij) = x′ijβ
• Random-Effects Models
E(Yij|bi,xij) = x′ijβ + z′ijbi
• Transition Models
E(Yij|Yi,j−1, . . . , Yi1,xij) = x′ijβ + αYi,j−1
90
9.2 Longitudinal Generalized LinearModels
• Normal case: easy transfer between models
• Also non-normal data can be measured repeatedly(over time)
• Lack of key distribution such as the normal [=⇒]
. A lot of modeling options
. Introduction of non-linearity
. No easy transfer between model families
cross-
sectional longitudinal
normal outcome linear model LMM
non-normal outcome GLM ?
91
9.2.1 Marginal Models
• Non-likelihood methods:
. Empirical generalized least squares (EGLS)∗ (Koch et al, 1977) (SAS procedure CATMOD)
. Generalized estimating equations (GEE)∗ Liang and Zeger (1986) (SAS procedure GENMOD)
• Likelihood methods:
. Multivariate Probit Model
. Bahadur Model
. Odds ratio model
92
9.2.2 Random-effects Models
• Beta-binomial model
. Skellam (1948)
. Kleinman (1973)
. Molenberghs, Declerck, and Aerts (1997)
• Generalized linear mixed models
. Engel and Keen (1992)
. Breslow and Clayton (JASA 1993)
. Wolfinger and O’Connell (JSCS 1993)
• Hierarchical generalized linear models
. Lee and Nelder (JRSSB 1996)
• . . .
93
Chapter 10
Generalized Estimating Equations
• Univariate GLM, score function of the form (scalarYi):
S(β) =N∑
i=1
∂µi
∂βv−1
i (yi − µi) = 0
with vi = Var(Yi).
• In longitudinal setting: Y = (Y 1, . . . ,Y N):
S(β) =N∑
i=1DT
i [V i(α)]−1 (yi − µi) = 0
where
. Di is an ni × p matrix with (i, j)th elements
∂µij
∂β
. Vi is ni × ni diagonal,
. yi and µi are ni-vectors with elements yij and µij
94
• The corresponding matrices V i = Var(Y i) involve aset of nuisance parameters, α say, which determinethe covariance structure of Y i.
• Same form as for full likelihood procedure
• We restrict specification to the first moment only
• The second moment is only specified in the variances,not in the correlations.
• Solving these equations:
. version of iteratively weighted least squares
. Fisher scoring
• Liang and Zeger (1986)
95
10.1 Large Sample Properties
As N → ∞√N(β − β) ∼ N (0, I−1
0 )
where
I0 =N∑
i=1DT
i [Vi(α)]−1Di
• (Unrealistic) Conditions:
. α is known
. the parametric form for V i(α) is known
• Solution: working correlation matrix
96
10.2 Unknown Covariance Structure
Keep the score equations
S(β) =N∑
i=1[Di]
T [Vi(α)]−1 (yi − µi) = 0
BUT
• suppose V i(.) is not the true variance of Y i but onlya plausible guess, a so-called working correlationmatrix
• specify correlations and not covariances, because thevariances follow from the mean structure
• the score equations are solved as before
The asymptotic normality results change to√N (β − β) ∼ N (0, I−1
0 I1I−10 )
I0 =N∑
i=1DT
i [Vi(α)]−1Di
I1 =N∑
i=1DT
i [Vi(α)]−1Var(Y i)[Vi(α)]−1Di.
97
10.3 The Sandwich Estimator
• This is the so-called sandwich estimator:
. I0 is the bread
. I1 is the filling (ham or cheese)
• correct guess =⇒ likelihood variance
• the estimators β are consistent even if the workingcorrelation matrix is incorrect
• An estimate is found by replacing the unknownvariance matrix Var(Y i) by
(Y i − µi)(Y i − µi)T .
Even if this estimator is bad for Var(Y i) it leads to agood estimate of I1, provided that:
. replication in the data is sufficiently large
. same model for µi is fitted to groups of subjects
. observation times do not vary too much betweensubjects
• a bad choice of working correlation matrix can affectthe efficiency of β
98
10.4 The Working Correlation Matrix
WriteVi(β,α) = φA
1/2i (β)Ri(α)A
1/2i (β).
Variance function: Ai is (ni × ni) diagonal withelements v(µij), the known GLM variance function.
Working correlation: Ri(α), possibly dependent on adifferent set of parameters α.
Overdispersion parameter: φ, assumed 1 or estimatedfrom the data.
The unknown quantities are expressed in terms of thePearson residuals
eij =yij − µij√v(µij)
.
Note that eij depends on β.
99
10.5 Estimation of Working Correlation
Liang and Zeger (1986) proposed moment-basedestimates for the working correlation.
Some of the more popular ones:
Corr(Yij, Yik) Estimate
Independence 0 —
Exchangeable α α = 1N
∑Ni=1
1ni(ni−1)
∑j 6=k eijeik
AR(1) αt α = 1N
∑Ni=1
1ni−1
∑j≤ni−1 eijei,j+1
Unstructured αjk αjk = 1N
∑Ni=1 eijeik
The dispersion parameter is estimated by
φ =1
N
N∑
i=1
1
ni
ni∑
j=1e2ij.
100
10.6 Fitting GEE
The standard procedure, implemented in the SASprocedure GENMOD.
1. Compute initial estimates for β, using a univariateGLM (i.e., assuming independence).
2. • Compute Pearson residuals eij.
• Compute estimates for α.
• Compute Ri(α).
• Compute estimate for φ.
• Compute Vi(β,α) = φA1/2i (β)Ri(α)A
1/2i (β).
3. Update estimate for β:
β(t+1) = β(t)−
N∑
i=1DT
i V−1i Di
−1
N∑
i=1DT
i V−1i (yi − µi)
.
4. Iterate 2.–3. until convergence.
Estimates of precision by means of I−10 and I−1
0 I1I−10 .
101
10.7 Application to Analgesic Trial
10.8 Code and Output
proc genmod data=gsa;
ods listing exclude parameterestimates classlevels parminfo;
class patid timecls;
model gsabin = pca0 time|time / dist=b;
repeated subject=patid / type=un corrw within=timecls modelse;
run;
Selected Output
The GENMOD Procedure
Analysis Of Initial Parameter Estimates
Standard Wald 95% Chi-
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq
Intercept 1 2.8016 0.4902 1.8408 3.7624 32.66 <.0001
pca0 1 -0.2058 0.0864 -0.3751 -0.0365 5.67 0.0172
TIME 1 -0.7864 0.3874 -1.5456 -0.0271 4.12 0.0424
TIME*TIME 1 0.1774 0.0793 0.0219 0.3329 5.00 0.0254
Scale 0 1.0000 0.0000 1.0000 1.0000
102
GEE Model Information
Correlation Structure Unstructured
Within-Subject Effect timecls (4 levels)
Subject Effect PATID (395 levels)
Number of Clusters 395
Correlation Matrix Dimension 4
Maximum Cluster Size 4
Minimum Cluster Size 1
Working Correlation Matrix
Col1 Col2 Col3 Col4
Row1 1.0000 0.1770 0.2481 0.2021
Row2 0.1770 1.0000 0.1811 0.1177
Row3 0.2481 0.1811 1.0000 0.4594
Row4 0.2021 0.1177 0.4594 1.0000
Analysis Of GEE Parameter Estimates
Empirical Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 2.8731 0.4592 1.9730 3.7731 6.26 <.0001
pca0 -0.2278 0.0959 -0.4157 -0.0398 -2.37 0.0176
TIME -0.7779 0.3234 -1.4117 -0.1440 -2.41 0.0162
TIME*TIME 0.1670 0.0656 0.0384 0.2955 2.55 0.0109
Analysis Of GEE Parameter Estimates
Model-Based Standard Error Estimates
Standard 95% Confidence
Parameter Estimate Error Limits Z Pr > |Z|
Intercept 2.8731 0.4836 1.9252 3.8209 5.94 <.0001
pca0 -0.2278 0.1025 -0.4287 -0.0268 -2.22 0.0263
TIME -0.7779 0.3275 -1.4198 -0.1359 -2.37 0.0176
TIME*TIME 0.1670 0.0665 0.0366 0.2973 2.51 0.0121
Scale 1.0000 . . . . .
103
10.9 Comparison of GEE Estimates
Variable INDependence EXCHangeable AutoRegressive UNstructured
Intercept 2.80 (0.469; 0.490) 2.92 (0.463; 0.494) 2.94 (0.469; 0.488) 2.87 (0.459; 0.484)
Time -0.79 (0.341; 0.387) -0.83 (0.328; 0.343) -0.90 (0.334; 0.352) -0.78 (0.323; 0.328)
Time2 0.18 (0.070; 0.079) 0.18 (0.067; 0.070) 0.20 (0.069; 0.072) 0.17 (0.066; 0.067)
Basel. PCA -0.21 (0.095; 0.086) -0.23 (0.095; 0.103) -0.22 (0.095; 0.099) -0.23 (0.096; 0.103)
Correlation 0 0.219 0.235 see output
Parameter estimates and standard errors (empirical; model-based).
104
Chapter 11
Random-Effects Models
• Assume the responses Yi1, . . . , Yiniconditionally
independent given an unobserved vector bi.
• E(Yij|bi) = µij
• η(µij) = x′ijβ + z′ijbi
• var(Yij|bi) = φv(µij)
• Marginal likelihood
f (yi) =∫ n∏
j=1f (yij|bi)g(b)db
where g(.) denotes the joint p.d.f. of b.
. f and g normal: linear mixed model
. f exp. fam. & g normal: gen. lin. mix. model
. f exp. fam. & g general: hier. gen. lin. model
105
Chapter 12
Generalized Linear Mixed Models
• We re-consider the linear mixed model:
Yi|bi ∼ N (Xiβ + Zibi,Σi)
bi ∼ N (0, D)
• The implied marginal model equals
Yi ∼ N (Xiβ, ZiDZ′i + Σi)
• Hence, even under conditional independence, i.e., allΣi equal to σ2Ini
, a marginal association structure isimplied through the random effects.
• Linear models versus GLM: similarities, but alsodifferences !
106
12.1 Generalized Linear Mixed Models(GLMM)
• Given a vector bi of random effects for cluster i, it isassumed that all responses Yij are independent, withdensity
f (yij|θij, φ)
= exp{φ−1[yijθij − ψ(θij)] + c(yij, φ)
}
in which θij is now modelled as
θij = xij′β + zij
′bi
• It is generally assumed that
bi ∼ N (0, D)
107
• Let fij(yij|bi,β, φ) denote the conditional density ofYij given bi, we then have that the marginaldistribution of Yi is given by
fi(yi|β, D, φ) =∫ ni∏
j=1fij(yij|bi,β, φ) f (bi|D) dbi
where f (bi|D) is the density of the N (0, D)distribution.
• The likelihood function for β, D, and φ now equals
L(β,D, φ) =N∏
i=1fi(yi|β, D, φ)
=N∏
i=1
∫ ni∏
j=1fij(yij|bi,β, φ) f (bi|D) dbi
• Under the normal linear model, the integral can beworked out analytically.
108
• In general, approximations are required
• Two approaches can be followed:
. approximation of the integrand
. approximation of the integral: numericalquadrature
• Roughly speaking, under the first approach,∏
jfij(yij|bi,β, φ) is approximated by a normal density
such that the integral can be calculated analytically,as in the normal linear model.
109
12.2 Numerical Quadrature
• Two popular approaches:
. Gaussian quadrature
. Adaptive Gaussian quadrature (assuming integrandapproximately normal)
• Adaptive Gaussian quadrature often needs less points,but the evaluations are more time consuming
• Number of quadrature points can influence theresults !
• Implemented in PROC NLMIXED
110
12.3 Application to Toenail Data
The Model
• As an illustration, we perform a GLMM analysis ofthe severity of infection in the toenail data set.
• The outcome of interest is the binary indicatorreflecting severity of the infection.
• Frequencies at each visit (both treatments):
111
• Conditionally on a random intercept bi, logisticregression models will be used to describe themarginal distributions, i.e., the distribution of theoutcome at each time point separately:
Yij|bi ∼ Bernoulli(πij)
log
πij
1 − πij
= β0 + bi + β1Ti + β2tij + β3Titij
• Notation:
. Ti: treatment indicator for subject i
. tij: time point at which jth measurement is takenfor ith subject
• More complex mean models can be considered as well(e.g. including polynomial time effects, includingcovariates, or including other random effects than justintercepts).
112
12.3.1 SAS PROC NLMIXED Program
• SAS code (random intercept):
proc nlmixed data=test noad qpoints=3;
parms beta0=-1.6 beta1=0 beta2=-0.4 beta3=-0.5 sigma=3.9;
teta = beta0 + b + beta1*treatn + beta2*time + beta3*timetr;
expteta = exp(teta);
p = expteta/(1+expteta);
model onyresp ~ binary(p);
random b ~ normal(0,sigma**2) subject=idnum;
run;
• The inclusion of a random slope can be specified asfollows:
proc nlmixed data=test noad qpoints=3;
parms beta0=-1.6 beta1=0 beta2=-0.4 beta3=-0.5
d11=3.9 d12=0 d22=0.1;
teta = beta0 + b1 + beta1*treatn + beta2*time
+ b2*time + beta3*timetr;
expteta = exp(teta);
p = expteta/(1+expteta);
model onyresp ~ binary(p);
random b1 b2 ~ normal([0, 0] , [d11, d12, d22])
subject=idnum;
run;
113
12.3.2 SAS output
• Model and data specifications:
Specifications
Data Set WORK.TEST
Dependent Variable onyresp
Distribution for Dependent Variable Binary
Random Effects b
Distribution for Random Effects Normal
Subject Variable idnum
Optimization Technique Dual Quasi-Newton
Integration Method Gaussian Quadrature
• Dimensions of the data set:
Dimensions
Observations Used 1908
Observations Not Used 0
Total Observations 1908
Subjects 294
Max Obs Per Subject 7
Parameters 5
Quadrature Points 3
• Analysis of initial parameter estimates:
Parameters
beta0 beta1 beta2 beta3 sigma NegLogLike
-1.6 0 -0.4 -0.5 3.9 760.941002
114
• Minus twice the maximized log-likelihood, as well asassociated information criteria:
Fit Statistics
-2 Log Likelihood 1344.1
AIC (smaller is better) 1354.1
BIC (smaller is better) 1372.6
• Parameter estimates, standard errors, approximatet-tests and confidence intervals:
Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha
beta0 -1.5221 0.3063 293 -4.97 <.0001 0.05
beta1 -0.3932 0.3812 293 -1.03 0.3031 0.05
beta2 -0.3198 0.03481 293 -9.19 <.0001 0.05
beta3 -0.09098 0.05236 293 -1.74 0.0833 0.05
sigma 2.2555 0.1217 293 18.54 <.0001 0.05
Parameter Estimates
Parameter Lower Upper Gradient
beta0 -2.1250 -0.9192 -0.00007
beta1 -1.1433 0.3570 -0.00002
beta2 -0.3883 -0.2513 0.000058
beta3 -0.1940 0.01207 0.000319
sigma 2.0161 2.4950 -0.00003
115
12.3.3 Accuracy of Numerical Integration
• In order to investigate the accuracy of the numericalintegration method, the model was refitted, forvarying numbers of quadrature points, and foradaptive as well as non-adaptive Gaussian quadrature:
Gaussian quadrature
Q = 3 Q = 5 Q = 10 Q = 20 Q = 50
β0 -1.52 (0.31) -2.49 (0.39) -0.99 (0.32) -1.54 (0.69) -1.65 (0.43)
β1 -0.39 (0.38) 0.19 (0.36) 0.47 (0.36) -0.43 (0.80) -0.09 (0.57)
β2 -0.32 (0.03) -0.38 (0.04) -0.38 (0.05) -0.40 (0.05) -0.40 (0.05)
β3 -0.09 (0.05) -0.12 (0.07) -0.15 (0.07) -0.14 (0.07) -0.16 (0.07)
σ 2.26 (0.12) 3.09 (0.21) 4.53 (0.39) 3.86 (0.33) 4.04 (0.39)
−2` 1344.1 1259.6 1254.4 1249.6 1247.7
Adaptive Gaussian quadrature
Q = 3 Q = 5 Q = 10 Q = 20 Q = 50
β0 -2.05 (0.59) -1.47 (0.40) -1.65 (0.45) -1.63 (0.43) -1.63 (0.44)
β1 -0.16 (0.64) -0.09 (0.54) -0.12 (0.59) -0.11 (0.59) -0.11 (0.59)
β2 -0.42 (0.05) -0.40 (0.04) -0.41 (0.05) -0.40 (0.05) -0.40 (0.05)
β3 -0.17 (0.07) -0.16 (0.07) -0.16 (0.07) -0.16 (0.07) -0.16 (0.07)
σ 4.51 (0.62) 3.70 (0.34) 4.07 (0.43) 4.01 (0.38) 4.02 (0.38)
−2` 1259.1 1257.1 1248.2 1247.8 1247.8
116
• (Log-)likelihoods are not comparable
• Different Q can lead to considerable differences inestimates and standard errors.
• For example, using non-adaptive quadrature, withQ = 3, we found no difference in time effect betweenboth treatment groups (t = −0.09/0.05, p = 0.0833).
• Using adaptive quadrature, with Q = 50, we find asignificant interaction between the time effect and thetreatment (t = −0.16/0.07, p = 0.0255).
• Assuming that Q = 50 is sufficient, the ‘final’ resultsare well approximated with smaller Q under adaptivequadrature, but not under non-adaptive quadrature.
117
Chapter 13
Variations to a Common Theme
• Numerical integration methods
. SAS procedure NLMIXED
. MIXOR
• Expansion methods
. SAS macro GLIMMIX
• allows for random effects at several levels
• allows for serial correlation
. multilevel modeling and MLwiN
• Fully Bayesian approach
. WinBUGS
118
13.1 Application to the Analgesic Trial
13.1.1 PROC NLMIXED
• Code:proc nlmixed data=gsa npoints=20 noad noadscale tech=newrap;
parms beta0=3 beta1=-0.8 beta2=0.2 beta3=-0.2 su=1;
eta = beta0 + beta1*time + beta2*time2 + beta3*pca0 + u;
expeta = exp(eta);
p = expeta/(1+expeta);
model gsabin ~ binary(p);
random u ~ normal(0,su**2) subject=patid;
estimate ’ICC’ su**2/(arcos(-1)**2/3 + su**2);
run;
• Selected Output:Parameter Estimates
Standard
Parameter Estimate Error DF t Value Pr > |t| Alpha Lower
beta0 4.0474 0.7097 394 5.70 <.0001 0.05 2.6521
beta1 -1.1600 0.4657 394 -2.49 0.0131 0.05 -2.0755
beta2 0.2445 0.09472 394 2.58 0.0102 0.05 0.05826
beta3 -0.2997 0.1428 394 -2.10 0.0365 0.05 -0.5805
su 1.5914 0.2125 394 7.49 <.0001 0.05 1.1736
Parameter Upper Gradient
beta0 5.4427 -1.56E-9
beta1 -0.2445 -2E-9
beta2 0.4307 1.313E-9
beta3 -0.01893 1.468E-9
su 2.0092 -4.52E-9
Additional Estimates
Standard
Label Estimate Error DF t Value Pr > |t| Alpha Lower Upper
ICC 0.4350 0.06564 394 6.63 <.0001 0.05 0.3059 0.5640
119
13.1.2 MIXOR
MIXOR - The program for mixed-effects ordinal regression analysis
(version 2)
Global Satisfaction Assessment
Response function: logistic
Random-effects distribution: normal
Covariate(s) and random-effect(s) mean subtracted from thresholds
==> positive coefficient = positive association between regressor
and ordinal outcome
Numbers of observations
-----------------------
Level 1 observations = 1137
Level 2 observations = 395
The number of level 1 observations per level 2 unit are:
1 1 4 2 1 4 3 4 3 3 4 4 4 3 1 1 1 3 2
...
Descriptive statistics for all variables
----------------------------------------
Variable Minimum Maximum Mean Stand. Dev.
GSAbin 0.00000 1.00000 0.81882 0.38534
intcpt 1.00000 1.00000 1.00000 0.00000
Time 1.00000 4.00000 2.25330 1.12238
Time2 1.00000 16.00000 6.33597 5.55444
PCA0 1.00000 5.00000 3.02375 0.89519
Categories of the response variable GSAbin
--------------------------------------------
Category Frequency Proportion
0.00 206.00 0.18118
1.00 931.00 0.81882
120
Starting values
---------------
mean 1.022
covariates 0.295 -0.066 0.079
var. terms 0.574
==> The number of level 2 observations with non-varying responses
= 284 ( 71.90 percent )
---------------------------------------------------------
* Final Results - Maximum Marginal Likelihood Estimates *
---------------------------------------------------------
Total Iterations = 10
Quad Pts per Dim = 20
Log Likelihood = -506.275
Deviance (-2logL) = 1012.549
Ridge = 0.000
Variable Estimate Stand. Error Z p-value
-------- ------------ ------------ ------------ ------------
intcpt 4.04741 0.71278 5.67835 0.00000 (2)
Time -1.16003 0.47453 -2.44457 0.01450 (2)
Time2 0.24449 0.09678 2.52624 0.01153 (2)
PCA0 -0.29971 0.15375 -1.94932 0.05126 (2)
Random effect variance term (standard deviation)
intcpt 1.59139 0.20578 7.73355 0.00000 (1)
note: (1) = 1-tailed p-value
(2) = 2-tailed p-value
Calculation of the intracluster correlation
-------------------------------------------
residual variance = pi*pi / 3 (assumed)
cluster variance = (1.591 * 1.591) = 2.533
intracluster correlation = 2.533 / ( 2.533 + (pi*pi/3)) = 0.435
121
13.1.3 PQL2 (MLwiN) Without and WithExtra-dispersion Parameter
122
13.1.4 PQL (MLwiN) Without and WithExtra-dispersion Parameter
123
13.1.5 PQL (GLIMMIX)
• Code:
%glimmix(data=gsa, procopt=%str(method=ml noclprint covtest),
stmts=%str(
class patid timecls;
model gsabin = time|time pca0 / s;
repeated timecls / sub=patid type=un rcorr=3;
),
error=binomial);
• Output:The Mixed Procedure
Covariance Parameter Estimates
Standard Z
Cov Parm Subject Estimate Error Value Pr Z
Intercept PATID 3.1651 0.3911 8.09 <.0001
Residual 0.4954 0.02521 19.65 <.0001
Solution for Fixed Effects
Standard
Effect Estimate Error DF t Value Pr > |t|
Intercept 4.0292 0.5476 394 7.36 <.0001
TIME -1.2788 0.3341 739 -3.83 0.0001
TIME*TIME 0.2590 0.06800 739 3.81 0.0002
pca0 -0.2922 0.1300 739 -2.25 0.0249
Type 3 Tests of Fixed Effects
Num Den
Effect DF DF F Value Pr > F
TIME 1 739 14.65 0.0001
TIME*TIME 1 739 14.50 0.0002
pca0 1 739 5.05 0.0249
124
13.1.6 Overview of Results
GLIM- NL- MLwiN
Parameter logistic GEE MIX MIXED MIXOR (PQL1) (PQL2)
Interc. β0 2.802 2.873 4.029 4.047 4.047 3.021 4.067
(0.490) (0.484;0.459) (0.548) (0.710) (0.713) (0.547) (0.703)
Time β1 -0.786 -0.778 -1.279 -1.160 -1.160 -0.868 -1.172
(0.387) (0.328;0.323) (0.334) (0.466) (0.475) (0.405) (0.477)
Time2 β2 0.177 0.167 0.259 0.245 0.244 0.189 0.245
(0.079) (0.067;0.066) (0.068) (0.095) (0.097) (0.083) (0.098)
Base β3 -0.206 -0.228 -0.292 -0.300 -0.300 -0.223 -0.309
(0.086) (0.103;0.096) (0.130) (0.143) (0.154) (0.108) (0.150)
R.I. var. σ2 3.165 2.533 1.019 2.594
(0.391) (0.676) (0.248) (0.473)
R.I. std. σ 1.591 1.591
(0.213) (0.206)
resid.var. - 0.495
(0.025)
ICC 0.435 0.435
(0.066)
• A wall between marginal and random-effects model(GLMM)
• Numerical integration based methods to be preferred
• Some approximations are poor
• What model/type of inference to choose ?
125
Chapter 14
Marginal Versus Random-effectsInference
model family
↙ ↘marginal random-effects
model model
↓ ↓inference inference
↙ ↘ ↙ ↘likelihood GEE marginal hierarchical
↓ ↓ ↓ ↓βM βM βRE (βRE, bi)
↓ ↓“βM” “βM”
126
14.1 Marginal Interpretation
• We compare our GLMM results for the toenail datawith those from fitting GEE’s (unstructured workingcorrelation):
GLMM GEE
Parameter Estimate (s.e.) Estimate (s.e.)
Intercept group A −1.6308 (0.4356) −0.7219 (0.1656)
Intercept group B −1.7454 (0.4478) −0.6493 (0.1671)
Slope group A −0.4043 (0.0460) −0.1409 (0.0277)
Slope group B −0.5657 (0.0601) −0.2548 (0.0380)
• The strong differences can be explained as follows:
. Consider the following GLMM:
Yij|bi ∼ Bernoulli(πij)
log
πij
1 − πij
= β0 + bi + β1tij
127
. The conditional means E(Yij|bi), as functions oftij, are given by
E(Yij|bi) =exp(β0 + bi + β1tij)
1 + exp(β0 + bi + β1tij)
. Graphical representation:
128
. The marginal average evolution is now opbtainedfrom averaging over the random effects:
E(Yij) = E[E(Yij|bi)]
= E
exp(β0 + bi + β1tij)
1 + exp(β0 + bi + β1tij)
6= exp(β0 + β1tij)
1 + exp(β0 + β1tij)
. Graphical representation:
129
• Hence, the parameter vector β in the GEE modelneeds to be interpreted completely different from theparameter vector β in the GLMM:
. GEE: marginal interpretation
. GLMM: conditional interpretation, conditionallyupon level of random effects
• In general, the model for the marginal average is notof the same parametric form as the conditionalaverage in the GLMM.
• For logistic mixed models, with normally distributedrandom random intercepts, it can be shown that themarginal model can be well approximated by again alogistic model, but with parameters approximatelysatisfying
β
RE
β
M=
√c2σ2 + 1 > 1
σ2 = variance random intercepts
c2 = 16√
3/15π
130
• For the toenail application, σ was estimated as4.0164, such that the ratio equals√c2σ2 + 1 = 2.5649.
• The ratio’s between the GLMM and GEE estimatesare:
GLMM GEE
Parameter Estimate (s.e.) Estimate (s.e.) Ratio
Intercept group A −1.6308 (0.4356) −0.7219 (0.1656) 2.2590
Intercept group B −1.7454 (0.4478) −0.6493 (0.1671) 2.6881
Slope group A −0.4043 (0.0460) −0.1409 (0.0277) 2.8694
Slope group B −0.5657 (0.0601) −0.2548 (0.0380) 2.2202
• Note that this problem does not occur in linear mixedmodels:
. Conditional mean: E(Yi|bi) = Xiβ + Zibi
. Specifically: E(Yi|bi = 0) = Xiβ
. Marginal mean: E(Yi) = Xiβ
131
• The problem arises from the fact that, in general,
E[g(Y )] 6= g[E(Y )]
• So, whenever the random effects enter the conditionalmean in a non-linear way, the regression parameters inthe marginal model need to be interpreted differentlyfrom the regression parameters in the mixed model.
• In practice, the marginal mean can be derived fromthe GLMM output by integrating out the randomeffects.
• This can be done numerically via Gauss quadrature,or based on sampling methods.
132
14.2 Sampling Based Marginalizationon Toenail Data
• As an example, we plot the average evolutions basedon the GLMM output obtained in the toenail example:
P (Yij = 1)
=
E
exp(−1.6308 + bi − 0.4043tij)
1 + exp(−1.6308 + bi − 0.4043tij)
, Treatment A
E
exp(−1.7454 + bi − 0.5657tij)
1 + exp(−1.7454 + bi − 0.5657tij)
, Treatment B
133
• Output:
• The code can easily be extended to models which(also) include random effects other than intercepts.
134
• Average evolutions obtained from the GEE analyses:
P (Yij = 1)
=
exp(−0.7219 − 0.1409tij)
1 + exp(−0.7219 − 0.1409tij), Treatment A
exp(−0.6493 − 0.2548tij)
1 + exp(−0.6493 − 0.2548tij), Treatment B
• Graphical representation:
135
• In a GLMM context, rather than plotting the marginalaverages, one can also plot the profile for an ‘average’subject, i.e., a subject with random effect bi = 0:
P (Yij = 1|bi = 0)
=
exp(−1.6308 − 0.4043tij)
1 + exp(−1.6308 − 0.4043tij), Treatment A
exp(−1.7454 − 0.5657tij)
1 + exp(−1.7454 − 0.5657tij), Treatment B
• Graphical representation:
136
Chapter 15
Key Points
Linear versus Generalized Linear
Distribution: Normal versus exponential family
Mean-variance link: absent versus present
Link: linear versus non-linear
Random effects and measurement error: same place versus
different place
Need for numerical approximation/integration: absent versus
present
Model Families
• Marginal model: β has a population-averaged interpretation
• Random effects model: β conditional upon level of random
effect
137
Chapter 16
Missing Data
• Subject i at occasion (time) j = 1, . . . , ni
• Measurement Yij
• Dropout indicator
Rij =
1 if Yij is observed,
0 otherwise.
• Group Yij into a vector
Y i = (Yi1, . . . , Yini)′ = (Y o
i ,Ymi )
Y oi contains Yij for which Rij = 1,
Y mi contains Yij for which Rij = 0.
• Group Rij into a vector Ri = (Ri1, . . . , Rini)′
• Di: time of dropout: Di = 1 +∑ni
j=1 Rij
138
16.1 Factorizing the Distribution
Consider the distribution of the full data:
f (Yi, Di|θ,ψ)
• θ parametrizes the measurement distribution,
• ψ parametrizes the missingness process.
Several routes are possible:
Selection Models:
f (Yi|θ)f (Di|Yi,ψ)
Pattern-Mixture Models:
f (Yi|Di, θ)f (Di|ψ)
Shared parameter models:
f (Yi, Di|bi, θ,ψ)
139
16.2 Missing Data Processes
f (Di|Y i,ψ) = f (Di|Y oi ,Y
mi ,ψ).
Missing Completely At Random (MCAR)Missingness is independent of the measurements:
f (Di|ψ).
Missing At Random (MAR) Missingness isindependent of the unobserved (missing)measurements, possibly depending on the observedmeasurements:
f (Di|Y oi ,ψ).
Missing Not At Random (MNAR) Missingnessdepends on the missing values.
Ignorability:
• Likelihood/Bayesian + MAR
• Frequentist + MCAR
140
16.3 Growth Data
• Taken from Potthoff and Roy, Biometrika (1964)
• Research question:
Is dental growth related to gender ?
• The distance from the center of the pituitary to themaxillary fissure was recorded at ages 8, 10, 12, and14, for 11 girls and 16 boys
141
• Individual profiles:
• Remarks:
. Much variability between girls / boys
. Considerable variability within girls / boys
. Fixed number of measurements per subject
. Measurements taken at fixed timepoints
142
16.4 Model Fitting
Mean Covar par −2` Ref G2 df p
1 unstr. unstr. 18 416.509
2 6= slopes unstr. 14 419.477 1 2.968 4 0.5632
3 = slopes unstr. 13 426.153 2 6.676 1 0.0098
4 6= slopes Toepl. 8 424.643 2 5.166 6 0.5227
5 6= slopes AR(1) 6 440.681 2 21.204 8 0.0066
4 16.038 2 0.0003
6 6= slopes random 8 427.806 2 8.329 6 0.2150
7 6= slopes CS 6 428.639 2 9.162 8 0.3288
4 3.996 2 0.1356
6 0.833 2 0.6594
6 0.833 1:2 0.5104
8 6= slopes simple 5 478.242 7 49.603 1 <0.0001
7 49.603 0:1 <0.0001
143
Chapter 17
Simple Methods
MCAR
• Rectangular matrix by deletion: complete case
analysis
• Rectangular matrix by completion or imputation
. Vertical: Unconditional mean imputation
. Horizontal: Last observation carried forward
. Diagonal: Conditional mean imputation
• Using data as is: available case analysis
. Likelihood-based frequentist analysis: difficult, notgenerally valid
. Likelihood-based MAR analysis: simple and correct
144
17.1 Simple Methods and Growth Data
Method Model Mean Covar. par
Complete case 7 = slopes CS 6
LOCF 2c quadratic unstr. 16
Unconditional mean 7 = slopes CS 6
Conditional mean 1 unstr. unstr. 18
Simplest methods usually distorting.
145
Chapter 18
Three Likelihood Approaches
MAR
• Direct likelihood maximization
. continuous: SAS PROC MIXED,. . .
. categorical: generalized linear mixed models (SAS PROCNLMIXED,. . . )
. NOT: generalized estimating equations !!!
• EM algorithm
Match the data to the “complete” model
• Multiple Imputation
Accounts properly for uncertainty due to missingness
146
Chapter 19
Direct Likelihood Approach
MAR
Likelihood based inference is valid, whenever
• the mechanism is MAR,
• the parameters describing the missingness mechanismare distinct from the measurement model parameters.
PROC MIXED gives valid inference !
147
19.1 Growth Data
Mean Covar par −2λ Ref G2 df P
1 unstr. unstr. 18 386.957
2 6= slopes unstr. 14 393.288 1 6.331 4 0.1758
3 = slopes unstr. 13 397.400 2 4.112 1 0.0426
4 6= slopes banded 8 398.030 2 4.742 6 0.5773
5 6= slopes AR(1) 6 409.523 2 16.235 8 0.0391
6 6= slopes random 8 400.452 2 7.164 6 0.3059
7 6= slopes CS 6 401.313 6 0.861 2 0.6502
6 0.861 1:2 0.5018
8 6= slopes simple 5 441.583 7 40.270 1 <0.0001
7 40.270 0:1 <0.0001
148
SAS Code
• For Model 1:
proc mixed data=growthav method=ml;
title ’Jennrich and Schluchter (MAR, Altern.), Model 1’;
class sex idnr age;
model measure=sex age*sex / s;
repeated age / type=un subject=idnr r rcorr;
run;
• For Model 2:
data help;
set growthav;
agec=age;
run;
proc mixed data=help method=ml;
title ’Jennrich and Schluchter (MAR, Altern.), Model 2’;
class sex idnr agec;
model measure=sex age*sex / s;
repeated agec / type=un subject=idnr r rcorr;
run;
149
Graphical Summary
150
Chapter 20
Weighted Estimating Equations
MAR and non-ignorable !
• Standard GEE inference is correct only under MCAR.
• Under MAR: weighted GEE is necessary.
Robins, Rotnitzky & Zhao (JASA, 1995)
Fitzmaurice, Molenberghs & Lipsitz (JRSSB, 1995)
151
• The idea is to weight each subject’s contribution inthe GEEs by the inverse probability that a subjectdrops out at the time he dropped out.
• This can be calculated as
P [Di = di] =di−1∏
k=2(1 − P [Rik = 0|Ri2 = . . . = Ri,k−1 = 1]) ×
×P [Ridi= 0|Ri2 = . . . = Ri,di−1 = 1]I{di≤T}
• Such a model can be fitted using logistic regressionwith:
. outcome: whether dropout occurs at a given time,yes or no (from the start of the measurementsequence until time of dropout or the end of thesequence)
. covariates: the outcomes at previous occassions,supplemented with genuine covariate information
• Some data manipulation is needed
• Illustration using analgesic trial
152
20.1 A Model for Dropout
• Code to fit appropriate model:
data gsac;
set gsac;
gendis=axis5_0;
run;
proc genmod data=gsac;
class prevgsa;
model dropout = prevgsa pca0 physfct gendis / pred dist=b;
ods output obstats=pred;
ods listing exclude obstats;
run;
• There is some evidence for MAR:
Model on conditional dropout P [Di = j|Di ≥ j]shows dependence on previous GSA measurements.
• Previous GSA is treated as a class level covariate.
• The model further includes: baseline PCA, physicalfunctioning and genetic/congenital disorder.
153
• Selected output:
Class Level Information
Class Levels Values
prevgsa 5 1 2 3 4 5
Response Profile
Ordered Ordered
Level Value Count
1 0 800
2 1 163
Analysis Of Parameter Estimates
Standard Wald 95% Chi-
Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq
Intercept 1 -1.8043 0.4856 -2.7562 -0.8525 13.80 0.0002
prevgsa 1 1 -1.0183 0.4131 -1.8278 -0.2087 6.08 0.0137
prevgsa 2 1 -1.0374 0.3772 -1.7767 -0.2980 7.56 0.0060
prevgsa 3 1 -1.3439 0.3716 -2.0721 -0.6156 13.08 0.0003
prevgsa 4 1 -0.2636 0.3828 -1.0140 0.4867 0.47 0.4910
prevgsa 5 0 0.0000 0.0000 0.0000 0.0000 . .
pca0 1 0.2542 0.0986 0.0609 0.4474 6.65 0.0099
PHYSFCT 1 0.0090 0.0038 0.0015 0.0165 5.54 0.0186
gendis 1 0.5863 0.2420 0.1120 1.0607 5.87 0.0154
Scale 0 1.0000 0.0000 1.0000 1.0000
• The predicted probabilities from this model need tobe translated into weights
154
20.2 Computing the Weights
• Predicted values have been saved by means of ODSOUTPUT in the logistic regression (GENMOD call)
• The weights are now defined at the individualmeasurement level:
. At the first occasion, the weight is w = 1
. At other than the last ocassion, the weight is thealready accumulated weight, multiplied by 1−thepredicted probability
. At the last occasion within a sequence where
dropout occurs the weight is multiplied by thepredicted probability
. At the end of the process, the weight is inverted
• Program statements (creating at the same timeindicators for the dropout patterns out of themissingness patterns, for the case where this could beuseful)
155
data gsaw;
merge pred gsac(drop=dropout);
run;
data wgt(keep=patid wi);
set gsaw;
by patid;
retain wi;
if first.patid then wi=1;
if not last.patid then wi=wi*(1-pred);
if last.patid then do;
if mpattern not in ("----","--*-","-*--","-**-") then wi=wi*pred;
else wi=wi*(1-pred);
wi=1/wi;
output;
end;
run;
data repbin.gsaw;
merge repbin.gsa wgt;
by patid;
dpattern=mpattern;
if mpattern in ("***-","*-*-","*---","-**-","-*--","--*-")
then dpattern="----";
else if mpattern in ("**-*","*--*","-*-*") then dpattern="---*";
else if mpattern in ("*-**") then dpattern="--**";
run;
156
20.3 Weighted Estimating Equations
• Given the preparatory work, we only need to includethe weights by means ofthe SCWGT statement withinthe GENMOD procedure.
• Together with the use of the REPEATED statement,weighted GEE are then obtained.
• Code:
proc genmod data=repbin.gsaw;
scwgt wi;
class patid timecls;
model gsabin = time|time pca0 / dist=b;
repeated subject=patid / type=un corrw within=timecls;
run;
157
• GEE versus WGEE (empirical s.e.):
Variable GEE WGEE
Intercept 2.950 (0.465) 2.166 (0.694)
Time -0.842 (0.325) -0.437 (0.443)
Time2 0.181 (0.066) 0.120 (0.089)
Baseline PCA -0.244 (0.097) -0.159 (0.130)
GEE
1 0.173 0.246 0.201
1 0.177 0.113
1 0.456
1
WGEE
1 0.215 0.253 0.167
1 0.196 0.113
1 0.409
1
158
Chapter 21
Selection Models for MNAR andSensitivity Analysis
MNAR
Diggle and Kenward Selection Model (1994)
• Linear mixed model for Y i:
Y i = Xiβ + Zibi + εi
• Logistic regressions for dropout:
logit [P (Di = j | Di ≥ j, Yi,j−1, Yij)]
= ψ0 + ψ1Yi,j−1 + ψ2Yij
159
21.1 Mastitis in Dairy Cattle
• Infectious disease of the udder
• Leads to a reduction in milk yield
• There is a belief among dairy scientists that highyielding cows are more susceptible
• But this cannot be measured directly because of theeffect of the disease: evidence is missing
Is the probability of occurrence of mastitis
related to the yield that would have been observed
had mastitis not occured ?
160
Analysis of Diggle and Kenward
• Model for milk yield: f (Yi)
Yi ∼ N (µ,Σ)
• Model for occurrence of mastitis: f (Ri | Yi)
logit [P (Ri = 1 | Yi1, Yi2)] = ψ0 + ψ1Yi1 + ψ2Yi2
• Fitted model for occurrence of mastitis:
logit [P (Ri = 1 | Yi1, Yi2)]
= 0.37 + 2.25Yi1 − 2.54Yi2
= 0.37 − 0.29Yi1 − 2.54(Yi2 − Yi1)
• LR test for H0 : ψ2 = 0 : G2 = 5.11
Diggle and Kenward (1994)
161
21.2 Issues
• Fitting such a model
. OSWALD in SPlus (see book)
. user-defined software
. convergence almost always an issue
. many properties still under study (e.g.,asymptotics)
• Sensitivity !
. How much belief can be put in model and itsconclusions ?
162
Criticism −→ . . .
• Nice to have MNAR tools ?
Rubin (1994):
“. . . , even inferences for the data parameters generally
depend on the posited missingness mechanism, a fact that
typically implies greatly increased sensitivity of
inference. . . ”
Laird (1994):
“. . . , estimating the ‘unestimable’ can be accomplished only
by making modelling assumptions,. . . . The consequences
of model misspecification will (. . . ) be more severe in the
non-random case.”
Little (1995):
“. . . , but estimates rely heavily on normal assumptions and
the correct specification of the dropout model. . . ”
Molenberghs, Kenward & Lesaffre (1997):
“. . . conclusions are conditional on the appropriateness of the
assumed model, which in a fundamental sense is not
testable.”
• Forget about MNAR ?
163
. . .−→ Sensitivity Analysis
• Fit several plausible models
• Ranges of inference: Interval of Ignorance
• Change distributional assumptions (Kenward 1998)
• Local influence
• Anomalous subjects/measures
• Influential observations
• Semi-parametric framework (Scharfstein et al 1999)
• Pattern-mixture models
• Flexible and general modeling framework
164