a short course in longitudinal data analysis

157
A Short Course in Longitudinal Data Analysis Peter J Diggle Nicola Reeve, Michelle Stanton (School of Health and Medicine, Lancaster University) Lancaster, June 2011

Upload: ciaomitico

Post on 13-Apr-2015

78 views

Category:

Documents


5 download

DESCRIPTION

A Short Course in Longitudinal DataAnalysis

TRANSCRIPT

Page 1: A Short Course in Longitudinal Data  Analysis

A Short Course in Longitudinal DataAnalysis

Peter J DiggleNicola Reeve, Michelle Stanton

(School of Health and Medicine, Lancaster University)

Lancaster, June 2011

Page 2: A Short Course in Longitudinal Data  Analysis

Timetable

Day 1

9.00 Registration9.30 Lecture 1 Motivating examples, exploratory analysis

11.00 BREAK11.30 Lab 1 Introduction to R

12.30 LUNCH13.30 Lecture 2 Linear modelling of repeated measurements15.00 BREAK15.30 Lab 2 Exploring longitudunal data17.00 CLOSE

Page 3: A Short Course in Longitudinal Data  Analysis

Day 2

9.00 Lecture 3 Generalized linear models (GLM’s)10.00 Lab 3 The nlme package11.30 BREAK12.00 Lecture 4 Joint modelling13.00 Lunch14.00 Lab 4 Marginal and random effects GLM’s16.00 CLOSE

Page 4: A Short Course in Longitudinal Data  Analysis

Lecture 1

• examples

• scientific objectives

• why longitudinal data are correlated and whythis matters

• balanced and unbalanced data

• tabular and graphical summaries

• exploring mean response profiles

• exploring correlation structure

Page 5: A Short Course in Longitudinal Data  Analysis

Example 1. Reading ability and age

5 6 7 8 9 10

5055

6065

7075

80

age

read

ing

abili

ty

Page 6: A Short Course in Longitudinal Data  Analysis

Example 1. Reading ability and age

5 6 7 8 9 10

5055

6065

7075

80

age

read

ing

abili

ty

5 6 7 8 9 10

5055

6065

7075

80

age

read

ing

abili

ty

Page 7: A Short Course in Longitudinal Data  Analysis

Example 1. Reading ability and age

5 6 7 8 9 10

5055

6065

7075

80

age

read

ing

abili

ty

5 6 7 8 9 10

5055

6065

7075

80

age

read

ing

abili

ty

5 6 7 8 9 10

5055

6065

7075

80

age

read

ing

abili

ty

Longitudinal designs enable us to distinguish cross-sectionaland longitudinal effects.

Page 8: A Short Course in Longitudinal Data  Analysis

Example 2. CD4+ cell numbersCohort of 369 HIV seroconverters, CD4+ cell-count measuredat approximately six-month intervals, variable number ofmeasurements per subject.

−2 0 2 4

1020

3040

50

time (weeks since seroconversion)

squa

re−

root

−C

D4

Page 9: A Short Course in Longitudinal Data  Analysis

Example 2. CD4+ cell numbersCohort of 369 HIV seroconverters, CD4+ cell-count measuredat approximately six-month intervals, variable number ofmeasurements per subject.

−2 0 2 4

1020

3040

50

time (weeks since seroconversion)

squa

re−

root

−C

D4

Page 10: A Short Course in Longitudinal Data  Analysis

Example 3. Schizophrenia trial

• randomised clinical trial of drug therapies

• three treatments:

– haloperidol (standard)

– placebo

– risperidone (novel)

• dropout due to “inadequate response to treatment”

Treatment Number of non-dropouts at week0 1 2 4 6 8

haloperidol 85 83 74 64 46 41placebo 88 86 70 56 40 29risperidone 345 340 307 276 229 199total 518 509 451 396 315 269

Page 11: A Short Course in Longitudinal Data  Analysis

Schizophrenia trial data (PANSS)

0 2 4 6 8

050

100

150

time (weeks since randomisation)

PA

NS

S

Page 12: A Short Course in Longitudinal Data  Analysis

Scientific Objectives

Pragmatic philosophy: method of analysis should take accountof the scientific goals of the study.

All models are wrong, but some models are useful

G.E.P. Box

• scientific understanding or empirical description?

• individual-level or population-level focus?

• mean response or variation about the mean?

Page 13: A Short Course in Longitudinal Data  Analysis

Example 5. Smoking and health

• public health perspective – how would smoking reductionpolicies/programmes affect the health of the community?

• clinical perspective – how would smoking reduction affectthe health of my patient?

Page 14: A Short Course in Longitudinal Data  Analysis

Correlation and why it matters

• different measurements on the same subject are typicallycorrelated

• and this must be recognised in the inferential process.

Page 15: A Short Course in Longitudinal Data  Analysis

Estimating the mean of a time series

Y1, Y2, ..., Yt, ..., Yn Yt ∼ N(µ, σ2)

Classical result from elementary statistical theory:

Y ± 2√

σ2/n

But if Yt is a time series:

• E[Y ] = µ

• Var{Y } = (σ2/n) × {1 + n−1∑

u 6=t Corr(Yt, Yu)}

Page 16: A Short Course in Longitudinal Data  Analysis

Correlation may or may not hurt you

Yit = α + β(t − t) + Zit i = 1, ...,m t = 1, ..., n

−4 −2 0 2 4

24

68

10

time

resp

onse

Page 17: A Short Course in Longitudinal Data  Analysis

Correlation may or may not hurt you

Yit = α + β(t − t) + Zit i = 1, ...,m t = 1, ..., n

−4 −2 0 2 4

24

68

10

time

resp

onse

Page 18: A Short Course in Longitudinal Data  Analysis

Correlation may or may not hurt you

Yit = α + β(t − t) + Zit i = 1, ...,m t = 1, ..., n

−4 −2 0 2 4

24

68

10

time

resp

onse

Page 19: A Short Course in Longitudinal Data  Analysis

Correlation may or may not hurt you

Yit = α + β(t − t) + Zit i = 1, ...,m t = 1, ..., n

Parameter estimates and standard errors:

ignoring correlation recognising correlationestimate standard error estimate standard error

α 5.234 0.074 5.234 0.202β 0.493 0.026 0.493 0.011

Page 20: A Short Course in Longitudinal Data  Analysis

Balanced and unbalanced designs

Yij = jth measurement on ith subjecttij = time at which Yij is measured

• balanced design: tij = tj for all subjects i

• a balanced design may generate unbalanced data

Page 21: A Short Course in Longitudinal Data  Analysis

Missing values

• dropout

• intermittent missing values

• loss-to-follow-up

Page 22: A Short Course in Longitudinal Data  Analysis

Random sample of PANSS response profiles

0 2 4 6 8

4060

8010

012

014

016

0

weeks

PA

NS

S

Page 23: A Short Course in Longitudinal Data  Analysis

Tabular summary

PANSS treatment group 1 (standard drug)

Week Mean Variance Correlation0 93.61 214.69 1.00 0.46 0.44 0.49 0.45 0.411 89.07 272.46 0.46 1.00 0.71 0.59 0.65 0.512 84.72 327.50 0.44 0.71 1.00 0.81 0.77 0.544 80.68 358.30 0.49 0.59 0.81 1.00 0.88 0.726 74.63 376.99 0.45 0.65 0.77 0.88 1.00 0.848 74.32 476.02 0.41 0.51 0.54 0.72 0.84 1.00

Page 24: A Short Course in Longitudinal Data  Analysis

More than one treatment?

• separate tables for each treatment group

• look for similarities and differences

Covariates?

• use residuals from working model fitted byordinary least squares

Page 25: A Short Course in Longitudinal Data  Analysis

Graphical summary

• spaghetti plots

• mean response profiles

• non-parametric smoothing

• pairwise scatterplots

• variograms

Page 26: A Short Course in Longitudinal Data  Analysis

A spaghetti plot

0 2 4 6 8

050

100

150

time (weeks)

PA

NS

S

0 2 4 6 8

050

100

150

time (weeks)

PA

NS

S

0 2 4 6 8

050

100

150

time (weeks since randomisation)

PA

NS

S

Page 27: A Short Course in Longitudinal Data  Analysis

A slightly better spaghetti plot

0 2 4 6 8

050

100

150

time (weeks)

PA

NS

S

0 2 4 6 8

050

100

150

time (weeks)

PA

NS

S

0 2 4 6 8

050

100

150

time (weeks since randomisation)

PA

NS

S

Page 28: A Short Course in Longitudinal Data  Analysis

A set of mean response profiles

0 2 4 6 8

7075

8085

9095

100

time (weeks post−randomisation)

PA

NS

S

Page 29: A Short Course in Longitudinal Data  Analysis

Kernel smoothing

• useful for unbalanced data

• simplest version is mean response within a movingtime-window

µ(t) = average(yij : |tij − t| < h/2)

• more sophisticated version:

– kernel function k(·) (symmetric pdf)

– band-width h

µ(t) =∑

yijk{(tij − t)/h}/∑

k{(tij − t)/h}

Page 30: A Short Course in Longitudinal Data  Analysis

Smoothing the CD4 data

Data

−2 0 2 4

050

010

0015

0020

0025

0030

00

time (years since seroconversion)

CD

4

Page 31: A Short Course in Longitudinal Data  Analysis

Smoothing the CD4 data

Data, uniform kernel

−2 0 2 4

050

010

0015

0020

0025

0030

00

time (years since seroconversion)

CD

4

Page 32: A Short Course in Longitudinal Data  Analysis

Smoothing the CD4 data

Data, uniform and Gaussian kernels

−2 0 2 4

050

010

0015

0020

0025

0030

00

time (years since seroconversion)

CD

4

Page 33: A Short Course in Longitudinal Data  Analysis

Smoothing the CD4 data

Data, Gaussian kernels with small and large band-widths

−2 0 2 4

050

010

0015

0020

0025

0030

00

time (years since seroconversion)

CD

4

Page 34: A Short Course in Longitudinal Data  Analysis

The variogram

The variogram of a stochastic process Y (t) is

V (u) =1

2Var{Y (t) − Y (t − u)}

• well-defined for stationary and some non-stationaryprocesses

• for stationary processes,

V (u) = σ2{1 − ρ(u)}

• easier to estimate V (u) than ρ(u) when data areunbalanced

Page 35: A Short Course in Longitudinal Data  Analysis

Estimating the variogram

rij = residual from preliminary model for mean response

• Define

vijkℓ =1

2(rij − rkℓ)

2

• Estimate

V (u) = average of all quantities vijiℓ such that |tij−tiℓ| ≃ u

• Estimate of process variance

σ2 = average of all quantities vijkℓ such that i 6= k.

Page 36: A Short Course in Longitudinal Data  Analysis

Example 2. Square-root CD4+ cell numbers

0 1 2 3 4 5 6

010

020

030

040

050

0

u

V(u

)

Very large sampling fluctuations hide the information

Page 37: A Short Course in Longitudinal Data  Analysis

Smoothing the empirical variogram

• For irregularly spaced data:

– group time-differences u into bands

– take averages of corresponding vijil

• For data from a balanced design, usually no need toaverage over bands of values for u

Page 38: A Short Course in Longitudinal Data  Analysis

Example 2. CD4+ cell numbers

0 1 2 3 4 5 6 7

010

2030

40

u

V(u

)

Page 39: A Short Course in Longitudinal Data  Analysis

Lag

Var

iogr

am

0 2 4 6 8

010

020

030

040

0

Example 3. schizophrenia trial

Page 40: A Short Course in Longitudinal Data  Analysis

Sampling distribution of the empiricalvariogram

• Gaussian distribution

• balanced data

• s subjects in each of p experimental groups

• n measurement times per subject

• mean responses estimated by ordinary least squares fromsaturated treatments-by-times model

Page 41: A Short Course in Longitudinal Data  Analysis

Properties of V (u)

• Marginal distribution of empirical variogram ordinates is

vijil ∼ {(s − 1)/s}V (uijil)χ21

• {s/(s − 1)}V (u) is unbiased for V (u)

• expression available for covariance between any twoquantities vijil

• hence can compute variance of V (u)

• typically, Var{V (u)} increases (sharply) as u increases

Page 42: A Short Course in Longitudinal Data  Analysis

Where does the correlation come from?

• differences between subjects

• variation over time within subjects

• measurement error

Page 43: A Short Course in Longitudinal Data  Analysis

data=read.table("CD4.data",header=T)

data[1:3,]

time=data$time

CD4=data$CD4

plot(time,CD4,pch=19,cex=0.25)

id=data$id

uid=unique(id)

for (i in 1:10) {

take=(id==uid[i])

lines(time[take],CD4[take],col=i,lwd=2)

}

Page 44: A Short Course in Longitudinal Data  Analysis

Lecture 2

• The general linear model with correlated residuals

• parametric models for the covariance structure

• the clever ostrich (why ordinary least squares may notbe a silly thing to do)

• weighted least squares as maximum likelihood underGaussian assumptions

• missing values and dropouts

Page 45: A Short Course in Longitudinal Data  Analysis

General linear model, correlated residuals

E(Yij) = xij1β1 + ... + xijpβp

Yi = Xiβ + ǫi

Y = Xβ + ǫ

• measurements from different subjects independent

• measurements from same subject typically correlated.

Page 46: A Short Course in Longitudinal Data  Analysis

Parametric models for covariance structure

Three sources of random variation in a typical set oflongitudinal data:

• Random effects (variation between subjects)

– characteristics of individual subjects

– for example, intrinsically high or low responders

– influence extends to all measurements on thesubject in question.

Page 47: A Short Course in Longitudinal Data  Analysis

Parametric models for covariance structure

Three sources of random variation in a typical set oflongitudinal data:

• Random effects

• Serial correlation (variation over time within subjects)

– measurements taken close together in time typicallymore strongly correlated than those taken furtherapart in time

– on a sufficiently small time-scale, this kind ofstructure is almost inevitable

Page 48: A Short Course in Longitudinal Data  Analysis

Parametric models for covariance structure

Three sources of random variation in a typical set oflongitudinal data:

• Random effects

• Serial correlation

• Measurement error

– when measurements involve delicate determinations,duplicate measurements at same time on samesubject may show substantial variation

Page 49: A Short Course in Longitudinal Data  Analysis

Some simple models

• Compound symmetry

Yij − µij = Ui + Zij

Ui ∼ N(0, ν2)

Zij ∼ N(0, τ 2)

Implies that Corr(Yij , Yik) = ν2/(ν2 + τ 2), for all j 6= k

Page 50: A Short Course in Longitudinal Data  Analysis

• Random intercept and slope

Yij − µij = Ui + Witij + Zij

(Ui,Wi) ∼ BVN(0,Σ)

Zij ∼ N(0, τ 2)

Often fits short sequences well, but extrapolationdubious, for example Var(Yij) quadratic in tij

Page 51: A Short Course in Longitudinal Data  Analysis

• Autoregressive

Yij − µij = α(Yi,j−1 − µi,j−1) + Zij

Yi1 − µi1 ∼ N{0, τ 2/(1 − α2)}

Zij ∼ N(0, τ 2), j = 2, 3, ...

Not a natural choice for underlying continuous-timeprocesses

Page 52: A Short Course in Longitudinal Data  Analysis

• Stationary Gaussian process

Yij − µij = Wi(tij)

Wi(t) a continuous-time Gaussian process

E[W (t)] = 0 Var{W (t)} = σ2

Corr{W (t),W (t − u)} = ρ(u)

ρ(u) = exp(−u/φ) gives continuous-time versionof the autoregressive model

Page 53: A Short Course in Longitudinal Data  Analysis

Time-varying random effects

0 200 400 600 800 1000

−3

−2

−1

01

23

intercept and slope

t

R(t

)

Page 54: A Short Course in Longitudinal Data  Analysis

Time-varying random effects: continued

0 200 400 600 800 1000

−3

−2

−1

01

23

stationary process

t

R(t

)

Page 55: A Short Course in Longitudinal Data  Analysis

• A general model

Yij − µij = d′ijUi + Wi(tij) + Zij

Ui ∼ MVN(0,Σ)(random effects)

dij = vector of explanatory variables for random effects

Wi(t) = continuous-time Gaussian process(serial correlation)

Zij ∼ N(0, τ 2)(measurement errors)

Even when all three components of variation are neededin principle, one or two may dominate in practice

Page 56: A Short Course in Longitudinal Data  Analysis

The variogram of the general model

Yij − µij = d′ijUi + Wi(tij) + Zij

V (u) = τ 2 + σ2{1 − ρ(u)} Var(Yij) = ν2 + σ2 + τ 2

0 1 2 3 4 5 6

0.0

0.2

0.4

0.6

0.8

1.0

u

V(u

)

Page 57: A Short Course in Longitudinal Data  Analysis

Fitting the model

1. A non-technical summary

2. The gory details

Page 58: A Short Course in Longitudinal Data  Analysis

Fitting the model: non-technical summary

• Ad hoc methods won’t do

• Likelihood-based inference is the statistical gold standard

• But be sure you know what you are estimating

Page 59: A Short Course in Longitudinal Data  Analysis

Fitting the model: the gory details

1. The clever ostrich: robust version of ordinary least squares

2. The very clever ostrich: robust version of weighted leastsquares

3. Likelihood-based inference: ML and REML

Page 60: A Short Course in Longitudinal Data  Analysis

The clever ostrich

• use ordinary least squares for exploratory analysis andpoint estimation (ostrich)

• use sample covariance matrix of residuals to giveconsistent estimates of standard errors (clever)

Procedure as follows:

• y = Xβ + ǫ : V ar(ǫ) = V

• β = (X′X)−1X′y ≡ Dy

• Var(β) = DV D′ ≃ DV D′,

V = sample covariance matrix of OLS residuals

β ∼ MV N(β,DV D′)

Page 61: A Short Course in Longitudinal Data  Analysis

Good points:

• technically simple

• often reasonably efficient, and efficiency can be improvedby using plausible weighting matrix W to reflect likelycovariance structure (see below)

• don’t need to specify covariance structure.

Bad points:

• sometimes very inefficient (recall linear regressionexample)

• accurate non-parametric estimation of V needs highreplication (small ni, large m)

• assumes missingness completely at random(more on this later)

Page 62: A Short Course in Longitudinal Data  Analysis

Weighted least squares estimation

Weighted least squares estimate of β minimizes

S(β) = (y − Xβ)′W (y − Xβ)

where W is a symmetric weight matrix

Solution isβW = (X′WX)−1X′Wy.

• unbiased : E(βW ) = β, for any choice of W ,

• Var(βW ) = {(X′WX)−1X′W}V {WX(X′WX)−1} = Σ

InferenceβW ∼ N(β,Σ)

Page 63: A Short Course in Longitudinal Data  Analysis

Special cases

1. W = I: ordinary least squares

• β = (X′X)−1X′y,

• Var(β) = (X′X)−1X′V X(X′X)−1.

2. W = V −1: maximum likelihood under Gaussianassumptions with known V

• β = (X′V −1X)−1X′V −1y,

• Var(β) = (X′V −1X)−1.

Page 64: A Short Course in Longitudinal Data  Analysis

Maximum likelihood estimation (V0 known)

Log-likelihood for observed data y is

L(β, σ2, V0) = −0.5{nm logσ2 + m log |V0|

+σ−2(y − Xβ)′(I ⊗ V0)−1(y − Xβ)} (1)

where I ⊗ V0 is block-diagonal matrix, non-zero blocks V0

Given V0, estimator for β is

β(V0) = (X′(I ⊗ V0)−1X)−1X′(I ⊗ V0)

−1y, (2)

the weighted least squares estimates with W = (I ⊗ V0)−1.

Explicit estimator for σ2 also available as

σ2(V0) = RSS(V0)/(nm) (3)

RSS(V0) = {y − Xβ(V0)}′(I ⊗ V0)

−1{y − Xβ(V0)}.

Page 65: A Short Course in Longitudinal Data  Analysis

Maximum likelihood estimation, V0 unknown

Substitute (2) and (3) into (1) to give reduced log-likelihood

L(V0) = −0.5m[n log{RSS(V0)} + log |V0|]. (4)

Numerical maximization of (4) then gives V0, hence β ≡ β(V0)and σ2 ≡ σ2(V0).

• Dimensionality of optimisation is 12n(n + 1) − 1

• Each evaluation of L(V0) requires inverse anddeterminant of an n by n matrix.

Page 66: A Short Course in Longitudinal Data  Analysis

REML: what is it and why use it?

• design matrix X influences estimation of covariancestructure, hence

– wrong X gives inconsistent estimates of σ2 and V0.

• remedy for designed experiments (n measurement timesand g treatment groups) is to assume a saturatedtreatments-by-times model for estimation of σ2 and V0

• but

– model for mean response then has ng parameters

– if ng is large, maximum likelihood estimates of σ2

and V0 may be seriously biased

• saturated model not well-defined for most observationalstudies

Page 67: A Short Course in Longitudinal Data  Analysis

Restricted maximum likelihood (REML)

• REML is a generalisation of the unbiased sample varianceestimator,

s2 =n∑

i=1

(Yi − Y )2

• Assume that Y follows a linear model as before,

Y ∼ MV N{Xβ, σ2(I ⊗ V0)}.

• transform data Y to Y ∗ = Ay, matrix A chosen to makedistribution of Y ∗ independent of β.

Example: transform to ordinary least squares residual space:

β = (X′X)−1X′y Y ∗ = Y − Xβ = {I − X(XX)−1X′}Y

Page 68: A Short Course in Longitudinal Data  Analysis

REML calculations

• Y ∗ = Ay

• Y ∗ has singular multivariate Normal distribution,

Y ∗ ∼ MV N{0, σ2A(I ⊗ V0)A′}

independent of β.

• estimate σ2 and V0 by maximising likelihood based onthe transformed data Y ∗.

Page 69: A Short Course in Longitudinal Data  Analysis

β(V0) = (X′(I ⊗ V0)−1X)−1X′(I ⊗ V0)

−1Y

σ2(V0) = RSS(V0)/(N − p)

L∗(V0) = −0.5m[n log{RSS(V0)} + log |V0|] − 0.5 log |X′(I ⊗ V0)−1

= L(V0) − 0.5 log |X′(I ⊗ V0)−1X|

Note that:

• different choices for A correspond to different rotationsof coordinate axes within the residual space

• hence, REML estimates do not depend on A

Page 70: A Short Course in Longitudinal Data  Analysis

fit1=lm(CD4~time)

summary(fit1)

library(nlme)

?lme

fit2=lme(CD4~time,random=~1|id)

summary(fit2)

Page 71: A Short Course in Longitudinal Data  Analysis

Lecture 3

Generalized linear models for longitudinal data

• marginal, transition and random effects models: whythey address different scientific questions

• generalized estimating equations: what they canand cannot do

Page 72: A Short Course in Longitudinal Data  Analysis

Analysing non-Gaussian data

The classical GLM unifies previously disparate methodologiesfor a wide range of problems, including :

• multiple regression/ANOVA (Gaussian responses)

• probit and logit regression (binary responses)

• log-linear modelling (categorical responses)

• Poisson regression (counted responses)

• survival analysis (non-negative continuous responses).

How should we extend the classical GLM to analyselongitudinal data?

Page 73: A Short Course in Longitudinal Data  Analysis

Generalized linear models for independentresponses

Applicable to mutually independent responses Yi : i = 1, ..., n.

1. E(Yi) = µi : h(µi) = x′iβ, where h(·) is known link

function, xi is vector of explanatory variables attachedto ith response, Yi

2. Var(Yi) = φv(µi) where v(·) is known variance function

3. pdf of Yi is f(yi;µi, φ)

Page 74: A Short Course in Longitudinal Data  Analysis

Two examples of classical GLM’s

Example 1: simple linear regression

Yi ∼ N(β1 + β2di, σ2)

• xi = (1, di)′

• h(µ) = µ

• v(µ) = 1, φ = σ2

• f(yi;µi, φ) = N(µi, φ)

Page 75: A Short Course in Longitudinal Data  Analysis

Example 2: Bernoulli logistic model (binary response)

P (Yi = 1) = exp(β1 + β2di)/{1 + exp(β1 + β2di)}

• xi = (1, di)′

• h(µ) = log{µ/(1 − µ)

• v(µ) = µ(1 − µ), φ = 1

• f(yi;µi) = Bernoulli

Page 76: A Short Course in Longitudinal Data  Analysis

Three GLM constructions for longitudinal data

• random effects models

• transition models

• marginal models

Page 77: A Short Course in Longitudinal Data  Analysis

Random effects GLM

Responses Yi1, . . . , Yinion an individual subject conditionally

independent, given unobserved vector of random effects Ui,i = 1, . . . ,m.

Ui ∼ g(θ) represents properties of individual subjects thatvary randomly between subjects

• E(Yij |Ui) = µij : h(µij) = x′ijβ + z′ijUi

• Var(Yij|Ui) = φv(µij)

• (Yi1, . . . , Yini) are mutually independent

conditional on Ui.

Likelihood inference requires evaluation of

f(y) =

∫ m∏

i=1

f(yi|Ui)g(Ui)dUi

Page 78: A Short Course in Longitudinal Data  Analysis

Transition GLM

Conditional distribution of each Yij modelled directly in termsof preceding Yi1, . . . , Yij−1.

• E(Yij |history) = µij

• h(µij) = x′ijβ + Y ′

(ij)α, where Y(ij) = (Yi1, . . . , Yij−1)

• Var(Yij|history) = φv(µij)

Construct likelihood as product of conditional distributions,usually assuming restricted form of dependence, for example:

fk(yij|yi1, ..., yij−1) = fk(yij|yij−1)

and condition on yi1 as model does not directly specify fi1(yi1).

Page 79: A Short Course in Longitudinal Data  Analysis

Marginal GLM

Let h(·) be a link function which operates component-wise,

• E(yij) = µij : h(µij) = X′

ijβ

• Var(yij) = φv(µij)

• Corr(yij) = R(α)ij.

Not a fully specified probability model

May require constraints on variance function v(·) andcorrelation matrix R(·) for valid specification

Inference for β uses method of generalized estimating equations(the clever ostrich revisited)

Page 80: A Short Course in Longitudinal Data  Analysis

Indonesian children’s health study

• ICHS - of interest is to investigate the association be-tween risk of respiratory illness and vitamin A deficiency

• Over 3000 children medically examined quarterly for upto six visits to assess whether they suffered from respira-tory illness (yes/no = 1/0) and xerophthalmia (an ocularmanifestation of vitamin A deficiency) (yes/no = 1/0).

• Let Yij be the binary random variable indicating whetherchild i suffers from respiratory illness at time tij.

• Let xij be the covariate indicating whether child i isvitamin A deficient at time tij.

Page 81: A Short Course in Longitudinal Data  Analysis

Marginal model for ICHS study

• E[Yij] = µij = PP (Yij = 1)

• log(µij

1−µij) = β0 + β1xij

• Var(Yij) = µij(1 − µij)

• Corr(Yij, Yik) = α.

Page 82: A Short Course in Longitudinal Data  Analysis

Marginal model - interpretation of regressionparameters

• exp(β0) is the odds of infection for any child with repletevitamin A.

• exp(β1) is the odds ratio for any child - i.e. the odds ofinfection among vitamin A deficient children divided bythe odds of infection among children replete withvitamin A.

• exp(β1) is a ratio of population frequencies - a population-averaged parameter.

• β1 represents the effect of the explanatory variable(vitamin A status) on any child’s chances of respiratoryinfection.

Page 83: A Short Course in Longitudinal Data  Analysis

Random effects model for ICHS study

• E[Yij|Ui] = µij = P(Yij = 1|Ui)

• log(µij

1−µij) = β∗

0 + Ui + β∗1xij where Ui ∼ N(0, γ2)

• Ui represents the ith child’s propensity for infectionattributed to unmeasured factors (which could be ge-netic,environmental ...)

• Var(Yij|Ui) = µij(1 − µij)

• Yij|Ui ⊥ Yik|Ui for j 6= k.

Page 84: A Short Course in Longitudinal Data  Analysis

Random effects model - interpretation ofregression parameters

• exp(β∗0) is the odds of infection for a child with average

propensity for infection and with replete vitamin A.

• exp(β∗1) is the odds ratio for a specific child - i.e. the

odds of infection for a vitamin A deficient child dividedby the odds of infection for the same child replete withvitamin A.

• β1 represents the effect of the explanatory variable(vitamin A status) upon an individual child’s chance ofrespiratory infection.

Page 85: A Short Course in Longitudinal Data  Analysis

Estimating equations

Estimating equations for β in a classical GLM:

S(βj) =

n∑

i=1

∂µi

∂βj

v−1i (Yi − µi) = 0 : j = 1, ..., p

where vi = Var(Yi).

In vector-matrix notation:

S(β) = D′µβV

−1(Y − µ) = 0

• Dµβ is an n × p matrix with ijth element ∂µi

∂βj

• V is an n × n diagonal matrix with non-zero elementsproportional to Var(Yi)

• Y and µ are n-element vectors with elements Yi and µi

Page 86: A Short Course in Longitudinal Data  Analysis

Generalized estimating equations (GEE)

In longitudinal setting:

• in previous slide Yi and µi were scalars. In the longitu-dinal setting they are replaced by ni-element vectors Yi

and µi, associated with ith subject

• corresponding matrices Vi(α) = Var(Yi) are no longerdiagonal

Estimating equations for complete set of data, Y = (Y1, ..., Ym),

S(β) =m∑

i=1

{Dµiβ}′{Vi(α)}−1(Yi − µi) = 0

Page 87: A Short Course in Longitudinal Data  Analysis

Large-sample properties of resulting estimates β

(m)(β − β) ∼ MV N(0, I−10 ) (5)

where

I0 =m∑

i=1

{Dµiβ}′{Vi(α)}−1Dµiβ

What to do when variance matrices Vi(α) are unknown?

Page 88: A Short Course in Longitudinal Data  Analysis

The working covariance matrix

S(β) =m∑

i=1

{Dµiβ}′{V ∗

i (α)}−1(Yi − µi) = 0

V ∗i (·) is a guess at the covariance matrix of Yi, called the

working covariance matrix

Result (5) on distribution of β now modified to

(m)(β − β) ∼ MV N(0, I−10 I1I

−10 ) (6)

where

I0 =m∑

i=1

{Dµiβ}′{Vi(α)}−1Dµiβ

and

I1 =m∑

i=1

{Dµiβ}′{V ∗

i (α)}−1Var(Yi){V∗i (α)}−1Dµiβ

Page 89: A Short Course in Longitudinal Data  Analysis

Properties:

• result (6) reduces to (5) if V ∗i (·) = Vi(·)

• estimator β is consistent even if V ∗i (·) 6= Vi(·)

• to calculate an approximation to I1, replace Var(Yi) by

(Yi − µi)(Yi − µi)′

where µi = µi(β)

Gives a terrible estimator of Var(Yi), but OK in practiceprovided:

– number of subjects, m, is large

– same model for µi fitted to groups of subjects;

– observation times common to all subjects

• but a bad choice of V ∗i (·) does affect efficiency of β.

Page 90: A Short Course in Longitudinal Data  Analysis

What are we estimating?

• in marginal modelling, β measures population-averagedeffects of explanatory variables on mean response

• in transition or random effects modelling, β measureseffects of explanatory variables on mean response of anindividual subject, conditional on

– subject’s measurement history (transition model)

– subject’s own random characteristics Ui

(random effects model)

Page 91: A Short Course in Longitudinal Data  Analysis

Example: Simulation of a logistic regression model,probability of positive response from subject i at time t is pi(t),

logit{pi(t)} : α + βx(t) + γUi,

x(t) is a continuous covariate and Ui is a random effect

−10 −5 0 5 10

0.0

0.2

0.4

0.6

0.8

1.0

x

P(y

=1)

Page 92: A Short Course in Longitudinal Data  Analysis

Example: Effect of mother’s smoking on probability ofintra-uterine growth retardation (IUGR).

Consider a binary response Y = 1/0 to indicate whether a babyexperiences IUGR, and a covariate x to measure the mother’samount of smoking.

Two relevant questions:

1. public health: by how much might population incidenceof IUGR be reduced by a reduction in smoking?

2. clinical/biomedical: by how much is a baby’s risk of IUGRreduced by a reduction in their mother’s smoking?

Question 1 is addressed by a marginal model, question 2 by arandom effects model

Page 93: A Short Course in Longitudinal Data  Analysis

set.seed(2346)

x=rep(1:10,50)

logit=0.1*(x-mean(x))

subject=rep(1:50,each=10)

re=2*rnorm(50)

re=rep(re,each=10)

prob=exp(re+logit)/(1+exp(re+logit))

y=(runif(500)<prob)

fit1=glm(y~x,family=binomial)

summary(fit1)

library(gee)

fit2<-gee(y~x,id=subject,family=binomial)

summary(fit2)

library(glmmML)

fit3<-glmmML(y~x,family=binomial,cluster=subject)

summary(fit3)

Page 94: A Short Course in Longitudinal Data  Analysis

Lecture 4.

• Dropouts

– classification of missing value mechanisms

– modelling the missing value process

– what are we estimating?

• Joint modelling

– what is it?

– why do it?

– random effects models

– transformation models

Page 95: A Short Course in Longitudinal Data  Analysis

0 2 4 6 8 10

4045

5055

60

time

outc

ome

F>7

5<F<6

F=6.4

Page 96: A Short Course in Longitudinal Data  Analysis

Missing values and dropouts

Issues concerning missing values in longitudinal data can beaddressed at two different levels:

• technical: can the statistical method I am using cope withmissing values?

• conceptual: why are the data missing? Does the factthat an observation is missing convey partial informationabout the value that would have been observed?

These same questions also arise with cross-sectional data, butthe correlation inherent to longitudinal data can sometimes beexploited to good effect.

Page 97: A Short Course in Longitudinal Data  Analysis

Rubin’s classification

• MCAR (completely at random): P(missing) dependsneither on observed nor unobserved measurements

• MAR (at random): P(missing) depends on observedmeasurements, but not on unobserved measurementsconditional on observed measurements

• MNAR (not at random): conditional on observedmeasurements, P(missing) depends on unobservedmeasurements.

Page 98: A Short Course in Longitudinal Data  Analysis

Example : Longitudinal clinical trial

• completely at random: patient leaves the the studybecause they move house

• at random : patient leaves the study on their doctor’sadvice, based on observed measurement history

• not at random : patient misses their appointmentbecause they are feeling unwell.

Page 99: A Short Course in Longitudinal Data  Analysis

Intermittent missing values and dropouts

• dropouts: subjects leave study prematurely, and nevercome back

• intermittent missing values: everything else

Sometimes reasonable to assume intermittent missing valuesare also missing completely at random

Not so for dropouts

It is always helpful to know why subjects drop out

Page 100: A Short Course in Longitudinal Data  Analysis

Modelling the missing value process

• Y = (Y1, ..., Yn), intended measurements on a singlesubject

• t = (t1, ..., tn), intended measurement times

• M = (M1, ...,Mn), missingness indicators

• for dropout, M reduces to a single dropout time D,in which case:

– (Y1, ..., YD−1) observed

– (YD, ..., Yn) missing

A model for data subject to missingness is just a specificationof the joint distribution

[Y,M ]

Page 101: A Short Course in Longitudinal Data  Analysis

Modelling the missing value process:three approaches

• Selection factorisation

[Y,M ] = [Y ][M |Y ]

• Pattern mixture factorisation

[Y,M ] = [M ][Y |M ]

• Random effects

[Y,M ] =

[Y |U ][M |U ][U ]dU

Page 102: A Short Course in Longitudinal Data  Analysis

Comparing the three approaches

• Pattern mixture factorisation has a natural data-analyticinterpretation(sub-divide data into different dropout-cohorts)

• Selection factorisation may have a more naturalmechanistic interpretation in the dropout setting(avoids conditioning on the future)

• Random effects conceptually appealing, especially for noisymeasurements, but make stronger assumptions andusually need computationally intensive methodsfor likelihood inference

Page 103: A Short Course in Longitudinal Data  Analysis

Fitting a model to data with dropouts

• MCAR

1. almost any method will give sensible point estimatesof mean response profiles

2. almost any method which takes account ofcorrelation amongst repeated measurements willgive sensible point estimates and standard errors

Page 104: A Short Course in Longitudinal Data  Analysis

• MAR

1. likelihood-based inference implicitly assumes MAR

2. for inferences about a hypothetical dropout-freepopulation, there is no need to model the dropoutprocess explicitly

3. but be sure that a hypothetical dropout-freepopulation is the required target for inference

Page 105: A Short Course in Longitudinal Data  Analysis

• MNAR

1. joint modelling of repeated measurements and dropouttimes is (more or less) essential

2. but inferences are likely to be sensitive tomodelling assumptiuons that are difficult(or impossible) to verify empirically

Page 106: A Short Course in Longitudinal Data  Analysis

Longitudinal data with dropouts: the gorydetails

New notation for measurements on a single subject:

• Y ∗ = (Y ∗1 , . . . , Y ∗

n ) : complete intended sequence

• t = (t1, . . . , tn) : times of intended measurements

• Y = (Y1, . . . , Yn) : incomplete observed sequence

• Hk = {Y1, . . . , Yk−1} : observed history up to time tk−1

Core assumption:

Yk =

{

Y ∗k : k = 1, 2, . . . , D − 1

0 : k ≥ D

No a priori separation into sub-populations of potential dropoutsand non-dropouts

Page 107: A Short Course in Longitudinal Data  Analysis

The likelihood function

Two basic ingredients of any model:

1. y∗ ∼ f∗(y;β,α),

2. P (D = d|history) = pd(Hd, y∗d;φ).

• β parameterises mean response profile for y∗

• α parameterises covariance structure of y∗

• φ parameterises dropout process.

For inference, need the likelihood for the observed data, y,rather than for the intended data y∗

Page 108: A Short Course in Longitudinal Data  Analysis

Let f∗k(y|Hk;β,α) denote conditional pdf of Y ∗

k given Hk

Model specifies f∗k(·), we need fk(·).

1. P (Yk = 0|Hk, Yk−1 = 0) = 1

because dropouts never re-enter the study.

2.

P(Yk = 0|Hk−1, Yk−1 6= 0) =

pk(Hk, y;φ)f∗k(y|Hk;β,α)dy

3. For Yk 6= 0,

fk(y|Hk;β,α, φ) = {1 − pk(Hk, y;φ)}f∗k (y|Hk;β, α).

Page 109: A Short Course in Longitudinal Data  Analysis

Multiply sequence of conditional distributions for Yk given Hk

to define joint distribution of Y , and hence likelihood function

1. for a complete sequence Y = (Y1, . . . , Yn):

f(y) = f∗(y)n∏

k=2

{1 − pk(Hk, yk)}

2. for an incomplete sequence Y = (Y1, . . . , Yd−1, 0, . . . , 0):

f(y) = f∗d−1(y)

d−1∏

k=2

{1−pk(Hk, yk)}P(Yd = 0|Hd, Yd−1 6= 0)

where f∗d−1(y) denotes joint pdf of (Y ∗

1 , ..., Y ∗d−1).

Page 110: A Short Course in Longitudinal Data  Analysis

Now consider a set of data with m subjects.

• β and α parameterise measurement process y∗

• φ parameterises dropout process

Hence, log-likelihood can be partitioned into three components:

L(β, α, φ) = L1(β, α) + L2(φ) + L3(β,α, φ)

L1(β,α) =m∑

i=1

log{f∗di−1(yi)} L2(φ) =

m∑

i=1

di−1∑

k=1

log{1−pk(Hik, yik)}

L3(β, α, φ) =∑

i:di≤n

log{P(Yidi= 0|Hidi

Yidi−16= 0)}.

Page 111: A Short Course in Longitudinal Data  Analysis

When is likelihood inference straightforward?

L3(β,α, φ) =∑

i:di≤n

log{P(Yidi= 0|Hidi

Yidi−16= 0).

If L3(·) only depends on φ, inference is straightforward,because we can then:

• absorb L3(·) into L2(·)

• maximise L1(β, α) and L2(φ) separately

Page 112: A Short Course in Longitudinal Data  Analysis

L3(β,α, φ) =∑

i:di≤n

log{P(Yidi= 0|Hidi

Yidi−16= 0).

• P(Yk = 0|Hk−1, Yk−1 6= 0) =∫

pk(Hk, y;φ)f∗k (y|Hk;β, α)dy

• MAR implies pk(Hk, y;φ) = pk(Hk;φ) does not dependon y

• It follows that

P(Yk = 0|Hk−1, Yk−1 6= 0) = pk(Hk;φ)

f∗k (y|Hk;β, α)dy

= pk(Hk;φ),

since conditional pdf must integrate to one.

Page 113: A Short Course in Longitudinal Data  Analysis

Key result:

• If dropouts are MAR, then L3(β,α, φ) = L3(φ) andparameter estimates for the model can be obtained byseparate maximisation of:

– L1(β,α)

– L∗2(φ) ≡ L2(φ) + L3(φ)

Page 114: A Short Course in Longitudinal Data  Analysis

Is MAR ignorable?

Conventional wisdom: if dropout is MAR and we only wantestimates of β and α we can ignore the dropout process

Two caveats:

• If MAR holds, but measurement and dropout modelshave parameters in common, ignoring dropouts ispotentially inefficient

• More importantly, parameters of the measurement modelmay not be the most appropriate target for inference

Page 115: A Short Course in Longitudinal Data  Analysis

Example: simulated MAR data

• Y∗-process: mean response µ(t) = 1, constant correlationρ between any two measurements on same subject.

• dropout sub-model: logit(pij) = α + βyij−1

• simulated realisation for ρ = 0.9, α = −1 and β = −2

2 4 6 8 10

−1

01

23

45

time

y

Page 116: A Short Course in Longitudinal Data  Analysis

In the simulation:

• empirical means show a steadily rising trend

• likelihood analysis ignoring dropout concludes that meanresponse is constant over time.

Explanation:

• empirical means are estimating conditional expectation,

E(Y ∗(t)|dropout time > t)

• likelihood analysis is estimating unconditionalexpectation

E[Y ∗(t)]

Which, if either, of these do you want to estimate?

Page 117: A Short Course in Longitudinal Data  Analysis

Under random dropout, conditional and unconditional meansare different because the data are correlated.

Diagram below shows simulation with ρ = 0, i.e. no correlation,but α = −1 and β = −2 as before.

2 4 6 8 10

−2

−1

01

23

time

y

Empirical means now tell same story as likelihood analysis,namely that mean response is constant over time.

Page 118: A Short Course in Longitudinal Data  Analysis

2 4 6 8 10

−1

01

23

45

time

y

2 4 6 8 10

−2

−1

01

23

time

y

Page 119: A Short Course in Longitudinal Data  Analysis

PJD’s take on ignorability

For correlated data, dropout mechanism can be ignored only ifdropouts are completely random

In all other cases, need to:

• think carefully what are the relevant practical questions,

• fit an appropriate model for both measurement processand dropout process

• use the model to answer the relevant questions.

Page 120: A Short Course in Longitudinal Data  Analysis

Joint modelling: what is it?

• Subjects i = 1, ...,m.

• Longitudinal measurements Yij at times tij, j = 1, ..., ni.

• Times-to-event Fi (possibly censored).

• Baseline covariates xi.

• Parameters θ.

[Y, F |x, θ]

Page 121: A Short Course in Longitudinal Data  Analysis

Prothrombin index data

• Placebo-controlled RCT of prednisone for liver cirrhosispatients.

Total m = 488 subjects.

• F = time of death

Y = time-sequence of prothrombin index measurements(months ≈ 0, 3, 6, 12, 24, 36,...,96)

• ≈ 30% survival to 96 months

Andersen, Borgan, Gill and Keiding, 1993

Page 122: A Short Course in Longitudinal Data  Analysis

0 2 4 6 8

7080

9010

0

Time (years)

Pro

thro

mbi

n in

dex

Smoothed mean prothrombin

Treatment (n=251)Control (n=237)

Page 123: A Short Course in Longitudinal Data  Analysis

0 2 4 6 8

0.0

0.2

0.4

0.6

0.8

1.0

Sur

viva

l

Kaplan−Meier

Treatment (n=251)Control (n=237)

Page 124: A Short Course in Longitudinal Data  Analysis

Schizophrenia trial data

• Data from placebo-controlled RCT of drug treatmentsfor schizophrenia:

– Placebo; Haloperidol (standard); Risperidone (novel)

• Y = sequence of weekly PANSS measurements

• F = dropout time

• Total m = 516 subjects, but high dropout rates:

week −1 0 1 2 4 6 8missing 0 3 9 70 122 205 251

proportion 0.00 0.01 0.02 0.14 0.24 0.40 0.49

• Dropout rate also treatment-dependent (P > H > R)

Page 125: A Short Course in Longitudinal Data  Analysis

Schizophrenia dataPANSS responses from haloperidol arm

0 2 4 6 8

4060

8010

012

014

016

0

time (weeks)

PA

NS

S

Page 126: A Short Course in Longitudinal Data  Analysis

Heart surgery data

• Data from RCT to compare efficacy of two types ofartificial heart-valves

– homograft; stentless

• m = 289 subjects

• Y = time-sequence of left-ventricular-mass-index (LVMI)

• F = time of death

• two other repeated measures of heart-function alsoavailable (ejection fraction, gradient)

Lim et al, 2008

Page 127: A Short Course in Longitudinal Data  Analysis

Heart surgery dataMean log-LVMI response profiles

0 2 4 6 8 10

4.8

4.9

5.0

5.1

5.2

Time (years)

log.

lvm

i

stentlesshomograft

Page 128: A Short Course in Longitudinal Data  Analysis

Heart surgery dataSurvival curves adjusted for baseline

covariates

0 2 4 6 8 10

0.0

0.2

0.4

0.6

0.8

1.0

Cox Proportional model, for baseline covariates

Time

Sur

viva

l

observedfitted

Page 129: A Short Course in Longitudinal Data  Analysis

Joint modelling: why do it?

To analyse failure time F , whilst exploiting correlation withan imperfectly measured, time-varying risk-factor Y

Example: prothrombin index data

• interest is in time to progression/death

• but slow progression of disease implies heavy censoring

• hence, joint modelling improves inferences about marginaldistribution [F ]

Page 130: A Short Course in Longitudinal Data  Analysis

Joint modelling: why do it?

To analyse a longitudinal outcome measure Y withpotentially informative dropout at time F

Example: Schizophrenia data

• interest is reducing mean PANSS score

• but informative dropout process would imply that mod-elling only [Y ] may be misleading

Page 131: A Short Course in Longitudinal Data  Analysis

Joint modelling: why do it?

Because relationship between Y and F is of direct interest

Example: heart surgery data

• long-term build-up of left-ventricular muscle mass mayincrease hazard for fatal heart-attack

• hence, interested in modelling relationship between sur-vival and subject-level LVMI

• also interested in inter-relationships amongst LVMI, ejec-tion fraction, gradient and survival time

Page 132: A Short Course in Longitudinal Data  Analysis

Random effects models

• linear Gaussian sub-model for repeated measurements

• proportional hazards sub-model with time-dependentfraility for time-to-event

• sub-models linked through shared random effects

θ

α

βY

FR1

R2

Page 133: A Short Course in Longitudinal Data  Analysis

Example: Henderson, Diggle and Dobson, 2000

Ingredients of model are:

• a latent stochastic process; a measurement sub-model; ahazard sub-model

Latent stochastic process

Bivariate Gaussian process R(t) = {R1(t), R2(t)}

• Rk(t) = Dk(t)Uk + Wk(t)

• {W1(t),W2(t)}: bivariate stationary Gaussian process

• (U1, U2): multivariate Gaussian random effects

Bivariate process R(t) realised independently between subjects

Page 134: A Short Course in Longitudinal Data  Analysis

Measurement sub-model

Yij = µi(tij) + R1i(tij) + Zij

• Zij ∼ N(0, τ 2)

• µi(tij) = X1i(tij)β1

Hazard sub-model

hi(t) = h0(t) exp{X2(t)β2 + R2i(t)}

• h0(t) = non-parametric baseline hazard

• η2(t) = X2i(t) + R2i(t) = linear predictor for hazard

Page 135: A Short Course in Longitudinal Data  Analysis

Schizophrenia trial dataMean response by dropout cohort

Time (weeks)

mea

n re

spon

se

0 2 4 6 8

7080

9010

0

mean response by dropout cohort

Page 136: A Short Course in Longitudinal Data  Analysis

Model formulation

Measurement sub-model

For subject in treatment group k,

µi(t) = β0k + β1kt + β2kt2

Yij = µi(tij) + R1i(tij) + Zij

Hazard sub-model

For subject in treatment group k,

hi(t) = h0(t) exp{αk + R2i(t)}

Page 137: A Short Course in Longitudinal Data  Analysis

Latent process

Illustrative choices for measurement process component:

R1(t) = U1 + W1(t)

R1(t) = U1 + U2t

And for hazard process component:

R2(t) = γ1R1(t)

R2(t) = γ1(U1 + U2t) + γ2U2

= γ1R1(t) + γ2U2

Page 138: A Short Course in Longitudinal Data  Analysis

Schizophrenia trial dataMean response (random effects model)

0 2 4 6 8

7075

8085

9095

Time (weeks)

Mea

n re

spon

se

placebohalperidolrisperidone

Page 139: A Short Course in Longitudinal Data  Analysis

Lag

Var

iogr

am

0 2 4 6 8

010

020

030

040

0

Schizophrenia trial dataEmpirical and fitted variograms

Page 140: A Short Course in Longitudinal Data  Analysis

A simple transformation model

(Y, log F ) ∼ MVN(µ,Σ)

• write S = logF

• µ = (µY , µS)

• Σ =

[

V (θ) g′(φ)g(φ) ν2

]

• subjects provide independent replicates of (Y, S)

Cox, 1999

Page 141: A Short Course in Longitudinal Data  Analysis

Comparing approaches

Random effects models

• intuitively appealing

• flexible

• more-or-less essential for subject-level prediction

But

• likelihood-based inference computationally intensive

• robustness to non-Normality suspect

Page 142: A Short Course in Longitudinal Data  Analysis

Transformation model

• very simple to use

• transparent diagnostic checks

But

• purely empirical

• requires more-or-less balanced data

Page 143: A Short Course in Longitudinal Data  Analysis

More on the transformation model

• the likelihood function

• missing values and censoring

• modelling the covariance structure

• diagnostics

Page 144: A Short Course in Longitudinal Data  Analysis

The likelihood function

• Write S = log F , hence [Y, S] = MVN(µ,Σ)

• Use factorisation [Y, S] = [Y ][S|Y ]

• µ = (µY , µS)

• Standard result for [S|Y ]

– S|Y ∼ N(µS|Y , σ2S|Y )

– µS|Y = µS + g′(φ)V (θ)−1(Y − µY )

– σ2S|Y = ν2 − g′(φ)V (θ)−1g(φ)

Page 145: A Short Course in Longitudinal Data  Analysis

Missing values and censoring

• uncensored Si:

[Yi] × [Si|Yi]

• right-censored Si > tij

[Yi] × [1 − Φ{(tij − µS|Yi)/σS|Y }]

• interval-censored tij < Si < ti,j+1

[Yi] × [Φ{(ti,j+1 − µS|Yi)/σS|Y } − Φ{(tij − µS|Yi

)/σS|Y }]

• missing Yij

– reduce dimensionality of Yi accordingly

– OK for Yij intermittently missing and/or Yij missingbecause Si < log tij

Page 146: A Short Course in Longitudinal Data  Analysis

Modelling the covariance structure

• Notation for covariance structure:

– Var(Y ) = V (θ)

– Var(S) = ν2

– g(φ) = Cov(Y, S)

• Standard choices for V (θ) include:

– Random intercept and slope (Laird and Ware, 1982)

Yij −µij = Ai +Bitij +Zij : j = 1, .., ni; i = 1, ...,m

– Three components of variation (Diggle, 1988)

Yij − µij = Ai + Wi(tij) + Zij

– Compound symmetry

Y ij − µij = Ai + Zij

Page 147: A Short Course in Longitudinal Data  Analysis

• Models for g(φ)?

– uniform correlation

– saturated

– intermediate?

Choice for V (θ) implies constraints on g(φ)

Page 148: A Short Course in Longitudinal Data  Analysis

Diagnostics

Assume balanced data, i.e. tij = tj

• Fit to [Y ]:

– consider all ‘survivors” at each follow-up time tj

– classify according to whether they do or do notsurvive to time tj+1

– check goodness-of-fit to distributions implied bythe model

• Fit to [S|Y ]:

– Gaussian P-P and Q-Q plots with multipleimputation of censored log S

– Check that deviation from linearity is comparablewith simulated N(0, 1) samples.

Page 149: A Short Course in Longitudinal Data  Analysis

Re-analysis of schizophrenia trial dataDropout is not completely at random

60 80 100 120 140 160

0.0

0.2

0.4

0.6

0.8

1.0

t=0

empi

rical

cum

ulat

ive

dist

ribut

ion

40 60 80 100 120 140 160

0.0

0.2

0.4

0.6

0.8

1.0

t=1

empi

rical

cum

ulat

ive

dist

ribut

ion

40 60 80 100 120 140 160

0.0

0.2

0.4

0.6

0.8

1.0

t=2

empi

rical

cum

ulat

ive

dist

ribut

ion

40 60 80 100 120 140

0.0

0.2

0.4

0.6

0.8

1.0

t=4

empi

rical

cum

ulat

ive

dist

ribut

ion

40 60 80 100 120 140

0.0

0.2

0.4

0.6

0.8

1.0

t=6

empi

rical

cum

ulat

ive

dist

ribut

ion

subjects about to drop−outsubjects not about to drop−out

YY

YYY

Page 150: A Short Course in Longitudinal Data  Analysis

Re-analysis of schizophrenia trial dataModel specification

• measurements, Y : random intercept and slope

Yij − µij = Ai + Bitij + Zij : j = 1, .., ni; i = 1, ...,m

• dropout time, F

S = log F ∼ N(µS, ν2)

• cross-covariances

Cov(Yj, S) = φj : j = 1, ..., 6

Page 151: A Short Course in Longitudinal Data  Analysis

Re-analysis of schizophrenia trial dataGoodness-of-fit: mean response profiles

0 2 4 6 8

7080

9010

011

0

haloperidol

Time

Mea

n re

spon

se

observed

E[Y_j | log F > log t_j]

E[Y_j]

0 2 4 6 8

7080

9010

011

0

placebo

Time

Mea

n re

spon

se

observed

E[Y_j | log F > log t_j]

E[Y_j]

0 2 4 6 8

7080

9010

011

0

risperidone

Time

Mea

n re

spon

se

observed

E[Y_j | log F > log t_j]

E[Y_j]

F

Page 152: A Short Course in Longitudinal Data  Analysis

Re-analysis of schizophrenia trial dataFitted mean response profiles

0 2 4 6 8

7080

9010

011

0

haloperidol

Time

Mea

n re

spon

se

observed

E[Y_j] ignoring drop−out

E[Y_j] joint model

0 2 4 6 8

7080

9010

011

0

placebo

Time

Mea

n re

spon

se

observed

E[Y_j] ignoring drop−out

E[Y_j] joint model

0 2 4 6 8

7080

9010

011

0

risperidone

Time

Mea

n re

spon

se

observed

E[Y_j] ignoring drop−out

E[Y_j] joint model

F

Page 153: A Short Course in Longitudinal Data  Analysis

Closing remarks

• the role of modelling

“We buy information with assumptions”

Coombs (1964)

• choice of model/method should relate to scientificpurpose.

“Analyse problems, not data”

PJD

• simple models/methods are useful when exploring a rangeof modelling options, for example to select from manypotential covariates.

Page 154: A Short Course in Longitudinal Data  Analysis

• complex models/methods are useful when seeking tounderstand subject-level stochastic variation.

• likelihood-based inference is usually a good idea

• different models may fit a data-set almost equally well

• joineR library under development

• longitudinal analysis is challenging, but rewarding

“La peinture de l’huile,c’est tres difficileMais c’est beaucoup plus beau,que la peinture de l’eau”

Winston Churchill

Page 155: A Short Course in Longitudinal Data  Analysis

Reading list

Books

The course is based on selected chapters from Diggle, Heagerty, Liang and Zeger (2002). Fitzmau-rice, Laird and Ware (2004) covers similar ground. Fitzmaurice, Davidian, Verbeke and Molen-berghs (2009) is an extensive edited compilation. Verbke and Molenberghs (2000) and Molenberghsand Verbeke (2005) are companion volumes that together cover linear models for continuous dataand a range of models for discrete data. Andersen, Borgan, Gill and Keiding (1993) is a detailedaccount of modern methods of survival analysis and related topics. Daniels and Hogan (2008)covers missing value methods in more detail than do the more general texts.

Andersen, P.K., Borgan, Ø., Gill, R.D. and Keiding, N. (1993). Statistical Models Based on CountingProcesses. New York: Springer-Verlag.Daniels, M.J. and Hogan, J.W. (2008). Missing Data in Longitudinal Studies. Chapman andHall/CRC PressDiggle, P.J., Heagerty, P., Liang, K.Y. and Zeger, S.L. (2002). Analysis of Longitudinal Data(second edition). Oxford: Oxford University Press.Fitzmaurice, G.M., Davidian, M., Verbeke, G. and Molenberghs, G. (2009). Longitudinal DataAnalysis. Boca Raton: Chapman and Hall/CRCFitzmaurice, G.M., Laird, N.M. and Ware, J.H. (2004). Applied Longitudinal Analysis. New Jersey:Wiley.Molenberghs, G. and Verbeke, G. (2005). Models for Discrete Longitudinal Data. New York:SpringerVerbeke, G. and Molenberghs, G. (2000). Linear Mixed Models for Longitudinal Data. New York:Springer.

Page 156: A Short Course in Longitudinal Data  Analysis

Journal articles

Berzuini, C. and Larizza, C. (1996). A unified approach for modelling longitudinal and failuretime data, with application in medical monitoring. IEEE Transactions on Pattern Analysis andMachine Intelligence, 18, 109-123.Brown E.R. and Ibrahim, J.G. (2003). A Bayesian semiparametric joint hierarchical model forlongitudinal and survival data. Biometrics, 59, 221-228.Cox, D.R. (1972). Regression models and life tables (with Discussion). Journal of the RoyalStatistical Society B, 34, 187–220.Cox, D.R. (1999). Some remarks on failure-times, surrogate markers, degradation, wear and thequality of life. Lifetime Data Analysis, 5, 307–14.Diggle, P.J. , Farewell, D. and Henderson, R. (2007). Longitudinal data with dropout: objec-tives, assumptions and a proposal (with Discussion). Applied Statistics, 56, 499–550. Diggle, P.J.,Heagerty, P., Liang, K-Y and Zeger, S.L. (2002). Analysis of Longitudinal Data (second edition).Oxford : Oxford University Press.Dobson, A. and Henderson, R. (2003). Diagnostics for joint longitudinal and dropout time mod-elling. Biometrics, 59, 741-751.Faucett, C.L. and Thomas, D.C. (1996). Simultaneously modelling censored survival data andrepeatedly measured covariates: a Gibbs sampling approach. Statistics in Medicine, 15, 1663–1686.Finkelstein, D.M. and Schoenfeld, D.A. (1999). Combining mortality and longitudinal measuresin clinical trials. Statistics in Medicine, 18, 1341–1354.Henderson, R., Diggle, P. and Dobson, A. (2000). Joint modelling of longitudinal measurementsand event time data. Biostatistics, 1, 465–80.Henderson, R., Diggle, P. and Dobson, A. (2002). Identification and efficacy of longitudinal mark-ers for survival. Biostatistics, 3, 33–50.Hogan, J.W. and Laird, N.M. (1997a). Mixture models for the joint distribution of repeatedmeasures and event times. Statistics in Medicine 16, 239–257.Hogan, J.W. and Laird, N.M. (1997b). Model-based approaches to analysing incomplete longitu-dinal and failure time data. Statistics in Medicine 16, 259–272.

Page 157: A Short Course in Longitudinal Data  Analysis

Lim, E., Ali, A., Theodorou, P., Sousa, I., Ashrafian, H., Chamageorgakis, A.D., Henein, M.,Diggle, P. and Pepper, J, (2008). A longitudinal study of the profile and predictors of left ven-tricular mass regression after stentless aortic valve replacement. Annals of Thoracic Surgery, 85,2026–2029.Little, R.J.A. (1995). Modelling the drop-out mechanism in repeated-measures studies. Journalof the American Statistical Association, 90, 1112–21.Pawitan, Y. and Self, S. (1993). Modelling disease marker processes in AIDS. Journal of theAmerican Statisical Association 88, 719–726.Robins, J.M., Rotnitzky, A. and Zhao, L.P. (1995). Analysis of semiparametric regression modelsfor repeated outcomes in the presence of missing data. J. American Statistical Association, 90,106-121.Rotnitzky, A., Robins, J.M. and Scharfstein, D.O. (1998). Semiparametric regression for repeatedoutcomes with nonignorable nonresponse. J. American Statistical Association, 93, 1321-1339.Scharfstein, D.O., Rotnitzky, A. and Robins, J.M. (1999). Adjusting for nonignorable drop-outusing semiparametric nonresponse models. J. American Statistical Association, 94, 1096-1146.Taylor, J.M.G., Cumberland, W.G. and Sy, J.P. (1994). A stochastic model for analysis of longi-tudinal AIDS data. Journal of the American Statistical Association 89, 727–736Tsiatis, A.A. and Davidian, M.A. (2004). Joint modelling of longitudinal and time-to-event data:an overview. Statistica Sinica, 14, 809-834.Tsiatis, A.A., Degruttola, V. and Wulsohn, M.S. (1995). Modelling the relationship of survival tolongitudinal data measured with error. Applications to survival and CD4 counts in patients withAIDS. Journal of the American Statistical Association 90, 27–37.Wulfsohn, M.S. and Tsiatis, A.A. (1997). A joint model for survival and longitudinal data mea-sured with error. Biometrics, 53, 330–9.Xu, J. and Zeger, S.L. (2001). Joint analysis of longitudinal data comprising repeated measuresand times to events. Applied Statistics, 50, 375–387.