lecture 15: time varying covariates time-varying covariates

Post on 04-Jan-2016

233 Views

Category:

Documents

4 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Lecture 15: Time Varying Covariates

Time-varying covariates

Time-Dependent Covariates

• Thus far we’ve only considered “fixed” time covariates

• Examples of time varying covariates– Cumulative exposure– Smoking status– Blood pressure

• Now, data structure is– [T, d, Z(t); 0 < t < T]

CPHM with Time Varying Covariates

• The model looks like what we’ve been working with.

• Now however, Z is a function of t:

0 1exp

p

k kkh t t h t Z t

Z

Likelihood Time Varying Covariates

• Again, we can use the partial likelihood estimation approach for estimating b

• But Z is now a function of t (as in the model statement):

• Otherwise, testing and estimation are the same as for fixed covariates

1

1

1

exp

expi

p

k ii kD k

pi

k ii kj R t k

Z tL

Z t

Example: Bone Marrow Transplant

• Main covariate of interest is disease type: – ALL– low risk AML– high risk AML

• Interest is in determining factors associated with disease-free survival (death or relapse)

BMT Fixed Time Covariates

• There are several fixed time covariates we’ve found to be important– Patient Age– Donor Age– FAB identification– Disease type– Hospital

BMT Time Varying Covariates

• There are also several time varying covariates– Acute graft vs. host disease (AGvHD)– Chronic graft vs. host disease (CGvHD)– Platelet recovery (PR)

• These all occur after BMT or not at all• They can also vary over the course of the

study

R: Time-Varying Covariates

• Expand data to describe all scenarios• Need to consider the possible combinations of events• Example: AGVHD and DFS– Possible scenarios at any point in time during the study for

subject 1• No AGVDH: DFS?• AGVHD: DFS?

– For all patients with TTAGVHD < DFS, need two rows in dataset to describe variation

– For all patients with TTAGVHD > DFS, need only one row in the dataset

Timeline Examples: Observed Event

t0 to ta: no AGVHD until ta, no event

ta to te: AGVHD, event

t0 to te: no AGVHD, event

t0

t0

ta

te

te

Timeline Examples: Censored Event

t0 to ta: no AGVHD, no event

ta to tc: AGVHD, no event (censored)

t0 to tc: no AGVHD, no event

t0

t0

ta

te

tetc

tc

Time-Varying Covariates

• First, look at each time varying covariate• Which (if any) are associated with DFS,

adjusting for diagnosis• Estimation and inference are the same as with

fixed time covariates• Difference– Data structure

Data Set-up>data[1:15,c(1,25,4:8)]ID Disease DFS Death Relapse Either TAGvH AGvH1 1 2081 0 0 0 67 12 1 1602 0 0 0 1602 03 1 1496 0 0 0 1496 04 1 1462 0 0 0 70 15 1 1433 0 0 0 1433 06 1 1377 0 0 0 1377 07 1 1330 0 0 0 1330 08 1 996 0 0 0 72 19 1 226 0 0 0 226 010 1 1199 0 0 0 1199 011 1 1111 0 0 0 1111 012 1 530 0 0 0 38 113 1 1182 0 0 0 1182 014 1 1167 0 0 0 39 115 1 418 1 0 1 418 0

Expansion• Consider row 1– Now, two rows– Row 1: start time = 0, stop time = 67, agvhd = 0, …– Row 2: start time = 67, stop time = 2081, agvhd =

1, …• Consider row 2– Still 1 row– Row 1: start time = 0, stop time = 1602, agvhd = 0,

What About Dependence?

• You might be asking whether we need to worry about correlated data?

• In this case we do not need to worry about it.

• There two exceptions:– When subjects have multiple events – When a subject appears in overlapping intervals

• The 2nd case is almost always a data error

• A subject can be at risk in multiple strata at the same time– Corresponds to being simultaneously at risk for two distinct

outcomes.

R Expansionn<-nrow(bmt)adata<-bmt[, c(1:2,14:23)] #fixed time columnsfor (i in 1:n){ times1<-c(bmt$TAGvH[i], bmt$TCGvH[i], bmt$TRP[i], bmt$DFS[i]) events<-c(bmt$AGvH[i], bmt$CGvH[i], bmt$RP[i], bmt$Either[i]) times2<-times1[which(times1<=times1[4])] utimes<-sort(unique(times2)) for (j in 1:length(utimes)) { if (length(utimes)==1) {vec<-events} if (length(utimes)>1 & j==1) {vec<-c(0,0,0,0)} if (j>1 & j<length(utimes)){loc<-which(times1==utimes[j-1])

vec<-replace(vec, loc, events[loc]) } if (j>1 & j==length(utimes)) {loc<-which(times1==utimes[j-1])

vec<-replace(vec, c(loc,4), events[c(loc,4)])} if (j==1 & i==1) {bmt.long<-unlist(c(0, utimes[j], adata[i,], vec))} if (j==1 & i>1) {bmt.long<-rbind(bmt.long, c(0, utimes[j], adata[i,],vec))} if (j>1) {bmt.long<-rbind(bmt.long, c(utimes[j-1], utimes[j], adata[i,],vec))} } }bmt.long<-as.data.frame(matrix(as.vector(unlist(bmt.long)), nrow=342, ncol=18, byrow=F))colnames(bmt.long)<-c("Tstart","Tstop",colnames(adata),"AGvH","CGvH","PR","event") sum(bmt.long$event)

Expanded Data> bmt[1:2,] ID Disease TTD TTR Death Relapse Either TAGvH AGvH TCGvH CGvH TRP RP PtAge 1 1 2081 2081 0 0 0 67 1 121 1 13 1 26 2 1 1602 1602 0 0 0 1602 0 139 1 18 1 21….

> bmt.long[1:8,] Tstart Tstop ID Disease PtAge AGvH CGvH PR event 0 13 1 1 26 0 0 0 0 13 67 1 1 26 0 0 1 0 67 121 1 1 26 1 0 1 0 121 2081 1 1 26 1 0 1 0 0 18 2 1 21 0 0 0 0 18 139 2 1 21 0 0 1 0 139 1602 2 1 21 0 1 1 0 0 12 3 1 26 0 0 0 0….

Alternatively Use:expand.breakpoints

• Previous creates dataset per time-dependent covariate

• Above created by John Maindonald• Expands dataset into rows per person using

either observed number of times, or pre-specified number of times

expand.breakpoints Approach

> bps<-sort(unique(c(bmt$DFS, bmt$TAGvH, bmt$TCGvH, bmt$TRP)))> bps [1] 1 2 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 … [215] 1850 1857 1870 2024 2081 2133 2140 2204 2218 2246 2252 2409 2430 2506 2569 2640

> bmt.long2<-expand.breakpoints(bmt, index="id", status="Either", tevent="DFS", breakpoints=bps)> bmt.long2 ID Tstart Tstop Either epoch Disease TTD TTR Death Relapse TAGvH AGvH TCGvH CGvH TRP RP 1 0 1 0 1 1 2081 2081 0 0 67 1 121 1 13 1 1 1 2 0 2 1 2081 2081 0 0 67 1 121 1 13 1 1 2 7 0 3 1 2081 2081 0 0 67 1 121 1 13 1 1 7 8 0 4 1 2081 2081 0 0 67 1 121 1 13 1 … 1 1870 2024 0 218 1 2081 2081 0 0 67 1 121 1 13 1 1 2024 20 81 0 219 1 2081 2081 0 0 67 1 121 1 13 1 2 0 1 0 1 1 1602 1602 0 0 1602 0 139 1 18 1

Still Not Done

• That provides us with separate intervals per patient for all intervals of interest

• BUT, treats AGvHD, CGvHD, and PR as “fixed” time covariates

• We need to create time-dependent versions

R#create time-dependent covariates> bmt.long$AGvHt<-ifelse(bmt.long$TAGvH<=bmt.long$Tstart &

bmt.long$AGvH==1, 1, 0)> bmt.long$CGvHt<-ifelse(bmt.long$TCGvH<=bmt.long$Tstart &

bmt.long$CGvH==1, 1, 0)> bmt.long$PRt<-ifelse(bmt.long$TRP<=bmt.long$Tstart &

bmt.long$PR==1, 1, 0)

#Look again at pts 1 and 2 to see time dependent variables> bmt.long2$AGvH[which(bmt.long2$ID==1)][1] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 … [175] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

> bmt.long$AGvHt[which(bmt.long$id==1)][1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 … [175] 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Syntax in R

• To define time to event variable, there are two options:– Surv(time, y)– Surv(start.time, stop.time, y)

• For time varying covariates (or left-truncated data), usually simpler to use the latter convention

• In most other cases, simpler to use the former

Testing Time-Varying Covariates Controlling for Diagnosis

#Acute graft vs. host disease#Chronic graft vs. host disease#Platelet recovery time

rega<-coxph(Surv(Tstart, Tstop, event)~ AGvHDt+factor(Disease), data=bmt.long2)

regc<-coxph(Surv(Tstart, Tstop, event)~ CGvHDt+factor(Disease), data=bmt.long2)

regp<-coxph(Surv(Tstart, Tstop, event)~ PRt+factor(Disease), data=bmt.long2)

AGvHD> regaCall:coxph(formula = Surv(Tstart, Tstop, Either) ~ AGvHt + factor(Disease), data = bmt.long2)

coef exp(coef) se(coef) z pAGvH 0.323 1.381 0.285 1.31 0.264factor(Disease)2 -0.551 0.576 0.288 -1.91 0.055factor(Disease)3 0.435 1.546 0.272 1.60 0.110

Likelihood ratio test=14.7 on 3 df, p=0.00214 n= 19070, number of events= 83

CGvHD

> regcCall:coxph(formula = Surv(Tstart, Tstop, Either) ~ CGvHt + factor(Disease), data = bmt.long2)

coef exp(coef) se(coef) z pCGvHt -0.186 0.830 0.288 -0.646 0.520factor(Disease)2 -0.620 0.538 0.296 -2.094 0.036factor(Disease)3 0.367 1.444 0.268 1.368 0.170

Likelihood ratio test=13.9 on 3 df, p=0.00309 n= 19070, number of events= 83

Platelet Recovery

> regpCall:coxph(formula = Surv(Tstart, Tstop, Either) ~ PRt + factor(Disease), data = bmt.long2)

coef exp(coef) se(coef) z pPRt -1.120 0.326 0.329 -3.40 0.00067factor(Disease)2 -0.497 0.608 0.289 -1.72 0.08600factor(Disease)3 0.382 1.465 0.268 1.43 0.15000

Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 19070, number of events= 83

Interpretation?

• Patients with low risk AML have less risk of an event compare to ALL patients

• Patients with high risk AML have greater risk of an event relative to patients with ALL

• Patients who experience platelet recovery at a given time have less risk of an event relative to those who have not experienced platelet recovery

Back to Our Original Models

• Only platelet recovery is significantly associated with disease free survival

• Now investigate model that adjusts for previously mentioned fixed time covariates– Disease type– FAB– Donor/patient age and interaction– hospital

Models with and without PRt#Model w/ donor/patient age, intx, FAB, dx, hosp, & PR> st<-Surv(bmt.long2$Tstart, bmt.long2$Tstop, bmt.long2$Either)

> reg.fixed<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge, data=bmt.long2)

> reg.tv<-coxph(st~factor(Disease)+PRt, data=bmt.long2)

> reg.all<-coxph(st~factor(Disease)+FAB+PtAge+DonAge+ PtAge*DonAge+PRt, data=bmt.long2)

> LRT<-2*(reg.all$loglik[2]-reg.tv$loglik[2])> pchisq(LRT, 4, lower.tail=F)[1] 0.001878685

Recall Fixed Time Covariate Model> reg.fixedCall:coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 -1.09065 0.336 0.354279 -3.08 0.00210factor(Disease)3 -0.40391 0.668 0.362777 -1.11 0.27000FAB 0.83742 2.310 0.278464 3.01 0.00260PtAge -0.08164 0.922 0.036107 -2.26 0.02400DonAge -0.08459 0.919 0.030097 -2.81 0.00490PtAge:DonAge 0.00316 1.003 0.000951 3.32 0.00089

Likelihood ratio test=32.8 on 6 df, p=1.14e-05 n= 342, number of events= 83

Time Covariate + Disease Type> reg.tvCall:coxph(formula = st ~ factor(Disease) + PR, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 -0.497 0.608 0.289 -1.72 0.08600factor(Disease)3 0.382 1.465 0.268 1.43 0.15000PR -1.120 0.326 0.329 -3.40 0.00067

Likelihood ratio test=22.9 on 3 df, p=4.32e-05 n= 342, number of events= 83

Full Model> reg.allCall:coxph(formula = st ~ factor(Disease) + FAB + PtAge + DonAge + PtAge * DonAge + PR, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 -1.03245 0.356 0.353200 -2.92 0.0035factor(Disease)3 -0.41398 0.661 0.365222 -1.13 0.2600FAB 0.81180 2.252 0.283236 2.87 0.0042PtAge -0.07102 0.931 0.035449 -2.00 0.0450DonAge -0.07607 0.927 0.030007 -2.54 0.0110PR -0.98307 0.374 0.338109 -2.91 0.0036PtAge:DonAge 0.00287 1.003 0.000935 3.07 0.0021

Likelihood ratio test=39.9 on 7 df, p=1.3e-06 n= 342, number of events= 83

Interactions Coding by Hand#Interaction coding#Diagnosis 2 (low risk AML)*PRT#Diagnosis 3 (hi risk AML)*PRT#FAB*PRT#PRT*donor age, PRT*patient age, PRT*Donor age*Patient agebmt.long2$ageint<-(bmt.long2$PtAge-28)*

(bmt.long2$DonAge-28)bmt.long2$dx2.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==2, 1, 0)bmt.long2$dx3.pr<-ifelse(bmt.long2$PRt==1 & bmt.long2$Disease==3, 1, 0)bmt.long2$fab.pr<-bmt.long2$PRt*bmt.long2$FABbmt.long2$dnr.pr<-bmt.long2$PRt*(bmt.long2$DonAge-28)bmt.long2$pt.pr<-bmt.long2$PRt*(bmt.long2$PtAge-28)bmt.long2$pt.pr.dnr<-bmt.long2$PRt*(bmt.long2$ageint)

Interactions1. Diag 2 x PRT2. Diag 3 x PRT3. PRT x donor age4. PRT x patient age5. PRT x donor age x patient age (confusing)

1. “additional hazard of failure after platelet recovery in those with diagnosis of low risk AML vs. those with ALL”2. “additional hazard of failure after platelet recovery in those with diagnosis of high risk AML vs. those with ALL”3. “additional hazard of failure after platelet recovery with an increase in donor age”4. “additional hazard of failure after platelet recovery with an increase in patient age”5. “additional hazard of failure after platelet recovery with an increase in the interaction between the patient and donor age”

Series of Modelsreg1<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt, data=bmt.long2)

reg2<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+dx3.pr, data=bmt.long2)

reg3<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+fab.pr, data=bmt.long2)

reg4<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dnr.pr+pt.pr+ pt.pr.dnr, data=bmt.long2)

reg5<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr, data=bmt.long2)

reg6<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)

reg7<-coxph(st~factor(Disease)+FAB+DonAge+PtAge+ageint+PRt+dx2.pr+ dx3.pr+fab.pr+dnr.pr+pt.pr+pt.pr.dnr, data=bmt.long2)

Full Model with Interactions> reg7 coef exp(coef) se(coef) z pfactor(Disease)2 1.325 3.765 0.819 1.618 0.1100factor(Disease)3 1.134 3.108 1.225 0.926 0.3500FAB -1.250 0.286 1.112 -1.124 0.2600DonAge 0.116 1.123 0.043 2.679 0.0074PtAge -0.154 0.857 0.054 -2.820 0.0048ageint 0.0026 1.003 0.001 1.337 0.1800PRt -0.286 0.751 0.695 -0.412 0.6800dx2.pr -3.057 0.047 0.926 -3.299 0.0010dx3.pr -1.894 0.150 1.291 -1.467 0.1400fab.pr 2.471 11.831 1.159 2.131 0.0330dnr.pr -0.147 0.863 0.048 -3.054 0.0023pt.pr 0.193 1.213 0.058 3.289 0.0010pt.pr.dnr 0.000 1.000 0.002 0.060 0.9500

Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 19070, number of events= 83

Fitting Interactions Directly> reg7bCall:coxph(formula = st ~ factor(Disease) + FAB + DonAge + PtAge + DonAge * PtAge + PR + PR * factor(Disease) + PR * FAB + PR* DonAge + PR * PtAge + DonAge * PtAge * PR, data = bmt.long)

coef exp(coef) se(coef) z pfactor(Disease)2 1.3257 3.765 0.81952 1.618 0.11000factor(Disease)3 1.1341 3.108 1.22487 0.926 0.35000FAB -1.2503 0.286 1.11245 -1.124 0.26000DonAge 0.0436 1.045 0.05866 0.744 0.46000PtAge -0.2264 0.797 0.09118 -2.484 0.01300PR -1.4817 0.227 2.11360 -0.701 0.48000DonAge:PtAge 0.0026 1.003 0.00194 1.337 0.18000factor(Disease)2:PR -3.0568 0.047 0.92646 -3.299 0.00097factor(Disease)3:PR -1.8941 0.150 1.29132 -1.467 0.14000FAB:PR 2.4707 11.831 1.15926 2.131 0.03300DonAge:PR -0.1506 0.860 0.06967 -2.162 0.03100PtAge:PR 0.1894 1.209 0.10127 1.871 0.06100DonAge:PtAge:PR 0.000138 1.000 0.00230 0.060 0.95000

Likelihood ratio test=63.6 on 13 df, p=1.19e-08 n= 342, number of events= 83

Low Risk AML vs. ALL

• Interaction between diagnosis and platelet recovery

• Low-risk AML vs. ALL, prior to platelet recovery– b = 1.326– HR (95% CI): 3.76 (0.76, 18.76)

• Low-risk AML vs. ALL, after platelet recovery– b = 1.326 + (-3.06) = -1.73– HR (95% CI): 0.18 (0.08, 0.41)

R Code for the HR and 95% CI> betahr<-reg7$coef[1]+reg7$coef[8]> betahrfactor(Disease)2 -1.731125

> seintx<-sqrt(reg7$var[1,1]+reg7$var[8,8]+2*reg7$var[1,8])> seintx[1] 0.4263292

> exp(betahr - qnorm(0.975)*seintx)factor(Disease)2 0.07678741

> exp(betahr + qnorm(0.975)*seintx)factor(Disease)2 0.408389

Other Interactions?

• High risk AML vs. ALL? High risk AML vs. Low Risk AML?

• Age?• …

What About Continuous Covariates

• Continuous variables can change over time as well

• Given the times measurements are taken, we can expand the data in the same way.

• We are assuming the value is unchanging during the interval between which it was measured– A little unrealistic BUT…– This is no different from treating a single measure

(e.g. blood pressure) as a fixed time covariate

Next Time

• Regression Diagnostics… checking the proportional hazards assumption.

top related