1 using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf borgan...

1

Using martingale residuals to assess goodness of fit for sampled

risk set data

Ørnulf BorganDepartment of Mathematics

University of Oslo

Based on joint work with Bryan Langholz

2

Outline:

• Example: Uranium miners cohort

• Cohort model, data and martingale residuals

• Risk set sampling

• Martingale residuals and goodness-of-fit tests for sampled risk set data

• Concluding remarks

3

Uranium miners cohort:

• 3347 uranium miners from Colorado Plateau included in study cohort 1950-60

• Followed-up until end of 1982

• 258 lung cancer deaths

• Interested in effect of radon and smoking exposure on the risk of lung cancer death

• Have exposure information for the full cohort. Will sample from the risk sets for illustration

(e.g. Langholz & Goldstein, 1996)

4

Relative risk regression models

Hazard rate for individual i

0 )( () ii t t

Relative risk for individual i depends on covariates xi1 , xi2 , … , xip (possibly time-dependent)

relative risk baseline hazard

1 1exp ... i i p ipx xCox:

1 1(1 ) ... (1 ) i i p ipx x

Excess relative risk:

5

Cohort data:

Study time

individuals at risk

(arrows are censored observations)

6

t1 < t2 < t3 < …. times of failures

ij individual failing at tj ("case")

Counting process for individual i :

( ) ,j

i j jt t

N t I t t i i

( ) ( ) 1 | "past"i it dt dN tP

Intensity process i(t) is given by

7

Cumulative intensity processes:

Martingales:

Martingale residual processes:

at risk indicator hazard rate

( ) ( ) ( )i i it Y t t

0( ) ( )

t

i it u du

( ) ( ) ( )i i iM t N t t

ˆ ˆ( ) ( ) ( )i i iM t N t t

0( ) ( )i iY t t

00( ) ( )

t

i iY u dA u

8

Martingal residual processes may be used to assess goodness of fit:

• Plot individual martingale residuals

• Plot grouped martingale residual processes

versus time

(Aalen,1993; Grønnesby & Borgan,1996)

versus covariates(Therneau, Grambsch & Flemming,1990)

ˆ ˆ ( )i iM M

The latter may be extended to sampled risk set data

* ˆ( ) ( )g ii gM t M t

9

Risk set sampling

• Cohort studies need information on covariates for all individuals at risk

• Expensive to collect and check (!) this information for all individuals in large cohorts

• For risk set sampling designs one only needs to collect covariate information for the cases and a few controls sampled at the times of the failure

10

Select m –1 controls among the n(t) – 1 non-failures at risk if a case occurs at time t, i.e. match on study time

Illustration for m = 2

case

control

11

A sampling design for the controls is described by its sampling distribution

The classical nested case-control design:If individual i fails at time t the probability of selecting the set r as the sampled risk set is

A sampled risk set consists of the case ij and its controls

(we assume that r is a subset of the risk set, that r is of size m and that i is in r)

jR

1( ) 1

( | )1t

n tr i

m

A number of sampling designs are available

12

Inference on the regression coefficients can be based on the partial likelihood

The partial likelihood enjoys usual likelihood properties (Borgan, Goldstein & Langholz 1995)

For the classical nested case-control design, the partial likelihood simplifies

( | )

( | )j j

j j

j

i t j j

t l t jl R

R iL

R l

13

Martingale residuals and goodness-of-fit tests for sampled risk set data

Introduce the counting processes

Intensity processes take the form:

( , )( ) ( ) ( | )i r i tt t r i

( , )( ) , ( , ) ( , )j

i r j j jt t

N t I t t i R i r

0( ) ( ) ( | )i i tY t t r i

14

Martingale residual processes:

Corresponding martingales:

The are of little practical use on their own, but they may be aggregated over groups of individuals to produce useful plots

( , ) ( , ) ( , )( ) ( ) ( )i r i r i rM t N t udu

( , ) ( , ) ( , )ˆ ˆ( ) ( ) ( )i r i r i rM t N t udu

( , )ˆ ( )i rM t

15

For group g

May be interpreted as "observed _ expected" number of failures in group g

Asymptotic distribution may be derived using counting process methods

Simplifies for classical nested case-control

*( , )

ˆ( ) ( )g i ri g r

M t M t

ˆ ( | )

( )ˆ ( | )

jj

j jj

l t jl R gi

i g t t l t jl R

R lN t

R l

16

Ilustration: uranium miners cohort

1 1 2 2(1 ) (1 ) i i ix x

Fit excess relative risk model:

xi1 = cumulative radon (100 WLMs)

xi2 = cumulative smoking (1000 packs)

For classical nested case-control with three controls per case:

1̂ 0.556 (0.215) per 100 WMLs

2ˆ 0.276 (0.093) per 1000 packs

17

Aggregate martingale residual processes in three groups according to cumulative radon exposure:

Groups: I: < 500 WLMs II: 500-1500 WLMs

III: > 1500 WLMs

There are indications for an interaction between cumulative radon exposure and age

18

Age and group Observed Expected

Below 60 years & group I 30 30.7

Below 60 years & group II 39 45.9

Below 60 years & group III 81 73.4

Above 60 years & group I 27 27.7

Above 60 years & group II 45 36.1

Above 60 years & group III 36 44.2

Observed and expected number of failures in the groups for ages below and above 60 years:

Chi-squared statistic with 2(3 – 1) = 4 df takes the value 10.5 (P-value 3.2%)

19

Concluding remarks

• Introduces a time aspect that is usually disregarded for sample risk set data

• Gives a similar model formulation as for cohort data and thereby opens up for similar methodo-logical developments as for cohort studies

• Grouped martingale residual processes is one example of this. They allow to check for time-dependent effects and other deviations from the model

The counting process formulation of nested case-control studies:

20

• How should the grouping be performed?

• How do specific deviations from the model turn up in the plots?

• Kolmogorov-Smirnov and Cramer von Mises type tests? (Durbin’s approximation, Lin et al’s simultation trick)

Questions and further develoments of grouped martingale residual plots and related goodness-of-fit methods

1 using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf borgan...

Documents