two-stage case-control studies using exposure estimates from a geographical information system jonas...

TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE

ESTIMATES FROM A GEOGRAPHICAL INFORMATION

SYSTEM

Jonas Björk1 & Ulf Strömberg2

1Competence Center for Clinical Research2Occupational and Environmental Medicine

Lund University Hospital

OUTLINE OF TALK

• Previous project: What have we done? (Jonas Björk)

• Ongoing project: What shall we do? (Ulf Strömberg)

Two-stage procedure for case-control studies

1st stage Complete data obtained from registries

Disease statusGeneral characteristicsGroup affiliation (e.g. occupation or residential area)

Group-level exposure XG

2nd stageIndividual exposure data for a subset of

the 1st stage sample

Exposure database group-level exposure

• JEM = Job Exposure Matrix Occupational group proportion exposed

• GIS Residential group (area) average concentration of an

air pollutant

JEM - proportion exposed

0

0,1

0,2

0,3

0,4

0,5

Group 0 Group 1 Group 2 Group 3 Group 4Most datatypically in groupswith low XG

Linear Relation between Proportion Exposed and Relative Risk

• No confounding between/within groups Example: RR (exposed vs. unexposed) = 2.0

Proportion exposed XG Average RR

0% 1.0

10% 0.10 * 2 + 0.9 +1.0 =1.1

50% 1.5

100% 2.0

Linear OR model: OR(XG) = 1 + β XG

XG = Exposure proportion

OR for exposed vs. unexposed = OR(1) = 1 + β

1

OR(1)

XG0 1

Most datatypically in groupswith low XG

Confounding between groups

• General confounders (eg, gender and age) can normally be adjusted for

• Assuming no confounding within groups and no effect modification in any stratum sk:

OR(XG;s1, s2, ...sk) = (1 + β XG) exp(Σγksk)

Combining 1st and 2nd stage data

• Assumption: 2nd stage data missing at random condition on disease status and 1st stage group affiliation

• For subjects with missing 2nd stage data:Use 1st stage data to calculate expected number of exposed/unexposed

• Expectation-maximization (EM) algorithm

EM-algorithm(Wacholder & Weinberg 1994)

1. Select a starting value, e.g. OR=12. E-step

Among the non-participants, calculate expected number of exposed/unexposed case and controls in each group

3. M-stepMaximize the likelihood for observed+expected cell frequencies using the chosen risk model for individual-level data (not necessarily linear)

New OR-estimate4. Repeat 2. and 3. until convergence

E-step in our situation(Strömberg & Björk, submitted)

• m0 controls with missing 2nd stage data

m0 * XG = expected number of exposed

• m1 cases with missing 2nd stage data

m1 * XG * ÔR / [1+(ÔR-1)* XG]

ÔR = Current OR-estimateComplete the data in each group G:

Simulated case-control studies

• 400 cases, 1200 controls in the 1st stage• 2nd stage participation

75% of the cases

25% of the controls• Selective participation of 2nd stage controls

Corr(Participation, XG) =0, > 0, <0

• 1000 replications in each scenario• True OR = 3

Simulations - Results

Participation 1st stage data only

(400 + 1200)

2nd stage data only

(300 + 300)

EM-method

(400 + 1200)

OR SD Coverage OR SD Coverage OR SD Coverage

Corr(Part., XG)=0 3.0 0.18 95.0% 3.0 0.23 95.6% 3.0 0.15 95.5%

Corr(Part., XG)<0 3.0 0.18 95.0% 5.3 0.29 45.8% 3.0 0.15 95.0%

Corr(Part., XG)>0 3.0 0.18 95.0% 1.8 0.20 32.9% 3.0 0.15 95.5%

SD = Empirical standard deviation of the ln(OR) estimatesCoverage = Coverage of 95% confidence intervals

Simulations - Conclusions

Combining 1st and 2nd stage data, using the EM method can:

1. Improve precision

2. Remove bias from selective participation

Method is sensitive to errors in the

(1st stage) external exposure data!

Simulations – Conclusions II

EM-method is sensitive to

1. Violations of the MAR-assumption

(condition on on disease status and 1st stage group affiliation)

2. Errors in the (1st stage) external exposure data

Ongoing methodological research project

• Focus on exposure estimates from a GIS

GIS data: NO2 (Scania)

Two-stage exposure assessment procedure

XG = 4.8 XG = 10.1 XG = 20.1 ...

xi

1st stage: XG represents mean exposure levels rather than proportion exposed

xi xi

2nd stage: xi is a continuous, rather than a dichotomous, exposure variable

Assume a linear relation between and xi and disease odds (cf. radon exposure and lung cancer [Weinberg et al., 1996]).

xi

Odds

For the ”only 1st stage” subjects: no bias expected by using their XG:s (Berkson errors) provided MAR in each group – independent of disease status. EM method? Exposure variation in each group?

Two-stage exposure assessment procedure – related work

• Multilevel studies with applications to a study of air pollution [Navidi et al., 1994]: pooling exposure effect estimates based on individual-level and group-level models, respectively

Collecting data on confounders or effect modifiers at 2nd stage

XG = 4.8 XG = 10.1 XG = 20.1 ...

ci

1st stage: XG = mean exposure levels

ci ci

2nd stage: ci is a covariate, e.g. smoking history

Data on confounders or effect modifiers at 2nd stage –

estimation of exposure effect

• Confounder adjustment based on logistic regression: pseudo-likelihood approach [Cain & Breslow, 1988]

• More general approach: EM method [Wacholder & Weinberg, 1994]

Design stage (“stage 0”)

Group 1 Group 2 Group 3 ...

Subjects?

1st stage: How many geographical areas (groups)?

? ?

2nd stage: Fractions of the 1st stage cases and controls?

Design stage – related work

• Two-stage exposure assessment: power depends more strongly on the number of groups than on the number of subjects per group [Navidi et al., 1994]

References I

• Björk & Strömberg. Int J Epidemiol 2002;31:154-60.

• Strömberg & Björk. “Incorporating group-level exposure information in case-control studies with missing data on dichotomous exposures”. Submitted.

References II

• Cain & Breslow. Am J Epidemiol 1988;128:1198-1206.• Navidi et al. Environ Health Perspect 1994;102(Suppl

8):25-32.• Wacholder & Weinberg. Biometrics 1994;50:350-7.• Weinberg et al. Epidemiology 1996;7:190-7.

two-stage case-control studies using exposure estimates from a geographical information system jonas...

Documents

stage complete data

stage data only300

stage data m1

stage data m0

stage external exposure

stage dataassumption

stage casecontrol studies

stage2nd stage participation75