two-stage case-control studies using exposure estimates from a geographical information system jonas...
TRANSCRIPT
TWO-STAGE CASE-CONTROL STUDIES USING EXPOSURE
ESTIMATES FROM A GEOGRAPHICAL INFORMATION
SYSTEM
Jonas Björk1 & Ulf Strömberg2
1Competence Center for Clinical Research2Occupational and Environmental Medicine
Lund University Hospital
OUTLINE OF TALK
• Previous project: What have we done? (Jonas Björk)
• Ongoing project: What shall we do? (Ulf Strömberg)
Two-stage procedure for case-control studies
1st stage Complete data obtained from registries
Disease statusGeneral characteristicsGroup affiliation (e.g. occupation or residential area)
Group-level exposure XG
2nd stageIndividual exposure data for a subset of
the 1st stage sample
Exposure database group-level exposure
• JEM = Job Exposure Matrix Occupational group proportion exposed
• GIS Residential group (area) average concentration of an
air pollutant
JEM - proportion exposed
0
0,1
0,2
0,3
0,4
0,5
Group 0 Group 1 Group 2 Group 3 Group 4Most datatypically in groupswith low XG
Linear Relation between Proportion Exposed and Relative Risk
• No confounding between/within groups Example: RR (exposed vs. unexposed) = 2.0
Proportion exposed XG Average RR
0% 1.0
10% 0.10 * 2 + 0.9 +1.0 =1.1
50% 1.5
100% 2.0
Linear OR model: OR(XG) = 1 + β XG
XG = Exposure proportion
OR for exposed vs. unexposed = OR(1) = 1 + β
1
OR(1)
XG0 1
Most datatypically in groupswith low XG
Confounding between groups
• General confounders (eg, gender and age) can normally be adjusted for
• Assuming no confounding within groups and no effect modification in any stratum sk:
OR(XG;s1, s2, ...sk) = (1 + β XG) exp(Σγksk)
Combining 1st and 2nd stage data
• Assumption: 2nd stage data missing at random condition on disease status and 1st stage group affiliation
• For subjects with missing 2nd stage data:Use 1st stage data to calculate expected number of exposed/unexposed
• Expectation-maximization (EM) algorithm
EM-algorithm(Wacholder & Weinberg 1994)
1. Select a starting value, e.g. OR=12. E-step
Among the non-participants, calculate expected number of exposed/unexposed case and controls in each group
3. M-stepMaximize the likelihood for observed+expected cell frequencies using the chosen risk model for individual-level data (not necessarily linear)
New OR-estimate4. Repeat 2. and 3. until convergence
E-step in our situation(Strömberg & Björk, submitted)
• m0 controls with missing 2nd stage data
m0 * XG = expected number of exposed
• m1 cases with missing 2nd stage data
m1 * XG * ÔR / [1+(ÔR-1)* XG]
ÔR = Current OR-estimateComplete the data in each group G:
Simulated case-control studies
• 400 cases, 1200 controls in the 1st stage• 2nd stage participation
75% of the cases
25% of the controls• Selective participation of 2nd stage controls
Corr(Participation, XG) =0, > 0, <0
• 1000 replications in each scenario• True OR = 3
Simulations - Results
Participation 1st stage data only
(400 + 1200)
2nd stage data only
(300 + 300)
EM-method
(400 + 1200)
OR SD Coverage OR SD Coverage OR SD Coverage
Corr(Part., XG)=0 3.0 0.18 95.0% 3.0 0.23 95.6% 3.0 0.15 95.5%
Corr(Part., XG)<0 3.0 0.18 95.0% 5.3 0.29 45.8% 3.0 0.15 95.0%
Corr(Part., XG)>0 3.0 0.18 95.0% 1.8 0.20 32.9% 3.0 0.15 95.5%
SD = Empirical standard deviation of the ln(OR) estimatesCoverage = Coverage of 95% confidence intervals
Simulations - Conclusions
Combining 1st and 2nd stage data, using the EM method can:
1. Improve precision
2. Remove bias from selective participation
Method is sensitive to errors in the
(1st stage) external exposure data!
Simulations – Conclusions II
EM-method is sensitive to
1. Violations of the MAR-assumption
(condition on on disease status and 1st stage group affiliation)
2. Errors in the (1st stage) external exposure data
Ongoing methodological research project
• Focus on exposure estimates from a GIS
GIS data: NO2 (Scania)
Two-stage exposure assessment procedure
XG = 4.8 XG = 10.1 XG = 20.1 ...
xi
1st stage: XG represents mean exposure levels rather than proportion exposed
xi xi
2nd stage: xi is a continuous, rather than a dichotomous, exposure variable
Assume a linear relation between and xi and disease odds (cf. radon exposure and lung cancer [Weinberg et al., 1996]).
xi
Odds
For the ”only 1st stage” subjects: no bias expected by using their XG:s (Berkson errors) provided MAR in each group – independent of disease status. EM method? Exposure variation in each group?
Two-stage exposure assessment procedure – related work
• Multilevel studies with applications to a study of air pollution [Navidi et al., 1994]: pooling exposure effect estimates based on individual-level and group-level models, respectively
Collecting data on confounders or effect modifiers at 2nd stage
XG = 4.8 XG = 10.1 XG = 20.1 ...
ci
1st stage: XG = mean exposure levels
ci ci
2nd stage: ci is a covariate, e.g. smoking history
Data on confounders or effect modifiers at 2nd stage –
estimation of exposure effect
• Confounder adjustment based on logistic regression: pseudo-likelihood approach [Cain & Breslow, 1988]
• More general approach: EM method [Wacholder & Weinberg, 1994]
Design stage (“stage 0”)
Group 1 Group 2 Group 3 ...
Subjects?
1st stage: How many geographical areas (groups)?
? ?
2nd stage: Fractions of the 1st stage cases and controls?
Design stage – related work
• Two-stage exposure assessment: power depends more strongly on the number of groups than on the number of subjects per group [Navidi et al., 1994]
References I
• Björk & Strömberg. Int J Epidemiol 2002;31:154-60.
• Strömberg & Björk. “Incorporating group-level exposure information in case-control studies with missing data on dichotomous exposures”. Submitted.
References II
• Cain & Breslow. Am J Epidemiol 1988;128:1198-1206.• Navidi et al. Environ Health Perspect 1994;102(Suppl
8):25-32.• Wacholder & Weinberg. Biometrics 1994;50:350-7.• Weinberg et al. Epidemiology 1996;7:190-7.