how to use data to get “the right answer” donna spiegelman departments of epidemiology and...
TRANSCRIPT
How to use data to get“The Right Answer”
Donna SpiegelmanDepartments of Epidemiology and Biostatistics
Harvard School of Public [email protected]
Triumphs of Modern Epidemiology
• Alcohol and esophageal cancer• Hep B virus and liver cancer• HPV and cervical cancer• H. pylori and peptic ulcer• Folic acid and neural tube defects• Asbestos and lung cancer, mesothelioma• Aniline dye and bladder cancer• Vinyl chloride and angiosarcoma of the liver• Nickel and nasal cancer• Radon and lung cancer *• Aspirin and MI• Dalkon Shield IUD and PID ...www.epimonitor.net/EpiMonday/Triumph62501.htmhttp://www-cie.iarc.fr/monoeval/crthgr01.html
Definition of junk science: “It is a hodgepodge of biased data, spurious inference, and logical legerdemain, patched together by researchers whose enthusiasm for discovery and diagnosis far outstrips their dredging, wishful thinking, truculent dogmatism, and, now and again, outright fraud.”
The Start of Junk Science??
The Junkyard Dogs“Unfortunately, and increasingly today, one can find examples of junk science that compromise the integrity of the field of science and, at the same time, create a scare environment where unnecessary regulations on industry in particular, are rammed through without respect to rhyme, reason, effect or cause.”
—Michael A. Miles, former CEO of the Phillip Morris tobacco company
More controversial: “weak effects”
• Air pollution and all-cause mortality, CVD mortality
• Low dose exposure to radon and lung cancer• Low dose exposure to lead and neurotoxicity in
children• Passive cigarette smoking and lung cancer• Alcohol and breast cancer• Oral contraceptives and breast cancer
Population Attributable Risk and Weak Effects: Still Important!
Exposure Prevalence (%)
10 40 80
RR 1.2 2 7 14
1.5 5 17 29
2.0 17 44 66
The research (and the researcher) that Philip Morris did not want you to see
(Ragnar) Rylander was at that time at another Swedish university and had previously undertaken assignments for both Lorillard (another tobacco company) and Philip Morris. He was to be “officially … carried on the books as a consultant to FTR [Fabriques de tabac réunies, a Philip Morris subsidiary] and would be paid by FTR”.
By means of material from internal industry documents it can be revealed that one company, Philip Morris, acquired a research facility, INBIFO, in Germany and created a complex mechanism seeking to ensure that the work done in the facility could not be linked to Philip Morris. In particular it involved the appointment of a Swedish professor as a ‘co-ordinator’, who would synthesise reports for onward transmission to the USA. Various arrangements were made to conceal this process
Relation between Philip Morris and
INBIFO
Source: Diethelm et al. Lancet 2005
“Weak Associations” (NEJM 1990)Rylander
“Studies … a relation between exposure to environmental tobacco smoke and lung cancer must take into account other …factors and the possibility that exposure to environmental tobacco smoke may be confounded. This has not been considered in the majority of such studies. Until this has been done, the claim of causality between environmental tobacco smoke and lung cancer remains uncertain.”
Angell“There is no question that epidemiologic studies of risk factors for disease are of growing interest and importance, both for individuals and for the public health. It is important, however, to remember the pitfalls in interpreting them and to be cautious in advising patients on the basis of single or conflicting studies. This is particularly true of studies that purport to show only weak associations between exposures and disease.”
RCTs vs. observational studies
• ß-carotene and lung cancer
• HRT and CVD incidence and mortality
• VIOXX
• Dietary fat and breast cancer
- Standard designs & analysis sometimes not
adequately controlling for
- confounding
- information bias
- selection bias
Wrong answer?
- Agreed: We can be doing a better job
- Not agreed: HOW
Confounding
What do we do?
“industry standard” END of mainstream epi methods
collect data on known & suspected time-varying confounders
MSMs, G-causal algorithm
Design Analysis
Match on key confounders
Matched
Randomize Intent-to-treat
Restrict Simple (crude)
Collect data on known & suspected confounders (time-
invariant)
Multivariate models
Confounding – outstanding problems
• unmeasured confounding• known or suspected confounders• unknown confounders
Fact: ~ 47% of US breast cancer incidence explained by known risk factors (Madigan et al., JNCI, 1987:1681-1695)
r2 in most epi regressions (blood pressure, serum hormones)20%-40% (Pediatric Task Force on BP Control in Children, Pediatrics, 2004; Hankinson, personal communication)
Undiscovered genes?
Unimagined environmental factors?
Complex non-linear interactions?
Solution to confounding by unknown risk factors:randomization
VERY limited applicability
Outstanding questions:
a few strong risk factors or many weak ones?
many rare ones or a few common ones?
modeling of scenarios:do biases cancel?
NEW IDEAS NEEDED
Effects of unknown confounders
References
“The impact of residual and unmeasured confounding in epidemiological studies: a simulation study”, Davey-Smith and colleagues, Am J Epidemiol 2007; 166:646-655
“Poppers, Kaposi’s sarcoma, and HIV infection: empirical evidence of a strong confounding effect?”, Morabia, Prev Med 1995; 24:90-95.
Marshall and Hastrup, Am J Epidemiol, 1999; 150:88-96, 1996; 143: 1069-1078
Unmeasured confounding by known or suspected risk factors:
We can use the data to get ‘the right answer’/’improve the validity of new studies’!
Design: two-stage (Reilly & Salim, http://meb.ki.se/~marrei/software/)
Stage 1 (Di, Ei, C1i), i = 1, . . . , n
Stage 2 (Di, Ei, C1i, C2i), i = 1, . . . , n2 or
(Ei, C1i, C2i), i = 1, . . . , n2
So that
(Di, Ei, C1i, . ), i = n2+ 1, . . . , n1 + n2
n1 >> n2
Analysis: MLE of 2-stage likelihood
References:
Reilly M, 1996; Weinberg & Wacholder, 1990; Zhao & Lipsitz, 1992;
Robins et al., 1994; + many others
Cain & Breslow, AJE, 1988
f (D | E, C1, C2; β) pdf of complete data
Pr (I | D, E, C1), I = 1 if in stage 2, 0 otherwise
f (D, I | E, C1; β,θ) =
Pr (I | D, E, C1) f (D | E, C1, c2) f (c2 | E, C1) d c22C
likelihood of 2-stage design =
Stage 1
log [f (D | E, C1; , θ)]
Stage 2
+ log [f (D | E, C1, C2; )]
Stage 2
+ log [f (C2 | E, C1; θ ]
Example: Kyle Steenland – retrospective cohort study of lung cancer in
(Steenland & Greenland, AJE 2004;160:384-392)
f (D | E, C); E = silica, C = smoking
f (D | E) = f (D | E, C = s) Pr (C = s | E)
Pr (C = s| E=r) = where
1
S
s
rsP ,Likelihood (silica + 1987 smoking data + US smoking data + ACS lung cancer & smoking data)
silica 1987 silica smoking data
= log [f(Di | Ei)] +
relation to occupational silica exposure
P 1rss
1
1
n
i
2
1
n
i
sCIrEIrs
S
s
R
r
iiPlog11
43
111
0,|logn
iii
sCIS
sos
n
i
ECDfPi
• assume distribution of smoking during entire period ~ 1987 distribution
could treat as known
n1 silica workers in retrospective cohortstudy
n2 silica workers in 1987 smoking prevalence study
n3 NHIS participants on general population smoking rates in 1986
n4 ACS prospective cohort data on smoking & lung cancer
r=1,…, R levels of exposure
s=1,…, S levels of smoking
U.S ACS
Obstacles:
software? Design software available;
Offsets or weights in PROC GENMOD or PROC PHREG can be used for analysis
training?
funding?
Result: The right answer? A better answer?
Is it worth it?
INFORMATION BIAS:
What do we usually do?
NOTHING!
What can we do?
Design Analysis
main study/validation study measurement error methods
MS/EVS, MS/IVS, IVS misclassification methods
References:
Carroll, Ruppert, Stefanski, 1995, Chapman + Hall
Rosner et al., AJE, 1990, 1992
Spiegelman, “Reliability studies”
“Validation studies”
Robins et al., JASA, 1994
Encyclopedia of Biostatistics
EXAMPLE
FRAMINGHAM HEART STUDY
MAIN STUDY
- 1731 men free of CHD
(non-fatal MI, fatal CHD)
At exam 4
- Followed for 10 years for CHD
Incidence (163 events, cumulative incidence = 9.4%)
REPRODUCIBILITY STUDY
- 1346 men with all risk factors
information at exams 2+3 (subgroup of 1731 men)
- Risk factors in main study: Age, BMI, Serum Cholesterol, Serum Glucose, Smoking, SBP
- Risk factors in reproducibility study: Serum Cholesterol, BMI, Serum Glucose, SBP, Smoking
Example: (from Rosner, Spiegelman, Willett; AJE, 1992)
Framingham Heart Study
Reliability study: (n = 1346 men)
ij i
ij iij
ij i
iij
CHOL CHOL
GLUC GLUCijwBMI BMI
SBPSBP
, Var ,Eij
ε ε ε 0
Subject i’s
observed valve at time j
Subject i’s true mean
Reliability Coefficients
CHOL 75%
GLUC 52%
BMI 95%
SBP 72%
2B
I2 2B W
σR
σ σ
Assumptions
1. Measurement error model,ij i ijZ = X + ε
E ;Varij ij W ε 0 ε within
E ;Vari i xX μ X between
2. Disease incidence model
log i0 1 2 i
i
D
1-D
iX U
3. •Pr (Di) is small
•Measurement error independent of disease status
4. Reliability substudy “representative” of main study
The Procedure
― For one variable measured with unbiased, additive error
, where {simplest case}
Step 1. Run a logistic regression of D on Z, U in main study
logit 0 1 2Pr D 1|Z,U Z U
Measured witherror
Measured withouterror (>1)
Z X ( , ) 0Corr X
Step 2. Estimate reliability coefficient from reliability substudy (n2 subjects,
r replicates)
12 2 2
I x exˆ ˆ ˆ ˆR / , Need same # of
replicates per subject
where
2 2
2 2n n ri ij i2 2
ez i 1 i 1 j 12 2
Z Z.. Z Zˆ ˆ,
n 1 n r 1
within-person variance (estimated)
TOTAL
r2e
2z
2x /ˆˆˆ
Step 3. Correct.
*1 1 Iˆ ˆ ˆ/ R
correcteduncorrected
21* 1
1 I4I
ˆˆVar ˆˆˆ ˆVar Var R
ˆ ˆR RI
MAIN STUDY RELIABILITY STUDY
This contributes much less.
22 I I
I 22 2
ˆ ˆ2 n r 1 1 R 1 r 1 RˆˆVar R
r r 1 n n 1
(Donner, Intl Stat
Review, 1986)95% C.I. for odds ratio:
*
1*1 rVa961 ˆˆ.ˆexp
= biological meaningful comparison, e.g. 90% percentile – 10% percentile
Results: 10-year cumulative incidence of CHD (163 events / 1731 men)
CI) 95 OR %(
Uncorrected Corrected
^
GLUC 1.27 (0.97, 1.66)
CHOL 2.21 (1343, 3.39)
= 34mg/dl
= 100mg/dl
BMI 1.64 (1.04, 2.58)= 9.7kg/m2
SBP 2.80 (1.85, 4.24) = 49mmHg
SMOKE 1.70 (1.17, 2.47) (cig/day)
= 30 cig/day
AGE 2.05 (1.27, 3.33)
AGE 3.21 (1.95, 5.29)
AGE 4.30 (2.06, 8.98)
45-54
55-64
65-69
3.73 (1.67, 8.35)
2.85 (1.72, 4.74)
1.89 (1.16, 3.07)
1.69 (1.16, 2.47)
3.93 (2.19, 7.05)
1.49 (0.92, 2.43)
1.75 (0.87, 3.52)
2.91 (1.62, 5.24)
General framework for estimation and inference in failure time regression models
- Main study/validation study studies
The data:
(Di, Ti, Xi, Vi), i = 1, . . ., n1 main study subjects
(Di, Ti, xi, Xi, Vi), i = n1 + 1, . . ., n1 + n2 validation study subjects
where Ti = survival time
Di = 1 if case at Ti, 0 o.w.
xi = perfect exposure measurement
Xi = surrogate exposure measurement for x
Vi = other perfectly measured covariate data
- assume sampling into validation study is at random
Spiegelman and Logan, submitted
Effect of radon exposure on lung cancer mortality rates:
UNM uranium miners
31̂ ( ) 10SE x Mortality RR(95% CI)
= 100 WLM 500 WLM
Uncorrected 3.52 (0.658) 1.4 (1.3, 1.6) 5.8 (3.1, 11)
EPL 5.00 (1.00) 1.7 (1.4, 2.0) 12 (4.6, 32)
• > 30% attenuation in
• policy implications for risk assessment
1̂
Nutritional epidemiology:
Tworoger SS, Eliassen AH, Rosner B, Sluss P, Hankinson SE. Plasma prolaction concentrations and risk of premenopausal breast cancer. Cancer Research, 2004;64:6814-6819.
Hankinson SE, Willett WC, Michaud DS, Manson JE, Colditz GA, Longcope C, Rosner B, Speizer FE. Plasma prolaction levels and subsequent risk of breast cancer in postmenopausal women. Journal of the National Cancer Institute 1999; 91:629-634.
Smith-Warner SA, Spiegelman D, Adami H, Beeson L, van den Brandt P, Folsom A, Fraser G, Freudenheim J, Goldbohm R, Graham S, Kushi L, Miller A, Rohan T, Speizer FE, Toniolo P, Willett WC, Wolk A, Zeleniuch-Jacquotte A, Hunter DJ. Types of dietary fat and breast cancer: a pooled analysis of cohort studies. International Journal of Cancer 2001; 92:767-774.
Holmes MD, Stampfer MJ, Wolf AM, Jones CP, Spiegelman D, Manson JE, Coldditz GA. Can behavioral risk factors explain the difference in body mass index between African-American and European-American women? Ethnicity and Disease 1999; 8:331-339.
Rich-Edwards JW, Hu F, Michels K, Stampfer MJ, Manson JE, Rosner B, Willett WC. Breastfeeding in infancy and risk of cardiovascular disease in adult women. Epidemiology, 2004; 15:550-556.
Koh-Banerjee P, Chu NF, Spiegelman D, Rosner B, Colditz GA, Willett WC, Rimm EB. Prospective study of the association of changes in dietary intake, physical activity, alcohol consumption, and smoking with 9-year gain in wais circumference among 15,587 men. Am J Clin Nutr 2003; 78:719-727.
Koh-Banerjee P, Franz M, Sampson L, Liu S, Jacobs Jr. DR, Spiegelman D, Willett WC, Rimm EB. Changes in whole grain, bran and cereal fiber consumption in relation to 8-year weight gain among men. Am J Clin Nutr, 2004; 5:1237-1245.
Environmental epidemiology
Keshaviah AP, Weller EA, Spiegelman D. Occupational exposure to methyl tertiary-butyl ether in relation to key health symptom prevalence: the effect of measurement error correction. Environmetrics, 2002; 14:573-582.
Thurston SW, Williams P, Hauser R, Hu H, Hernandez-Avila M, Spiegelman D. A comparison of regression calibration methods for measurement error in main study/internal validation study designs. Journal of Statistical Planning and Inference, 2005; 131:175-190.
Weller EA, Milton DK, Eisen EA, Spiegelman D. Regression calibration for logistic regression with multiple surrogates for one exposure. Journal of Statistical Planning and Inference, 2007; 137:449-461 .
Horick N, Milton DK, Gold D, Weller E, Spiegelman D. Household dust endotoxin exposure and respiratory effects in infants: correction for measurement error bias. Environmental Health Perspectives, 2006; 114:135-140.
Li R, Weller EA, Dockery DW, Neas LM, Spiegelman D. Association of indoor nitrogen dioxide with respiratory symptoms in children: the effect of measurement error correction with multiple surrogates. Journal of Exposure Analysis and Environmental Epidemiology, 2006; 16:342-350.
Fetal lead exposure in relation to birth weight; MS/IVS; bone lead vs. cord lead (r=0.19)
Metal working fluids exposure in relation to lung function; MS/EVS; job characteristics vs. personal monitors (r=0.82)
SOFTWARE IS AVAILABLE!
• http:/www.hsph.harvard.edu/facres/spglmn.html
SAS macros for regression calibration (Rosner et al., AJE, 1990, 1992; Spiegelman et al., AJCN, 1997; Spiegelman et al, SIM, 2001)
in main study/validation study designs
• STATA (Carroll et al. SIMEX, regression calibration)
So why are methods under-utilized?
No validation data
Insufficient training of statisticians & epidemiologists
Either/or about assumptions
Quantitative correction for selection bias:
Design Analysis
main study/’selection’ study ML
SPE E-E
Note: large overlap w/ missing data literature when D is missing, potential for selection bias
References:
Little & Rubin, Wiley, 1986 Scharfstein et al., 1998 Rotnitzky et al., 1997 Robins et al., 1995
ML
SPE E-E
Basic idea:
Let I=1 if selected, 0 otherwise,
Pr (I | E, C) = selection probability
Selection study has data on those not in main study (Di, Ei, Ci = (Ci, Ui ), i=1, …, n2)
Surrogates for D,
risk factors for D
IPW: Pr -1(Ii = 1 | Di, Ei, Ci) = Wi
Use PROC GENMOD w/ robust variance + weights Wi; i=1, …, n1
REPEATED SUBJECT = ID / TYPE = IND;
Mail, phone, house visit to get data
For dependent censoring, (a.k.a. biased loss to follow-up)
1
iW Pr 1| E , i i it I t t t
C
i i i i i i i iPr 1| E , ,T ,D Pr 1| E ,I t t t I t t t C CAssumes
ENAR 2007 Spring Meeting
Double Sampling Designs for Addressing Loss to Follow-up In Estimating Mortality
Ming-Wen An, Johns Hopkins UniversityConstantine Frangakis, Johns Hopkins University
Donald B. Rubin, Harvard UniversityConstantin T. Yiannoutsos, Indiana University School of Medicine
Loss to follow-up is an important challenge that arises when estimating mortality, and is particular concern in developing countries. In the absence of more active follow-up systems, resulting mortality estimates may be biased. One design approach to address this is ‘double sampling’, where a subset of patients who are lost to follow-up is chosen to be actively followed, often subject to resource constraints, with the goal of obtaining valid and efficient estimators. We demonstrate our results using data from Africa, which were collected to estimate HIV mortality as part of the evaluation of the President’s Emergency Plan for AIDS Relief (PEPFAR).
Methods EXIST for efficient study design and valid data analysis when standard design with standard analysis gives the wrong answer
CONCLUSIONS
-
- Barriers to utilization
• software gaps
• software unfriendly, no QC
• inadequate training of students + practitioners (Epi & Biostat)
• are two-stage designs fundable @ NIH?
- Why do epidemiologists routinely adjust for one source of bias only?
(confounding by measured risk factors)