4 threats to validity from confounding bias and effect modification
TRANSCRIPT
Threats to Validity from Confounding and Effect Modification
• Overview: Random vs. systematic error• Confounding• Effect Modification• Logistic regression (time permitting)• Special thanks for some of the materials in these
lecture:– Professor Jen Ahern (UCB)– Professor Madhu Pai (McGilll—a former
250b GSI)1
2014 Page 1
1
The cardinal rule of epidemiology
• Remember that all results based on epidemiology studies are likely to be …
2014 Page 2
The cardinal rule of epidemiology (continued)
• WRONG…– unless proper care has been taken to eliminate
all sources of error in the estimate (…and sometimes even then the results will be wrong because of unknown sources of error)
2
2014 Page 3
Example: Confounding• A colleague with outside funding believes that cigarette smoke
is not a “cause” (in any sense) of lung cancer but that exposure to matches (yes, matches) is the cause. This colleague has conducted a large case control study to test the null hypothesis:
Ho: “Matches are not associated with lung cancer”.
• What’s the rationale (in the Popperian sense) for stating the null hypothesis rather than the alternative:
HA: “Matches are associated with lung cancer”.
• What does the colleague hope to do (in terms of hypothesis testing)
• What do you think of the term “associated” –would it be better to write “a cause of”?
2014 Page 4
• “We can never finally prove our scientific theories, we can merely (provisionally) confirm or (conclusively) refute them.”– - Karl PopperSir Karl Raimund Popper CH FBA FRS[4] (28 July 1902 – 17 September 1994) was an Austrian-British[5]
philosopher and professor at the London School of Economics.[6] He is generally regarded o regarded as one of the greatest philosophers of science of the 20th century.[7][8] Popper is known for his rejection of the classical inductivist views on the scientific method, in favour of empirical falsification: regarded as one of the greatest philosophers of science of the 20th century.[7][8] (wikipedia.com)
2014 Page 5
Confounding: smoking, matches,
10
and lung cancer• Your colleague has located 1000 cases of lung cancer, of
whom 820 carry matches.• Among 1000 reference patients (selected randomly from a
population with recently taken normal chest x-rays), 340 carry matches.
• Strengths of the reference selection process? Weaknesses?• Describe the relationship between matches and lung cancer
in your colleague’s data.• Would you like to analyze the data in any other fashion?
2014 Page 6
Confounding: smoking, matches,
and lung cancer
• Odds ratio = (820 * 660) / (180 * 340)
• OR = 8.8
• 95% CI (7.2, 10.9)
Cancer No cancer
Matches 820 340
No matches 180 660
2014 Page 7
Confounding: smoking, matches,
and lung cancer• You decide to look at the relationship between matches
and lung cancer in the smokers separately from the non- smokers.
• You find that among the 1000 cases, 900 are smokers and 810 (of the 900) carry matches
• Among the 1000 reference patients, 300 are smokers and 270 (of the 300) carry matches
• Calculate the relevant measure(s) of effect.• What should your colleague do about future funding?
2014 Page 8
Confounding: smoking, matches, and lungcancer
• ORpooled = 8.84 (7.2, 10.9)
• ORsmokers = 1.0 (0.6, 1.5)
• ORnonsmokers = 1.0 (0.5, 2.0)
Pooled Cancer No cancerMatches No Matches Smokers Matches
820180Cancer 810
340660No cancer 270
No Matches Non-smoker Matches
No Matches
90Cancer 10
90
30No cancer 70
630 13
2014 Page 9
Confounding: smoking, matches,and lung cancer
• To be complete, you also decide to examine the relationship between smoking and lung cancer.
• What tables should you construct to do this?
14
2014 Page 10
Confounding: smoking, matches, and lung cancer
’
• ORpooled = 21.0 (16.3, 27.1)
• ORmatches = 21.0 (10.5, 46.2)
• ORno matches = 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s
Pooled Cancer No cancerSmoking No Smoking Matches Smoking
900100Cancer 810
300700No cancer 270
No Smoking No matches Smoking No Smoking
10Cancer 90
90
70No cancer 30
630 16
2014 Page 11
Confounder?
? ?
? Unadjusted RRExposure Disease
? Adjusted RR
19
2014 Page 12
2
BMJ 2004;329:868-869 (16 October)
Why is confounding so important in epidemiology?
● BMJ Editorial: “The scandal of poor epidemiological research” [16 October 2004]● “Confounding, the situation in which an apparent
effect of an exposure on risk is explained by its association with other factors, is probably the most important cause of spurious associations in observational epidemiology.”
2014 Page 13
Overview
3
● Causality is the central concern of epidemiology● Confounding is the central concern with establishing
causality● Confounding can be understood using multiple
different approaches● A strong understanding of various approaches to
confounding and its control is essential for all those who engage in health research
2014 Page 14
10Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122.
Causal Effect
Random Error
Confounding
Information bias (misclassification)
Selection bias
Bias in inference
Reporting & publication bias
Bias in knowledge use
Confounding is one of the key biases in identifying causal effects
RRcausal
“truth”RR
association
2014 Page 15
11
Confounding:4 ways to understand it!
1. “Mixing of effects”2. “Classical” approach based on a priori
criteria3. Collapsibility and data-based criteria4. “Counterfactual” and non-comparability
approaches
2014 Page 16
12
Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002
First approach:Confounding: mixing of effects
● “Confounding is confusion, or mixing, of effects; the effect of the exposure is mixed together with the effect of another variable, leading to bias” - Rothman, 2002
Latin: “confundere” is to mix together
2014 Page 17
ExampleAssociation between birth order and Down syndrome
13Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 18
Association between maternal age and Down syndrome
14Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 19
Association between maternal age and Down syndrome, stratified by birth order
15Data from Stark and Mantel (1966) Source: Rothman 2002
2014 Page 20
Mixing of Effects: the water pipes analogy
Exposure
16Adapted from Jewell NP. Statistics for Epidemiology. Chapman & Hall, 2003
Outcome
Confounder
Mixing of effects – cannot separate the effect of exposure from that of confounder
Exposure and disease share a common cause (‘parent’)
2014 Page 21
Mixing of Effects: “control” of the confounder
Exposure
17Adapted from: Jewell NP. Statistics for Epidemiology. Chapman & Hall, 2003
Outcome
Confounder
Successful “control” of confounding (adjustment)
If the common cause (‘parent’) is blocked, then the exposure – disease association becomes
clearer
2014 Page 22
Second approach: “Classical” approach based on a priori criteria
18
“Bias of the estimated effect of an exposure on an outcome due to the presence of a common cause of the exposure and the outcome” – Porta 2008
● A factor is a confounder if 3 criteria are met:● a) a confounder must be causally or noncausally
associated with the exposure in the source population (study base) being studied;
● b) a confounder must be a causal risk factor (or a surrogate measure of a cause) for the disease in the unexposed cohort; and
● c) a confounder must not be an intermediate cause (in other words, a confounder must not be an intermediate step in the causal pathway between the exposure and the disease)
2014 Page 23
19
Exposure
EDisease (outcome)
D
Confounder
C
Confounding Schematic
Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc., 2000. Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th Edition.
2014 Page 24
Exposure
EConfounder
C
Intermediate cause
Disease
D
20
2014 Page 25
Exposure
E
ConfounderC
General idea: a confounder could be a ‘parent’ of the exposure, but should not be be a ‘daughter’ of the exposure
Disease
D
21
2014 Page 26
Example of schematic (from Gordis)
22
2014 Page 27
Birth Order
E
23
Down SyndromeD
Confounding factor: Maternal Age
C
Confounding Schematic
2014 Page 28
HRT use Heart disease
Association between HRT and heart disease
Confounding factor: SES
24
Are confounding criteria met?
2014 Page 29
BRCA1 gene Breast cancer
Confounding factor:Age
x
25
Are confounding criteria met?Should we adjust for age, when evaluating the association between a genetic factor and risk of breast cancer?
No!
2014 Page 30
Sex with multiple partners Cervical cancer
Confounding factor: HPV
Are confounding criteria met?
26
2014 Page 31
Sex with multiple partners
HPV Cervical cancer
27
What if this was the underlying causal mechanism?
2014 Page 32
Obesity Mortality
Are confounding criteria met?
Confounding factor: Hypertension
28
2014 Page 33
Obesity Hypertension Mortality
29
What if this was the underlying causal mechanism?
2014 Page 34
Direct vs indirect effects
Obesity Hypertension Mortality
ObesityIndirect effect
Hypertension Mortality
Direct effect
Direct effect is portion of the total effect that does not act via an intermediate cause 30
Indirect effect
2014 Page 35
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an appl3ic3ation to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.
Simple causal graphs
E DC
Maternal age (C) can confound the association between multivitamin use (E) and the risk of certain
birth defects (D)
2014 Page 36
34
Complex causal graphs
Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.
E DC
U
History of birth defects (C) may increase the chance of periconceptional vitamin intake (E). A genetic factor (U) could have been the cause of previous birth defects in the family, and could again cause birth defects in the current pregnancy
2014 Page 37
35
Smoking
A
ECalcium
DBone
fractures
CBMI
supplementation
U
Physical Activity
B
Source: Hertz-Picciotto
More complicated causal graphs!
2014 Page 38
The ultimate complex causal graph!
36A PowerPoint diagram meant to portray the complexity of American strategy in Afghanistan!
2014 Page 39
38
Third approach: Collapsibility and data- based approaches
● According to this definition, a factor is a confounding variable if● a) the effect measure is homogeneous across the strata
defined by the confounder and● b) the crude and common stratum-specific (adjusted) effect
measures are unequal (this is called “lack of collapsibility”)● Usually evaluated using 2x2 tables, and simple
stratified analyses to compare crude effects with adjusted effects
“Collapsibility is equality of stratum-specific measures of effect with the crude (collapsed), unstratified measure” Porta, 2008, Dictionary
2014 Page 40
39
Crude vs. Adjusted Effects● Crude: does not take into account the effect of the
confounding variable● Adjusted: accounts for the confounding variable(s)
(what we get by pooling stratum-specific effect estimates)● Generating using methods such as Mantel-Haenszel
estimator● Also generated using multivariate analyses (e.g. logistic
regression)● Confounding is likely when:●
●
RRcrude
=/= RRadjusted
ORcrude
=/= ORadjusted
2014 Page 41
42
Crude 2 x 2 tableCalculate Crude OR (or RR)
Stratify by Confounder
Calculate OR’s for each stratum
If stratum-specific OR’s are similar, calculate adjusted RR (e.g. MH)
Crude
Stratum 1 Stratum 2
If Crude OR =/= Adjusted OR, confounding is likely
If Crude OR = Adjusted OR, confounding isunlikely
ORCrude
OR1 OR2
Stratified Analysis
JC: introduce “test of homogeneity”
2014 Page 42
Examples: crude vs adjusted RR
Study Crude RR Stratum1 Stratum2 Adjusted ConfoundRR RR RR ing?
1 6.00 3.20 3.50 3.30
2 2.00 1.02 1.10 1.08
3 1.10 2.00 2.00 2.004 0.56 0.50 0.60 0.54
5 4.20 4.00 4.10 4.04
6 1.70 0.03 3.50
48
2014 Page 43
49
Maldonado & Greenland, Int J Epi 2002;31:422-29
Fourth approach: Causality: counterfactual model● Ideal “causal contrast” between exposed and
unexposed groups:● “A causal contrast compares disease frequency
under two exposure distributions, but in one target population during one etiologic time period”
● If the ideal causal contrast is met, the observed effect is the “causal effect”
2014 Page 44
52
What happens actually?
RRassoc
= Iexp
/ Isubstitute
RRcausal
= Iexp
/ Iunexp IDEAL
ACTUAL
2014 Page 45
50
Iexp
Iunexp
Maldonado & Greenland, Int J Epi 2002;31:422-29
Counterfactual, unexposed cohort
RRcausal
= Iexp
/ Iunexp
“A causal contrast compares disease frequency under two exposure distributions, but in one
Exposed cohort
Ideal counterfactual comparison to determine causal effects
target population during one etiologic time period”
“Initial conditions” are identical in the exposed and unexposed groups– because they are the same population!
2014 Page 46
51
Iexp
Iunexp
Counterfactual, unexposed cohort
Exposed cohort
Substitute, unexposed cohort
Isubstitute
What happens actually?
counterfactual state is not observed
A substitute will usually be a population other than the target population during the etiologic time period - INITIAL CONDITIONS MAY BE DIFFERENT
2014 Page 47
53Maldonado & Greenland, Int J Epi 2002;31:422-29
Counterfactual definition of confounding
● “Confounding is present if the substitute population imperfectly represents what the target would have been like under the counterfactual condition”● “An association measure is confounded (or biased
due to confounding) for a causal contrast if it does not equal that causal contrast because of such an imperfect substitution”
RRcausal=/=
RRassoc
2014 Page 48
Residual confounding• Confounding can persist, even after adjustment• Why?
– All confounders were not adjusted for (unmeasured confounding)– Some variables were actually not confounders!– Confounders were measured with error (misclassification of
confounders)– Categories of the confounding variable are improperly defined
(e.g. age categories were too broad)
51
2014 Page 49
55
Simulating the counter-factual comparison:Experimental Studies: RCT
Randomization helps to make the groups “comparable” (i.e. similar initial conditions) with respect to known and unknown confounders
Therefore confounding is unlikely at randomization - time t0
Eligible patients
Treatment
Randomization
Placebo
Outcomes
Outcomes
2014 Page 50
Confounding: Methods to control or reduce confounding
• Methods used in study design to reduce confounding– Randomization– Restriction– Matching
• Methods used in study analysis to reduce confounding– Stratified analysis– Multivariate analysis
31
2014 Page 51
Confounding:The use of randomization to
“ ”
reduce confounding
• Randomization– Useful only for intervention studies– Definition: random assignment of study subjects to
exposure categories– The special strength of randomization is its ability to
control/reduce the effect of confounding variables about which the investigator is unaware
– If there is maldistribution of potentially confounding variables after randomization (the reason for the classic “Table I: Baseline characteristics” in the randomized trial) then other confounding control options (see below) are
32applied 2014 Page 52
Substitute, unexposed cohort
54Maldonado & Greenland, Int J Epi 2002;31:422-29
Counterfactual, unexposed cohort
Exposed cohort
“Confounding is present if the substitute population imperfectly represents what the target would have been like under the counterfactual condition”
2014 Page 53
Confounding: The use of restriction to reduce confounding• Confounding cannot occur if the distribution of the
potential confounding factors do not vary across exposure or disease categories– Implication of this is that an investigator may restrict
study subjects to only those falling with specific level(s) of a confounding variable
• Extreme example: an investigator only selects subjects of exactly the same age.
• Advantages of restriction– straightforward, convenient, inexpensive
33
2014 Page 54
Confounding: The use of restriction to reduce confounding (cont.)
• Disadvantages– May limit number of eligible subjects– Residual confounding may persist if restriction
categories not sufficiently narrow (e.g. “decade of age” might be too broad)
– Not possible to evaluate the relationship of interest at different levels of the confounder
• Question: How does restriction differ from matching?
34
2014 Page 55
Confounding:The use of matching to reduce confounding
• Subjects with all levels of a potential confounder are admitted into the study BUT the control/reference subjects (either with respect to exposure in a cohort or disease in a case-reference study) are chosen to have the same distribution of the potential confounder
• The use of matching (may) also require special analysis techniques (matched analyses and conditional logistic regression)
35
2014 Page 56
• Disadvantages of matching– Finding appropriate control/reference subjects may be
difficult and expensive and limit sample size– Matching is most often used in case-reference (i.e.
case- control studies because in a large cohort study the cost of matching may be prohibitive)
• Thus, in cohort studies it’s often cheaper to just enroll available controls and use analytic methods (below) to control confounding)—this doesn’t apply to computerized “free” data
36
2014 Page 57
Confounding: The use of matching toreduce confounding (cont.)
• Disadvantages of matching (cont.)– Confounding factor used to match subjects cannot be
itself evaluated with respect to the outcome/disease– Obviously, matching does not control for confounding
by factors other than that used to match– The use of matching makes the use of stratified analysis
(for the control of other potential but non-matched factors) very difficult
• One way around this problem is the use of conditional logistic regression but there is a large reduction in “effective” sample size because only discordant pairs are used.
37
2014 Page 58
• Advantages of matching– Matching may be the only way to obtain sufficient
numbers of control/reference subjects with relevant levels of the confounding factor(s)
– Example: controlling for “neighborhood” (and all that it implies) by any approach other than matching is very difficult
38
2014 Page 59
• Advantages of matching (cont.)– Useful in very small studies in which chance
differences in confounding factors are likely to exist between the study groups and other forms of control for the confounders (such as stratification or multivariate adjustment) are not possible (because of the limited sample size)
– The full benefit of matching (in terms of the reduction of confounding) is obtained only if the proper form of matched analysis is used (to be reviewed later in the course)
39
2014 Page 60
• Basic goal of stratification is to evaluate the relationship between the predictor (“cause”) and outcome (“effect”) variable in strata homogenous with respect to potentially confounding variables
40
2014 Page 61
Confounding:The use of stratification to reduce confounding
• For example, to examine the relationship between smoking and lung cancer while controlling for the potentially confounding effect of gender:– Create a 2x2 table (smoking vs. lung cancer) for men
and women separately– To control for multiple confounders simultaneously,
stratify by pairs (or triplets or higher) of confounding factors. For example, to control for gender and race/ethnicity determine the OR for smoking vs. lung cancer in multiple strata: white women, black women, Hispanic women, white men, black men, Hispanic men,etc. 41
2014 Page 62
• (From the earlier example): Goal: create a summary or “adjusted” estimate for the relationship between matches and lung cancer while adjusting for the two levels of smoking (the potential confounder)
• This process is analgous to the standardization of rates earlier in the course—in those examples the purpose of adjustment was to remove the confounding effect of age on the relationship between populations (A vs. B etc.) and rates of disease or death.
• In the present example the goal is to remove the confounding effect of smoking on the relationship between
matches and lung cancer. 42
2014 Page 63
Confounding:Types of summary estimators to determine uniform effect over strata
• Mantel-Haenszel– We will use this estimator in the present course– Resistant to the effects of small strata or cells with a
value of “0”– Computationally a piece of cake
• Directly pooled estimators (e.g. Woolf)– Sensitive to small strata and cells with value “0”– Computationally messy but doable
• Maximum likelihood– The most “appropriate” estimator– Resistant to the effects of small strata or cells with a
value of “0”– Computationally challenging
43
2014 Page 64
Confounding: smoking, matches, and lungcancer
• ORpooled = 8.84 (7.2, 10.9)
• ORsmokers = 1.0 (0.6, 1.5)
• ORnonsmokers = 1.0 (0.5, 2.0)
Pooled Cancer No cancerMatches No Matches Smokers Matches
820180Cancer 810
340660No cancer 270
No Matches Non-smoker Matches
No Matches
90Cancer 10
90
30No cancer 70
630 44
2014 Page 65
An aside:Terminology
• Pooled = combined = collapsed = unadjusted• Adjusted = summary = weighted, etc.
– All of these reflect some adjustment process such as Mantel-Haenszel or Woolf or maximum likelihood estimation to weight the strata and develop confidence intervals about the estimate.
45
2014 Page 66
Confounding:Notation used in Mantel- Haenszel estimators of relative risk
• Notation for case-control or cohort studies with count data
Case-control: RR = OR = ad / bc
Cohort: RR = IeI0
46
= a / (a + b) c/ (c + d)
Cases Controls TotalExposed Nonexposed
a c b d a + b c + d
Total a + c b + d a + b + c + d = T
2014 Page 67
Confounding:Notation used in Mantel-Haenszel estimators of relative risk (cont.)
• Notation for cohort studies with person-time data
RR = IeI0
= a / PY1
47
c / PY0
Cases ControlsExposed Nonexposed
a c ------
PY1
PY0
Total a + c T
2014 Page 68
Confounding:Mantel-Haenszel estimators ofrelative risk for stratified data
Case-Control Study:
RRMH =∑(ad / T)i
∑(bc / T)i
Cohort Study with Count Denominators:
RRMH =∑{a(c + d) / T}i
∑{b(a + b) / T}ICohort Study with Person-years Denominators:
RRMH = ∑{a(PY0) / T}i
∑{b(PY1) / T}i 48
2014 Page 69
Confounding: smoking, matches, and lungcancer
• ORpooled = 8.84 (7.2, 10.9)
• ORsmokers = 1.0 (0.6, 1.5)
• ORnonsmokers = 1.0 (0.5, 2.0)
No Matches 90 630 51
Pooled Cancer No cancerMatches 820 340No Matches 180 660Smokers Cancer No cancerMatches 810 270No Matches 90 30Non-smoker Cancer No cancerMatches 10 70
2014 Page 70
Confounding:Mantel-Haenszel estimators of relative risk for stratified data (smoking, matches, lung cancer
RRMH = ∑(ad / T)i / ∑(bc / T)i
Numerator of MH estimator:
• For smokers: (ad/T)=(810*30)/1200=20.25;
• For nonsmokers: (ad/T)=(10*630)/800=7.88;
• Add these together: 20.25 + 7.88=28.13 (numerator)
Denominator of MH estimator:
• For smokers: (bc/T)=(270*90)/1200=20.25;
• For nonsmokers: (bc/T)=(90*70)/800=7.88;
• Add these together: 20.25 + 7.88=28.13•ORMH = 28.13 / 28.13 = 1.0 (as expected since both stratified OR’s were = 1.0)
•Be sure to try this on stratified data in which the two strata are not exactly equal to each other (but also not so different as to suggest that effect modification is present
52
2014 Page 71
Confounding:Interpretation of ORMH
• If ORMH (=1.0 in this example) “differs meaningfully” from ORunadjusted (=8.8 in this example) then confounding is present
• What does “differs meaningfully” mean– This is a matter of judgment based on biologic/clinical
sense rather than on a statistical test– Even if they “differ” only slightly, generally the ORMH
rather than the ORcombined is reported as the summary effect estimate
• But what is one disadvantage of reporting ORMH ?– Although there do exist statistical tests of confounding
they are not widely recommended (these tests evaluate53Ho: OR
MH = OR
unadjusted
2014 Page 72
67
JC: test of homogeneity
2014 Page 73
Hennekens, 1987, p305
54
2014 Page 74
55
2014 Page 75
56
2014 Page 76
Review what the X^2 means in this context.
58
2014 Page 77
59
2014 Page 78
• Confounding “pulls” the observed association away from the true association
– It can either exaggerate/over-estimate the true association (positive confounding)
• Example– RRcausal = 1.0– RRobserved = 3.0
or
– It can hide/under-estimate the true association (negative confounding)
• Example– RRcausal = 3.0
– RRobserved
= 1.0
Direction of Confounding Bias
40
2014 Page 79
Confounding:Summary of steps to evaluate confounding
Table 12-10. Steps for the control of confounding and the evaluation of effect modification through stratified analysis1. Stratify by levels of the potential confounding factor.2. Compute stratum-specific unconfounded relative risk estimates.3. Evaluate similarity of the stratum-specific estimates by either eyeballing or
performing test of statistical significance. (More on this step later)4. If the effect is thought to be uniform, calculate a pooled unconfounded summary
estimate using RRMH. If effect is not uniform (i.e. effect modification is present, skip to step 6)
5. Perform hypothesis testing on the unconfounded estimate, using Mantel-Haenszel chi-square and compute confidence interval.
6. If effect is not thought to be uniform (i.e., if effect modification is present):a. Report stratum-specific estimates, results of hypothesis testing, and
confidence intervals for each estimate
b.If desired, calculate a summary unconfounded estimate using a standar6d6ized formula 2014 Page 80
67
JC: test of homogeneity
2014 Page 81
68
Effect modification (Interaction)
• Goals of stratification of data– Evaluate and reduce/remove confounding– Evaluate and describe effect modification
• Description of effect modification– A change in the magnitude of an effect measure
(between exposure and disease) according to the level of some third variable
– What two “classes” of effect measures have we used so far in the course?
2014 Page 82
Effect modification: example #1
• Disease incidence by exposure and age– Does the relationship between exposure and disease change
over the value of the potential confounder (age)? How?
69
2014 Page 83
Effect modification: example #2• Disease incidence by exposure and age
• Does the relationship between exposure and disease change over the value of the potential confounder (age)? How?
Rothman ’86 (p 178) 70
2014 Page 84
Effect modification: contrast with confounding
• Confounding– A bias that an investigator hopes to remove– A nuisance that may or may not be present in a given
study design• Properties of a confounding variable: (Rothman, p123):
– a) be a risk factor for disease among the non-exposed;– b) be associated with the exposure variable; and– c) not be an intermediate step in the “causal pathway”
71
2014 Page 85
Effect modification: contrast with confounding
• Effect modification– A more detailed description of the “true” relationship
between the exposure and the outcome– Effect modification is a finding to be reported (even
celebrated), not a bias to be eliminated– Effect modification is a “natural phenomenon” that
exists independently of the study design– The presence and interpretation of effect modification
depends upon the choice of effect measure (ratio vs. difference)
72
2014 Page 86
73
Some lingo
• Covariate– Confounder, potential confounder– Effect modification, interaction– Intermediate variable
2014 Page 87
Effect modification: contrast with confounding
• Note that for any association under study, a given factor may be:– Both a confounder and an effect modifier or– A confounder but not an effect modifier or An effect
modifier but not a confounder or
– neither
74
2014 Page 88
Examples of confounding/effect modification
76
Level 1 Level 2 Crude/ collapsed/ Combined “unadjusted”
Uniform estimate (ORMH) /“adjusted”
Confounding present
Interaction present
4.0 4.0 4.0 4.0 NO NO4.0 0.25 1.0 1.0 NO YES1.0 1.0 8.4 1.0 YES NO4.0 0.25 1.0 2.0 YES
(?relevance)YES
2014 Page 89
77
2014 Page 90
Effect modification: test of homogeneity
• Null hypothesis: The individual stratified estimates of the effect do not differ from some uniform estimate of effect (such as a Mantel Haenszel estimator)
• Notation:
–– N is the number of strata (N=2 in our smoking/matches example);– ln^Ri is the natural logarithm of the estimated (hence the “^”) effect
measure for each stratum (ORi in our example);– ln^R is the natural logarithm of the uniform effect estimate (e.g. ORMH in
X2(N-1)
is chi-square with (N-1) degrees of freedom;
our example—the computer will use the maximum likelihood estimate)• One formula to test homogeneity:
X2
(N-1) = ∑ [ln(^ Ri) – ln(RMH)]2
Var[ln(^ Ri)]
N
i= 1
78
JC: Comment on choice of signifciance level for test of homogeneity 2014 Page 91
Paradox
• If effect modification is present, a uniform estimator of effect (such as ORMH) cannot (or at least should not) be reported.
• However, in order to determine if effect modification is present, it is necessary to calculate the value of a uniform estimator of effect (such as ORMH) because it is needed in the calculation of the test of homogeneity.
79
2014 Page 92
Effect modification: test of homogeneity (or is heterogeneity?)
• Comments– If the test of homogeneity is “significant” (=“reject homogeneity”)
this is evidence that there is heterogeneity (i.e. no homogeneity) and that effect modification may be present.
• (Null hypothesis: The individual stratified estimates of the effect do not differ from some uniform estimate of effect)
– The choice of a significance level (e.g. p < 0.05) is somewhat open to interpretation.
• One “conservative” approach, because of inherent limitations in the power of the test of homogeneity, is to treat the data as if interaction is present for p < 0.20).
• In other words, one would rather err on the side of assuming that interaction is present (and reporting the stratified estimates of effect) than on reporting a uniform estimate that may not be true across strata. 80
2014 Page 93
UC Berkeley
34
2014 Page 94
81
2014 Page 95
Additive versus multiplicative scale effect modification
● Notation: RXZ● No additive interaction if (R11 – R01) = (R10 – R00)
○ Rewrite as: (R11-R01)-(R10-R00)=0● In words: Difference in risk for (X=1 vs. X=0) when Z=1 is
equal to difference in risk for (X=1 vs. X=0) when Z=0● Note: the values R11, R10, etc. are risks (not counts)
2014 Page 96
Additive versus multiplicative scale effect modification
● Notation: RXZ● No multiplicative interaction if (R11/R01)=(R10/R00)
Rewrite as: (R11/R01)/(R10/R00)=1● In words: Ratio of risks/rates when X=1 vs. X=0 when
Z=1 is equal to ratio of risks/rates when X=1 vs. X=0 when Z=0
2014 Page 97
Effect modification is scale-dependent
• Evidence for effect modification/statistical interaction if the RR or the AR differs between two groups• However, effect modification/statistical interaction is scale-dependent
– If you do not have interaction on the additive scale (AR is homogenous) then you will have interaction on the multiplicative scale (RR must be heterogeneous)
– If you do not have interaction on the multiplicative scale (RR is homogenous) then you will have interaction on the additive scale (AR must be heterogeneous)
– Note: It is common to have evidence of interaction on both scales.
2014 Page 98
Example● No additive scale interaction if (R11-R01)-(R10-R00)=0● No relative scale interaction if (R11/R01)/(R10/R00)=1
● Additive scale: (60-20) - (50-10) = 0○ Interaction not present on the additive scale
● Relative scale: (60/20) / (50/10)=0.6○ Interaction present on the relative scale
Z=1 Z=0
X=1 60 50
X=0 20 10
2014 Page 99
Example● No additive scale interaction if (R11-R01)-(R10-R00)=0● No relative scale interaction if (R11/R01)/(R10/R00)=1
● Additive scale: (60-20) - (30-10) = 20○ Interaction present on the additive scale
● Relative scale: (60/20) / (30/10)=1○ Interaction not present on the relative scale
Z=1 Z=0
X=1 60 30
X=0 20 10
2014 Page 100
Logistic Regression(time permitting)
2014 Page 101
Confounding: smoking, matches, and lung cancer
’
• ORpooled = 21.0 (16.3, 27.1)
• ORmatches = 21.0 (10.5, 46.2)
• ORno matches = 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s
Pooled Cancer No cancerSmoking No Smoking Matches Smoking
900100Cancer 810
300700No cancer 270
No Smoking No matches Smoking No Smoking
10Cancer 90
90
70No cancer 30
630 84
2014 Page 102
A brief introduction to logistic regressionLet X1 = smoking (1=yes; 0=no) Let X2 = matches (1=yes; 0=no) Let Cancer = cancer (1=yes; 0=no)
Recall earlier tables:
OR=21.0
OR=21.0 OR=21.0
Conclusions: No confounding by matches of the relationship between smoking and lung cancer; no effect modification by matches of the relationship between smoking and lung cancer 85
Collapsed Cancer =1 Cancer=0X1=1 900 300X1=0 100 700
X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630
2014 Page 103
Data structure for computer analysis
• Most computer programs would want to see the data for the individual subjects in the study in the following form:
H 0 0 086
Subject ID X1 X2 Cancer How many?A 1 1 1B 1 1 0C 0 1 1D 0 1 0E 1 0 1F 1 0 0G 0 0 1
2014 Page 104
Data structure for computer analysis
• Most computer programs would want to see the data for the individual subjects in the study in the following form:
87
Subject ID X1 X2 Cancer How many?A 1 1 1 810 of theseB 1 1 0 270 of theseC 0 1 1 10 of theseD 0 1 0 70 of theseE 1 0 1 90 of theseF 1 0 0 30 of theseG 0 0 1 90 of theseH 0 0 0 630 of these
2014 Page 105
88
The basic logistic equation for this problem
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2
• ln (odds of disease) = a + b1(smoking) + b2(matches) + b3(smoking)(matches)
2014 Page 106
Solving a logistic equation
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 0, solve for “a”• ln (odds) = a = ln ( ) =• a =• So now: ln (odds) =
89
2014 Page 107
OR=21.0 OR=21.0
90
X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30
X1=0 10 70 X1=0 90 630
2014 Page 108
Solving a logistic equation
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 0, solve for “a”• ln (odds) = a = ln (90/630) = -1.946• a = -1.946• So now: ln (odds) = -1.946 + b1X1 + b2X2 + b3X1X2
91
2014 Page 109
92
Solving a logistic equation (cont.)
• When X1 = 1 and X2 = 0, solve for b1
• ln (odds) =• b1 =• So now: ln (odds) =
2014 Page 110
93
OR=21.0 OR=21.0
X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0
X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630
2014 Page 111
94
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 1 and X2 = 0, solve for b1• ln (odds) = ln (90/30) = 1.099 = -1.946 + b1• b1 = 3.045• So now: ln (odds) = -1.946 + 3.045X1 + b2X2 + b3X1X2
2014 Page 112
95
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 1, solve for b2:• ln (odds) = ln ( ) =• b2=• So now: ln (odds) =
2014 Page 113
96
X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630OR=21.0 OR=21.0
2014 Page 114
97
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 1, solve for b2:• ln (odds) = ln (10/70) = -1.946 + 0 + b2X2 + 0• b2= 0• So now: ln (odds) = -1.946 + 3.045X1 + 0 + b3X1X2
2014 Page 115
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 1 and X2 = 1 then:• ln (odds) =• ln (odds) =• Solve for b3• ln (odds) =• b3 =• So now: ln (odds) =
98
2014 Page 116
99
X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630OR=21.0 OR=21.0
2014 Page 117
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 1 and X2 = 1 then:• ln (odds) = -1.946 + b1 + b2 + b3• ln (odds) = -1.946 + 3.045 + 0 + b3• Solve for b3• ln (odds) = ln (810/270) = 1.099 = -1.946 + 3.045 + b3• b3 = 0• So now: ln (odds) = -1.946 + 3.045X1 + 0 + 0
100
2014 Page 118
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• This simplifies (earlier calculations) to:
– ln (odds) = -1.946 + 3.045X1 + 0 + 0• One can now use the logistic equation to efficiently describe
relationships in the table• Calculate the ln(odds) for a smoker who uses matches: ln
(odds)=• Calculate the ln(odds) for a smoker who doesn’t use matches:
ln(odds) =
• Now calculate the odds ratio for (smokers vs. non-smokers// matches+)
• At home, calculate the odds ratio for (smokers vs. non- smokers// matches-)
101
2014 Page 119
Solving a logistic equation (cont.)
• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• This simplifies (earlier calculations) to:
-1.946 + 3.045(X1) + 0(X2) + 0(X1X2 )• One can now use the logistic equation to efficiently describe
relationships in the table• Calculate the ln(odds) for a smoker who uses matches (X1 = 1
and X2 = 1):ln (odds)= -1.946 + 3.045 = 1.099
• Calculate the ln(odds) for a smoker who doesn’t use matches (X1= 1 and X2 = 0):ln(odds) = -1.946 + 3.045 = 1.099
• Now calculate the odds ratio for (smokers vs. non-smokers// matches)
• At home, calculate the odds ratio for (smokers vs. non-smokers//102no matches)
2014 Page 120
Logistic RegressionUsing the logistic model model developed in class for the matches-
smoking-lung cancer data (stratified by matches), evaluate the risk of lung cancer for:
1. (in-class) A smoker who uses matches vs. a non-smoker who uses matches.
2. (at home) A smoker who uses matches vs. a non-smoker who does not use matches
SEPARATE ASSIGNMENTDevelop a logistic model for the matches-smoking-lung cancer data (stratified by smoking status). Use this model to evaluate the risk of lung cancer for:
1. (at home) A user of matches who smokes vs. a non-user of matches who smokes.
2. (at home) A smoker who uses matches vs. a non-smoker who uses matches. Is this result consistent with that you arrived at in the in-
103class example above? 2014 Page 121
Find OR for smokers (who use matches) vs. non-smokers (who use matches)
For Smokers who use matches X1 = 1
X2 = 1For non-smokers who use matches X1 = 0
X2 = 1From prior slides we determined that: ln (odds) = -1.946
+ 3.045 (X1)
105
2014 Page 122
For smokers who use matches (X1 = 1; X2 = 1) ln (odds) = -1.946 + 3.045 (1) = 1.0990
For non-smokers who use matches (X1 = 0; X2 = 1) ln (odds) = -1.946 + 0 + 0 + 0 = -1.946
We want to solve:ln OR = 1.0990 – (-1.946) = 3.045 eln OR = OR = e3.045 = 21.0
Therefore, the odds ratio (determined using logistic regression) comparing smokers using matches to non-smokers using matches is 21.0. This agrees with the stratified data presented earlier.
106
2014 Page 123
Confounding: smoking, matches, and lung cancer
• ORpooled = 21.0 (16.3, 27.1)
• ORmatches = 21.0 (10.5, 46.2)
• ORno matches = 21.0 (12.9, 34.7)
• Discuss your intuitions about the 95% CI s’
No Smoking 90 630 107
Pooled Cancer No cancerSmoking 900 300No Smoking 100 700Matches Cancer No cancerSmoking 810 270No Smoking 10 70No matches Cancer No cancerSmoking 90 30
2014 Page 124
Some concluding comments on logistic regression
• Interpretations of the final logistic equation for these data:
ln (odds of disease) = a + b1(smoking) + b2(matches) + b3(smoking)(matches)
ln(odds) = -1.946 + 3.045(smoking) + 0(matches) + 0(matches)(smoking)
• This equation describes the data whether stratified either by matches or by smoking.
• The relationship of multiple variables may be simultaneously adjusted for by the the logistic equations
• The estimates of the coefficients for the equation are derived through maximum likelihood techniques
• This technique is very widely used in epidemiologic (and other)
applications when the outcome variable of interest is dichotomous. 108
2014 Page 125
Some concluding comments on logisticregression
• Comments– Having multiple strata (how this technique makes
possible)– Test of homogeneity (b3)
Maximum likelihood estimation for coefficient estimation
• Modifications of logistic regression exist for coping with– Outcome variables with multiple levels = polytomous
logistic regression– Studies in which matching was used = Conditional
logistic regression 109
2014 Page 126
. use http://www.stata-press.com/data/r8/lbw
storage display valuevariable name type format variable label--------------------------------------------------------------
-----------------
110
id low
int byte
%8.0g%8.0g
identification code birth weight<2500g
age lwt race smoke ptl ht ui ftv
byte int byte byte byte byte byte byte
%8.0g%8.0g%8.0g%8.0g%8.0g%8.0g%8.0g%8.0g
age of weight race smoked
motherat last menstrual period
during pregnancypremature labor history (count) has history of hypertension presence, uterine irritability number of visits to physician
during 1st trimester birth weight (grams)bwt int %8.0g
2014 Page 127
Special (and very useful) STATA command“xi” (=“interaction expansion”)
• xi: logistic low age lowwt i.race smoke pt1 ht ui
• In this example, a variable named “race” has three levels (e.g. white/hispanic/black) that might be coded as “0=white”; “1=hispanic”; “2=black”
• The combined use of xi and i.race directs STATA to analyze all levels of race (and compare them to level 1)— this can be a HUGE time-saver (avoids the user having to manually recode such variables)!
111
2014 Page 128
Assignments
• Write the logistic model describing these data (next slide).• What is the risk of low birth weight (LBW) for a smoker,
adjusted for all other variables?• How can the 95% CI be determined?• What is the risk of LBW for an Hispanic baby (compared
to a white baby)?• What is the risk of LBW for a black baby (compared to an
Hispanic baby)?
112
2014 Page 129
113
2014 Page 130
114Discuss intercept
2014 Page 131
115
2014 Page 132