4 threats to validity from confounding bias and effect modification

Threats to Validity from Confounding and Effect Modification

• Overview: Random vs. systematic error• Confounding• Effect Modification• Logistic regression (time permitting)• Special thanks for some of the materials in these

lecture:– Professor Jen Ahern (UCB)– Professor Madhu Pai (McGilll—a former

250b GSI)1

2014

1

The cardinal rule of epidemiology

• Remember that all results based on epidemiology studies are likely to be …

2014

The cardinal rule of epidemiology (continued)

• WRONG…– unless proper care has been taken to eliminate

all sources of error in the estimate (…and sometimes even then the results will be wrong because of unknown sources of error)

2

2014

Example: Confounding• A colleague with outside funding believes that cigarette smoke

is not a “cause” (in any sense) of lung cancer but that exposure to matches (yes, matches) is the cause. This colleague has conducted a large case control study to test the null hypothesis:

Ho: “Matches are not associated with lung cancer”.

• What’s the rationale (in the Popperian sense) for stating the null hypothesis rather than the alternative:

HA: “Matches are associated with lung cancer”.

• What does the colleague hope to do (in terms of hypothesis testing)

• What do you think of the term “associated” –would it be better to write “a cause of”?

2014

• “We can never finally prove our scientific theories, we can merely (provisionally) confirm or (conclusively) refute them.”– - Karl PopperSir Karl Raimund Popper CH FBA FRS[4] (28 July 1902 – 17 September 1994) was an Austrian-British[5]

philosopher and professor at the London School of Economics.[6] He is generally regarded o regarded as one of the greatest philosophers of science of the 20th century.[7][8] Popper is known for his rejection of the classical inductivist views on the scientific method, in favour of empirical falsification: regarded as one of the greatest philosophers of science of the 20th century.[7][8] (wikipedia.com)

2014

http://en.wikipedia.org/wiki/Companion_of_Honour

http://en.wikipedia.org/wiki/Fellow_of_the_British_Academy

http://en.wikipedia.org/wiki/Fellow_of_the_Royal_Society

http://en.wikipedia.org/wiki/Karl_Popper#cite_note-frs-4

http://en.wikipedia.org/wiki/Karl_Popper#cite_note-5

http://en.wikipedia.org/wiki/London_School_of_Economics


http://en.wikipedia.org/wiki/Philosophy_of_science



http://en.wikipedia.org/wiki/Inductivism

http://en.wikipedia.org/wiki/Scientific_method

http://en.wikipedia.org/wiki/Falsifiability

http://en.wikipedia.org/wiki/Philosophy_of_science



Confounding: smoking, matches,

10

and lung cancer• Your colleague has located 1000 cases of lung cancer, of

whom 820 carry matches.• Among 1000 reference patients (selected randomly from a

population with recently taken normal chest x-rays), 340 carry matches.

• Strengths of the reference selection process? Weaknesses?• Describe the relationship between matches and lung cancer

in your colleague’s data.• Would you like to analyze the data in any other fashion?

2014


and lung cancer

• Odds ratio = (820 * 660) / (180 * 340)

• OR = 8.8

• 95% CI (7.2, 10.9)

Cancer No cancer

Matches 820 340

No matches 180 660

2014


and lung cancer• You decide to look at the relationship between matches

and lung cancer in the smokers separately from the non- smokers.

• You find that among the 1000 cases, 900 are smokers and 810 (of the 900) carry matches

• Among the 1000 reference patients, 300 are smokers and 270 (of the 300) carry matches

• Calculate the relevant measure(s) of effect.• What should your colleague do about future funding?

2014

Confounding: smoking, matches, and lungcancer

• ORpooled = 8.84 (7.2, 10.9)

• ORsmokers = 1.0 (0.6, 1.5)

• ORnonsmokers = 1.0 (0.5, 2.0)

Pooled Cancer No cancerMatches No Matches Smokers Matches

820180Cancer 810

340660No cancer 270

No Matches Non-smoker Matches

No Matches

90Cancer 10

90

30No cancer 70

630 13

2014

Confounding: smoking, matches,and lung cancer

• To be complete, you also decide to examine the relationship between smoking and lung cancer.

• What tables should you construct to do this?

14

2014

Confounding: smoking, matches, and lung cancer

’

• ORpooled = 21.0 (16.3, 27.1)

• ORmatches = 21.0 (10.5, 46.2)

• ORno matches = 21.0 (12.9, 34.7)

• Discuss your intuitions about the 95% CI s

Pooled Cancer No cancerSmoking No Smoking Matches Smoking

900100Cancer 810

300700No cancer 270

No Smoking No matches Smoking No Smoking

10Cancer 90

90

70No cancer 30

630 16

2014

Confounder?

? ?

? Unadjusted RRExposure Disease

? Adjusted RR

19

2014

2

BMJ 2004;329:868-869 (16 October)

Why is confounding so important in epidemiology?

● BMJ Editorial: “The scandal of poor epidemiological research” [16 October 2004]● “Confounding, the situation in which an apparent

effect of an exposure on risk is explained by its association with other factors, is probably the most important cause of spurious associations in observational epidemiology.”

2014

Overview

3

● Causality is the central concern of epidemiology● Confounding is the central concern with establishing

causality● Confounding can be understood using multiple

different approaches● A strong understanding of various approaches to

confounding and its control is essential for all those who engage in health research

2014

10Adapted from: Maclure, M, Schneeweis S. Epidemiology 2001;12:114-122.

Causal Effect

Random Error

Confounding

Information bias (misclassification)

Selection bias

Bias in inference

Reporting & publication bias

Bias in knowledge use

Confounding is one of the key biases in identifying causal effects

RRcausal

“truth”RR

association

2014

11

Confounding:4 ways to understand it!

1. “Mixing of effects”2. “Classical” approach based on a priori

criteria3. Collapsibility and data-based criteria4. “Counterfactual” and non-comparability

approaches

2014

12

Rothman KJ. Epidemiology. An introduction. Oxford: Oxford University Press, 2002

First approach:Confounding: mixing of effects

● “Confounding is confusion, or mixing, of effects; the effect of the exposure is mixed together with the effect of another variable, leading to bias” - Rothman, 2002

Latin: “confundere” is to mix together

2014

ExampleAssociation between birth order and Down syndrome

13Data from Stark and Mantel (1966) Source: Rothman 2002

2014

Association between maternal age and Down syndrome


2014

Association between maternal age and Down syndrome, stratified by birth order


2014

Mixing of Effects: the water pipes analogy

Exposure

16Adapted from Jewell NP. Statistics for Epidemiology. Chapman & Hall, 2003

Outcome

Confounder

Mixing of effects – cannot separate the effect of exposure from that of confounder

Exposure and disease share a common cause (‘parent’)

2014

Mixing of Effects: “control” of the confounder

Exposure

17Adapted from: Jewell NP. Statistics for Epidemiology. Chapman & Hall, 2003

Outcome

Confounder

Successful “control” of confounding (adjustment)

If the common cause (‘parent’) is blocked, then the exposure – disease association becomes

clearer

2014

Second approach: “Classical” approach based on a priori criteria

18

“Bias of the estimated effect of an exposure on an outcome due to the presence of a common cause of the exposure and the outcome” – Porta 2008

● A factor is a confounder if 3 criteria are met:● a) a confounder must be causally or noncausally

associated with the exposure in the source population (study base) being studied;

● b) a confounder must be a causal risk factor (or a surrogate measure of a cause) for the disease in the unexposed cohort; and

● c) a confounder must not be an intermediate cause (in other words, a confounder must not be an intermediate step in the causal pathway between the exposure and the disease)

2014

19

Exposure

EDisease (outcome)

D

Confounder

C

Confounding Schematic

Szklo M, Nieto JF. Epidemiology: Beyond the basics. Aspen Publishers, Inc., 2000. Gordis L. Epidemiology. Philadelphia: WB Saunders, 4th Edition.

2014

Exposure

EConfounder

C

Intermediate cause

Disease

D

20

2014

Exposure

E

ConfounderC

General idea: a confounder could be a ‘parent’ of the exposure, but should not be be a ‘daughter’ of the exposure

Disease

D

21

2014

Example of schematic (from Gordis)

22

2014

Birth Order

E

23

Down SyndromeD

Confounding factor: Maternal Age

C

Confounding Schematic

2014

HRT use Heart disease

Association between HRT and heart disease

Confounding factor: SES

24

Are confounding criteria met?

2014

BRCA1 gene Breast cancer

Confounding factor:Age

x

25

Are confounding criteria met?Should we adjust for age, when evaluating the association between a genetic factor and risk of breast cancer?

No!

2014

Sex with multiple partners Cervical cancer

Confounding factor: HPV


26

2014

Sex with multiple partners

HPV Cervical cancer

27

What if this was the underlying causal mechanism?

2014

Obesity Mortality


Confounding factor: Hypertension

28

2014

Obesity Hypertension Mortality

29

What if this was the underlying causal mechanism?

2014

Direct vs indirect effects

Obesity Hypertension Mortality

ObesityIndirect effect

Hypertension Mortality

Direct effect

Direct effect is portion of the total effect that does not act via an intermediate cause 30

Indirect effect

2014

Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an appl3ic3ation to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.

Simple causal graphs

E DC

Maternal age (C) can confound the association between multivitamin use (E) and the risk of certain

birth defects (D)

2014

34

Complex causal graphs

Hernan MA, et al. Causal knowledge as a prerequisite for confounding evaluation: an application to birth defects epidemiology. Am J Epidemiol 2002;155(2):176-84.

E DC

U

History of birth defects (C) may increase the chance of periconceptional vitamin intake (E). A genetic factor (U) could have been the cause of previous birth defects in the family, and could again cause birth defects in the current pregnancy

2014

35

Smoking

A

ECalcium

DBone

fractures

CBMI

supplementation

U

Physical Activity

B

Source: Hertz-Picciotto

More complicated causal graphs!

2014

The ultimate complex causal graph!

36A PowerPoint diagram meant to portray the complexity of American strategy in Afghanistan!

2014

38

Third approach: Collapsibility and data- based approaches

● According to this definition, a factor is a confounding variable if● a) the effect measure is homogeneous across the strata

defined by the confounder and● b) the crude and common stratum-specific (adjusted) effect

measures are unequal (this is called “lack of collapsibility”)● Usually evaluated using 2x2 tables, and simple

stratified analyses to compare crude effects with adjusted effects

“Collapsibility is equality of stratum-specific measures of effect with the crude (collapsed), unstratified measure” Porta, 2008, Dictionary

2014

39

Crude vs. Adjusted Effects● Crude: does not take into account the effect of the

confounding variable● Adjusted: accounts for the confounding variable(s)

(what we get by pooling stratum-specific effect estimates)● Generating using methods such as Mantel-Haenszel

estimator● Also generated using multivariate analyses (e.g. logistic

regression)● Confounding is likely when:●

●

RRcrude

=/= RRadjusted

ORcrude

=/= ORadjusted

2014

42

Crude 2 x 2 tableCalculate Crude OR (or RR)

Stratify by Confounder

Calculate OR’s for each stratum

If stratum-specific OR’s are similar, calculate adjusted RR (e.g. MH)

Crude

Stratum 1 Stratum 2

If Crude OR =/= Adjusted OR, confounding is likely

If Crude OR = Adjusted OR, confounding isunlikely

ORCrude

OR1 OR2

Stratified Analysis

JC: introduce “test of homogeneity”

2014

Examples: crude vs adjusted RR

Study Crude RR Stratum1 Stratum2 Adjusted ConfoundRR RR RR ing?

1 6.00 3.20 3.50 3.30

2 2.00 1.02 1.10 1.08

3 1.10 2.00 2.00 2.004 0.56 0.50 0.60 0.54

5 4.20 4.00 4.10 4.04

6 1.70 0.03 3.50

48

2014

49

Maldonado & Greenland, Int J Epi 2002;31:422-29

Fourth approach: Causality: counterfactual model● Ideal “causal contrast” between exposed and

unexposed groups:● “A causal contrast compares disease frequency

under two exposure distributions, but in one target population during one etiologic time period”

● If the ideal causal contrast is met, the observed effect is the “causal effect”

2014

52

What happens actually?

RRassoc

= Iexp

/ Isubstitute

RRcausal

= Iexp

/ Iunexp IDEAL

ACTUAL

2014

50

Iexp

Iunexp

Maldonado & Greenland, Int J Epi 2002;31:422-29

Counterfactual, unexposed cohort

RRcausal

= Iexp

/ Iunexp

“A causal contrast compares disease frequency under two exposure distributions, but in one

Exposed cohort

Ideal counterfactual comparison to determine causal effects

target population during one etiologic time period”

“Initial conditions” are identical in the exposed and unexposed groups– because they are the same population!

2014

51

Iexp

Iunexp


Exposed cohort

Substitute, unexposed cohort

Isubstitute

What happens actually?

counterfactual state is not observed

A substitute will usually be a population other than the target population during the etiologic time period - INITIAL CONDITIONS MAY BE DIFFERENT

2014

53Maldonado & Greenland, Int J Epi 2002;31:422-29

Counterfactual definition of confounding

● “Confounding is present if the substitute population imperfectly represents what the target would have been like under the counterfactual condition”● “An association measure is confounded (or biased

due to confounding) for a causal contrast if it does not equal that causal contrast because of such an imperfect substitution”

RRcausal=/=

RRassoc

2014

Residual confounding• Confounding can persist, even after adjustment• Why?

– All confounders were not adjusted for (unmeasured confounding)– Some variables were actually not confounders!– Confounders were measured with error (misclassification of

confounders)– Categories of the confounding variable are improperly defined

(e.g. age categories were too broad)

51

2014

55

Simulating the counter-factual comparison:Experimental Studies: RCT

Randomization helps to make the groups “comparable” (i.e. similar initial conditions) with respect to known and unknown confounders

Therefore confounding is unlikely at randomization - time t0

Eligible patients

Treatment

Randomization

Placebo

Outcomes

Outcomes

2014

Confounding: Methods to control or reduce confounding

• Methods used in study design to reduce confounding– Randomization– Restriction– Matching

• Methods used in study analysis to reduce confounding– Stratified analysis– Multivariate analysis

31

2014

Confounding:The use of randomization to

“ ”

reduce confounding

• Randomization– Useful only for intervention studies– Definition: random assignment of study subjects to

exposure categories– The special strength of randomization is its ability to

control/reduce the effect of confounding variables about which the investigator is unaware

– If there is maldistribution of potentially confounding variables after randomization (the reason for the classic “Table I: Baseline characteristics” in the randomized trial) then other confounding control options (see below) are

32applied 2014

Substitute, unexposed cohort

54Maldonado & Greenland, Int J Epi 2002;31:422-29


Exposed cohort

“Confounding is present if the substitute population imperfectly represents what the target would have been like under the counterfactual condition”

2014

Confounding: The use of restriction to reduce confounding• Confounding cannot occur if the distribution of the

potential confounding factors do not vary across exposure or disease categories– Implication of this is that an investigator may restrict

study subjects to only those falling with specific level(s) of a confounding variable

• Extreme example: an investigator only selects subjects of exactly the same age.

• Advantages of restriction– straightforward, convenient, inexpensive

33

2014

Confounding: The use of restriction to reduce confounding (cont.)

• Disadvantages– May limit number of eligible subjects– Residual confounding may persist if restriction

categories not sufficiently narrow (e.g. “decade of age” might be too broad)

– Not possible to evaluate the relationship of interest at different levels of the confounder

• Question: How does restriction differ from matching?

34

2014

Confounding:The use of matching to reduce confounding

• Subjects with all levels of a potential confounder are admitted into the study BUT the control/reference subjects (either with respect to exposure in a cohort or disease in a case-reference study) are chosen to have the same distribution of the potential confounder

• The use of matching (may) also require special analysis techniques (matched analyses and conditional logistic regression)

35

2014

• Disadvantages of matching– Finding appropriate control/reference subjects may be

difficult and expensive and limit sample size– Matching is most often used in case-reference (i.e.

case- control studies because in a large cohort study the cost of matching may be prohibitive)

• Thus, in cohort studies it’s often cheaper to just enroll available controls and use analytic methods (below) to control confounding)—this doesn’t apply to computerized “free” data

36

2014

Confounding: The use of matching toreduce confounding (cont.)

• Disadvantages of matching (cont.)– Confounding factor used to match subjects cannot be

itself evaluated with respect to the outcome/disease– Obviously, matching does not control for confounding

by factors other than that used to match– The use of matching makes the use of stratified analysis

(for the control of other potential but non-matched factors) very difficult

• One way around this problem is the use of conditional logistic regression but there is a large reduction in “effective” sample size because only discordant pairs are used.

37

2014

• Advantages of matching– Matching may be the only way to obtain sufficient

numbers of control/reference subjects with relevant levels of the confounding factor(s)

– Example: controlling for “neighborhood” (and all that it implies) by any approach other than matching is very difficult

38

2014

• Advantages of matching (cont.)– Useful in very small studies in which chance

differences in confounding factors are likely to exist between the study groups and other forms of control for the confounders (such as stratification or multivariate adjustment) are not possible (because of the limited sample size)

– The full benefit of matching (in terms of the reduction of confounding) is obtained only if the proper form of matched analysis is used (to be reviewed later in the course)

39

2014

• Basic goal of stratification is to evaluate the relationship between the predictor (“cause”) and outcome (“effect”) variable in strata homogenous with respect to potentially confounding variables

40

2014

Confounding:The use of stratification to reduce confounding

• For example, to examine the relationship between smoking and lung cancer while controlling for the potentially confounding effect of gender:– Create a 2x2 table (smoking vs. lung cancer) for men

and women separately– To control for multiple confounders simultaneously,

stratify by pairs (or triplets or higher) of confounding factors. For example, to control for gender and race/ethnicity determine the OR for smoking vs. lung cancer in multiple strata: white women, black women, Hispanic women, white men, black men, Hispanic men,etc. 41

2014

• (From the earlier example): Goal: create a summary or “adjusted” estimate for the relationship between matches and lung cancer while adjusting for the two levels of smoking (the potential confounder)

• This process is analgous to the standardization of rates earlier in the course—in those examples the purpose of adjustment was to remove the confounding effect of age on the relationship between populations (A vs. B etc.) and rates of disease or death.

• In the present example the goal is to remove the confounding effect of smoking on the relationship between

matches and lung cancer. 42

2014

Confounding:Types of summary estimators to determine uniform effect over strata

• Mantel-Haenszel– We will use this estimator in the present course– Resistant to the effects of small strata or cells with a

value of “0”– Computationally a piece of cake

• Directly pooled estimators (e.g. Woolf)– Sensitive to small strata and cells with value “0”– Computationally messy but doable

• Maximum likelihood– The most “appropriate” estimator– Resistant to the effects of small strata or cells with a

value of “0”– Computationally challenging

43

2014


• ORpooled = 8.84 (7.2, 10.9)

• ORsmokers = 1.0 (0.6, 1.5)

• ORnonsmokers = 1.0 (0.5, 2.0)

Pooled Cancer No cancerMatches No Matches Smokers Matches

820180Cancer 810

340660No cancer 270

No Matches Non-smoker Matches

No Matches

90Cancer 10

90

30No cancer 70

630 44

2014

An aside:Terminology

• Pooled = combined = collapsed = unadjusted• Adjusted = summary = weighted, etc.

– All of these reflect some adjustment process such as Mantel-Haenszel or Woolf or maximum likelihood estimation to weight the strata and develop confidence intervals about the estimate.

45

2014

Confounding:Notation used in Mantel- Haenszel estimators of relative risk

• Notation for case-control or cohort studies with count data

Case-control: RR = OR = ad / bc

Cohort: RR = IeI0

46

= a / (a + b) c/ (c + d)

Cases Controls TotalExposed Nonexposed

a c b d a + b c + d

Total a + c b + d a + b + c + d = T

2014

Confounding:Notation used in Mantel-Haenszel estimators of relative risk (cont.)

• Notation for cohort studies with person-time data

RR = IeI0

= a / PY1

47

c / PY0

Cases ControlsExposed Nonexposed

a c ------

PY1

PY0

Total a + c T

2014

Confounding:Mantel-Haenszel estimators ofrelative risk for stratified data

Case-Control Study:

RRMH =∑(ad / T)i

∑(bc / T)i

Cohort Study with Count Denominators:

RRMH =∑{a(c + d) / T}i

∑{b(a + b) / T}ICohort Study with Person-years Denominators:

RRMH = ∑{a(PY0) / T}i

∑{b(PY1) / T}i 48

2014


• ORpooled = 8.84 (7.2, 10.9)

• ORsmokers = 1.0 (0.6, 1.5)

• ORnonsmokers = 1.0 (0.5, 2.0)

No Matches 90 630 51

Pooled Cancer No cancerMatches 820 340No Matches 180 660Smokers Cancer No cancerMatches 810 270No Matches 90 30Non-smoker Cancer No cancerMatches 10 70

2014

Confounding:Mantel-Haenszel estimators of relative risk for stratified data (smoking, matches, lung cancer

RRMH = ∑(ad / T)i / ∑(bc / T)i

Numerator of MH estimator:

• For smokers: (ad/T)=(810*30)/1200=20.25;

• For nonsmokers: (ad/T)=(10*630)/800=7.88;

• Add these together: 20.25 + 7.88=28.13 (numerator)

Denominator of MH estimator:

• For smokers: (bc/T)=(270*90)/1200=20.25;

• For nonsmokers: (bc/T)=(90*70)/800=7.88;

• Add these together: 20.25 + 7.88=28.13•ORMH = 28.13 / 28.13 = 1.0 (as expected since both stratified OR’s were = 1.0)

•Be sure to try this on stratified data in which the two strata are not exactly equal to each other (but also not so different as to suggest that effect modification is present

52

2014

Confounding:Interpretation of ORMH

• If ORMH (=1.0 in this example) “differs meaningfully” from ORunadjusted (=8.8 in this example) then confounding is present

• What does “differs meaningfully” mean– This is a matter of judgment based on biologic/clinical

sense rather than on a statistical test– Even if they “differ” only slightly, generally the ORMH

rather than the ORcombined is reported as the summary effect estimate

• But what is one disadvantage of reporting ORMH ?– Although there do exist statistical tests of confounding

they are not widely recommended (these tests evaluate53Ho: OR

MH = OR

unadjusted

2014

67

JC: test of homogeneity

2014

Hennekens, 1987, p305

54

2014

55

2014

56

2014

Review what the X^2 means in this context.

58

2014

59

2014

• Confounding “pulls” the observed association away from the true association

– It can either exaggerate/over-estimate the true association (positive confounding)

• Example– RRcausal = 1.0– RRobserved = 3.0

or

– It can hide/under-estimate the true association (negative confounding)

• Example– RRcausal = 3.0

– RRobserved

= 1.0

Direction of Confounding Bias

40

2014

Confounding:Summary of steps to evaluate confounding

Table 12-10. Steps for the control of confounding and the evaluation of effect modification through stratified analysis1. Stratify by levels of the potential confounding factor.2. Compute stratum-specific unconfounded relative risk estimates.3. Evaluate similarity of the stratum-specific estimates by either eyeballing or

performing test of statistical significance. (More on this step later)4. If the effect is thought to be uniform, calculate a pooled unconfounded summary

estimate using RRMH. If effect is not uniform (i.e. effect modification is present, skip to step 6)

5. Perform hypothesis testing on the unconfounded estimate, using Mantel-Haenszel chi-square and compute confidence interval.

6. If effect is not thought to be uniform (i.e., if effect modification is present):a. Report stratum-specific estimates, results of hypothesis testing, and

confidence intervals for each estimate

b.If desired, calculate a summary unconfounded estimate using a standar6d6ized formula 2014

67

JC: test of homogeneity

2014

68

Effect modification (Interaction)

• Goals of stratification of data– Evaluate and reduce/remove confounding– Evaluate and describe effect modification

• Description of effect modification– A change in the magnitude of an effect measure

(between exposure and disease) according to the level of some third variable

– What two “classes” of effect measures have we used so far in the course?

2014

Effect modification: example #1

• Disease incidence by exposure and age– Does the relationship between exposure and disease change

over the value of the potential confounder (age)? How?

69

2014

Effect modification: example #2• Disease incidence by exposure and age

• Does the relationship between exposure and disease change over the value of the potential confounder (age)? How?

Rothman ’86 (p 178) 70

2014

Effect modification: contrast with confounding

• Confounding– A bias that an investigator hopes to remove– A nuisance that may or may not be present in a given

study design• Properties of a confounding variable: (Rothman, p123):

– a) be a risk factor for disease among the non-exposed;– b) be associated with the exposure variable; and– c) not be an intermediate step in the “causal pathway”

71

2014


• Effect modification– A more detailed description of the “true” relationship

between the exposure and the outcome– Effect modification is a finding to be reported (even

celebrated), not a bias to be eliminated– Effect modification is a “natural phenomenon” that

exists independently of the study design– The presence and interpretation of effect modification

depends upon the choice of effect measure (ratio vs. difference)

72

2014

73

Some lingo

• Covariate– Confounder, potential confounder– Effect modification, interaction– Intermediate variable

2014


• Note that for any association under study, a given factor may be:– Both a confounder and an effect modifier or– A confounder but not an effect modifier or An effect

modifier but not a confounder or

– neither

74

2014

Examples of confounding/effect modification

76

Level 1 Level 2 Crude/ collapsed/ Combined “unadjusted”

Uniform estimate (ORMH) /“adjusted”

Confounding present

Interaction present

4.0 4.0 4.0 4.0 NO NO4.0 0.25 1.0 1.0 NO YES1.0 1.0 8.4 1.0 YES NO4.0 0.25 1.0 2.0 YES

(?relevance)YES

2014

77

2014

Effect modification: test of homogeneity

• Null hypothesis: The individual stratified estimates of the effect do not differ from some uniform estimate of effect (such as a Mantel Haenszel estimator)

• Notation:

–– N is the number of strata (N=2 in our smoking/matches example);– ln^Ri is the natural logarithm of the estimated (hence the “^”) effect

measure for each stratum (ORi in our example);– ln^R is the natural logarithm of the uniform effect estimate (e.g. ORMH in

X2(N-1)

is chi-square with (N-1) degrees of freedom;

our example—the computer will use the maximum likelihood estimate)• One formula to test homogeneity:

X2

(N-1) = ∑ [ln(^ Ri) – ln(RMH)]2

Var[ln(^ Ri)]

N

i= 1

78

JC: Comment on choice of signifciance level for test of homogeneity 2014

Paradox

• If effect modification is present, a uniform estimator of effect (such as ORMH) cannot (or at least should not) be reported.

• However, in order to determine if effect modification is present, it is necessary to calculate the value of a uniform estimator of effect (such as ORMH) because it is needed in the calculation of the test of homogeneity.

79

2014

Effect modification: test of homogeneity (or is heterogeneity?)

• Comments– If the test of homogeneity is “significant” (=“reject homogeneity”)

this is evidence that there is heterogeneity (i.e. no homogeneity) and that effect modification may be present.

• (Null hypothesis: The individual stratified estimates of the effect do not differ from some uniform estimate of effect)

– The choice of a significance level (e.g. p < 0.05) is somewhat open to interpretation.

• One “conservative” approach, because of inherent limitations in the power of the test of homogeneity, is to treat the data as if interaction is present for p < 0.20).

• In other words, one would rather err on the side of assuming that interaction is present (and reporting the stratified estimates of effect) than on reporting a uniform estimate that may not be true across strata. 80

2014

UC Berkeley

34

2014

81

2014

Additive versus multiplicative scale effect modification

● Notation: RXZ● No additive interaction if (R11 – R01) = (R10 – R00)

○ Rewrite as: (R11-R01)-(R10-R00)=0● In words: Difference in risk for (X=1 vs. X=0) when Z=1 is

equal to difference in risk for (X=1 vs. X=0) when Z=0● Note: the values R11, R10, etc. are risks (not counts)

2014

Additive versus multiplicative scale effect modification

● Notation: RXZ● No multiplicative interaction if (R11/R01)=(R10/R00)

Rewrite as: (R11/R01)/(R10/R00)=1● In words: Ratio of risks/rates when X=1 vs. X=0 when

Z=1 is equal to ratio of risks/rates when X=1 vs. X=0 when Z=0

2014

Effect modification is scale-dependent

• Evidence for effect modification/statistical interaction if the RR or the AR differs between two groups• However, effect modification/statistical interaction is scale-dependent

– If you do not have interaction on the additive scale (AR is homogenous) then you will have interaction on the multiplicative scale (RR must be heterogeneous)

– If you do not have interaction on the multiplicative scale (RR is homogenous) then you will have interaction on the additive scale (AR must be heterogeneous)

– Note: It is common to have evidence of interaction on both scales.

2014

Example● No additive scale interaction if (R11-R01)-(R10-R00)=0● No relative scale interaction if (R11/R01)/(R10/R00)=1

● Additive scale: (60-20) - (50-10) = 0○ Interaction not present on the additive scale

● Relative scale: (60/20) / (50/10)=0.6○ Interaction present on the relative scale

Z=1 Z=0

X=1 60 50

X=0 20 10

2014

Example● No additive scale interaction if (R11-R01)-(R10-R00)=0● No relative scale interaction if (R11/R01)/(R10/R00)=1

● Additive scale: (60-20) - (30-10) = 20○ Interaction present on the additive scale

● Relative scale: (60/20) / (30/10)=1○ Interaction not present on the relative scale

Z=1 Z=0

X=1 60 30

X=0 20 10

2014

Logistic Regression(time permitting)

2014


’

• ORpooled = 21.0 (16.3, 27.1)

• ORmatches = 21.0 (10.5, 46.2)

• ORno matches = 21.0 (12.9, 34.7)

• Discuss your intuitions about the 95% CI s

Pooled Cancer No cancerSmoking No Smoking Matches Smoking

900100Cancer 810

300700No cancer 270

No Smoking No matches Smoking No Smoking

10Cancer 90

90

70No cancer 30

630 84

2014

A brief introduction to logistic regressionLet X1 = smoking (1=yes; 0=no) Let X2 = matches (1=yes; 0=no) Let Cancer = cancer (1=yes; 0=no)

Recall earlier tables:

OR=21.0

OR=21.0 OR=21.0

Conclusions: No confounding by matches of the relationship between smoking and lung cancer; no effect modification by matches of the relationship between smoking and lung cancer 85

Collapsed Cancer =1 Cancer=0X1=1 900 300X1=0 100 700

X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630

2014

Data structure for computer analysis

• Most computer programs would want to see the data for the individual subjects in the study in the following form:

H 0 0 086

Subject ID X1 X2 Cancer How many?A 1 1 1B 1 1 0C 0 1 1D 0 1 0E 1 0 1F 1 0 0G 0 0 1

2014

Data structure for computer analysis

• Most computer programs would want to see the data for the individual subjects in the study in the following form:

87

Subject ID X1 X2 Cancer How many?A 1 1 1 810 of theseB 1 1 0 270 of theseC 0 1 1 10 of theseD 0 1 0 70 of theseE 1 0 1 90 of theseF 1 0 0 30 of theseG 0 0 1 90 of theseH 0 0 0 630 of these

2014

88

The basic logistic equation for this problem

• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2

• ln (odds of disease) = a + b1(smoking) + b2(matches) + b3(smoking)(matches)

2014

Solving a logistic equation

• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 0, solve for “a”• ln (odds) = a = ln ( ) =• a =• So now: ln (odds) =

89

2014

OR=21.0 OR=21.0

90

X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30

X1=0 10 70 X1=0 90 630

2014

Solving a logistic equation

• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 0, solve for “a”• ln (odds) = a = ln (90/630) = -1.946• a = -1.946• So now: ln (odds) = -1.946 + b1X1 + b2X2 + b3X1X2

91

2014

92

Solving a logistic equation (cont.)

• When X1 = 1 and X2 = 0, solve for b1

• ln (odds) =• b1 =• So now: ln (odds) =

2014

93

OR=21.0 OR=21.0

X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0

X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630

2014

94


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 1 and X2 = 0, solve for b1• ln (odds) = ln (90/30) = 1.099 = -1.946 + b1• b1 = 3.045• So now: ln (odds) = -1.946 + 3.045X1 + b2X2 + b3X1X2

2014

95


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 1, solve for b2:• ln (odds) = ln ( ) =• b2=• So now: ln (odds) =

2014

96

X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630OR=21.0 OR=21.0

2014

97


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 0 and X2 = 1, solve for b2:• ln (odds) = ln (10/70) = -1.946 + 0 + b2X2 + 0• b2= 0• So now: ln (odds) = -1.946 + 3.045X1 + 0 + b3X1X2

2014


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 1 and X2 = 1 then:• ln (odds) =• ln (odds) =• Solve for b3• ln (odds) =• b3 =• So now: ln (odds) =

98

2014

99

X2=1 Cancer=1 No Cancer=0 X2=0 Cancer=1 No Cancer=0X1=1 810 270 X1=1 90 30X1=0 10 70 X1=0 90 630OR=21.0 OR=21.0

2014


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• When X1 = 1 and X2 = 1 then:• ln (odds) = -1.946 + b1 + b2 + b3• ln (odds) = -1.946 + 3.045 + 0 + b3• Solve for b3• ln (odds) = ln (810/270) = 1.099 = -1.946 + 3.045 + b3• b3 = 0• So now: ln (odds) = -1.946 + 3.045X1 + 0 + 0

100

2014


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• This simplifies (earlier calculations) to:

– ln (odds) = -1.946 + 3.045X1 + 0 + 0• One can now use the logistic equation to efficiently describe

relationships in the table• Calculate the ln(odds) for a smoker who uses matches: ln

(odds)=• Calculate the ln(odds) for a smoker who doesn’t use matches:

ln(odds) =

• Now calculate the odds ratio for (smokers vs. non-smokers// matches+)

• At home, calculate the odds ratio for (smokers vs. non- smokers// matches-)

101

2014


• ln (odds of disease) = a + b1X1 + b2X2 + b3X1X2• This simplifies (earlier calculations) to:

-1.946 + 3.045(X1) + 0(X2) + 0(X1X2 )• One can now use the logistic equation to efficiently describe

relationships in the table• Calculate the ln(odds) for a smoker who uses matches (X1 = 1

and X2 = 1):ln (odds)= -1.946 + 3.045 = 1.099

• Calculate the ln(odds) for a smoker who doesn’t use matches (X1= 1 and X2 = 0):ln(odds) = -1.946 + 3.045 = 1.099

• Now calculate the odds ratio for (smokers vs. non-smokers// matches)

• At home, calculate the odds ratio for (smokers vs. non-smokers//102no matches)

2014

Logistic RegressionUsing the logistic model model developed in class for the matches-

smoking-lung cancer data (stratified by matches), evaluate the risk of lung cancer for:

1. (in-class) A smoker who uses matches vs. a non-smoker who uses matches.

2. (at home) A smoker who uses matches vs. a non-smoker who does not use matches

SEPARATE ASSIGNMENTDevelop a logistic model for the matches-smoking-lung cancer data (stratified by smoking status). Use this model to evaluate the risk of lung cancer for:

1. (at home) A user of matches who smokes vs. a non-user of matches who smokes.

2. (at home) A smoker who uses matches vs. a non-smoker who uses matches. Is this result consistent with that you arrived at in the in-

103class example above? 2014

Find OR for smokers (who use matches) vs. non-smokers (who use matches)

For Smokers who use matches X1 = 1

X2 = 1For non-smokers who use matches X1 = 0

X2 = 1From prior slides we determined that: ln (odds) = -1.946

+ 3.045 (X1)

105

2014

For smokers who use matches (X1 = 1; X2 = 1) ln (odds) = -1.946 + 3.045 (1) = 1.0990

For non-smokers who use matches (X1 = 0; X2 = 1) ln (odds) = -1.946 + 0 + 0 + 0 = -1.946

We want to solve:ln OR = 1.0990 – (-1.946) = 3.045 eln OR = OR = e3.045 = 21.0

Therefore, the odds ratio (determined using logistic regression) comparing smokers using matches to non-smokers using matches is 21.0. This agrees with the stratified data presented earlier.

106

2014


• ORpooled = 21.0 (16.3, 27.1)

• ORmatches = 21.0 (10.5, 46.2)

• ORno matches = 21.0 (12.9, 34.7)

• Discuss your intuitions about the 95% CI s’

No Smoking 90 630 107

Pooled Cancer No cancerSmoking 900 300No Smoking 100 700Matches Cancer No cancerSmoking 810 270No Smoking 10 70No matches Cancer No cancerSmoking 90 30

2014

Some concluding comments on logistic regression

• Interpretations of the final logistic equation for these data:

ln (odds of disease) = a + b1(smoking) + b2(matches) + b3(smoking)(matches)

ln(odds) = -1.946 + 3.045(smoking) + 0(matches) + 0(matches)(smoking)

• This equation describes the data whether stratified either by matches or by smoking.

• The relationship of multiple variables may be simultaneously adjusted for by the the logistic equations

• The estimates of the coefficients for the equation are derived through maximum likelihood techniques

• This technique is very widely used in epidemiologic (and other)

applications when the outcome variable of interest is dichotomous. 108

2014

Some concluding comments on logisticregression

• Comments– Having multiple strata (how this technique makes

possible)– Test of homogeneity (b3)

Maximum likelihood estimation for coefficient estimation

• Modifications of logistic regression exist for coping with– Outcome variables with multiple levels = polytomous

logistic regression– Studies in which matching was used = Conditional

logistic regression 109

2014

. use http://www.stata-press.com/data/r8/lbw

storage display valuevariable name type format variable label--------------------------------------------------------------

-----------------

110

id low

int byte

%8.0g%8.0g

identification code birth weight<2500g

age lwt race smoke ptl ht ui ftv

byte int byte byte byte byte byte byte

%8.0g%8.0g%8.0g%8.0g%8.0g%8.0g%8.0g%8.0g

age of weight race smoked

motherat last menstrual period

during pregnancypremature labor history (count) has history of hypertension presence, uterine irritability number of visits to physician

during 1st trimester birth weight (grams)bwt int %8.0g

2014

http://www.stata-press.com/data/r8/lbw

Special (and very useful) STATA command“xi” (=“interaction expansion”)

• xi: logistic low age lowwt i.race smoke pt1 ht ui

• In this example, a variable named “race” has three levels (e.g. white/hispanic/black) that might be coded as “0=white”; “1=hispanic”; “2=black”

• The combined use of xi and i.race directs STATA to analyze all levels of race (and compare them to level 1)— this can be a HUGE time-saver (avoids the user having to manually recode such variables)!

111

2014

Assignments

• Write the logistic model describing these data (next slide).• What is the risk of low birth weight (LBW) for a smoker,

adjusted for all other variables?• How can the 95% CI be determined?• What is the risk of LBW for an Hispanic baby (compared

to a white baby)?• What is the risk of LBW for a black baby (compared to an

Hispanic baby)?

112

2014

113

2014

114Discuss intercept

2014

115

2014

4 threats to validity from confounding bias and effect modification

Education