confounding and interaction: part iii when evaluating association between an exposure and an...

37
Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: Intermediary Variable Effect Modifier – Confounder No Effect Using Stratification to Form “Adjusted” Summary Estimates to Evaluate Presence of Confounding Concept of weighted average • Woolf’s Method • Mantel-Haenszel Method Avoid statistical testing Handling more than one potential confounder Limitations of Stratification to Adjust for Confounding the motivation for multivariable regression

Post on 19-Dec-2015

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Confounding and Interaction: Part III

• When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are:– Intermediary Variable– Effect Modifier– Confounder– No Effect

• Using Stratification to Form “Adjusted” Summary Estimates to Evaluate Presence of Confounding– Concept of weighted average

• Woolf’s Method• Mantel-Haenszel Method

– Avoid statistical testing– Handling more than one potential confounder

• Limitations of Stratification to Adjust for Confounding– the motivation for multivariable regression

Page 2: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

When Assessing the Association Between an Exposure and a Disease,

What are the Possible Effects of a Third Variable?

Effect Modifier

+

_

Confounding:

ANOTHER PATHWAY TO

GET TO THE DISEASE

Confounding:

ANOTHER PATHWAY TO

GET TO THE DISEASE

Interaction:

MODIFIES THE EFFECT OF THE EXPOSURED

I C Intermediary

Variable

No Effect

Page 3: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

What are the Possible Effects of a 3rd Variable?

• Intermediary Variable

• Effect Modifier (interaction)

• Confounder

• No Effect

Effect Modifier?

yesno

Confounder?

yesno

Report stratum-specific estimates

Report “adjusted” summary estimate

No Effect: Report Crude Estimate

Intermediary Variable (conceptual decision)?

Report Crude Estimate

no yes

Page 4: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Interaction with a Third Variable

Delayed Not DelayedSmoking 26 133No Smoking 64 601

DelayedNot

DelayedSmoking 15 61No Smoking 47 528

Stratified

Crude

No Caffeine Use

Heavy Caffeine Use

RR crude = 1.7

RRno caffeine use = 2.4

DelayedNot

DelayedSmoking 11 72No Smoking 17 73

RRcaffeine use = 0.7

Page 5: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Is Interaction Present?

• Does the relationship between the exposure and the outcome vary meaningfully (in a clinical/biologic sense) across strata of the third variable?

• Does an average (adjusted) effect (formed by averaging the strata formed on the basis of the third variable) reasonably represent all strata? – if yes, go on to form an average (adjusted)

measure– if no, stop - this is interaction; report

stratum-specific estimates

Page 6: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Declare vs Ignore Interaction?

Relative Risks for aGiven Exposure and

Disease

Potential Effect ModifierPresent Absent

P value forheterogeneity

Declare orIgnore

Interaction

2.3 2.6 0.45 Ignore

2.3 2.6 0.001 Ignore

2.0 20.0 0.001 Declare

2.0 20.0 0.20 Declare

2.0 20.0 0.50 Defer toprior prob.

3.0 4.5 0.30 Ignore

3.0 4.5 0.001 +/-

0.5 3.0 0.001 Declare

0.5 3.0 0.20 Declare

0.5 3.0 0.30 +/-

Page 7: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

No Effect of Third Variable

Lung Ca No Lung CaSmoking 900 300No Smoking 100 700

Lung CaNo

Lung CASmoking 810 270No Smoking 10 70

Stratified

Crude

Matches Absent

Matches Present

Lung CaNo

Lung CASmoking 90 30No Smoking 90 630

OR crude = 21.0

(95% CI: 16.4 - 26.9)

ORmatches = 21.0 OR no matches = 21.0

OR adj = 21.0

(95% CI: 14.2 - 31.1)

Page 8: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Confounding by Third Variable

Lung Ca No Lung CaMatches 820 340No Matches 180 660

Lung CaNo

Lung CAMatches 810 270No Matches 90 30

Stratified

Crude

Non-SmokersSmokers

OR crude = 8.8

OR non-smokers = 1.0

Lung CaNo

Lung CAMatches 10 70No Matches 90 630

ORsmokers = 1.0

OR adj = 1.0

Page 9: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Forming an Adjusted Summary Estimate

Down’s No Down’sSpermicide Use 4 109No Spermicide 12 1145

Down’sNo

Down’sSpermicide 3 104No Spermicide 9 1059

Stratified

Crude

Age > 35Age < 35

OR crude = 3.5

ORage >35 = 5.7

Down’sNo

Down’sSpermicide 1 5No Spermicide 3 86

ORage <35 = 3.4

Test of homogeneity: p = 0.71

Page 10: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Assuming Interaction is not Present, Form a Summary of the Unconfounded

Stratum-Specific Estimates

• Construct a weighted average– Assign weights to the individual strata– Summary Estimate = Weighted Average of

the stratum-specific estimates

– a simple mean is a weighted average where the weights are equal to 1

– which weights to use depends on type of effect estimate desired (OR, RR, RD) and characteristics of the data

– e.g. • Woolf’s method• Mantel-Haenszel method

ii

ii

w

istratuminestimateeffectw )] ([

Right. We need to assign a weight to each stratum and then perform a weighted average.

Right. We need to assign a weight to each stratum and then perform a weighted average.

How do we decide on a weight?How do we decide on a weight?

Hopefully the concept of a weighted average is understood by everyone. A simple mean is in fact a weighted average where the weights equal one. To get the average height of everyone in class, we add up everyone’s height and divide by the number of persons

contributing. The weight is one.

Hopefully the concept of a weighted average is understood by everyone. A simple mean is in fact a weighted average where the weights equal one. To get the average height of everyone in class, we add up everyone’s height and divide by the number of persons

contributing. The weight is one.

The second approach to getting a summary estimate is actually the one used by multivariable modeling approaches and we will touch on this briefly today. It is called the maximum likelihood approach

The second approach to getting a summary estimate is actually the one used by multivariable modeling approaches and we will touch on this briefly today. It is called the maximum likelihood approach

Page 11: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Forming a Summary Estimate for Stratified Data

• Goal: – Create a summary “adjusted” estimate for

the relationship in question while adjusting for the potential confounder

– e.g.:• Case-control study of post-exposure

AZT use in preventing HIV seroconversion after needlestick (NEJM 1997)

Crude

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320 347

ORcrude =0.61

(95% CI: 0.26 - 1.4)

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

Page 12: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Post-exposure prophylaxis with AZT after a needlestick

HIV

AZT Use

Severity of Exposure

Page 13: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Forming a Summary Estimate for Stratified Data

• Potential confounder: severity of exposure

Minor Severity Major

Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320 347

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

Page 14: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

. cc HIV AZTuse,by(severity)

severity | OR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- minor | 0 0 2.302373 1.070588 major | .35 .1344565 .9144599 6.956522-----------------+-------------------------------------------------

Test of homogeneity (B-D) chi2(1) = 0.60 Pr>chi2 = 0.4400

To stratify the subjects into those women with maternal age less than 35 and those with maternal age >= 35, you add a “by(matage) option. If you add a “, pool” option as I have here, the program will give you not only the default MH summary but also the Woolf estimate.

To stratify the subjects into those women with maternal age less than 35 and those with maternal age >= 35, you add a “by(matage) option. If you add a “, pool” option as I have here, the program will give you not only the default MH summary but also the Woolf estimate.

Finally, you are already familiar with this command but for sake of comparison let’s look at the summary estimate as obtained by logistic regression which as you know uses the MLE approach. As you can see, the MH estimate is essentially identical to the MLE in this problem.

Finally, you are already familiar with this command but for sake of comparison let’s look at the summary estimate as obtained by logistic regression which as you know uses the MLE approach. As you can see, the MH estimate is essentially identical to the MLE in this problem.

Page 15: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Forming a Summary Estimate for Stratified Data

• Goal: – Create a summary “adjusted” estimate for

the relationship in question while adjusting for the potential confounder

– e.g.:• AZT use, severity of needlestick and

HIV seroconversion after needlestick (NEJM 1997)

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320 347

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

Page 16: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Summary Estimators: Woolf’s Method

• aka Directly pooled or precision estimator• Woolf’s estimate for odds ratio

– where wi

– wi is the inverse of the variance of the stratum-specific log(odds ratio)

idicibia1111

1

One of the first approaches developed for forming summaryl adjusted estimates was Woolf’s method:

One of the first approaches developed for forming summaryl adjusted estimates was Woolf’s method:

This is the inverse of the variance of

the log odds ratio. This makes sense the more precise strata have the

smallest variances and the inverse of a small number is a

large number

This is the inverse of the variance of

the log odds ratio. This makes sense the more precise strata have the

smallest variances and the inverse of a small number is a

large number

i

i

i

ii

Woolfw

w )]OR (log[

OR log

)(OR logOR WoolfWoolf e

Disease No DiseaseExposed ai bi

Unexposed ci di

Page 17: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Calculating a Summary Effect Using the Woolf Estimator

• e.g. AZT use, severity of needlestick, and HIV

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

281

161

401

81

1

1611

31

911

01

1

)]0.35 log(

281

161

401

81

1[)]0 log(

1611

31

911

01

1[

WoolfOR log

Page 18: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Summary Estimators: Woolf’s Method

• Conceptually straightforward although computationally messy

• Best when:– number of strata is small– sample size within each strata is large

• Cannot be calculated when any cell in any stratum is zero because log(0) is undefined– 1/2 cell corrections have been suggested but are subject to

bias

• Formulae for Woolf’s summary estimates for other measures (RR, RD, AR) available in texts and software documentation

– sensitive to small strata, cells with “0”– computationally messy

It seems the most reasonable to assign each stratum according to how sure you are of the inference and the

variance of the estimate is the best measure we have for this.

It seems the most reasonable to assign each stratum according to how sure you are of the inference and the

variance of the estimate is the best measure we have for this.

I discuss this approach first not only because it was one of the first proposed but also because it is the most conceptually straightforward.

I discuss this approach first not only because it was one of the first proposed but also because it is the most conceptually straightforward.

In the days before computers, this was considered computationally messy such that other easier methods were sought

In the days before computers, this was considered computationally messy such that other easier methods were sought

Page 19: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Summary Estimators: Mantel-Haenszel

• Mantel-Haenszel estimate for odds ratios

– ORMH =

– wi =

– wi is inverse of the variance of the stratum-specific odds ratio under the null hypothesis (OR =1)

i

ii

N

cb

i

ii

i

ii

NcbNda

i

ii

ii

ii

i

ii

Ncbcbda

Ncb

*

Disease No DiseaseExposed ai bi

Unexposed ci di

ai+ bi + ci + di = Ni

A more robust approach is the Mantel-Haenszel methodA more robust approach is the Mantel-Haenszel method

Again, using the same cell definitions, the M-H estimate for the summary OR is the sum of a times d divided by T divided by the sum of . . .

Again, using the same cell definitions, the M-H estimate for the summary OR is the sum of a times d divided by T divided by the sum of . . .

If we decompose this slightly, we can see that the weight is for each stratum is actually b times c divided by T. This is actually the inverse of the . . .

If we decompose this slightly, we can see that the weight is for each stratum is actually b times c divided by T. This is actually the inverse of the . . .

And the same logic as before, strata with the smallest variance get the most weight

And the same logic as before, strata with the smallest variance get the most weight

Page 20: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Summary Estimators: Mantel-Haenszel

• Mantel-Haenszel estimate for odds ratios

– resistant to the effects of large numbers of strata with few observations

– resistant to cells with a value of “0”

– computationally easy

– most commonly used

The MH is the most commonly used estimator. The MH is the most commonly used estimator.

It is fairly resistant (ie it doesn’t blow up) . . .It is fairly resistant (ie it doesn’t blow up) . . .

Although really not a factor in the computer era, the computation of the MH estimator is a breeze.

Although really not a factor in the computer era, the computation of the MH estimator is a breeze.

More importantly is that the M-H closely approximates the MLE estimate which is generally regarded as the most accurate.

More importantly is that the M-H closely approximates the MLE estimate which is generally regarded as the most accurate.

Page 21: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Calculating a Summary Effect Using the Mantel-Haenszel Estimator

• e.g. AZT use, severity of needlestick, and HIV

• ORMH =

• ORMH =

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

i

ii

ii

ii

i

ii

N

cbcb

da

N

cb*

i

ii

i

ii

NcbNda

30.0

921640

255391

92288

2551610

Page 22: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Calculating a Summary Effect in Stata

• epitab command - Tables for epidemiologists– see Reference manual A-G

• To produce crude estimates and 2 x 2 tables:– For cross-sectional or cohort studies:

• cs variablecase variable exposed

– For case-control studies:

• cc variablecase variableexposed

• To stratify by a third variable:

– cs varcase varexposed, by(varthird variable)

– cc varcase varexposed, by(varthird variable)

• Default summary estimator is Mantel-Haenszel– , pool will also produce Woolf’s method

How can we make our lives a lot easier and implement all of this on the computer?How can we make our lives a lot easier and implement all of this on the computer?

The epitab command - Tables for Epidemiologists is quite a little handy command. Has anyone used it ?The epitab command - Tables for Epidemiologists is quite a little handy command. Has anyone used it ?

Page 23: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Calculating a Summary Effect Using the Mantel-Haenszel Estimator

• e.g. AZT use, severity of needlestick, and HIV

• . cc HIV AZTuse,by(severity) pool• severity | OR [95% Conf. Interval] M-H Weight• -----------------+-------------------------------------------------• minor | 0 0 2.302373 1.070588 • major | .35 .1344565 .9144599 6.956522 • -----------------+-------------------------------------------------• Crude | .6074729 .2638181 1.401432 • Pooled (direct) | . . .• M-H combined | .30332 .1158571 .7941072 • -----------------+-------------------------------------------------• Test of homogeneity (B-D) chi2(1) = 0.60 Pr>chi2 = 0.4400• Test that combined OR = 1:• Mantel-Haenszel chi2(1) = 6.06• Pr>chi2 = 0.0138

Minor Severity

Major Severity

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVAZT 8 131No AZT 19 189

27 320

HIVNo

HIVAZT 0 91No AZT 3 161

3 252 255

ORcrude =0.61

OR = 0.0

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVAZT 8 40No AZT 16 28

24 68 92

OR = 0.35

Page 24: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Calculating a Summary Effect Using the Mantel-Haenszel Estimator

• In addition to the odds ratio, Mantel-Haenszel estimators are also available in Stata for:

– risk ratio

• “cs varcase varexposed, by(varthird variable)”

• or st stir– rate ratio

• “ir varcase varexposed vartime, by(varthird variable)”

• or st strate

Page 25: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Mantel-Haenszel Confidence Interval and Hypothesis Testing

stratumeach in cell a

for the valueexpected theis E

)1(

5.0

eCI %95

;;;

)(2

)(

))((2

)(

)(2

)(

OR) (logSE

i

12

2121

2

1 121

)MH

OR SE(log x (1.96 MH

OR log

1

2

1

1 1

1

1

2

1

where

NN

mmnn

Ea

N

cbw

N

daR

N

cbQ

N

daP

where

w

wQ

wR

RQwP

R

RP

k

i ii

iiii

k

i

k

iii

i

iii

i

iii

i

iii

i

iii

k

ii

k

iii

k

i

k

iii

k

iiiii

k

ii

k

iii

Disease No DiseaseExposed ai bi n1i

Unexposed ci di m2i

n1i n2i Ni

Page 26: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Mantel-Haenszel Techniques

• Mantel-Haenszel estimators• Mantel-Haenszel chi-square statistic• Mantel’s test for trend (dose-response)

Page 27: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Assessment of Confounding: Interpretation of Summary Estimate

• Compare “adjusted” summary estimate to crude estimate

– e.g. compare ORMH (= 0.30 in the example) to ORcrude (= 0.61 in the example)

• If “adjusted” measure “differs meaningfully” from crude estimate, then confounding is present

– e.g., does ORMH = 0.30 “differ meaningfully” from ORcrude = 0.61?

• What does “differs meaningfully” mean?– a matter of judgement based on

biologic/clinical sense rather than on a statistical test

– no one correct answer– 10% change often used– your threshold needs to be stated a priori

and included in your methods section

So, its in the hands of the researcherSo, its in the hands of the researcher

If the summary estimate, here a M-H OR estimator of 3.8If the summary estimate, here a M-H OR estimator of 3.8

Page 28: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Summary Effect in Stata -example• e.g. Spermicide use, maternal age and

Down’s Down’s No Down’sSpermicide use 4 109No spermicide use 12 1145

Down’s NoDown’s

Spermici use 3 104No spermic. 9 1059

1175

Age < 35 Age > 35

Crude

StratifiedDown’s No

Down’sSpermic. use 1 5No spermic. 3 86

95

OR = 3.4 OR = 5.7

OR = 3.5

With this in mind, let’s consider an example using . . .With this in mind, let’s consider an example using . . .

Should we pool these?Should we pool these?

Is there confounding present?Is there confounding present?

. cc downs spermici , by(matage) pool

matage | OR [95% Conf. Interval] M-H Weight-----------------+------------------------------------------------- < 35 | 3.394231 .9800358 11.80389 .7965957 >= 35 | 5.733333 0 50.8076 .1578947-----------------+------------------------------------------------- Crude | 3.501529 1.171223 10.49699 Pooled (direct) | 3.824166 1.196437 12.22316 M-H combined | 3.781172 1.18734 12.04142-----------------+-------------------------------------------------Test for heterogeneity (direct) chi2(1) = 0.137 Pr>chi2 = 0.7109Test for heterogeneity (M-H) chi2(1) = 0.138 Pr>chi2 = 0.7105

Test that combined OR = 1: Mantel-Haenszel chi2(1) = 5.81 Pr>chi2 = 0.0159

Page 29: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Presence or Absence of Confounding by a Third Variable?

Relative RisksCrude Third

FactorPresent

ThirdFactorAbsent

Adjusted

Adjust orIgnore?

4.1 1.9 2.1 2.0 Adjust4.0 1.2 1.0 1.1 Adjust0.2 0.7 0.9 0.8 Adjust4.0 3.8 4.2 4.1 Ignore4.0 8.2 7.7 7.9 Adjust1.0 3.1 2.7 3.0 Adjust1.9 1.6 1.9 1.8 Prob. Ignore0.9 0.1 0.2 0.1 Adjust4.0 0.4 0.6 0.5 Adjust

Page 30: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Statistical Testing for Confounding is Inappropriate

• Testing for statistically significant differences between crude and adjusted measures is inappropriate – e.g. when examining an association for

which a factor is a known confounder (say age in the association between HTN and CAD)

– if the study has a small sample size, even large differences between crude and adjusted measures will not be statistically different

– yet, we know confounding is present

– therefore, the difference between crude and adjusted measures cannot be ignored as merely chance and must be reported as confounding

Page 31: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Statistical Testing for Confounding is Inappropriate

• Furthermore, with large sample sizes, even factors which truly are not confounders can appear to cause confounding that is “statistically significant”

• e.g., study of sunlight exposure and melanoma• prior knowledge: no relationship between gum

chewing and melanoma• data: gum chewing is assoc. with sunlight

exposure and with melanoma and adjusted measure of association is statistically different than the crude association?

• should gum chewing be controlled for?

• To resolve this paradox, only adjust for factors for which you have biologic rationale (i.e., some prior probability)

Page 32: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Stratification - Effect of Excessive Correlation Between Exposure & Confounder

• e.g. race/SES; income/education; no. of sexual partners/no. of anal intercourse partners

– aka collinearity– precludes ability to adjust

Low Education

High Education

Crude

Stratified

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

The goal is to combine or average the results from the different strata into one summary estimate. Any thoughts on how to do this?

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

After we have formed our strata and gotten rid of confounding, how do we summarize what the unconfounded estimates from the two or more strata are telling us. In the examples of last week, the measures of association from the different strata were

identical. This is seldom the case.

HIV No HIVLow income 40 960Hi income 5 995

HIVNo

HIVLow income 38 930Hi income 0 1

RRcrude =8.0

RR = undefined

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

A more realistic is described in the Rothman chapter regarding the question of whether spermicide use might cause Down’s

HIVNo

HIVLow income 2 30Hi income 5 994

RR = 12.5

Page 33: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

When More than One Additional Variable is Present

Crude

Stratified

white smokers

latino non-smokers

black non-smokers

CAD NoCAD

Chlamydia

NoChlamydia

white non-smokers

black smokers latino smokers

CAD No CADChlamydiaNo chlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

CAD NoCAD

Chlamydia

NoChlamydia

Page 34: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

The Need for Evaluation of Joint Confounding

• Variables that evaluated alone show no confounding may show confounding when evaluated jointly

Crude

Stratified by Factor 1

by Factor 2

by Factor 1 & 2

The examples I have shown thus far have just one potential confounder to worry about. What should we do when more than . . .

The examples I have shown thus far have just one potential confounder to worry about. What should we do when more than . . .

In this example, the crude estimate is identical to the stratum specific measures when the 2 other variables are looked at separately.

In this example, the crude estimate is identical to the stratum specific measures when the 2 other variables are looked at separately.

Disease No DiseaseExposed 12 4Unexposed 30 22

OR = 2.2

F1 +Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

F1+F2+Disease

NoDisease

Exposed 1 1Unexposed 10 10

OR = 1.0

F1-F2+Disease

NoDisease

Exposed 5 1Unexposed 5 1

OR = 1.0

F1+F2-Disease

NoDisease

Exposed 5 1Unexposed 5 1

OR = 1.0

F1-F2-Disease

NoDisease

Exposed 1 1Unexposed 10 10

OR = 1.0

F1 -Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

F2 +Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

F2 -Disease

NoDisease

Exposed 6 2Unexposed 15 11

OR = 2.2

Page 35: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Approaches for When More than One Potential Confounder is Present

• Backward versus forward confounder evaluation strategies– relevant both for stratification and especially

multivariable modeling• Backwards Strategy

– initially evaluate all potential confounders together (look for joint confounding)

– conceptually preferred because in nature variables are all present and act together

– Procedure:• with all potential confounders considered, form

adjusted estimate• one variable can then be dropped and the

adjusted estimate is re-calculated (adjusted for remaining variables)

• if the dropping of the first variable results in an inconsequential change, it can be eliminated

• procedure continues until no more variables can be dropped

– Problem:• with many potential confounders, cells become

very sparse and strata very imprecise

This introduces the whole topic of This introduces the whole topic of

I know you are learning a bit about this in biostatistics. Which is

preferable -backward or forwards?

I know you are learning a bit about this in biostatistics. Which is

preferable -backward or forwards?

In fact, you may not even be able to get off the ground because the initial stratification is just too thin

In fact, you may not even be able to get off the ground because the initial stratification is just too thin

Page 36: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Approaches for When More than One Potential Confounder is Present

• Forward Strategy– start with the variable that has the biggest

“change-in-estimate” impact

– then add the variable with the second biggest impact

– keep this variable if its presence meaningfully changes the adjusted estimate

– procedure continues until no other added variable has an important impact

– Advantage:• avoids the initial sparse cell problem of

backwards approach

– Problem:• does not evaluate joint confounding effects

of many variables

In the forward selection approach, you start with . . .In the forward selection approach, you start with . . .

Page 37: Confounding and Interaction: Part III When Evaluating Association Between an Exposure and an Outcome, the Possible Effects of a 3rd Variable are: –Intermediary

Stratification to Reduce Confounding

• Advantages– straightforward to implement and comprehend– easy to evaluate interaction

• Limitations– Looks at only one exposure-disease assoc. at a time– Requires continuous variables to be discretized

• loses information; possibly results in “residual confounding”

– Deteriorates with multiple confounders• e.g. suppose 4 confounders with 3 levels

– 3x3x3x3=81 strata needed– unless huge sample, many cells have “0”’s

and strata have undefined effect measures– Solution:

• Mathematical modeling (multivariable regression)– e.g.

» linear regression» logistic regression» proportional hazards regression

Although you are all now learning about the wonderful world of multivariable modeling, I would encourage you to examine your data whenever you can with stratification because it

is the most native way to see your data and the easiest to explain your data to others

Although you are all now learning about the wonderful world of multivariable modeling, I would encourage you to examine your data whenever you can with stratification because it

is the most native way to see your data and the easiest to explain your data to others

It does, however, have its limitations which is principally that it breaks down with multiple confounders

It does, however, have its limitations which is principally that it breaks down with multiple confounders

These approaches are the topics of Mitch Katz’s upcoming sessions and your Thursday sessions.

These approaches are the topics of Mitch Katz’s upcoming sessions and your Thursday sessions.