linear models ii wednesday, may 30, 10:15-12:00

78
Linear Models II Wednesday, May 30, 10:15-12:00 Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School of Public Health Training Course in MCH Epidemiology

Upload: lieu

Post on 22-Mar-2016

42 views

Category:

Documents


0 download

DESCRIPTION

Linear Models II Wednesday, May 30, 10:15-12:00. Deborah Rosenberg, PhD Research Associate Professor Division of Epidemiology and Biostatistics University of IL School of Public Health Training Course in MCH Epidemiology. Linear Models: Using the Regression Equation - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Linear Models II Wednesday, May 30, 10:15-12:00

Linear Models IIWednesday, May 30, 10:15-12:00

Deborah Rosenberg, PhDResearch Associate ProfessorDivision of Epidemiology and BiostatisticsUniversity of IL School of Public Health

Training Course in MCH Epidemiology

Page 2: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

2

Linear Models: Using the Regression Equation1.Assessing Effect Modification and

Confounding 2.Using "dummy" variables3.Generating and testing custom contrasts4.Strategies for model-building

Page 3: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

3

Confounding and Effect Modification

If another factor is related to both the risk factor and outcome of interest, and if the association between the risk factor and outcome is different after considering a third factor, then confounding is present.

If the association between a risk factor and outcome is not only different after considering a third factor, but the difference varies depending on values of the third factor, then effect modification is present.

Page 4: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Confounding and Effect Modification

The concepts of confounding and effect modification apply to any modeling approach as well as to data at any level of measurement.

For example, if we conduct 'normal' regression with a continuous, normally distributed outcome variable, we are interested in whether differences between means on a variable vary across levels of another variable or whether the crude difference between means is different than the adjusted difference between means.

4

Page 5: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

5

Confounding and Effect Modification

Comparing Results—Contingency Tables and Linear Modeling:2 Dichotomous Independent Variables (coded 1 and 0)and a Dichotomous Outcome

tables VarB*VarA*outcome / relrisk riskdiff cmh;

A model with 2 main effects only (no product term)model outcome = VarA VarB / link = __ dist = __ ;

A model with 2 main effects and a product term

model outcome = VarA VarB VarA*VarB / link = __ dist = __ ;

Page 6: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

6

Confounding and Effect Modification

Comparing Results--Contingency Tables and Linear Modeling:2 Dichotomous Independent Variables (coded 1 and 0)and a Dichotomous Outcome

Outcome VarB = Y Y N VarA Y

N Outcome VarB = N Y N VarA Y

N

Analysis Of Maximum Likelihood Parameter Estimates Standard Parameter DF Estimate Error Pr > ChiSq Intercept 1 VarA 1 VarB 1

Analysis Of Maximum Likelihood Parameter Estimates Standard Parameter DF Estimate Error Pr > ChiSq Intercept 1 VarA 1 VarB 1 VarA*VarB

Page 7: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Confounding and Effect Modification

tables VarB*VarA*outcome / relrisk riskdiff cmh;

Given how these tables are specified, we will obtain:1. An OR and RR along with a statistical test for the

association between VarA and the outcome, adjusted for VarB

2. Stratum-specific ORs and RRs along with statistical tests for the associations between VarA and the outcome at each level of VarB

3. A statistical test for homogeneity of the stratum-specific measures of association (effect modification/interaction)

7

Page 8: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Confounding and Effect Modification

tables VarB*VarA*outcome / relrisk riskdiff cmh;

Given how these tables are specified, we will not obtain:

1. An OR and RR, or a statistical test for the association between VarB and the outcome, adjusted for VarA

2. Stratum-specific ORs and RRs, or the statistical tests for the associations between VarB and the outcome at each level of VarA

8

Page 9: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Confounding and Effect Modification

A model with 2 main effects only (no product term)model outcome = VarA VarB

/ link = __ dist = __ ;

Given how the model is specified, we will obtain: 1. ORs or RRs along with statistical tests for both the

association between VarA and the outcome, adjusted for VarB, and the association between VarB and the outcome, adjusted for VarA

2. A global statistical test for the simultaneous effect of VarA and VarB

9

Page 10: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

10

Confounding and Effect Modification

A model with 2 main effects only (no product term)model outcome = VarA VarB

/ link = __ dist = __ ;

Given how the model is specified, we will not obtain:

1. Any stratum-specific estimates, either for VarA stratified by VarB or vice versa

2. A statistical test for homogeneity of stratum-specific measures of association (effect modification/interaction)

Page 11: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

11

ExampleUsing Logistic Regression

A model with 2 main effects only (no product term)model outcome = smoking late_no_pnc

Mutually Adjusted ORsTo assess confounding,

compare each toits own crude OR

Analysis of Maximum Likelihood Estimates

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8622 0.0176 26390.9295 <.0001 smoking 1 0.6064 0.0382 251.5499 <.0001 late_no_pnc 1 0.2739 0.0362 57.3687 <.0001

Odds Ratio Estimates

Point 95% Wald Effect Estimate Confidence Limits smoking 1.834 1.701 1.977 late_no_pnc 1.315 1.225 1.412

Page 12: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

12

Example:Using Logistic Regression

Computing the OR for the association between smoking and low birthweight adjusting for late/no prenatal care:

Holding late_no_pnc constant at 0 Holding late_no_pnc constant at 1

The result is the same regardless of whether we use '1' or '0' for the value of the prenatal care variable—this is the meaning of 'adjustment' or 'control for' confounding.

834.1e

eee

6064.0

016064.0

02739.006064.08622.2

02739.016064.08622.2

834.1e

eee

6064.0

016064.0

12739.006064.08622.2

12739.016064.08622.2

Page 13: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

13

Example:Using Log-binomial Regression

A model with 2 main effects only (no product term)model outcome = smoking late_no_pnc

In logistic regression, the predicted values—odds—are not that meaningful, but from log-binomial (or Poisson) regression, estimated prevalences / risks can be reported in addition to adjusted measures of association.

Page 14: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

14

Example:Using Log-binomial Regression

Among women who got late Among women who got earlyor no prenatal care, the low prenatal care, the lowbirthweight rate was 12% for birthweight rate was 9.5% forsmokers compared to 7% for smokers compared to 5.4% for nonsmokers. nonsmokers.

095.0e 02548.015593.09174.2

054.0e 02548.005593.09174.2 07.0e 12548.005593.09174.2

122.0e 12548.015593.09174.2

Page 15: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

15

Example:Using Log-binomial Regression

Only a Model Without Considering Effect ModificationYields Adjusted Relative Prevalences

Estimated Adjusted Relative Prevalence for Smoking Controlling for late_no_pncThe summary relative prevalence was: Cohort Mantel-Haenszel 1.749 (1.635,1.873)

Calculations yield the same result regardless of the category of late_no_pnc

s.e. for smoking from modeling = 0.0347

749.1e

ee

112548.0015593.0

12548.005593.09174.2

12548.015593.09174.2

749.1e

ee

002548.0015593.0

02548.005593.09174.2

02548.015593.09174.2

873.1,634.1eCI %95 0347.096.15593.0

Page 16: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Confounding and Effect Modification

A model with 2 main effects and a product termmodel outcome = VarA VarB VarA*VarB

/ link = __ dist = __;

Given how the model is specified, we will obtain: 1. Stratum-specific ORs / RRs along with statistical tests for the

associations between VarA and the outcome in the stratum where VarB = 0, and between VarB and the outcome in the stratum where VarA=0

2. A statistical test for homogeneity of stratum-specific measures of association (multiplicative effect modification /interaction)

16

Page 17: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

17

Confounding and Effect Modification

A model with 2 main effects and a product term

model outcome = VarA VarB VarA*VarB / link = __ dist = __;

Given how the model is specified, we will not obtain: 1. Any adjusted ORs/RRs, either for VarA adjusted for

VarB or vice versa. Including a product term in a model by definition assumes that effect modification is present and that we only want to report stratum specific, rather than adjusted measures.

Page 18: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

18

Modeling Effect Modification

So why a product term in the model?

Effect modification/interaction is commonly assessed under a multiplicative model. The null hypothesis can be stated in terms of joint and separate effects as follows:

This approach explicitly acknowledges why this is multiplicative effect modification.

1VarB ,0VarA0VarB ,1VarA1VarB and 1VarA0 Effect .SepEffect .SepEffectJoint :H

Page 19: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

19

Modeling Effect Modification

So why a product term in the model?Alternatively, the null hypothesis of no multiplicative interaction can be stated in the more familiar terms of homogeneity of stratum-specific estimates:

2 stratum1 stratum0

2 stratum1 stratum0

2 stratum1 stratum0

OROR:Hor

RRRR:Hor

MeanMean:H

Page 20: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

20

The null hypothesis of no multiplicative interaction

Joint and Separate Effects

Stratum-specificEffects

Modeling Effect Modification

Page 21: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

21

Modeling Effect Modification

The product term is literally the multiplication of the values of two independent variables.

Assuming two dichotomous variables, values of the modeled variables are:

VarA VarB VarA *VarB1 1 11 0 00 1 00 0 0

Page 22: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

22

Modeling Effect Modification

Assuming one dichotomous and one continuous variable, values of the modeled variables might be:

VarA Age VarA*Age1 22 221 35 350 19 00 32 0

Page 23: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

23

Modeling Effect Modification

Assuming two ordinal variables, values of the modeled variables might be:

ord1 ord2 ord1*ord22 2 41 2 20 2 02 1 21 1 10 1 02 0 01 0 00 0 0

Page 24: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Example:

Case-Control (OR) Mantel-Haenszel 1.8355 1.7028-1.9784Cohort (RP) Mantel-Haenszel 1.7499 1.6349-1.8731

Breslow-Day Test p =0.2524

24

Table 1 of smoking by lbw Controlling for late_no_pnc=Late or No PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 289 ‚ 1988 ‚ 2277 ‚ 12.69 ‚ 87.31 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 775 ‚ 10503 ‚ 11278 ‚ 6.87 ‚ 93.13 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 1064 12491 13555

Table 2 of smoking by lbw Controlling for late_no_pnc=First Trimester PNC smoking lbw Frequency‚ Row Pct ‚ yes ‚no ‚ Total ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ yes ‚ 649 ‚ 6333 ‚ 6982 ‚ 9.30 ‚ 90.70 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ no ‚ 3271 ‚ 57000 ‚ 60271 ‚ 5.43 ‚ 94.57 ‚ ƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒˆ Total 3920 63333 67253

Odds Ratio 1.9701 1.7070-2.2738 Cohort 1.8470 1.6261-2.0979

Odds Ratio 1.7858 1.6351-1.9503 Cohort 1.7127 1.5803-1.8563

Page 25: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

25

Example:Modeling Effect Modification

/*product term in model */proc logistic order=formatted; model lbw = smoking late_no_pnc smoking*late_no_pnc;run;

How do we compute the adjusted estimates? How do we compute the stratum-specific estimates? What is the interpretation of the 2 main effects?

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8580 0.0180 25267.2474 <.0001 smoking 1 0.5799 0.0450 166.2930 <.0001 late_no_pnc 1 0.2514 0.0413 36.9870 <.0001 smoking*late_no_pnc 1 0.0987 0.0858 1.3219 0.2503

Page 26: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

26

Example:Modeling Effect Modification

Only Stratum-Specific Measures of Associationare possible for variables included in a product term in a model

Computation for 1st stratum--- >The association between smoking and LBW among women with late/no PNC.

<---Computation for 2nd stratumThe association between smoking and LBW among women with early PNC.

971.1

10987.015799.0

010987.0015799.0

00987.012514.005799.08858.2

10987.012514.015799.08858.2

e

eee

7859.1

15799.0

015799.0

00987.002514.005799.08858.2

00987.002514.015799.08858.2

e

eee

Page 27: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

27

Modeling Effect Modification

Computation of joint effect: Computation of Separate Effects:Both smoking and late / no PNC Smoking only: Smoking & early PNC

Late / no PNC only: Nonsmoking & late / no PNC

The common reference group forall three odds ratios is nonsmoking and early PNC

5345.2e

e

eee

93.0

10987.012514.015799.0

010987.0012514.0015799.0

00987.002514.005799.08858.2

10987.012514.015799.08858.2

7859.1 e

eee

15799.0

015799.0

00987.002514.005799.08858.2

00987.002514.015799.08858.2

2858.1 e

eee

12514.0

000987.0012514.0005799.0

00987.002514.005799.08858.2

00987.012514.005799.08858.2

Page 28: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

28

Modeling Effect Modification

Whether considering stratum-specific or joint and separate effects in modeling, the test for interaction is the test of the product term in the model.

eractionint

2eractionint2

1 bVariance0b

Page 29: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

29

Outcome = vA + vB + vA*vB

Either way, the null hypothesis of no interaction is

Ho: vA*vB = 0.

Modeling Effect Modification

1b1b1b

0b0b0bb

0b0b1bb

0b1b0bb

1b1b1bb

vAvB*vAvA

2v*1v2v1v0

2v*1v2v1v0

vB*vAvBvA0

vB*vAvBvA0

eeee

ee

2 Stratum 1 Stratum

1b1b1b1b1b

0b0b0bb

0b1b0bb

0b0b0bb

0b0b1bAb

0b0b0bb

1b1b1bb

2v1v2v*1v2v1v

vB*vAvBvA0

vB*vAvBvA0

vB*vAvBvA0

vB*vAvB0

2v*1vvBvA0

vB*vAvBvA0

eeeee

ee

ee

Effect vB Sep. Effect vA Sep. vB&Effect vA Joint

Page 30: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

Modeling Effect Modification

While the p-value for the product term is the test for multiplicative interaction, what is the meaning of the beta coefficient itself?

For our example :model lbw = smoking late_no_pnc smoking*late_no_pnc1. The additional effect of smoking on LBW in the presence of late

or no pnc compared to the separate effect of smoking aloneand

2. The additional effect of late or no pnc in the presence of smoking compared to the separate effect of the late or no pnc alone

e0.0987 = 1.1037 = 1.9701 / 1.7858 (OR1 / OR2) 30

Page 31: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

31

Re-specify a model using joint and separate effects Instead of setting up stratified tables or using a product

term in a model, the data are arranged to look directly at the joint effect, and each of the separate effects

Construct a composite variable with categories corresponding to combinations of values from the original variables:

if VarA = 1 and VarB = 1 then joint_sep = 3; else if VarA = 1 and VarB = 0 then joint_sep = 2; else if VarA = 0 and VarB = 1 then joint_sep = 1; else if VarA = 0 and VarB = 0 then joint_sep = 0;

Modeling Effect Modification

Page 32: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

32

RR=2.08 Breslow-Day Test p <0.0001 RR=1.03

Example:Modeling Effect Modification

Table 1 of VarA by outcome Controlling for VarB=yes

VarA outcome

Frequency| Percent | Row Pct | Col Pct | yes |no | Total ---------+--------+--------+ yes | 365 | 2135 | 2500 | 7.30 | 42.70 | 50.00 | 14.60 | 85.40 | | 67.59 | 47.87 | ---- -----+--------+--------+ no | 175 | 2325 | 2500 | 3.50 | 46.50 | 50.00 | 7.00 | 93.00 | | 32.41 | 52.13 | ---------+--------+--------+ Total 540 4460 5000 10.80 89.20 100.00

Table 2 of VarA by outcome Controlling for VarB=no VarA outcome Frequency| Percent | Row Pct | Col Pct | yes |no | Total ---------+--------+--------+ yes | 135 | 1865 | 2000 | 1.35 | 18.65 | 20.00 | 6.75 | 93.25 | | 20.45 | 19.97 | ---------+--------+--------+ no | 525 | 7475 | 8000 | 5.25 | 74.75 | 80.00 | 6.56 | 93.44 | | 79.55 | 80 .03 | ---------+--------+--------+ Total 660 9340 10000 6.60 93.40 100.00

Page 33: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

33

proc logistic data = interaction order=formatted; model outcome = VarA VarB varA*VarBrun;

Standard WaldParameter DF Estimate Error Chi-Square Pr > ChiSq

Intercept 1 -2.6559 0.0452 3460.2770 <.0001VarA 1 0.0302 0.0999 0.0912 0.7626VarB 1 0.0692 0.0905 0.5857 0.4441VarA*VarB 1 0.7903 0.1390 32.3012 <.0001

VarB=Yes VarB=No

What are the adjusted estimates?

27.2e

ee

17903.010302.0

07903.010692.000302.06559.2

17903.010692.010302.06559.2

03.1e

ee

10302.0

07903.000692.000302.06559.2

07903.000692.010302.06559.2

Example:Using Logistic Regression

Page 34: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

34

Example:Using Logistic Regression

Computation of joint effect: Computation of Separate Effects:

Both VarA = 1 and VarB = 1 VarA only: VarA = 1 and VarB = 0

VarB only: VarA = 0 and VarB = 1

The common reference group forall three odds ratios is VarA = 0 and VarB = 0

43.2e

ee

17903.010692.010302.0

07903.000692.000302.06559.2

17903.010692.010302.06559.2

07.1e

ee

10692.0

07903.000692.000302.06559.2

07903.010692.000302.06559.2

03.1e

ee

10302.0

07903.000692.000302.06559.2

07903.000692.010302.06559.2

Page 35: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

35

Example:Using Logistic Regression

proc logistic order=formatted; class joint_sep(ref='VarA n,VarB n') / param=ref; model outcome = joint_sep;run;

With no product term in the model, the joint and separate effects are obtained with simple exponentiation of each beta coefficient.

Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.6559 0.0452 3460.2770 <.0001 joint_sep VarA y,VarB y 1 0.8897 0.0724 150.8699 <.0001 joint_sep VarA y,VarB n 1 0.0302 0.0999 0.0912 0.7626 joint_sep VarA n,VarB y 1 0.0692 0.0905 0.5857 0.4441

Point 95% Wald Effect Estimate Confidence Limits VarA y,VarB y vs VarA n,VarB n 2.434 2.112 2.806 VarA y,VarB n vs VarA n,VarB n 1.031 0.847 1.254 VarA n,VarB y vs VarA n,VarB n 1.072 0.898 1.280

Page 36: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

36

Example:Using Logistic Regression

Using the re-specified model to get Back to Stratum-Specific Estimates

Computation for 1st stratum--->where VarB = ‘yes’

Computation for 2nd stratum--->where covariate = ‘no’

27.2eee

10692.018897.0

10692.000302.008897.06559.2

00692.000302.018897.06559.2

03.1eee

10302.0

00692.000302.008897.06559.2

00692.010302.008897.06559.2

Page 37: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

37

Confounding and Effect Modification

Summary

In multivariable modeling (>1 variable in a model):

• In a model without a product term or joint and separate effects, it is not possible to compute stratum-specific estimates

• main effects are mutually adjusted measures of association

• To assess confounding, an adjusted estimate from a model has to be compared to a crude estimate

Page 38: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

38

Confounding and Effect Modification

Summary• In a model with a product term, it is not possible to

compute adjusted estimates• In a model with a product term, main effects are

measures of association within one stratum--the separate effects

• The p-value for the product term is the test for multiplicative interaction

• Joint and separate effects can be modeled directly, but then none of the beta coefficients provide a statistical test for interaction

Page 39: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

39

Modeling with Dummy Variables

Page 40: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

40

What if you want to model a nominal independent variable—a variable with more than 2 categories?

For example, what if you want to model three categories of race/ethnicity—African American, Hispanic, and White?

If the outcome variable were continuous, this situation would be Analysis of Variance (ANOVA), testing a hypothesis about the difference between three means:

Ho: 1 = 2 = 3

Modeling with Dummy Variables

Page 41: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

41

Modeling with Dummy Variables

In logistic regression, the hypothesis could be written:

Ho: ln(Odds1) = ln(Odds2) = ln(Odds3)

In log-binomial regression, the hypothesis could be written:

Ho: ln(prop1) = ln(prop2) = ln(prop3)

In any case, we need to “trick” the modeling procedure into handling the nominal variable appropriately.

Page 42: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

42

Modeling with Dummy Variables

Race/ethnicity as a single variable coded, for example 1=African American, 2=Hispanic, and 3=White, will be treated as ordinal in a regression model.

proc genmod order=formatted; model outcome = ethnicity / link=log dist=bin;run;

The incorrect interpretation of the resulting beta coefficient for ‘ethnicity’ would be, “for every unit change in ‘ethnicity’, there is a ____ change in the log(prevalence/risk) of the outcome”.

Page 43: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

43

Modeling with Dummy Variables

So, what’s the trick?

Dummy variables, or indicator variables, are a set of dichotomous variables which together capture the nominal construct of interest.

For a nominal variable with k categories, a set of k-1 dummy variables will capture the entire construct.

If variables for all k categories are created, there will be redundancy in the model.

Page 44: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

44

Modeling with Dummy Variables

Each dichotomous variable considered separately is indeed assumed to be ‘ordinal’ by the modeling procedure

For example, we know that ‘sex’ can be appropriately modeled even though it is a nominal variable.

The beta coefficient for sex is interpreted as the difference between means (OLS) or the difference between log proportions (log-binomial) for males and females.

Page 45: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

45

Modeling with Dummy Variables

Example: Dummy variables for race/ethnicity:

We create 2 dummies for our 3 category race/ethnicity variable:

Original Dummy1 Dummy 2Coding af_am hisp

African American 1 1 0 Hispanic 2 0 1White 3 0 0

Here, whites are being considered as the reference group.

Page 46: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

46

Modeling with Dummy Variables

Explicit coding in SAS: If we’re not sure which level we want as the reference group, we can code ‘k’ dummies and then decide which k-1 we will model:

if race = 1 then af_am = 1; else if race ^= . then af_am = 0; if race = 2 then hisp = 1; else if race ^= . then hisp = 0; if race = 3 then white = 1; else if race^= . then white = 0;

Page 47: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

47

Modeling with Dummy Variables

Now, race/ethnicity can be modeled as follows:

proc logistic order=formatted; model outcome = af_am hisp;run;

The beta coefficient for af_am is the difference in log odds between African Americans and whites; the beta coefficient for hisp is the difference in log odds between Hispanics and whites. Exponentiating the betas, we get the odds ratios for African Americans v. whites & for Hispanics v. whites.

Page 48: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

48

Modeling with Dummy Variables

SAS will automatically code dummy variables within a proc step, but then the analyst needs to control the coding and the reference group. For the race/ethnicity example, rather than using the dummies af_am and hisp which were coded in the data step, the following model could be specified using the CLASS statement in SAS:

proc logistic order=formatted; class race(ref=‘3’) / param = ref; model outcome = race;run;

Page 49: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

49

Modeling with Dummy Variables

So far, whichever way we created dummy variables, there was no direct comparison of African Americans to Hispanics. One way to do this is to rerun the model:

proc logistic 0rder=formatted; model outcome = af_am white;run; orproc logistic order=formatted; class race(ref=‘2’) / param = ref; model outcome = race;run; (hold the thought for other ways to get a desired comparison).

Page 50: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

50

Modeling with Dummy Variables

Suppose you have an apparently ordinal variable such as income:

Should you include it in a model in this ordinal form?proc logistic order=formatted; model outcome = income;run;

Should you use dummy variables? How many dummy variables will you create? What is the reference group? Why?

Page 51: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

51

Modeling with Dummy Variables

1 = < $25,000Suppose income is 2 = $25,000 - $49,999coded as follows: 3 = $50,000 - $74,999

4 = >= $75,000

What would be the best reference group?

Page 52: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

52

Modeling with Dummy Variables

1.proc logistic order=formatted; class income(ref=‘2’) / param=ref; model outcome = income;run;2.proc logistic order=formatted; class income(ref=‘1’) / param=ref; model outcome = income;run;

Page 53: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

53

Type 3 Analysis of Effects Wald Effect DF Chi-Square Pr > ChiSq smoking 1 255.0274 <.0001 late_no_pnc 1 43.3175 <.0001 momage 2 95.8383 <.0001

Modeling with Dummy Variables

The smoking and lbw example:

SAS createsthe dummyvariables for ‘momage’

For a variable with > 1 df, aType 3 test looks at the significance of the whole construct

Class Level Information Design Class Value Variables momage <=17 1 0 18-34 0 0 >=35 0 1

Page 54: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

54

Modeling with Dummy Variables

Analysis of Maximum Likelihood Estimates Standard Wald Parameter DF Estimate Error Chi-Square Pr > ChiSq Intercept 1 -2.8987 0.0191 23034.3553 <.0001 smoking 1 0.6115 0.0383 255.0274 <.0001 late_no_pnc 1 0.2403 0.0365 43.3175 <.0001 momage <=17 1 0.5951 0.0613 94.1146 <.0001 momage >=35 1 0.0971 0.0436 4.9638 0.0259 Odds Ratio Estimates Point 95% Wald Effect Estimate Confidence Limits smoking 1.843 1.710 1.987 late_no_pnc 1.272 1.184 1.366 momage <=17 vs 18-34 1.813 1.608 2.045 momage >=35 vs 18-34 1.102 1.012 1.200

Page 55: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

55

Modeling with Dummy Variables

Summary: Dummy Variables Dummy variables are mutually exclusive, unlike distinct

dichotomous variables. A woman can only be coded ‘1’ on one of the ‘momage’ dummies, while she can be coded ‘1’ or ‘0’ on both of the distinct, non-mutually exclusive dichotomous variables for ‘smoking’ and ‘late_no_pnc’

Dummy variables are used for categorical variables:1. to examine an inherently nominal construct2. to examine an apparently ordinal construct as though

it were nominal to empirically assess the meaning of each category

Page 56: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

56

Modeling with Dummy Variables

Dummy variables can also be used to combine multiple variables into a single composite variable. For example, dummy variables could be used to create an age / education measure or a SES index that synthesizes several socio-demographic variables.

We've already seen how this approach can be used to handle effect modification by directly modeling joint and separate effects.

Page 57: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

57

Custom Estimates

Just as a stratum-specific estimate when modeling effect modification may involve more than one beta coefficient, there are other contrasts that may also involve multiple coefficients.

For example, suppose this model with no interaction term is run:

proc logistic order=formatted; model outcome = smoking medrisk age35;run;

Page 58: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

58

Custom Estimates

One might want to ask: What is the OR comparing a high risk smoker to a low risk nonsmoker, among women age 35 and older?

21

3210

3210

bb

135ageb0medriskb0smokebb

135ageb1medriskb1smokebb

eee

Page 59: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

59

Custom Estimates

In the model with 2 dummy variables for race / ethnicity, we can compare whichever groups we are interested in —we are not constrained by the coded reference group:

proc logistic order=formatted; model outcome = af_am hisp;run;

The OR comparing African-Americans to Hispanics is: 21

210

210

bb

)1hisp(b)0am_(afbb

)0hisp(b)1am_(afbb

eee

Page 60: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

60

Custom Estimates

Whenever multiple beta coefficients are involved in an estimate, the confidence interval is calculated as follows:

For dichotomous variables coded 1 and 0, the formula for the confidence interval reduces to:

Page 61: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

61

Custom Estimates

A confidence interval when more than one beta coefficient is part of the estimate, requires the variance of each beta and the covariance(s) between betas

/*no product term in model */proc logistic order=formatted; model lbw = smoking late_no_pnc / covb;run;

Estimated Covariance Matrix late_no_Parameter Intercept smoking pncIntercept 0.00031 -0.00024 -0.00025smoking -0.00024 0.001462 -0.00013late_no_pnc -0.00025 -0.00013 0.001307

Page 62: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

62

Custom Estimates

OR for LBW, comparing smokers who had late/no pnc to nonsmokers who entered pnc early:

Standard error of this complex estimate (on the log scale):

sqrt(0.001462+0.001307+(2*-0.00013))=0.0501

95% CI : = e0.8803 +/- 1.96*0.0501

= e0.8803 +/- 0.0982

= 2.19-2.68

Why is this joint effect different than the one on slide 26?

4116.2e

e

eee

8803.0

2739.06064.0

012739.0016064.0

02739.006064.08622.2

12739.016064.08622.2

Page 63: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

63

Custom Estimates

Automating Custom Contrasts

/*no product term in model */proc logistic order=formatted;model lbw = smoking late_no_pnc / covb;contrast 'sm & late_no_pnc v. neither'

smoking 1 late_no_pnc 1 / estimate = exp; run;

Contrast Estimate Standard Error

Confidence Limits Wald Chi-Square

Pr > ChiSq

sm & late_no_pnc v. neither 2.4117 0.1210 2.1857 2.6610 307.6292 <.0001

Page 64: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

64

Custom Estimates

/*product term in model */proc logistic order=formatted data=one; model lbw = smoking late_no_pnc smoking*late_no_pnc; contrast 'sm & late_no_pnc'

smoking 1 late_no_pnc 1 smoking*late_no_pnc 1 / estimate=exp;

contrast 'smoking, given late/no pnc' smoking 1 smoking*late_no_pnc 1 / estimate=exp;

contrast 'smoking, given early pnc' smoking 1 / estimate = exp;

run;

Why is the joint effect from this model different than from the previous one?

Contrast Estimate Standard Error

Confidence Limits Wald Chi-Square

Pr > ChiSq

sm & late_no_pnc v. neither 2.5344 0.1659 2.2292 2.8814 201.8238 <.0001 smoking, late/no pnc 1.9710 0.1441 1.7079 2.2748 86.1073 <.0001 smoking, early pnc 1.7858 0.0803 1.6351 1.9503 166.2930 <.0001

Page 65: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

65

Custom Estimates

Log-binomial Regression for LBWModel Considering Effect Modification and

Adjustment for Confounding by Other Covariates

proc genmod order=formatted;class ageparity/ param=ref;model lbw = smoking late_no_pnc ageparity

matrisk smoking*ageparity /dist=bin link=log;

run;“Ageparity” is an index variable considering levels

of maternal age and parity simultaneously

Page 66: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

66

Custom Estimates

Log-binomial Regression for LBWModel Considering Effect Modification and

Adjustment for Confounding by Other Covariates

LR Statistics For Type 3 Analysis Chi- Source DF Square Pr > ChiSq smoking 1 1.61 0.2038 late_no_pnc 1 29.74 <.0001 ageparity 3 258.69 <.0001 matrisk 1 605.98 <.0001 smoking*ageparity 3 47.42 <.0001

Page 67: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

67

Custom Estimates

Log-binomial Regression for LBWModel Considering Effect Modification and

Adjustment for Confounding by Other CovariatesAnalysis Of Parameter Estimates Standard Wald 95% Chi- Parameter DF Estimate Error Confidence Limits Square Pr > ChiSq Intercept 1 -2.7162 0.0437 -2.8019 -2.6306 3860.09 <.0001 smoking 1 0.1227 0.0949 -0.0633 0.3087 1.67 0.1961 late_no_pnc 1 0.1922 0.0345 0.1245 0.2599 30.96 <.0001 ageparity 20+ multips 1 -0.6834 0.0461 -0.7738 -0.5930 219.39 <.0001 ageparity 20+ primips 1 -0.2916 0.0480 -0.3857 -0.1975 36.89 <.0001 ageparity Teen multips 1 -0.3247 0.1073 -0.5349 -0.1145 9.17 0.0025 matrisk 1 0.6951 0.0278 0.6406 0.7497 623.89 <.0001 smoking*ageparity 20+ multips 1 0.6012 0.1054 0.3947 0.8078 32.55 <.0001 smoking*ageparity 20+ primips 1 0.2081 0.1218 -0.0306 0.4469 2.92 0.0876 smoking*ageparity Teen multips 1 0.1231 0.2127 -0.2939 0.5400 0.33 0.5629 Scale 0 1.0000 0.0000 1.0000 1.0000

Page 68: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

68

Custom Estimates

Log-binomial Model Considering Effect Modification and Adjustment for Confounding by Other Covariates

Yields Adjusted, Stratum-Specific Relative Prevalences

392.1e

ee

)1(2081.0)1(1227.0

)0(2081.0)1(2916.0)0(1227.07162.2

)1(2081.0)1(2916.0)1(1227.07162.2

Smoking v. Nonsmoking, Among 20+ Primips

Smoking v. Nonsmoking, Among 20+ Multips

062.2e

ee

)1(6012.0)1(1227.0

)0(6012.0)1(6834.0)0(1227.07162.2

)1(6012.0)1(6834.0)1(1227.07162.2

Adjusted for prenatal care and maternal medical risk

Page 69: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

69

Custom Estimates

Log-binomial Model Considering Effect Modification and Adjustment for Confounding by Other Covariates

Yields Adjusted, Stratum-Specific Relative Prevalences

131.1e

ee

)1(1227.0

)0(1227.07162.2

)1(1227.07162.2

Smoking v. Nonsmoking, Among Teen Primips

Smoking v. Nonsmoking, Among Teen Multips

279.1e

ee

)1(1231.0)1(1227.0

)0(1231.0)1(3247.0)0(1227.07162.2

)1(1231.0)1(3247.0)1(1227.07162.2

Adjusted for prenatal care and maternal medical risk

Page 70: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

70

Custom Estimates withBinomial Regression

Automating Custom Estimatesproc genmod order=formatted; class ageparity/ param=ref; model lbw = smoking late_no_pnc ageparity

matrisk smoking*ageparity /dist=bin link=log;

estimate ‘sm 20+ mu’smoking 1 smoking*ageparity 1 0 0 / exp;

estimate ‘sm 20+ pr’smoking 1 smoking*ageparity 0 1 0 / exp;

estimate ‘sm tn mu’smoking 1 smoking*ageparity 0 0 1 / exp;

estimate ‘sm tn pr’smoking 1 smoking*ageparity 0 0 0 / exp;

run;

The estimate statements mirror the equations on the previous 2 slides. They estimate the effect of smoking, stratified by the ageparity index, adjusting for or regardless of the values of late_no_pnc and matrisk.

Page 71: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

71

Model Building Strategies

Use the results of stratified analysis to inform the initial model-building phase.

Continue exploring confounding and effect modification with more variables.

Decide on "rules" for inclusion of variables in a model, including components of interactions: statistical testing, conceptual rationale, etc.

Page 72: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

72

Some approaches to organizing variables and reducing the analysis burden:

Group according to domain: Sociodemographic Behavioral Medical risk Health care system

Group according to evidence from previous studies Examine correlations and identify proxy variables Subset analysis

Model Building Strategies

Page 73: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

73

Model Building Strategies

Coding Issues Level of measurement

– Dummy variable coding—how many? what’s the reference group?

– Ordinal—how refined?– Continuous

Conceptual issues– risk factors that only apply to some subjects, e.g.

history of …– causal pathway– combining variables; index construction

Page 74: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

74

Modeling Building Strategies

Interaction can get even more complicated when variables are sets of dummy variables. There will be (k-1) x (k-1) product terms in a model. Suppose X1 has 4 categories and X2 has 3 categories—there will be 3 x 2 = 6 beta coefficients for interaction in the model.

A Type 3 test with 6 df will provide an overall test of whether there is any interaction between X1 and X2; each of the 6 beta coefficients will provide a test of interaction between particular combination of X1 and X2

Page 75: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

75

Modeling Effect Modification

When models become complicated due to multiple product terms—terms that involve sets of dummy variables, then interpretation becomes very difficult. In this situation, stratified modeling can be useful.

While stratified models with zero or very few product terms are easier to interpret, though, it is not possible to directly test for effect modification across the strata.

Page 76: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

76

Model Building Strategies

Model Selection Routines

Stepwise Forward Backward Best Subsets Chunkwise

... model building can involve manually mimicking these approaches ...

Model selection methods can guide and inform the model building process, but they should never be a substitute for applying epidemiologic and other substantive information to choosing a final model.

Page 77: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

77

Model Building Strategies

Model building is an iterative process in which many different models may be compared before deciding on a final model

• Sometimes nested models are compared—e.g. compare a model with 5 variables to one with 7 variables, including the original 5

• Sometimes models of the same size are compared—e.g. compare two 7 variable models with somewhat different variables

Page 78: Linear Models II Wednesday, May 30, 10:15-12:00

t

-3 -2 -1 0 1 2 3

0.00.1

0.20.3

0.4

Density of Student's t with 10 d.f.

x

0 5 10 15

0.00.1

0.20.3

0.40.5

0.6

Chi-Square Densities

1 d.f.

2 d.f.

3 d.f.5 d.f.

8 d.f.

Disease or Other Health Outcome Yes No

Yes

a

b

a + b (n1)

Exposure or Person, Place,

or Time Variable No c d

c + d (n2)

a + c (m1)

b + d (m2)

a + b + c + d N

78

Model Building Strategies

Model building proceeds according to the nature of the research questions—the hypotheses being tested

Sometimes, regression modeling is carried out to assess one association; other variables are included to adjust for confounding or account for effect modification. Here, the focus is on obtaining the ‘best’ estimate of the single association.

Sometimes, regression modeling is carried out in order to assess multiple, competing exposures, or to identify a set of variables that together predict the outcome.