6. multiple regression - proc glmpublicifsv.sund.ku.dk/~kach/sas/6. the general linear...multiple...

43
6. Multiple regression - PROC GLM Karl B Christensen http://192.38.117.59/ ~ kach/SAS Karl B Christensenhttp://192.38.117.59/ ~ kach/SAS 6. Multiple regression - PROC GLM

Upload: others

Post on 18-Aug-2021

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

6. Multiple regression - PROC GLM

Karl B Christensenhttp://192.38.117.59/~kach/SAS

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 2: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Contents

Analysis of covariance (ANCOVA)

the general linear model

Interaction

Multiple regression

Automatic variable selection

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 3: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Data example: lung capacity

Data from 32 patients subject to a heart/lung transplantation.TLC (Total Lung Capacity) is determined from whole-bodyplethysmography. Are men and women different with respect tototal lung capacity?

OBS SEX AGE HEIGHT TLC

1 F 35 149 3.40

2 F 11 138 3.41

3 M 12 148 3.80

. . . . .

. . . . .

29 F 20 162 8.05

30 M 25 180 8.10

31 M 22 173 8.70

32 M 25 171 9.45

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 4: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Box plots for comparison of sex groups

PROC GPLOT DATA=TLCdata;

PLOT tlc*sex / HAXIS=AXIS1 VAXIS=AXIS2;

AXIS1 LABEL=(H=3) VALUE=(H=2) OFFSET =(6,6)CM;

AXIS2 LABEL=(H=3 A=90) VALUE =(H=2);

SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 5: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Box plots for comparison of sex groups

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 6: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Group comparisons

Using t-tests1

PROC TTEST DATA=tlc;

CLASS sex;

VAR tlc height;

RUN;

Output

T-Tests

Variable Method Variances DF t Value Pr > |t|

TLC Pooled Equal 30 -3.67 0.0009

TLC Satterthwaite Unequal 29.7 -3.67 0.0009

Height Pooled Equal 30 -3.73 0.0008

Height Satterthwaite Unequal 29.5 -3.73 0.0008

Obvious sex difference for TLC as well as for Height

1Note that we can specify more than one variable in the VAR statement.Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 7: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Confounding when comparing groups

Occurs if the distributions of some other relevant explanatoryvariables differ between the groups. Here “relevant” meansthings we would have liked to be the same (or at least verysimilar) for everybody, because we think of it as noise ordistortion.

Can be reduced by performing a regression analysis with therelevant variables as covariates.

Confounding could be a problem in the current example, if weintended to compare the lung function between men andwomen of similar height

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 8: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Relation between TLC and HEIGHT

PROC GPLOT DATA=TLCdata;

PLOT tlc*height=sex / HAXIS=AXIS1 VAXIS=AXIS2;

AXIS1 LABEL=(H=4) VALUE =(H=3) MINOR=NONE;

AXIS2 LABEL=(A=90 H=4) VALUE=(H=3) ORDER =(3 TO 10) MINOR=NONE;

SYMBOL1 C=RED V=DOT H=2 I=SM75S L=1 W=3 MODE=INCLUDE;

SYMBOL2 C=BLUE V=CIRCLE H=2 I=SM75S L=41 W=3 MODE=INCLUDE;

LEGEND1 LABEL =(H=2.5) VALUE =(H=2 JUSTIFY=LEFT);

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 9: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Relation between TLC and HEIGHT

���

������

������

�� �� �� �� � � ��

��� � �

���

������

������

�� �� �� �� � � ��

��� � �

(Plotted using I=RL)

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 10: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Analysis of covariance

Comparison of parallel regression lines

Model: ygi = αg + βxgi + εgi g = 1, 2; i = 1, · · · , ngHere α2 − α1 is the expected difference in the responsebetween the two groups for fixed value of the covariate, thatis, when comparing any two subjects who have the same valueof (match on) the covariate x (“adjusted for x”).

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 11: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

But what if the lines are not parallel?More general model: ygi = αg + βgxgi + εgi

If β1 6= β2 there is an interaction between Height and Sex

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 12: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Interaction

Interaction between Height and Sex

The effect of height depends on sex

The difference between men and women depends on height

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 13: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Model with interaction

PROC REG only works for linear covariates. Group variables can behandled directly in PROC GLM by specifying the group variable as aCLASS variable.

PROC GLM DATA=TLCdata;

CLASS sex;

MODEL tlc=sex height sex*height / SOLUTION;

RUN; QUIT;

The option SOLUTION is needed if we want to see the regressionparameter estimates.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 14: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

PROC GLM output

The GLM Procedure

Class Level Information

Class Levels Values

Sex 2 F M

Number of Observations Read 32

Number of Observations Used 32

Dependent Variable: TLC

Sum of

Source DF Squares Mean Square F Value Pr > F

Model 3 42.81845030 14.27281677 10.28 <.0001

Error 28 38.89354970 1.38905535

Corrected Total 31 81.71200000

R-Square Coeff Var Root MSE TLC Mean

0.524017 19.36069 1.178582 6.087500

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 15: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

PROC GLM output

Source DF Type I SS Mean Square F Value Pr > F

Sex 1 25.31161250 25.31161250 18.22 0.0002

Height 1 17.48233164 17.48233164 12.59 0.0014

Height*Sex 1 0.02450616 0.02450616 0.02 0.8953

Source DF Type III SS Mean Square F Value Pr > F

Sex 1 0.07951043 0.07951043 0.06 0.8127

Height 1 17.36061701 17.36061701 12.50 0.0014

Height*Sex 1 0.02450616 0.02450616 0.02 0.8953

The interaction is not significant.

The Type III p-values for the two main effects should neverbe used for anything in a model including the interaction!

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 16: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

PROC GLM output

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept -5.827971333 B 4.97706299 -1.17 0.2515

Sex F -1.727664141 B 7.22116113 -0.24 0.8127

Sex M 0.000000000 B . . .

Height 0.073564647 B 0.02854339 2.58 0.0155

Height*Sex F 0.005743619 B 0.04324220 0.13 0.8953

Height*Sex M 0.000000000 B . . .

These are the regression parameters

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 17: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Where are the two lines in the output?

Line for males (the reference group):

TLC = -5.828 + 0.07356 × Height

Line for females:

TLC = −5.828 + (−1.727) + (0.07356 + 0.00574)× Height

= −7.555 + 0.07930× Height

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 18: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Same model, new parameterization

PROC GLM DATA=TLCdata;

CLASS sex;

MODEL tlc=sex sex*height / NOINT SOLUTION;

RUN; QUIT;

Output (edited)

Standard

Parameter Estimate Error t Value Pr > |t|

Sex F -7.555635475 5.23201797 -1.44 0.1598

Sex M -5.827971333 4.97706299 -1.17 0.2515

Height*Sex F 0.079308266 0.03248326 2.44 0.0212

Height*Sex M 0.073564647 0.02854339 2.58 0.0155

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 19: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Two different parameterizations

PROC GLM DATA=TLCdata;

CLASS sex;

MODEL tlc=sex height sex*height / SOLUTION;

RUN; QUIT;

(extrapolated) level at Height=0 for reference group

(extrapolated) difference between groups at Height=0

An effect of Height (slope) for the reference group

The difference between the slopes for the two sexes

PROC GLM DATA=TLCdata;

CLASS sex;

MODEL tlc=sex sex*height / NOINT SOLUTION;

RUN; QUIT;

The (extrapolated) level at Height=0 for each group

The effect of Height (the slope) for each group

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 20: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Model without interaction

No indication of interaction, we omit the term

PROC GLM DATA=TLCdata;

CLASS sex;

MODEL tlc=sex height / SOLUTION CLPARM;

RUN; QUIT;

Are there also other possible parameterizations in this model? (andwhich one should we use?)

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 21: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Source DF Type I SS Mean Square F Value Pr > F

Sex 1 25.31161250 25.31161250 18.86 0.0002

Height 1 17.48233164 17.48233164 13.03 0.0011

Source DF Type III SS Mean Square F Value Pr > F

Sex 1 3.24523555 3.24523555 2.42 0.1308

Height 1 17.48233164 17.48233164 13.03 0.0011

Note: The effect of sex seen in the group comparison hasdisappeared!!

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 22: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Model without interaction - results

Standard

Parameter Estimate Error t Value Pr > |t|

Intercept -6.263569903 B 3.67983781 -1.70 0.0994

Sex F -0.770859760 B 0.49571132 -1.56 0.1308

Sex M 0.000000000 B . . .

Height 0.076067188 0.02107532 3.61 0.0011

Parameter 95% Confidence Limits

Intercept -13.78968328 1.262543472

Sex F -1.784703236 0.242983716

Sex M . .

Height 0.032963311 0.119171065

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 23: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Confounding?

In this example it seems that

1 The observed difference in lung capacity between men andwomen can be explained by height differences

2 However, there may still be a sex difference for persons of thesame height (women vs. men), estimated as−0.77± 2× 0.50 = (−1.78, 0.24)

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 24: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

But. . .

what if we did not have the two very short men to pull the line forthe men? Let us look at the subjects above 152 cm (using thestatement WHERE height>152; in PROC GLM). Test of interaction:

Source DF Type I SS Mean Square F Value Pr > F

Sex 1 22.10748067 22.10748067 19.92 0.0002

Height 1 0.25519165 0.25519165 0.23 0.6361

Height*Sex 1 2.76108429 2.76108429 2.49 0.1284

Estimated additive effects

Parameter Estimate 95% Confidence Limits Pr > |t|

Intercept 10.53318707 B -3.47974795 24.54612210 0.1339

Sex F -2.04829071 B -3.40956535 -0.68701607 0.0048

Sex M 0.00000000 B . . .

Height -0.01778053 -0.09665451 0.06109345 0.6459

A somewhat different conclusion. . .

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 25: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Plots for model checking in the HTML output:

ODS GRAPHICS ON;

PROC GLM DATA=TLCdata PLOTS=( DIAGNOSTICS RESIDUALS(SMOOTH ));

CLASS sex;

MODEL tlc=sex height sex*height / SOLUTION;

OUTPUT OUT=WithResid RSTUDENT=NormResidWithoutCurrent;

RUN; QUIT;

PROC GPLOT DATA=WithResid;

PLOT NormResidWithoutCurrent * sex;

SYMBOL1 V=CIRCLE H=2 I=BOX10TJ W=3;

RUN; QUIT;

In addition to the ODS GRAPHICS plots for PROC GLM, residualsshould be plotted against each of the CLASS variables (here sex) inorder to check variance homogeneity

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 26: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Exercise: Another look the Juul data

1 Get the data into SAS using a libname statement.

2 Create a new data set including only individuals above 25years, and make a new variable with log-transformed SIGF1.

3 Use PROC GPLOT to plot the relationship between age andlog-transformed SIGF-I.

4 Make separate regression lines for men and women.

5 Do a regression analysis to explore if slopes are equal in menand women.

6 Give an estimate for the difference in slopes, with 95%confidence interval.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 27: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Multiple regression. General linear model (GLM).

Data: n sets of observations, made on the same ’unit’:

unit x1....xp y

1 x11....x1p y12 x21....x2p y23 x31....x3p y3. . . . . . . .n xn1....xnp yn

The linear regression model with p explanatory variables(covariates) is written:

y = β0 + β1x1 + · · ·+ βpxp + ε

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 28: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Interpretation of regression coefficients

ModelYi = β0 + β1Xi1 + β2Xi2 + ...+ βpXip + ε

where ε ∼ N(0, σ2). Consider two subjects:A has covariate values (X1,X2, . . . ,Xp)B has covariate values (X1 + 1,X2, . . . ,Xp)Expected difference in the response (B − A)

[β0 + β1(X1 + 1) + β2Xi2 + ...]− [β0 + β1X1 + β2Xi2 + ...] = β1

This means that β1 is the effect of one unit’s difference in X1 forfixed levels of the other variables (X2, . . . ,Xp)

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 29: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

School-age obesity data

School-age obesity score versus height and weight measured at 1year of age

Obs Obesity Height1 Weight1

1 -0.06967 79 11.70

2 -0.79982 72 9.55

3 2.67337 76 9.95

. . . .

. . . .

196 0.47968 78 10.60

197 -0.61818 77 10.10

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 30: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

School-age obesity data

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 31: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

SAS code

PROC REG DATA=SchoolObesity;

MODEL Obesity = Height1 Weight1 / CLB;

RUN; QUIT;

(part of the) output

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 0.18668 1.16769 0.16 0.8731

Height1 1 -0.06644 0.02163 -3.07 0.0024

Weight1 1 0.47653 0.07097 6.71 <.0001

Parameter Estimates

Variable DF 95% Confidence Limits

Intercept 1 -2.11631 2.48967

Height1 1 -0.10910 -0.02379

Weight1 1 0.33656 0.61650

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 32: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Interpretation of regression parameters

Remember that βj is the effect of the j’th explanatory variable,corrected for the effect of the other explanatory variables, that is,when comparing any two subject who match on all the othervariables.

The effect of Height1 corrected for the effect of Weight1 isfound to be β̂1 = −0.066 (95% CI: −0.109 to −0.024),p = 0.0024

In the univariate model without correction for Weight1 we gotβ̂1 = +0.048 (95% CI: +0.019 to +0.077), p = 0.0014

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 33: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Interpretation of regression parameters

The parameter for height answers two different questionsdepending on whether or not adjusted for weight:

Unadj. ’Are big 1-year-old children generally fatter during schoolage?’

Adj. ’Are slim 1-year-old children generally slimmer during schoolage?’

Both questions are relevant and both answers are valid!

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 34: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Relative effects and products or ratios of covariates

Both issues are solved by log-transforming the covariate(s)!Example: BMI = Weight/Height2 is a ratio measure. Logarithmicrules give

log(BMI) = log(Weight) − 2·log(Height)

so β·log(BMI) = β·log(Weight) −2β·log(Height)Choice of log-transformation of covariates

Use of log10 means that the regression parameter shows theeffect of two subjects differing by a factor 10. Do not uselog10 unless it is likely for two subjects to differ by a factor 10!

Use log2 [SAS code: LOG2(·)] when doubling is likely.

Use a covariate calculated as

XX=LOG(·)/LOG(1.1)

if 10% differences are likely.

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 35: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

BMI at age 1 appropriate predictor for school-age obesity?

1 BMI is a ratio measure involving weight and height, so weshould investigate log-transformed weight and height

2 Doubling is not a realistic difference, so we look at “per 10%”

DATA School1;

SET SchoolObesity;

HeightPer10pct = LOG(Height1 )/LOG (1.1);

WeightPer10pct = LOG(Weight1 )/LOG (1.1);

RUN;

PROC REG DATA=School1;

MODEL Obesity = HeightPer10pct WeightPer10pct / CLB;

TestBMI: TEST HeightPer10pct = -2* WeightPer10pct;

RUN; QUIT;

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 36: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Part of output

Parameter Estimates

Parameter Standard

Variable DF Estimate Error t Value Pr > |t| 95% Conf. Limits

Intercept 1 9.80370 5.95742 1.65 0.1015 -1.94593 21.55334

HeightPer10pct 1 -0.45993 0.15673 -2.93 0.0037 -0.76904 -0.15082

WeightPer10pct 1 0.45679 0.06714 6.80 <.0001 0.32437 0.58922

Test TestBMI Results for Dependent Variable Obesity

Mean

Source DF Square F Value Pr > F

Numerator 1 15.72218 20.34 <.0001

Denominator 194 0.77312

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 37: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Conclusion:

1 10% higher weight increases the expected school-age obesityscore by 0.456 (95% CI: 0.324 - 0.589),

2 10% lower height increases the expected school-age obesityscore by 0.460 (95% CI: 0.151 - 0.769),

3 BMI at age 1 year is not an appropriate choice (p < 0.0001).

4 Since the regression parameters for the log-transformed weightand height are of the same size, but with opposite signs, anappropriate predictor would be the ratio weight/height

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 38: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Model selection

Lung function - 25 patients with cystic fibrosis2

2O’Neill et al (1983).Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 39: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Which covariates have a univariate effect on the outcome PEmax?

Are these the variables to be included in the model?

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 40: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Model with all covariates

PROC REG DATA=pemax;

MODEL pemax=age sex height weight bmp fev1 rv frc tlc;

RUN; QUIT;

output

Parameter Standard

Variable DF Estimate Error t Value Pr > |t|

Intercept 1 176.05821 225.89116 0.78 0.4479

age 1 -2.54196 4.80170 -0.53 0.6043

sex 1 -3.73678 15.45982 -0.24 0.8123

height 1 -0.44625 0.90335 -0.49 0.6285

weight 1 2.99282 2.00796 1.49 0.1568

bmp 1 -1.74494 1.15524 -1.51 0.1517

fev1 1 1.08070 1.08095 1.00 0.3333

rv 1 0.19697 0.19621 1.00 0.3314

frc 1 -0.30843 0.49239 -0.63 0.5405

tlc 1 0.18860 0.49974 0.38 0.7112

No significant effects. . .

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 41: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Automatic variable selection: Forward selection

Start with no covariates. In every step, add the most significantvariable

PROC REG DATA=pemax;

MODEL pemax=age sex height weight bmp fev1 rv frc tlc

/ SELECTION=FORWARD;

RUN; QUIT;

Final model: Weight BMP FEV1

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 42: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

Automatic variable selection: Backward elimination

Start with all covariates. At each step, omit the least significantvariable

PROC REG DATA=pemax;

MODEL pemax=age sex height weight bmp fev1 rv frc tlc

/ SELECTION=BACKWARD;

RUN; QUIT;

Final model: Weight BMP FEV1

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM

Page 43: 6. Multiple regression - PROC GLMpublicifsv.sund.ku.dk/~kach/SAS/6. The general linear...Multiple regression - PROC GLM Exercise: Another look the Juul data 1 Get the data into SAS

But. . .

There is no guarantee that these automatic methods will give usthe same result:

Had observation no. 25 not been in the data set, backwardelimination would have excluded Height as the first variable,while forward selection would have included Height as the firstvariable!

A ’best’ automatic method has not been identified, butbackward elimination is often recommended over forwardselection.

WARNING: Output from selected model does not take modelselection uncertainty into account: The output (regressioncoefficients and p-values) is identical to what would have beenobtained had we fitted the final model with out doing anymodel selection. The importance of the selected covariates isover-estimated!

Karl B Christensenhttp://192.38.117.59/~kach/SAS 6. Multiple regression - PROC GLM