multiple logistic regression rsquare, lackfit, selection, and interactions

28
Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Upload: stephany-chastity-barker

Post on 18-Dec-2015

226 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Multiple Logistic Regression

RSQUARE, LACKFIT, SELECTION, and interactions

Page 2: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

IntroductionJust as with linear regression, logistic

regression allows you to look at the effect of multiple predictors on an outcome.

Consider the following example: 15- and 16-year-old adolescents were asked if they have ever had sexual intercourse. The outcome of interest is intercourse. The predictors are race (white and black) and gender (male and female).

Example from Agresti, A. Categorical Data Analysis, 2nd ed. 2002.

Page 3: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Here is a table of the data:

Intercourse

Race Gender Yes No

White Male 43 134

Female 26 149

Black Male 29 23

Female 22 36

Page 4: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Entering the Data in SAS

The data set intercourse is created with the variables “white” (1 if white, 0 if black), “male” (1 if male, 0 if female), and “intercourse” (1 if yes, 0 if no). We want to examine the odds of having intercourse with race and gender as predictors.

Enter the code on the next slide into SAS.

Page 5: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Creating the Data Set Intercourse

DATA intercourse; INPUT white male intercourse count; DATALINES; 1 1 1 43 1 1 0 134 1 0 1 26 1 0 0 149 0 1 1 29 0 1 0 23 0 0 1 22 0 0 0 36 ; RUN;

Page 6: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Multiple Logistic Regression: Main Effects

First look at the effect of race and gender with no interaction. The SAS code is similar to that of simple logistic regression; one more independent variable has been added to the model statement.

Page 7: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Entering the following code into SAS:PROC LOGISTIC DATA = intercourse descending; weight count; MODEL intercourse = white male/rsquare lackfit; RUN;

• “descending” models the probability that intercourse = 1 (yes) rather than = 0 (no).

• “rsquare” requests the R2 value from SAS; it is interpreted the same way as the R2 from linear regression.

• “lackfit” requests the Hosmer and Lemeshow Goodness-of-Fit Test. This tells you if the model you have created is a good fit for the data.

Page 8: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

SAS Output: R2

Page 9: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Interpreting the R2 value

The R2 value is 0.9907. This means that 99.07% of the variability in our outcome (intercourse) is explained by including gender and race in our model.

Page 10: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

PROC LOGISTIC Output

Page 11: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Interpreting OutputNotice that the race and gender terms are

both statistically significant (p < 0.0001 and p = 0.0040, respectively).

The logistic regression model is:log(odds) = β0 + β1(white) + β2(male)log(odds) = -0.4555 – 1.3135(white) + 0.6478(male)

The odds of having intercourse is 73.1% (1-0.269) lower for whites than blacks.

The odds of having intercourse is 1.911 times greater for males versus females.

Page 12: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Suppose you wanted to know the odds of intercourse for black males versus white females:Log(odds)black males = β0 + β1(0) + β2(1)

Log(odds)white females = β0 + β1(1) + β2(0)

Log(OR) = β0 + β2 – [β0 + β1] = β2 – β1

Log(OR) = 0.6478 – (-1.3135) = 1.9613

OR = exp(1.9613) = 7.11

Black males have a 7.11 times greater odds of having intercourse than white females.

Page 13: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Hosmer and Lemeshow GOF Test

Page 14: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Interpreting the H-L GOF TestThe Hosmer and Lemeshow Goodness-of-Fit Test

tests the hypotheses:Ho: the model is a good fit, vs.Ha: the model is NOT a good fitWith this test, we want to FAIL to reject the null

hypothesis, because that means our model is a good fit (this is different from most of the hypothesis testing you have seen).

Look for a pvalue > 0.10 in the H-L GOF test. This indicates the model is a good fit.

In this case, the pvalue = 0.2419, so we do NOT reject the null hypothesis, and we conclude the model is a good fit.

Page 15: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Let’s consider an interaction between race and gender:

PROC LOGISTIC DATA = intercourse descending; weight count; MODEL intercourse = white male white*male/rsquare lackfit; RUN;

We have added a third term to the model: the interaction between race and gender (“white*male”). We did not need to create this variable in the data set.

Page 16: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

The new R2 value is 0.9908, which is barely higher than the R2 from the model with only the main effects. Adding the interaction did not help explain more variance in the model.

Page 17: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Logistic Regression Output

Page 18: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

The interaction is not significant (p = 0.8092). We probably will not want to include it in our model. If it were significant, the model would be:

log(odds) = β0 + β1(white) + β2(male) + β3(white*male)

log(odds) = -0.4925 -1.2534(white) + 0.7243(male) – 0.1151(white*male)

Page 19: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

H-L Goodness-of-Fit

Page 20: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

The pvalue of the Hosmer and Lemeshow GOF Test is 0.2439, which is not much greater than that of the previous model without the interaction. Therefore, we conclude the model with just race and gender, without the interaction, is sufficient.

Page 21: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Model Selection in SASOften, if you have multiple predictors and interactions in

your model, SAS can systematically select significant predictors using forward selection, backwards selection, or stepwise selection.

In forward selection, SAS starts with no predictors in the model. It then selects the predictor with the smallest pvalue and adds it to the model. It then selects another predictor from the remaining variables with the smallest pvalue and adds it to the model. It continues doing this until no more predictors have pvalues less than 0.05.

In backwards selection, SAS starts with all of the predictors in the model and eliminates the non-significant predictors one at a time, refitting the model between each elimination. It stops once all the predictors remaining in the model are statistically significant.

Page 22: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Forward Selection in SAS

We will let SAS select a model for us out of the three predictors: white, male, white*male. Type the following code into SAS:

PROC LOGISTIC DATA = intercourse descending; weight count; MODEL intercourse = white male white*male/selection = forward lackfit; RUN;

Page 23: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Output from Forward Selection: “white” is added to the model

Page 24: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

“male” is added to the model

Page 25: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

No more predictors are found to be statistically significant

Page 26: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

The Final Model:

Page 27: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

Hosmer and Lemeshow GOF Test: The model is a good fit

Page 28: Multiple Logistic Regression RSQUARE, LACKFIT, SELECTION, and interactions

You are now familiar with multiple logistic regression and model selection in SAS. If given multiple predictors, you have the tools to find an appropriate model that explains the outcome of interest.