Download - Multivariate vs Multivariable - Cook County Hospital

7/29/2019 Multivariate vs Multivariable - Cook County Hospital

1/7

Faculty Development Program

Clinical Epidemiology and Clinical Research

TOPIC: Biostat istics 5: M ultivariable a nd Logistic RegressionDATE: May (1:30PM 5:00PM)

LEADERS: Art Evans

OBJECTIVES:

1. Interpret a logistic regression equation.

2. Calculate OR and RR from logistic regression for continuous and categorical

predictor variables.

3. Describe the main assumptions of the logistic model.

4. Describe the main errors in studies that analyze data with logistic regression.

5. Check for interactions in linear and logistic models.

REQUIRED READINGS:

Norman and Streiner. PDQ Statistics. 2nd Ed. Pages 65-69 (ANCOVA); 116-117 (logistic

regression).

Norman and Streiner. Biostatistics: The Bare Essentials. Pages: 119-127.

Concato and Feinstein. The risk of determining risk with multivariable models. Ann

Intern Med. 1993;201-210.

PROBLEMS:

A randomized trial was performed testing a new treatment against placebo with mortality

at one-year as the outcome of interest.

A logistic regression model was used to assess the treatment effect while adjusting for

potential confounders.

Questions:

1. According to the logistic regression model, is there evidence of interaction?

2. Based on the first model (no confounding or interaction considered), what is the

OR that describes the treatment effect? Verify by calculating the OR in the raw

data.

3. What is the treatment OR after adjusting for the potential confounder? Is there

evidence of confounding? How do you make that decision?


2/7

Model without the confounder

beta coefficient P value

intercept -0.2

treatment (1=Tx, 0=C) -1.0


3/7

Multivariable Linear Regression

1. Confusion: multivariable vs. multivariate

Multivariable means that you are simultaneously considering more than one predictor

variable (independent; X), eg, Y = X1+X2+X3+X4

Multivariate usually means that you are simultaneously considering more than one

outcome variable (dependent; Y), eg, Y1+Y2+Y3 = X1+X2+X3+X4

2. Sample Size: Rough rule of thumb

Linear regression: 10 subjects for every potential predictor variable;

Logistic regression: 10 subjects in the smallestgroup of the dichotomous outcome

variable for every potential predictor variable;

Multivariate linear regression: 10 subjects for every potential variable, including

the multiple dependent (Y) variables.

Note: this is 10 subjects for everypotentialpredictor, not 10 for every significant

predictor in the final model! (This assumes you are interested in describing a

prediction model, rather than simply being interested in controlling for lots of

potential confounders while examining a specific exposuredisease relationship. If

the later is true, then the combination of potential confounders counts as 1 variable.

3. Interpretation of beta coefficients:

The importance of a beta coefficient must be interpreted in light of the units of the

particular X variable. For example, the beta coefficient for heightmeasured in yardswould be much smaller than ifheightwere measured in inches (which would be 36 times

bigger). Despite a difference of 36-fold between these two beta coefficients, their

importance is identical. Therefore, it is impossible to judge a beta coefficient without

knowing the units of measurement.

4. Interactions:

Always look for interactions among predictor variables. Two X variables may appear to

have no relationship with the outcome variable until an interaction term is also

considered in the model.

Interaction terms are the best method to test for important differences among

subgroups.

Very bad method: testing within each subgroup separately and then declaring

interaction present if one subgroup demonstrates a significant difference whereas in the

other subgroup there is no significant difference.

Always test for interaction before trying to simplify the model.

3


4/7

Always check for interaction before checking for confounding. (Remember: Adjusting

for confounding is similar to taking the average among the subgroups. Taking the

average is bad if there is important interaction, ie, the effect is markedly different among

subgroups.)

5. Confounding:If the primary goal is to estimate the effect of one X variable on Y, while adjusting for

possible confounding from other X variables, then see if the beta coefficient for the main

X changes when all the other Xs are added to the model. If it does, then there is some

confounding. If it changes a lot, then there is a lot of confounding.

6. Test linearity assumption:

For multivariable linear regression (single Y; multiple Xs), the assumptions are:

linear relationship between Y and Xs;

for all possible combinations of X, the distribution of Y is normal with a constantvariance.

Eyeball test: SPSS: Graphs: Scatter: Matrix: enter all Xs and Y: look at the row in the

matrix that compares Y to each of the Xs: ask yourself: Is there really a linear

relationship?

Do NOT test the linearity assumption by looking at a table of correlation coefficients

between Y and each of the Xs. Instead, look at the scatterplots.

Check all partial regression plots to see if they are linear:

SPSS: Analyze: Regression: Linear: Plots: select Produce all partial plots

7. Test for collinearity (multicollinearity):

Its okay for the X variables to be correlated, but its not okay if they are nearly identical,

correlations near 1.0 (completely redundant).

Check that the tolerance> 0.1 for each X variable (collinearity diagnostics). (A tolerance

of < 0.1 is bad and means that something has to be done.)

8. Test for outliers: unusual values of Y for combinations of Xs

Do NOT plot residuals against the Observedvalues of Y (it will always have a positive

slope = 1-R2).

Instead, plot residuals against the Expectedvalues of Y.

Cooks distance tells you how much the beta coefficients will change if a particular case

(outlier) is removed. If Cooks distance is > 1, then its a case with a particularly big

influence and should be double checked to make sure there is no measurement error.

4


5/7

9. Choosing the best model (for prediction, rather than explanation):

Among the different methods (forward; backward; stepwise; best subset), backwards is

often the best (start with all Xs in the model and take out the most nonsignificant

predictor, and repeat until only significant predictors are left). There are better ways.

But, a good rule of thumb: do it several ways, and if you get a different answer, then becautious and get more help.

10. Multiple Linear Regression in SPSS:

SPSS: Analyze: Regression: Linear onlyallows continuous variables as predictors.

SPSS: Analyze: General Linear Model: Univariate allows different kinds of predictors.

5


6/7

Logistic Regression

1. Logistic regression models: 2 common goalstest associations vs. make predictions

If the outcome (Y) variable is dichotomous, then logistic regression allows you to assess

the association between Y and any type of X variable (nominal, ordinal, or interval),

while controlling for other variables (other Xs).

Logistic regression models also allow you to make predictions: for any combination of

predictor (X) variables, what is the probability that Y=1?

2. Logistic equation:

natural log of (odds that Y=1) = X1 + X2 +X3

3. Beta coefficients in logistic model:

For each of the X variables, there will be a beta coefficient. There will also be a Y

intercept term (except for case control studies).

ln (odds Y=1) = bo + b1X1 + b2X2 + b3X3

Odds (Y=1) = e(bo + b1X1 + b2X2 + b3X3)

If there are no interaction terms, then the odds ratio (OR) for the relationship between Y

and any X is simply: eb, where b is the beta coefficient for that particular X. This odds

ratio is adjusted for all the other Xs in the model.

If X is an ordinal or interval variable, then the odds ratio (eb) measures the relative

change in odds for every one unit change in the X variable.

4. Interactions:

As with other regression models, you must force the computer to look for interactions.

If there are two X variables in the model, then the relationship between Y and X1

(measured as an OR) is adjusted for the average value of X2. However, if the relationship

between Y and X1 (OR) is different for different values of X2, then there is interaction.

5. Interaction is good to find. It means there are important differences among subgroups

of patients (subgroups defined by X2).

6. Sample Size:

You should consider only one X variable for every 10 events in the smallerof the twosubgroups of Y. Again, this rule applies to the total number ofpotentialpredictor

variables being considered, not the final number ofsignificantpredictors.

However, if the goal is just measuring the association between one main X variable and

Y, while adjusting for several possible confounders, then all the potential confounders

(all the other Xs) can be considered together as the equivalent of one other variable. In

this situation, you would need at least 20-30 events in the smallest subgroup of Y.

6


7/7

7. Ordinal or Interval Predictor Variables: Do they meet the assumption of the model?

There is a linearity assumption for logistic regression models just as there is for linear

regression models. The assumption is that for any change of 1 unit in the X variable, the

OR will be the same (ie, the OR for X=1 compared to X=2 will be the same as the OR for

X=4 compared to X=5). This is the same as saying: there is a straight line relationshipwhen you plot the X variable on the horizontal axis and the ln (odds Y=1) on the vertical

axis. If this assumption is violated, then the conclusions of the model will be misleading.

Unfortunately, there is no easy test for this assumption. Ideally, you need to visually

inspect the plot, which you must create yourself.

For dichotomous X variables, there is no problem, since this assumption is

automatically satisfied.

8. Goodness of Fit Test:

Always inspect the goodness of fit test.

It is a test for logistic regression models that compares the expected to the observed

percentages of Y=1 for different combinations of Xs. If it is significant (small P value),

thats bad. It means the model doesnt fit the data well. In that case, then look for

important interactions or look for problems with ordinal or interval X variables that

might not be satisfying the linearity assumption. Another reason might be too few

outcome events in one of the subgroups of Y.

7

Download - Multivariate vs Multivariable - Cook County Hospital

Top Related