Transcript
  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    1/7

    Faculty Development Program

    Clinical Epidemiology and Clinical Research

    TOPIC: Biostat istics 5: M ultivariable a nd Logistic RegressionDATE: May (1:30PM 5:00PM)

    LEADERS: Art Evans

    OBJECTIVES:

    1. Interpret a logistic regression equation.

    2. Calculate OR and RR from logistic regression for continuous and categorical

    predictor variables.

    3. Describe the main assumptions of the logistic model.

    4. Describe the main errors in studies that analyze data with logistic regression.

    5. Check for interactions in linear and logistic models.

    REQUIRED READINGS:

    Norman and Streiner. PDQ Statistics. 2nd Ed. Pages 65-69 (ANCOVA); 116-117 (logistic

    regression).

    Norman and Streiner. Biostatistics: The Bare Essentials. Pages: 119-127.

    Concato and Feinstein. The risk of determining risk with multivariable models. Ann

    Intern Med. 1993;201-210.

    PROBLEMS:

    A randomized trial was performed testing a new treatment against placebo with mortality

    at one-year as the outcome of interest.

    A logistic regression model was used to assess the treatment effect while adjusting for

    potential confounders.

    Questions:

    1. According to the logistic regression model, is there evidence of interaction?

    2. Based on the first model (no confounding or interaction considered), what is the

    OR that describes the treatment effect? Verify by calculating the OR in the raw

    data.

    3. What is the treatment OR after adjusting for the potential confounder? Is there

    evidence of confounding? How do you make that decision?

  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    2/7

    Model without the confounder

    beta coefficient P value

    intercept -0.2

    treatment (1=Tx, 0=C) -1.0

  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    3/7

    Multivariable Linear Regression

    1. Confusion: multivariable vs. multivariate

    Multivariable means that you are simultaneously considering more than one predictor

    variable (independent; X), eg, Y = X1+X2+X3+X4

    Multivariate usually means that you are simultaneously considering more than one

    outcome variable (dependent; Y), eg, Y1+Y2+Y3 = X1+X2+X3+X4

    2. Sample Size: Rough rule of thumb

    Linear regression: 10 subjects for every potential predictor variable;

    Logistic regression: 10 subjects in the smallestgroup of the dichotomous outcome

    variable for every potential predictor variable;

    Multivariate linear regression: 10 subjects for every potential variable, including

    the multiple dependent (Y) variables.

    Note: this is 10 subjects for everypotentialpredictor, not 10 for every significant

    predictor in the final model! (This assumes you are interested in describing a

    prediction model, rather than simply being interested in controlling for lots of

    potential confounders while examining a specific exposuredisease relationship. If

    the later is true, then the combination of potential confounders counts as 1 variable.

    3. Interpretation of beta coefficients:

    The importance of a beta coefficient must be interpreted in light of the units of the

    particular X variable. For example, the beta coefficient for heightmeasured in yardswould be much smaller than ifheightwere measured in inches (which would be 36 times

    bigger). Despite a difference of 36-fold between these two beta coefficients, their

    importance is identical. Therefore, it is impossible to judge a beta coefficient without

    knowing the units of measurement.

    4. Interactions:

    Always look for interactions among predictor variables. Two X variables may appear to

    have no relationship with the outcome variable until an interaction term is also

    considered in the model.

    Interaction terms are the best method to test for important differences among

    subgroups.

    Very bad method: testing within each subgroup separately and then declaring

    interaction present if one subgroup demonstrates a significant difference whereas in the

    other subgroup there is no significant difference.

    Always test for interaction before trying to simplify the model.

    3

  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    4/7

    Always check for interaction before checking for confounding. (Remember: Adjusting

    for confounding is similar to taking the average among the subgroups. Taking the

    average is bad if there is important interaction, ie, the effect is markedly different among

    subgroups.)

    5. Confounding:If the primary goal is to estimate the effect of one X variable on Y, while adjusting for

    possible confounding from other X variables, then see if the beta coefficient for the main

    X changes when all the other Xs are added to the model. If it does, then there is some

    confounding. If it changes a lot, then there is a lot of confounding.

    6. Test linearity assumption:

    For multivariable linear regression (single Y; multiple Xs), the assumptions are:

    linear relationship between Y and Xs;

    for all possible combinations of X, the distribution of Y is normal with a constantvariance.

    Eyeball test: SPSS: Graphs: Scatter: Matrix: enter all Xs and Y: look at the row in the

    matrix that compares Y to each of the Xs: ask yourself: Is there really a linear

    relationship?

    Do NOT test the linearity assumption by looking at a table of correlation coefficients

    between Y and each of the Xs. Instead, look at the scatterplots.

    Check all partial regression plots to see if they are linear:

    SPSS: Analyze: Regression: Linear: Plots: select Produce all partial plots

    7. Test for collinearity (multicollinearity):

    Its okay for the X variables to be correlated, but its not okay if they are nearly identical,

    correlations near 1.0 (completely redundant).

    Check that the tolerance> 0.1 for each X variable (collinearity diagnostics). (A tolerance

    of < 0.1 is bad and means that something has to be done.)

    8. Test for outliers: unusual values of Y for combinations of Xs

    Do NOT plot residuals against the Observedvalues of Y (it will always have a positive

    slope = 1-R2).

    Instead, plot residuals against the Expectedvalues of Y.

    Cooks distance tells you how much the beta coefficients will change if a particular case

    (outlier) is removed. If Cooks distance is > 1, then its a case with a particularly big

    influence and should be double checked to make sure there is no measurement error.

    4

  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    5/7

    9. Choosing the best model (for prediction, rather than explanation):

    Among the different methods (forward; backward; stepwise; best subset), backwards is

    often the best (start with all Xs in the model and take out the most nonsignificant

    predictor, and repeat until only significant predictors are left). There are better ways.

    But, a good rule of thumb: do it several ways, and if you get a different answer, then becautious and get more help.

    10. Multiple Linear Regression in SPSS:

    SPSS: Analyze: Regression: Linear onlyallows continuous variables as predictors.

    SPSS: Analyze: General Linear Model: Univariate allows different kinds of predictors.

    5

  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    6/7

    Logistic Regression

    1. Logistic regression models: 2 common goalstest associations vs. make predictions

    If the outcome (Y) variable is dichotomous, then logistic regression allows you to assess

    the association between Y and any type of X variable (nominal, ordinal, or interval),

    while controlling for other variables (other Xs).

    Logistic regression models also allow you to make predictions: for any combination of

    predictor (X) variables, what is the probability that Y=1?

    2. Logistic equation:

    natural log of (odds that Y=1) = X1 + X2 +X3

    3. Beta coefficients in logistic model:

    For each of the X variables, there will be a beta coefficient. There will also be a Y

    intercept term (except for case control studies).

    ln (odds Y=1) = bo + b1X1 + b2X2 + b3X3

    Odds (Y=1) = e(bo + b1X1 + b2X2 + b3X3)

    If there are no interaction terms, then the odds ratio (OR) for the relationship between Y

    and any X is simply: eb, where b is the beta coefficient for that particular X. This odds

    ratio is adjusted for all the other Xs in the model.

    If X is an ordinal or interval variable, then the odds ratio (eb) measures the relative

    change in odds for every one unit change in the X variable.

    4. Interactions:

    As with other regression models, you must force the computer to look for interactions.

    If there are two X variables in the model, then the relationship between Y and X1

    (measured as an OR) is adjusted for the average value of X2. However, if the relationship

    between Y and X1 (OR) is different for different values of X2, then there is interaction.

    5. Interaction is good to find. It means there are important differences among subgroups

    of patients (subgroups defined by X2).

    6. Sample Size:

    You should consider only one X variable for every 10 events in the smallerof the twosubgroups of Y. Again, this rule applies to the total number ofpotentialpredictor

    variables being considered, not the final number ofsignificantpredictors.

    However, if the goal is just measuring the association between one main X variable and

    Y, while adjusting for several possible confounders, then all the potential confounders

    (all the other Xs) can be considered together as the equivalent of one other variable. In

    this situation, you would need at least 20-30 events in the smallest subgroup of Y.

    6

  • 7/29/2019 Multivariate vs Multivariable - Cook County Hospital

    7/7

    7. Ordinal or Interval Predictor Variables: Do they meet the assumption of the model?

    There is a linearity assumption for logistic regression models just as there is for linear

    regression models. The assumption is that for any change of 1 unit in the X variable, the

    OR will be the same (ie, the OR for X=1 compared to X=2 will be the same as the OR for

    X=4 compared to X=5). This is the same as saying: there is a straight line relationshipwhen you plot the X variable on the horizontal axis and the ln (odds Y=1) on the vertical

    axis. If this assumption is violated, then the conclusions of the model will be misleading.

    Unfortunately, there is no easy test for this assumption. Ideally, you need to visually

    inspect the plot, which you must create yourself.

    For dichotomous X variables, there is no problem, since this assumption is

    automatically satisfied.

    8. Goodness of Fit Test:

    Always inspect the goodness of fit test.

    It is a test for logistic regression models that compares the expected to the observed

    percentages of Y=1 for different combinations of Xs. If it is significant (small P value),

    thats bad. It means the model doesnt fit the data well. In that case, then look for

    important interactions or look for problems with ordinal or interval X variables that

    might not be satisfying the linearity assumption. Another reason might be too few

    outcome events in one of the subgroups of Y.

    7


Top Related