logistic regression multivariate analysis. what is a log and an exponent? log is the power to which...

29
Logistic Regression Multivariate Analysis

Post on 15-Jan-2016

221 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Logistic Regression

Multivariate Analysis

Page 2: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

What is a log and an exponent?

Log is the power to which a base of 10 must be raised to produce a given number. The log of 1000 is 3 as 103=1000.

The log of an odds ratio of 1.0 is 0 as 100 = 1 Exponent (e) or 2.718 raised to a certain

power is the antilog of that number. Thus, (expβ) = antilog β

Antilog of log odds 0 is 2.718o =1 Exponential increases are curvilinear.

Page 3: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Main questions with logistic regression How do the odds of a successful

outcome depend upon or change based on each explanatory variable (X)?

How does the probability that a successful outcome occurs depend upon or change based on each explanatory variable (X)?

π

Page 4: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Logistic regression Single binary response variable predicted by

categorical and interval variables Maximum likelihood model – the coefficients

that make sample observations most likely are reported in the final model

Binomial distribution that assumes a sigmoid curve (non-linear)

The probability of success falls between 0 and 1 for all possible values of X (s-curve bends)

Page 5: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Sigmoid curve for logistic regression

Page 6: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Response variable Denote Y by 0 and 1 (dummy coding) 0 and 1 are usually termed failure and

success of an outcome (by convention, success is category 1)

The sample mean of Y is the sum of the number of successes divided by the sample size (proportion of success)

Page 7: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Odds ratios in logistic regression Can be thought of as likelihood or odds success

based impact of predictors in model Interval : the odds of success for those who are a

unit apart in X, net of other predictors. For dummy coefficients: the odds of success for

those in the reference category of X (1) compared with those in the omitted (0)

Every unit increase in X has an exponential effect on the odds of success so an odds ratio can be >1

Page 8: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Odds ratio π / 1- π is the odds ratio or the odds of

success When the probability of success or π is

½ or 50-50, odds for success equals .5/1-.5 = 1.0. This means that success is equally as likely as failure

Thus, predicted probability of .5 and an odds ratio of 1.0 are our points of comparison when making inferences

Page 9: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Logistic transformation of odds ratio To model dichotomous outcomes, SPSS takes logistic transformation

of odds ratios: Log (π / 1- π ) = α + βX1 + βX2 … To interpret, we take the exponent values of beta coefficient for each

predictor (can do for all in model) Odds ratio or the odds of success are: π / 1- π = e α + βX = e α + (e β)X

Exponent

Page 10: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

We can also talk about the percentage change in odds for interval and dummy

variables Thus, the exponential beta value in the

SPSS output can be calculated into a percent by 100 (exp b –1) or the percentage change in odds for each unit increase in the independent variable.

We don’t really talk about the intercept here … betas for each predictor are our concern

Page 11: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

We can also talk about the probability of success or π

Can calculate point estimates by substituting specific X values, thus it is good for forecasting, given respondent characteristics

Impact of X on π is interactive/non-constant π is the probability of success and this

probability varies as X changes and it is expressed in a % form (ranges 0-1)

π = e α + βX / 1 + e α + βX or odds / 1 + odds

Page 12: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Slope in logistic regression models (FYI) Like the slope of a straight line, β refers to

whether the sigmoid curve (π or prob. of success) increases β+ or decreases β- as the values of the intervals increase or we move from 0 to 1 for dummy

Steepness of s-curve increases as absolute value of β increases

The rate at which the curve climbs or descends changes according to the values of the independent variable thus β (X)

Page 13: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Slope in logistic regression models (FYI) When β = 0, π does not change as X

increases (X has no bearing on probability or odds of success ) so the curve is flat, there is just a straight line

For β > 0, π increases as X increases (probability of success increases thus curve increases)

For β < 0, π decreases as X increases (probability of success decreases thus curve decreases)

Mention .5 bit on next slide

Page 14: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Slope in logistic regression (FYI)

Page 15: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Null hypothesis for predictors

Ho: β = 0 for Log (π / 1- π ) = α + βX 1…I

X has no effect on the likelihood that

[y =1] an outcome will occur Y is independent of X so the likelihood

of being successful is the same for all income groups

Page 16: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Wald Statistic Null hypothesis test statistic for each predictor

in your model

Wald statistic is the significance test for each parameter in the model

Null is that each β = 0 Has df=1; Chi-square distribution It is the square of z statistic which

equals β/s.error

Page 17: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

-2 log likelihood as test of null hypothesis for entire model

A test of significance for model and is like the F-ratio; chi-square distribution; df = pα + β - pα

Does the observed likelihood or odds of success differ from 1?

Compares the model with the intercept alone to intercept and predictors. Do your predictors add to the predictive power of the model?

Tests if the difference is 0 and is referred to as the model chi-square

Page 18: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Goodness of Fit Statistic – null for residuals (FYI)

Compares observed probabilities or what you observed in the sample to the predicted probabilities of an outcome occurring based on model parameters in your equation

Examines residuals – do the predictor coefficients significantly minimize their squared distances?

Chi-square distribution; df = p Should be NS as observed and predicted are

anticipated to be quite similar

Page 19: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number
Page 20: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number
Page 21: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Mean of our response variable attending self-help group (FYI)

The sample mean of Y is the sum of the number of successes (yes to attend) divided by the sample size, n

The sample mean is the proportion of successful outcomes

Thus, 44 said yes and n = 400, thus mean proportion of yes is .11 or 11%

Page 22: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Odds ratio and % in odds change by age

Age =-.0586 and p<.01 (beta negative). Thus, log odds of attending a self-help group decrease as a person gets older

Exp = .9431 … the odds ratio [exp <1] thus odds decrease

% change (in this case a reduction in) in odds of attending for each additional year of age is

100(exp - 1) = 100 (.9431 – 1) =

-5.69 % less likely each year one ages

Page 23: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Predicted probability of attending by age When < .5, the probability of attending

declines and we would see a downward dip in the sigmoid curve with increasing values of X (keeping in mind probability ranges from 0-1)

More meaningful with all predictors, however, a point estimate for age 80 would be:

= e(-.0586)(80) / 1 + e(-.0586)(80) = .009 The probability of those 80 years of age attending a group is 1%

Page 24: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Odds ratio and % change in odds by gender Gender =1.2540 and p<.05. Thus, odds

of attending a self-help group among females is greater (referent category is female and beta is positive)

Exp = 3.5045 … odds of attending are 3.5 times as large for females as they are for males [exp >1]

% change (in this case an increase in) in odds of attending when a person is female is 100(exp - 1)=100(3.50 – 1) = 250 %

Page 25: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Predicted probability of attending by gender

e(1.254)(1) / 1 + e(1.254)(1) = .77 Thus, the probability of attending

among females is 77% When > .5, the probability of

attending increases and we would see an upward trend in the sigmoid curve with increasing values of X on the horizontal axis (keeping in mind probability ranges from 0-1)

Page 26: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Wald statistic Coefficient for each independent

variable is 0 Tells us which variables significantly

predictor the likelihood of attending a self-help group

Age = 7.2298** Gender 5.7723*

Page 27: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Likelihood statistic for the model

Likelihood or odds are 1.0 and predicted probability is .5

Constant alone minus constant and all predictors 281.36838 - 240.518*

All of our predictor variables have β = 0 With 12 df model chi-square of 40.85 has a

p<.0001 The predictors in model significantly add to

our capacity to predict attendance

Page 28: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Goodness of Fit (FYI)

371.093, 12 df, ns Our model parameters minimize the

squared distances [residual] between actual sample observations of attendance to that which the logistic regression equation predicts (odds and probabilities)

Page 29: Logistic Regression Multivariate Analysis. What is a log and an exponent? Log is the power to which a base of 10 must be raised to produce a given number

Logistic Regression References

DeMaris, A. (1995). A tutorial in logistic regression. Journal of Marriage and the Family, 57(10): 956-968

Agresti, A. & Finlay, B. (1997). Logistic regression – modeling categorical responses. Statistical methods for social sciences (3rd ed., pp. 575-619). Prentice Hall: New Jersey.

Dwyer, J.H. (1983). Statistical methods for the social and behavioral sciences (pp. 447-465). Oxford University Press: New York.