22s:152 applied linear regression ch. 14 (sec. 1) and ch...

22s:152 Applied Linear Regression

Ch. 14 (sec. 1) and Ch. 15 (sec. 1 & 4):Logistic Regression————————————————————

Logistic Regression

•When the response variable is a binary vari-able, such as

– 0 or 1

– live or die

– fail or succeed

then we approach our modeling a little dif-ferently

•What is wrong with our previous modeling?– Let’s look at an example...

1

• Example: Study on lead levels in children

Binary response called highbld:

Either high lead blood level(1) orlow lead blood level(0)

One continuous predictor variable called soil:

Measure of the level of lead in the soil inthe subject’s backyard

> sl=read.csv("soillead.csv")

> attach(sl)

> head(sl)

highbld soil

1 1 1290

2 0 90

3 1 894

4 0 193

5 1 1410

6 1 410

2

The dependent variable plotted against theindependent variable:

0 1000 2000 3000 4000 5000

0.0

0.2

0.4

0.6

0.8

1.0

soil

highbld

Original

0 1000 2000 3000 4000 5000

0.0

0.2

0.4

0.6

0.8

1.0

soil

highbld

Jittered

3

Consider the usual regression model:Yi = β0 + β1xi + �i

The fitted model and residuals:Ŷ = 0.3178 + 0.0003x

0 1000 3000 5000

0.00.20.40.60.81.0

Fitted model (jittered points)

soil

highbld

0.4 0.8 1.2 1.6

-0.8

-0.4

0.0

0.4

Residuals vs. Fitted values

lm.out$fitted.values

lm.out$residuals

-2 -1 0 1 2

-0.8

-0.4

0.0

0.4

Normal Q-Q Plot

Theoretical Quantiles

Sam

ple

Qua

ntile

s

All sorts of violations here...

4

Violations:

- Our usual �i ∼ N(0, σ2) isn’t reasonable

The errors are not normal, and they don’thave the same variability across all x-values.

- predictions for observations can be outside [0,1].

For x=3000, Ŷ = 1.14, and this is notpossibly an average of the 0s and 1s at x=3000(like a conditional mean given x).

Ŷx=3000 = 1.14 which is not in [0,1].

5

•What is a better way to model a0-1 response using regression?

• At each x value, there’s a certain chance of a0 and a certain chance of a 1.

0 1000 2000 3000 4000 5000

0.0

0.2

0.4

0.6

0.8

1.0

soil

highbld

• For example, in this data set, it looks morelikely to get a 1 at higher values of x.

• Or, P (Y = 1) changes with the x-value.

6

• Conditioning on a given x, we haveP (Y = 1), and P (Y = 1) ∈ (0, 1).

•We will consider this probability of getting a1 given the x-value(s) in our modeling...

πi = P (Yi = 1|Xi)

The previous plot shows that P (Y = 1) de-pends on the value of x.

• It is actually a transformation of this prob-ability (πi) that we will use as our responsein the regression model.

7

Odds of an Event

• First, let’s discuss the odds of an event.

When two fair coins are flipped,P (two heads)=1/4P (not two heads)=3/4

The odds in favor of getting two heads is:

odds =P (2 heads)

P (not 2 heads)=

1/4

3/4= 1/3

or sometimes referred to as 1 to 3 odds.

You’re 3 times as likely to not get 2 headsas you are to get 2 heads.

8

• For a binary variable Y (2 possible outcomes),

odds in favor of Y=1 isP (Y =1)P (Y =0)

=P (Y =1)

1−P (Y =1)

• For example, if P (heart attack)=0.0018,then the odds of a heart attack is

0.00180.9982 =

0.00181−0.0018 = 0.001803

• The ratio of the odds for two different groupsis also a quantity of interest.

For example, consider heart attacks for“male nonsmoker vs. male smoker”

Suppose P (heart attack)=0.0036 for a malesmoker, and P (heart attack)=0.0018 for amale nonsmoker.

9

Then, the odds ratio (O.R.) for a heart at-tack in nonsmoker vs. smoker is

O.R. =odds of a heart attack for non-smoker

odds of a heart attack for smoker

=

(Pr(heart attack|non-smoker)

1−Pr(heart attack|non-smoker)

)(

Pr(heart attack|smoker)1−Pr(heart attack|smoker)

)

=

(0.00180.9982

)(0.00360.9964

) = 0.4991• Interpretation of the odds ratio for binary 0-1

– O.R. ≥ 0– If O.R. = 1.0, then P (Y = 1) is the same

in both samples

– IfO.R. < 1.0, then P (Y = 1) is less in thenumerator group than in the denominatorgroup

– O.R. = 0 if and only if P (Y = 1) = 0 innumerator sample

10

Back to Logistic Regression...

• The response variable we will model is a trans-formation of P (Yi = 1) for a given Xi.

• The transformation is the logit transforma-tion

logit(a) = ln

(a

1− a

)• The response variable we will use:

logit[P (Yi = 1)] = ln

(P (Yi = 1)

1− P (Yi = 1)

)This is the loge of the odds that Yi = 1.

Notice: P (Yi = 1) ∈ (0, 1)(P (Yi=1)

1−P (Yi=1)

)∈ (0,∞)

and −∞ < ln(

P (Yi=1)1−P (Yi=1)

)

• The logistic regression model:

ln

(P (Yi = 1)

1− P (Yi = 1)

)= β0+β1X1i+· · ·+βkXki

– This response on the left isn’t ‘bounded’by [0,1] eventhough the Y-values them-selves are (having the response boundedby [0,1] was a problem before).

– The response on the left can feasibly beany positive or negative quantity.

– This a nice characteristic because the rightside of the equation can ‘potentially’ giveany possible predicted value −∞ to ∞.

• The logistic regression model is aGENERALIZED LINEAR MODEL.Linear model on the right, something otherthan the usual continuous Y on the left.

12

• Let’s look at the fitted logistic regression modelfor the lead level data with one X covariate.

ln

(P (Yi = 1)

1− P (Yi = 1)

)= β̂0+β̂1Xi = −1.5160+0.0027 Xi

●

●

●

●

●●

●

●●

●● ●

●●●

●

●

●

●● ●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●● ●

●

●●● ●

●●●

● ●

●●

●

●

●

●●●

●●●●

●

● ●

●

●

●

● ●

●●●

●●

●

●

●●●

●

●●

●●●

●

●

●●

●●

●

● ●

●

●

●

●

●●

●●

●

●

●

● ●

●●●

●

●●

●

●●

●●●

● ●

●

●

●

●●

0 1000 2000 3000 4000 5000

0.0

0.2

0.4

0.6

0.8

1.0

soil

jitte

r(hi

ghbl

d, fa

ctor

= 0

.12)

The curve represents the E(Yi|xi).

And...

E(Yi|xi) = 0·P (Yi = 0|xi)+1·P (Yi = 1|xi)

= P (Yi = 1|xi)

13

•We can manipulate the regression model toput it in terms of P (Yi = 1) = E(Yi)

• If we take this regression model

ln

(P (Yi = 1)

1− P (Yi = 1)

)= −1.5160 + 0.0027 Xi

and solve for P (Yi = 1) we get...

P (Yi = 1) =exp(−1.5160 + 0.0027 Xi)

1 + exp(−1.5160 + 0.0027 Xi)

– The value on the right is bounded to [0,1].

– Because our β1 is positive,

∗ As X gets larger, P (Yi = 1) goes to 1.∗ As X gets smaller, P (Yi = 1) goes to 0.

– The fitted curve, i.e. the function of Xi onthe right, is S-shaped (sigmoidal)

14

• In the general model with one covariate:

ln

(P (Yi = 1)

1− P (Yi = 1)

)= β0 + β1 Xi

which means

P (Yi = 1) =exp(β0 + β1 Xi)

1 + exp(β0 + β1 Xi)

=1

1 + exp[−(β0 + β1 Xi)]

• If β1 is positive, we get the S-shape that goesto 1 asXi goes to∞ and goes to 0 asXi goesto −∞ (as in the lead example).

• If β1 is negative, we get the opposite S-shapethat goes to 0 as Xi goes to ∞ and goes to1 as Xi goes to −∞.

15

• Another way to think of logistic regression isthat we’re modeling a Bernoulli random vari-able occurring for each Xi, and the Bernoulliparameter πi depends on the covariate value.

Then, Yi|Xi ∼ Bernoulli(πi)

where πi represents P (Yi = 1|Xi)

E(Yi|Xi) = πi and

V (Yi|Xi) = πi(1− πi)

We’re thinking in terms of the conditional distribu-

tion of Y |X (we don’t have constant variance acrossthe X values, mean and variance are tied together).

Writing πi as a function of Xi,

πi =exp(β0 + β1 Xi)

1 + exp(β0 + β1 Xi)

16

Interpretation of parametersFor the lead levels example.

• Intercept β0:

When Xi = 0, there is no lead in the soil inthe backyard. Then,

ln

(P (Yi = 1)

1− P (Yi = 1)

)= β0

So, β0 is the log-odds that a randomly se-lected child with no lead in their backyardhas a high lead blood level.

Or, πi =eβ0

1+eβ0is the chance that a kid with

no lead in their backyard has high lead bloodlevel.

For this data,β̂0 = −1.5160 or π̂i = 0.1800

17

Interpretation of parametersFor the lead levels example.

• Coefficient β1:

Consider the loge of the odds ratio (O.R.) ofhaving high lead blood levels for the following2 groups...

1) those with exposure level of x2) those with exposure level of x + 1

loge

( π21−π2π1

1−π1

)= loge

(eβ0eβ1(x+1)

eβ0eβ1x

)= loge

(eβ1)

= β1

β1 is the loge of O.R. for a 1 unit increase inx.

It compares the groups with x exposure and(x + 1) exposure.

18

Or, un-doing the log,

eβ1 is theO.R. comparing the two groups.

For this data,β̂1 = 0.0027 or e

0.0027 = 1.0027

A 1 unit increase in X increases the odds ofhaving a high lead blood level by a factor of1.0027.

Though this value is small, the range of theX values is large(40 to 5255), so it can havea substantial impact when you consider thefull spectrum of x values.

19

Prediction

•What is the predicted probability of high leadblood level for a child with X = 500.

P (Yi = 1) =exp(−1.5160 + 0.0027 Xi)

1 + exp(−1.5160 + 0.0027 Xi)

=1

1 + exp[−(−1.5160 + 0.0027 Xi)]

=1

1 + exp[1.5160− 0.0027× 500]= 0.4586


P (Yi = 1) =1

1 + exp[1.5160− 0.0027× 4000]= 0.9999

●

●

●

●

●●

●

●●

●● ●

●●●

●

●

●

●● ●

●

●

●

●

●

●●

●●

●●

●

●

●

●

●

●●

●

●

●

●

●

●

●

●

●

●●● ●

●

●●● ●

●●●

● ●

●●

●

●

●

●●●

●●●●

●

● ●

●

●

●

● ●

●●●

●●

●

●

●●●

●

●●

●●●

●

●

●●

●●

●

● ●

●

●

●

●

●●

●●

●

●

●

● ●

●●●

●

●●

●

●●

●●●

● ●

●

●

●

●●

0 1000 2000 3000 4000 5000

0.0

0.2

0.4

0.6

0.8

1.0

soil

jitte

r(hi

ghbl

d, fa

ctor

= 0

.12)

20


P (Yi = 1) =1

1 + exp[1.5160]= 0.1800

So, there’s still a chance of having high leadblood level even when the backyard doesn’thave any lead. This is why we don’t see thelow-end of our fitted S-curve go to zero.

Testing Significance of Covariate

> glm.out=glm(highbld~soil,family=binomial(logit))

> summary(glm.out)

Coefficients:

Estimate Std. Error z value Pr(>|z|)

(Intercept) -1.5160568 0.3380483 -4.485 7.30e-06 ***

soil 0.0027202 0.0005385 5.051 4.39e-07 ***

---

Signif. codes: 0 *** 0.001 ** 0.01 * 0.05 . 0.1 1

Soil is a significant predictor in the logistic re-gression model.

21

A couple things...

•Method of Maximum Likelihood is usedto fit the model.

Estimates β̂j are asymptotically normal, soR uses Z-tests (or Wald tests) for covariatesignificance in the output. [NOTE: squaringa z-statistic gets you a chi-squared statisticwith 1 df.]

• General concepts easily extended to logisticregression with many covariates.

22

22s:152 applied linear regression ch. 14 (sec. 1) and ch...

Documents

ch 22 sec 5

ch 5 sec 1

ch 22 sec 4

ch 20, sec 1: mobilizing for war and ch 20, sec 2: early...

ch 13 sec 3,4,5

ch 19 sec 2

ch 15 sec 3

ch 12 sec 2

ch 23 sec 2

logistic regression versus cox regression ch. mélot, md...

ch 23 sec 1

ch 14 sec 3

ch 1 sec 1

ch. 13 sec. 3

ch.13 sec.2

ch 13 sec 1

ch 1, sec 2: cities and empires & ch 1, sec 3: north...

· ch. 1: general provisions ch. 2: applications...

ch 8 sec 3

ch 14 sec 4