economics revision guide ii

Chapter 11 Regression with a Binary Dependent Variables

So far the dependent variable Y has been continuous:

traffic fatality rate

cigarette consumption

test scores

What if Y is binary?

whether a person gets into college, or not

whether a person smokes, or not

whether a mortgage application is denied or accepted

1

Example: Mortgage Denial and Race, The Boston Fed HMDA Dataset

Individual applications for single-family mortgages made in 1990 in the greater Boston area

2,380 observations collected under the Home Mortgage Disclosure Act (HMDA)

Variables

Dependent variable:

Is the mortgage denied or accepted?

Independent variables:

income, wealth, employment status

other loan, property characteristics

race of applicant

2

Scatter plot of mortgage denial and ratio of debt payments to income (P/I ratio) for a subset

of the data set (n = 127)

Example: linear probability model, HMDA data Mortgage denial v. ratio of debt payments to income

(P/I ratio) in the HMDA data set (subset)

11-6

3

Section 11.1 Binary Dependent Variables and the Linear Probability Model

The regression line plots the predicted value of deny as a linear function of P/I ratio

For example, when P/I ratio = 0.3, the predicted value of deny is 0.2

But what exactly does it mean for the predicted value of a binary variable to be 0.2?

When Y is binary,

E (Y |X) = 1 Pr (Y = 1 |X) + 0 Pr (Y = 0 |X) = Pr (Y = 1 |X)

so

E (Y |X) = Pr (Y = 1 |X)

That is, when Y is binary, the predicted value, Y , is the probability that Y = 1 given X = x:

Y = Pr (Y = 1 |X = x) = E (Y |X = x)

4

For the linear regression model, given the OLS assumption that E (u |X) = 0:

Y = Pr (Y = 1 |X = x) = E (Y |X) = E (0 + 1X + u |X) = 0 + 1X

This model is called the linear probability model

It is simply the linear regression model with a binary dependent variable

Back to our example: when P/I ratio = 0.3, the predicted probability of deny is 0.2:

Pr (Deny |P/I ratio = 0.3) = 0 + 1 0.3 = 0.2

In other words, if there were many applications with P/I ratio = 0.3, then 20% of them would

be denied

Note that 1 is the change in the predicted probability that Y = 1 for a unit increase in X

5

Ex: full HMDA data set

Deny = .080 + 0.604 P/I ratio(0.32) (0.80)

Measuring the effect of increasing P/I ratio by 1 doesnt make much sense

Instead, what is the effect of increasing P/I ratio from .3 to .4?

The predicted value for P/I ratio = .3 is

P r (Deny |P/I ratio = .3) = .080 + .604 .3 = 0.101

The predicted value for P/I ratio = .4 is

P r (Deny |P/I ratio = .4) = .080 + .604 .4 = 0.162

Thus, the effect of increasing the P/I ratio from .3 to .4 is to increase the probability of denial

by 0.061, that is, by 6.1 percentage points

More simply, we can calculate the effect as 1 0.1 = 0.0616

Linear probability model: HMDA data, ctd.

Next include black as a regressor:

Deny = .091 + 0.559 P/I ratio + 0.177 black(0.32) (0.098) (0.025)

What is the difference in the probability of denial for a black person versus a white person?

For a black applicant with P/I ratio = .3:

P r (Deny = 1) = .091 + .559 .3 + .177 1 = .254

For a white applicant with P/I ratio = .3:

P r (Deny = 1) = .091 + .559 .3 + .177 0 = .077

The difference = 0.177 = 17.7 percentage points (the value of 2)

7

The linear probability model, ctd.

The linear probability model is easy to estimate and to interpret

But the LPM says that the change in the predicted probability for a given change in X is the

same for all values of X

Is this reasonable?

Further, the predicted probabilities of the LPM can be < 0 or > 1!

To overcome these shortcomings, people use the nonlinear probability models probit and logit

8

Section 11.2 Probit and logit regression

The probit and logit models satisfy the following conditions:

The effect of X on Pr (Y = 1 |X) is nonlinear 0 Pr (Y = 1 |X) 1 for all X

The probit model satisfies these conditions: 0 Pr(Y = 1|X) 1 for all X Pr(Y = 1|X) to be increasing in X (for 1>0)

11-11

9

The probit regression models the probability that Y = 1 using the cumulative standard

normal distribution function, (z), evaluated at z = 0 + 1X :

Pr (Y = 1 |X) = (0 + 1X)

where is the cumulative normal distribution function and z = 0 + 1X is the z-value

Ex. Suppose 0 = 2, 1 = 3, X = .4:

Pr (Y = 1 |X = .4) = (2 + 3 .4) = (0.8) = 0.2119

10

STATA Example: HMDA dataSTATA Example: HMDA data

. probit deny p_irat, r; Iteration 0: log likelihood = -872.0853 Well discuss this later Iteration 1: log likelihood = -835.6633 Iteration 2: log likelihood = -831.80534 Iteration 3: log likelihood = -831.79234 Probit estimates Number of obs = 2380 Wald chi2(1) = 40.68 Prob > chi2 = 0.0000 Log likelihood = -831.79234 Pseudo R2 = 0.0462 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901 _cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082 ------------------------------------------------------------------------------ nPr( 1 | / )deny P Iratio= = (-2.19 + 2.97P/I ratio)

(.16) (.47)

11-15

Pr (Deny = 1 | P/I ratio) = ( 2.19 + 2.97 P/I ratio )(0.16) (0.47)

11

STATA Example: HMDA data, ctd.

Pr (Deny = 1 | P/I ratio) = ( 2.19 + 2.97 P/I ratio )(0.16) (0.47)

Positive coefficient: does this make sense?

Standard errors have the usual interpretation

Predicted probabilities:

Pr (Deny = 1 | P/I ratio = .3) = (2.19 + 2.97 .3)= (1.3) = .097

Pr (Deny = 1 | P/I ratio = .4) = (2.19 + 2.97 .4)= (1.0) = 0.159

The effect of increasing P/I ratio from 0.3 to 0.4 on the probability of denial is .159 .097 =0.062 (6= 1 0.1!)

12

STATA Example: HMDA data, multiple regressors

11-18

STATA Example: HMDA data . probit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Iteration 1: log likelihood = -800.88504 Iteration 2: log likelihood = -797.1478 Iteration 3: log likelihood = -797.13604 Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------

Well go through the estimation details later

13

STATA Example, ctd.: Predicted probit probabilities

11-19

STATA Example, ctd.: predicted probit probabilities

. probit deny p_irat black, r; Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------ . sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0; . display "Pred prob, p_irat=.3, white: " normprob(z1); Pred prob, p_irat=.3, white: .07546603 NOTE

_b[_cons] is the estimated intercept (-2.258738) _b[p_irat] is the coefficient on p_irat (2.741637) sca creates a new scalar which is the result of a calculation display prints the indicated information to the screen

b[ cons] is the estimated intercept

b[p irat] is the coefficient on p irat

sca creates a scalar which equals the result of a calculation

display prints the indicated information to the screen

14

STATA Example, ctd.

Pr (Deny = 1 | P/I ratio) = ( 2.26 + 2.74 P/I ratio + 0.71 black )(0.16) (0.44) (0.08)

Is the coefficient on black statistically significant?

Predicted probabilities:

Pr (Deny = 1 | P/I ratio = .3, black = 1) = (2.26 + 2.74 .3 + .71 1) = .233

Pr (Deny = 1 | P/I ratio = .3, black = 0) = (2.26 + 2.74 .3 + .71 0) = .075

Difference in rejection probabilities is 0.158 (15.8 percentage points)

15

Logit Regression

Logit regression models the probability of Y = 1 givenX using the logistic distribution function,

, evaluated at z = 0 + 1X :

Pr (Y = 1 |X) = (0 + 1X)

The logistic distribution function is:

(0 + 1X) =1

1 + e(0+1X)

Ex. 0 = 3, 1 = 2, X = .4

Pr (Y = 1 |X = .4) = 11 + e2.2

= .0998

16

Why bother with logit if we have probit?

The main reason is historical: logit is computationally faster and easier but this doesnt matter

so much nowadays

In practice, logit and probit are very similar - since empirical results typically dont hinge on

the logit/probit choice, both tend to be used in practice

In more complicated situations, though, extensions of the logit model work better than exten-

sions of the probit model

17

The predicted probabilities from the probit and logit models are very close in these HMDA

regressions (as is usual):Predicted probabilities from estimated probit and logit models usually are (as usual) very close in this application.

11-24

18

STATA Example: HMDA data

11-23

STATA Example: HMDA data . logit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Later Iteration 1: log likelihood = -806.3571 Iteration 2: log likelihood = -795.74477 Iteration 3: log likelihood = -795.69521 Iteration 4: log likelihood = -795.69521 Logit estimates Number of obs = 2380 Wald chi2(2) = 117.75 Prob > chi2 = 0.0000 Log likelihood = -795.69521 Pseudo R2 = 0.0876 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481 black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913 _cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753 ------------------------------------------------------------------------------ . dis "Pred prob, p_irat=.3, white: " > 1/(1+exp(-(_b[_cons]+_b[p_irat]*.3+_b[black]*0))); Pred prob, p_irat=.3, white: .07485143 NOTE: the probit predicted probability is .07546603

The predicted probability from the probit model was 0.075

19

Section 11.3 Estimation and Inference in the Logit and Probit Models

Probit estimation by nonlinear least squares

Nonlinear least squares extends the idea of OLS to models in which the parameters enter

nonlinearly:

minb0, b1

ni=1

(Yi (b0 + b1Xi))2

How can we solve this minimization problem?

There is no explicit solution - we cant write the estimators as a function of the sample data

The estimators are found by solving the problem numerically on a computer (using specialized

minimization algorithms)

The estimators are consistent and asymptotically normally distributed

In practice, nonlinear least squares isnt used since a more efficient estimator (smaller variance)

exists

20

The Maximum Likelihood Estimator

The likelihood function is the conditional density of Y1, . . . , Yn given X1, . . . , Xn, treated as a

function of the unknown parameters (0 and 1 in the probit model)

The maximum likelihood estimator (MLE) of the probit model is the value of (0, 1) that

maximizes the likelihood function

The maximum likelihood estimator (MLE) is the value

of (0, 1) that best describes the distribution of the sample data

In large samples, the MLE is:

consistent

normally distributed

efficient (has the smallest variance of all estimators)

Inference is as usual: hypothesis testing via t-statistic, confidence interval as 1.96SE21

MLE for a binary dependent variable (no X)

Y =

1 with probability p0 with probability 1 pThat is, Y has a Bernoulli distribution. The goal is to estimate the unknown parameter p.

Data: Y1, . . . , Yn, i.i.d.

Lets start by deriving the density function of Y1:

Pr (Y1 = 1) = p and Pr (Y1 = 0) = 1 p

so

Pr (Y1 = y1) = py1 (1 p)1y1

22

Now lets find the joint density of (Y1, Y2). Because Y1 and Y2 are independent:

Pr (Y1 = y1, Y2 = y2) = Pr (Y1 = y1) Pr (Y2 = y2)

= py1 (1 p)1y1 py2 (1 p)1y2

= py1+y2 (1 p)2y1y2

Generally, the joint density of (Y1, Y2, . . . , Yn) is:

Pr (Y1 = y1, Y2 = y2, . . . , Yn) = Pr (Y1 = y1) Pr (Y2 = y2) Pr (Yn = yn)

= py1 (1 p)1y1 py2 (1 p)1y2 pyn (1 p)1yn

= pn

i=1 yi (1 p)nn

i=1 yi

23

The likelihood function is the joint density, treated as a function of the unknown parameters,

which here is p:

f (p;Y1, Y2, . . . , Yn) = pn

i=1 yi (1 p)nn

i=1 yi

The MLE maximizes this likelihood function.

In practice, its easier to work with the logarithm of the likelihood, ln (f (p;Y1, Y2, . . . , Yn)):

ln (f (p;Y1, Y2, . . . , Yn)) =

ni=1

yi ln (p) +

(n

ni=1

yi

)ln (1 p)

Maximize the likelihood function by setting the derivative with respect to p equal to 0:

d ln (f (p;Y1, Y2, . . . , Yn))

dp=

1

p

ni=1

yi 11 p

(n

ni=1

yi

)= 0

Solving for p yields the MLE, pMLE

24

1pMLE

ni=1

yi 11 pMLE

(n

ni=1

yi

)= 0

or

Y

1 Y=

pMLE

1 pMLE

So

pMLE = Y = the fraction of observations with Y = 1

Whew...a lot of work to get back to the first thing you might think of using...but the nice thing

is that this whole approach generalizes to more complicated models.

Now we apply MLE to probit

25

The Probit and Logit MLE

The derivation starts with the density of Y1 given X1:

Pr (Y1 = 1 |X1) = (0 + 1X1) and Pr (Y1 = 0 |X1) = 1 (0 + 1X1)

so

Pr (Y1 = y1 |X1) = (0 + 1X1)y1 (1 (0 + 1X1))1y1

The probit likelihood function is the joint density of Y1, . . . , Yn given X1, . . . , Xn:

f (0, 1; Y1, . . . , Yn |X1, . . . , Xn) =

(0 + 1X1)y1 (1 (0 + 1X1))1y1 (0 + 1Xn)yn (1 (0 + 1Xn))1yn

MLE0 and MLE1 maximize this likelihood function

But we cant solve for the estimators explicitly...the MLE is maximized using numerical methods

To find the logit MLE, simply take the probit likelihood function and replace with

26

Measures of Fit for Logit and Probit

R2 doesnt work well in binary dependent variable models as it tells us very little about how

well the model explains behavior

Reason: Yi can take on only 0 or 1 but Yi is continuous so Yi is likely very different than Yi

Two other measures that are used:

1. The fraction correctly predicted equals the fraction of Yis for which the predicted

probability is > 50% when Yi = 1 or is < 50% when Yi = 0

2. The pseudo-R2 measures the improvement in the value of the log likelihood relative to

the Bernoulli log likelihood (i.e., no X s):

pseudo-R2 = 1 ln(fmaxprobit

)ln (fmaxBernoulli)

,

where ln(fmaxprobit

)is the value of the maximized probit likelihood and ln (fmaxBernoulli) is the

value of the maximized Bernoulli likelihood

27

Ex. fraction correctly predicted

obs Yi correctlypredicted?1 0 0.40 yes

2 1 0.72 yes

3 0 0.55 no

4 1 0.44 no

5 1 0.55 yes

numbercorrectlypredicted: 3numberofobservations: 5fractioncorrectlypredicted: 0.6

28

pseudo-R2

Its a bit hard to see how the pseudo-R2 works so lets rewrite the formula in a slightly different

way

Note that ln(fmaxprobit

)< 0 and ln (fmaxBernoulli) < 0

Thus, we can re-write the pseudo-R2 as

pseudo-R2 = 1 ln(fmaxprobit

)ln (fmaxBernoulli)

= 1 | ln(fmaxprobit

) || ln (fmaxBernoulli) |

29

Section 11.4 Application to the Boston HMDA Data

Mortgages (home loans) are an essential part of buying a home

Question: is it harder for a black person to get a loan than a white person?

Specifically: if two otherwise identical individuals, one white and one black, applied for a home

loan, is there a difference in the probability of denial?

The mortgage application process in the US circa 1990-1991:

Go to a bank or mortgage company

Fill out an application (personal and financial info)

Meet with the loan officer

Then the loan officer decides - by law, without considering race. Presumably, the bank wants

to make profitable loans and the loan officer doesnt want to originate defaults

30

The Loan Officers Decision

Loan officer uses key financial variables:

P/I ratio

housing expense-to-income ratio

loan-to-value ratio

personal credit history

The decision rule is nonlinear:

loan-to-value ratio > 80%

loan-to-value ratio > 95% (what happens in default?)

credit score

31

Regression Specifications

Pr (deny = 1 | black, other Xs) = . . .

linear probability model

probit

logit

Main problem with the regressions so far: potential omitted variable bias. The following

variables (i) enter the loan officer decision and (ii) are correlated with race:

wealth, type of employment

credit history

family status

Fortunately, the HMDA data set is very rich, containing data on individual characteristics,

property characteristics, and loan denial/acceptance

32

11-48

33

Table 11, ctd.

11-4934

11-50

35

Table 12, ctd.Table 11.2, ctd.

11-51

36

Table 12, ctd.Table 11.2, ctd.

11-52

37

Summary of Empirical Results

Coefficients on the financial variables make sense

Black is statistically significant in all specifications

Race-financial variable interactions arent significant

Including the covariates sharply reduces the effect of race on denial probability

LPM, probit, logit: similar estimates of effect of race on the probability of denial

Estimated effects are large in a real world sense

38

economics revision guide ii

Documents