economics revision guide ii

38
Chapter 11 Regression with a Binary Dependent Variables So far the dependent variable Y has been continuous: traffic fatality rate cigarette consumption test scores What if Y is binary? whether a person gets into college, or not whether a person smokes, or not whether a mortgage application is denied or accepted 1

Upload: vera-neo

Post on 04-Sep-2015

27 views

Category:

Documents


0 download

DESCRIPTION

Part II

TRANSCRIPT

  • Chapter 11 Regression with a Binary Dependent Variables

    So far the dependent variable Y has been continuous:

    traffic fatality rate

    cigarette consumption

    test scores

    What if Y is binary?

    whether a person gets into college, or not

    whether a person smokes, or not

    whether a mortgage application is denied or accepted

    1

  • Example: Mortgage Denial and Race, The Boston Fed HMDA Dataset

    Individual applications for single-family mortgages made in 1990 in the greater Boston area

    2,380 observations collected under the Home Mortgage Disclosure Act (HMDA)

    Variables

    Dependent variable:

    Is the mortgage denied or accepted?

    Independent variables:

    income, wealth, employment status

    other loan, property characteristics

    race of applicant

    2

  • Scatter plot of mortgage denial and ratio of debt payments to income (P/I ratio) for a subset

    of the data set (n = 127)

    Example: linear probability model, HMDA data Mortgage denial v. ratio of debt payments to income

    (P/I ratio) in the HMDA data set (subset)

    11-6

    3

  • Section 11.1 Binary Dependent Variables and the Linear Probability Model

    The regression line plots the predicted value of deny as a linear function of P/I ratio

    For example, when P/I ratio = 0.3, the predicted value of deny is 0.2

    But what exactly does it mean for the predicted value of a binary variable to be 0.2?

    When Y is binary,

    E (Y |X) = 1 Pr (Y = 1 |X) + 0 Pr (Y = 0 |X) = Pr (Y = 1 |X)

    so

    E (Y |X) = Pr (Y = 1 |X)

    That is, when Y is binary, the predicted value, Y , is the probability that Y = 1 given X = x:

    Y = Pr (Y = 1 |X = x) = E (Y |X = x)

    4

  • For the linear regression model, given the OLS assumption that E (u |X) = 0:

    Y = Pr (Y = 1 |X = x) = E (Y |X) = E (0 + 1X + u |X) = 0 + 1X

    This model is called the linear probability model

    It is simply the linear regression model with a binary dependent variable

    Back to our example: when P/I ratio = 0.3, the predicted probability of deny is 0.2:

    Pr (Deny |P/I ratio = 0.3) = 0 + 1 0.3 = 0.2

    In other words, if there were many applications with P/I ratio = 0.3, then 20% of them would

    be denied

    Note that 1 is the change in the predicted probability that Y = 1 for a unit increase in X

    5

  • Ex: full HMDA data set

    Deny = .080 + 0.604 P/I ratio(0.32) (0.80)

    Measuring the effect of increasing P/I ratio by 1 doesnt make much sense

    Instead, what is the effect of increasing P/I ratio from .3 to .4?

    The predicted value for P/I ratio = .3 is

    P r (Deny |P/I ratio = .3) = .080 + .604 .3 = 0.101

    The predicted value for P/I ratio = .4 is

    P r (Deny |P/I ratio = .4) = .080 + .604 .4 = 0.162

    Thus, the effect of increasing the P/I ratio from .3 to .4 is to increase the probability of denial

    by 0.061, that is, by 6.1 percentage points

    More simply, we can calculate the effect as 1 0.1 = 0.0616

  • Linear probability model: HMDA data, ctd.

    Next include black as a regressor:

    Deny = .091 + 0.559 P/I ratio + 0.177 black(0.32) (0.098) (0.025)

    What is the difference in the probability of denial for a black person versus a white person?

    For a black applicant with P/I ratio = .3:

    P r (Deny = 1) = .091 + .559 .3 + .177 1 = .254

    For a white applicant with P/I ratio = .3:

    P r (Deny = 1) = .091 + .559 .3 + .177 0 = .077

    The difference = 0.177 = 17.7 percentage points (the value of 2)

    7

  • The linear probability model, ctd.

    The linear probability model is easy to estimate and to interpret

    But the LPM says that the change in the predicted probability for a given change in X is the

    same for all values of X

    Is this reasonable?

    Further, the predicted probabilities of the LPM can be < 0 or > 1!

    To overcome these shortcomings, people use the nonlinear probability models probit and logit

    8

  • Section 11.2 Probit and logit regression

    The probit and logit models satisfy the following conditions:

    The effect of X on Pr (Y = 1 |X) is nonlinear 0 Pr (Y = 1 |X) 1 for all X

    The probit model satisfies these conditions: 0 Pr(Y = 1|X) 1 for all X Pr(Y = 1|X) to be increasing in X (for 1>0)

    11-11

    9

  • The probit regression models the probability that Y = 1 using the cumulative standard

    normal distribution function, (z), evaluated at z = 0 + 1X :

    Pr (Y = 1 |X) = (0 + 1X)

    where is the cumulative normal distribution function and z = 0 + 1X is the z-value

    Ex. Suppose 0 = 2, 1 = 3, X = .4:

    Pr (Y = 1 |X = .4) = (2 + 3 .4) = (0.8) = 0.2119

    10

  • STATA Example: HMDA dataSTATA Example: HMDA data

    . probit deny p_irat, r; Iteration 0: log likelihood = -872.0853 Well discuss this later Iteration 1: log likelihood = -835.6633 Iteration 2: log likelihood = -831.80534 Iteration 3: log likelihood = -831.79234 Probit estimates Number of obs = 2380 Wald chi2(1) = 40.68 Prob > chi2 = 0.0000 Log likelihood = -831.79234 Pseudo R2 = 0.0462 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.967908 .4653114 6.38 0.000 2.055914 3.879901 _cons | -2.194159 .1649721 -13.30 0.000 -2.517499 -1.87082 ------------------------------------------------------------------------------ nPr( 1 | / )deny P Iratio= = (-2.19 + 2.97P/I ratio)

    (.16) (.47)

    11-15

    Pr (Deny = 1 | P/I ratio) = ( 2.19 + 2.97 P/I ratio )(0.16) (0.47)

    11

  • STATA Example: HMDA data, ctd.

    Pr (Deny = 1 | P/I ratio) = ( 2.19 + 2.97 P/I ratio )(0.16) (0.47)

    Positive coefficient: does this make sense?

    Standard errors have the usual interpretation

    Predicted probabilities:

    Pr (Deny = 1 | P/I ratio = .3) = (2.19 + 2.97 .3)= (1.3) = .097

    Pr (Deny = 1 | P/I ratio = .4) = (2.19 + 2.97 .4)= (1.0) = 0.159

    The effect of increasing P/I ratio from 0.3 to 0.4 on the probability of denial is .159 .097 =0.062 (6= 1 0.1!)

    12

  • STATA Example: HMDA data, multiple regressors

    11-18

    STATA Example: HMDA data . probit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Iteration 1: log likelihood = -800.88504 Iteration 2: log likelihood = -797.1478 Iteration 3: log likelihood = -797.13604 Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------

    Well go through the estimation details later

    13

  • STATA Example, ctd.: Predicted probit probabilities

    11-19

    STATA Example, ctd.: predicted probit probabilities

    . probit deny p_irat black, r; Probit estimates Number of obs = 2380 Wald chi2(2) = 118.18 Prob > chi2 = 0.0000 Log likelihood = -797.13604 Pseudo R2 = 0.0859 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 2.741637 .4441633 6.17 0.000 1.871092 3.612181 black | .7081579 .0831877 8.51 0.000 .545113 .8712028 _cons | -2.258738 .1588168 -14.22 0.000 -2.570013 -1.947463 ------------------------------------------------------------------------------ . sca z1 = _b[_cons]+_b[p_irat]*.3+_b[black]*0; . display "Pred prob, p_irat=.3, white: " normprob(z1); Pred prob, p_irat=.3, white: .07546603 NOTE

    _b[_cons] is the estimated intercept (-2.258738) _b[p_irat] is the coefficient on p_irat (2.741637) sca creates a new scalar which is the result of a calculation display prints the indicated information to the screen

    b[ cons] is the estimated intercept

    b[p irat] is the coefficient on p irat

    sca creates a scalar which equals the result of a calculation

    display prints the indicated information to the screen

    14

  • STATA Example, ctd.

    Pr (Deny = 1 | P/I ratio) = ( 2.26 + 2.74 P/I ratio + 0.71 black )(0.16) (0.44) (0.08)

    Is the coefficient on black statistically significant?

    Predicted probabilities:

    Pr (Deny = 1 | P/I ratio = .3, black = 1) = (2.26 + 2.74 .3 + .71 1) = .233

    Pr (Deny = 1 | P/I ratio = .3, black = 0) = (2.26 + 2.74 .3 + .71 0) = .075

    Difference in rejection probabilities is 0.158 (15.8 percentage points)

    15

  • Logit Regression

    Logit regression models the probability of Y = 1 givenX using the logistic distribution function,

    , evaluated at z = 0 + 1X :

    Pr (Y = 1 |X) = (0 + 1X)

    The logistic distribution function is:

    (0 + 1X) =1

    1 + e(0+1X)

    Ex. 0 = 3, 1 = 2, X = .4

    Pr (Y = 1 |X = .4) = 11 + e2.2

    = .0998

    16

  • Why bother with logit if we have probit?

    The main reason is historical: logit is computationally faster and easier but this doesnt matter

    so much nowadays

    In practice, logit and probit are very similar - since empirical results typically dont hinge on

    the logit/probit choice, both tend to be used in practice

    In more complicated situations, though, extensions of the logit model work better than exten-

    sions of the probit model

    17

  • The predicted probabilities from the probit and logit models are very close in these HMDA

    regressions (as is usual):Predicted probabilities from estimated probit and logit models usually are (as usual) very close in this application.

    11-24

    18

  • STATA Example: HMDA data

    11-23

    STATA Example: HMDA data . logit deny p_irat black, r; Iteration 0: log likelihood = -872.0853 Later Iteration 1: log likelihood = -806.3571 Iteration 2: log likelihood = -795.74477 Iteration 3: log likelihood = -795.69521 Iteration 4: log likelihood = -795.69521 Logit estimates Number of obs = 2380 Wald chi2(2) = 117.75 Prob > chi2 = 0.0000 Log likelihood = -795.69521 Pseudo R2 = 0.0876 ------------------------------------------------------------------------------ | Robust deny | Coef. Std. Err. z P>|z| [95% Conf. Interval] -------------+---------------------------------------------------------------- p_irat | 5.370362 .9633435 5.57 0.000 3.482244 7.258481 black | 1.272782 .1460986 8.71 0.000 .9864339 1.55913 _cons | -4.125558 .345825 -11.93 0.000 -4.803362 -3.447753 ------------------------------------------------------------------------------ . dis "Pred prob, p_irat=.3, white: " > 1/(1+exp(-(_b[_cons]+_b[p_irat]*.3+_b[black]*0))); Pred prob, p_irat=.3, white: .07485143 NOTE: the probit predicted probability is .07546603

    The predicted probability from the probit model was 0.075

    19

  • Section 11.3 Estimation and Inference in the Logit and Probit Models

    Probit estimation by nonlinear least squares

    Nonlinear least squares extends the idea of OLS to models in which the parameters enter

    nonlinearly:

    minb0, b1

    ni=1

    (Yi (b0 + b1Xi))2

    How can we solve this minimization problem?

    There is no explicit solution - we cant write the estimators as a function of the sample data

    The estimators are found by solving the problem numerically on a computer (using specialized

    minimization algorithms)

    The estimators are consistent and asymptotically normally distributed

    In practice, nonlinear least squares isnt used since a more efficient estimator (smaller variance)

    exists

    20

  • The Maximum Likelihood Estimator

    The likelihood function is the conditional density of Y1, . . . , Yn given X1, . . . , Xn, treated as a

    function of the unknown parameters (0 and 1 in the probit model)

    The maximum likelihood estimator (MLE) of the probit model is the value of (0, 1) that

    maximizes the likelihood function

    The maximum likelihood estimator (MLE) is the value

    of (0, 1) that best describes the distribution of the sample data

    In large samples, the MLE is:

    consistent

    normally distributed

    efficient (has the smallest variance of all estimators)

    Inference is as usual: hypothesis testing via t-statistic, confidence interval as 1.96SE21

  • MLE for a binary dependent variable (no X)

    Y =

    1 with probability p0 with probability 1 pThat is, Y has a Bernoulli distribution. The goal is to estimate the unknown parameter p.

    Data: Y1, . . . , Yn, i.i.d.

    Lets start by deriving the density function of Y1:

    Pr (Y1 = 1) = p and Pr (Y1 = 0) = 1 p

    so

    Pr (Y1 = y1) = py1 (1 p)1y1

    22

  • Now lets find the joint density of (Y1, Y2). Because Y1 and Y2 are independent:

    Pr (Y1 = y1, Y2 = y2) = Pr (Y1 = y1) Pr (Y2 = y2)

    = py1 (1 p)1y1 py2 (1 p)1y2

    = py1+y2 (1 p)2y1y2

    Generally, the joint density of (Y1, Y2, . . . , Yn) is:

    Pr (Y1 = y1, Y2 = y2, . . . , Yn) = Pr (Y1 = y1) Pr (Y2 = y2) Pr (Yn = yn)

    = py1 (1 p)1y1 py2 (1 p)1y2 pyn (1 p)1yn

    = pn

    i=1 yi (1 p)nn

    i=1 yi

    23

  • The likelihood function is the joint density, treated as a function of the unknown parameters,

    which here is p:

    f (p;Y1, Y2, . . . , Yn) = pn

    i=1 yi (1 p)nn

    i=1 yi

    The MLE maximizes this likelihood function.

    In practice, its easier to work with the logarithm of the likelihood, ln (f (p;Y1, Y2, . . . , Yn)):

    ln (f (p;Y1, Y2, . . . , Yn)) =

    ni=1

    yi ln (p) +

    (n

    ni=1

    yi

    )ln (1 p)

    Maximize the likelihood function by setting the derivative with respect to p equal to 0:

    d ln (f (p;Y1, Y2, . . . , Yn))

    dp=

    1

    p

    ni=1

    yi 11 p

    (n

    ni=1

    yi

    )= 0

    Solving for p yields the MLE, pMLE

    24

  • 1pMLE

    ni=1

    yi 11 pMLE

    (n

    ni=1

    yi

    )= 0

    or

    Y

    1 Y=

    pMLE

    1 pMLE

    So

    pMLE = Y = the fraction of observations with Y = 1

    Whew...a lot of work to get back to the first thing you might think of using...but the nice thing

    is that this whole approach generalizes to more complicated models.

    Now we apply MLE to probit

    25

  • The Probit and Logit MLE

    The derivation starts with the density of Y1 given X1:

    Pr (Y1 = 1 |X1) = (0 + 1X1) and Pr (Y1 = 0 |X1) = 1 (0 + 1X1)

    so

    Pr (Y1 = y1 |X1) = (0 + 1X1)y1 (1 (0 + 1X1))1y1

    The probit likelihood function is the joint density of Y1, . . . , Yn given X1, . . . , Xn:

    f (0, 1; Y1, . . . , Yn |X1, . . . , Xn) =

    (0 + 1X1)y1 (1 (0 + 1X1))1y1 (0 + 1Xn)yn (1 (0 + 1Xn))1yn

    MLE0 and MLE1 maximize this likelihood function

    But we cant solve for the estimators explicitly...the MLE is maximized using numerical methods

    To find the logit MLE, simply take the probit likelihood function and replace with

    26

  • Measures of Fit for Logit and Probit

    R2 doesnt work well in binary dependent variable models as it tells us very little about how

    well the model explains behavior

    Reason: Yi can take on only 0 or 1 but Yi is continuous so Yi is likely very different than Yi

    Two other measures that are used:

    1. The fraction correctly predicted equals the fraction of Yis for which the predicted

    probability is > 50% when Yi = 1 or is < 50% when Yi = 0

    2. The pseudo-R2 measures the improvement in the value of the log likelihood relative to

    the Bernoulli log likelihood (i.e., no X s):

    pseudo-R2 = 1 ln(fmaxprobit

    )ln (fmaxBernoulli)

    ,

    where ln(fmaxprobit

    )is the value of the maximized probit likelihood and ln (fmaxBernoulli) is the

    value of the maximized Bernoulli likelihood

    27

  • Ex. fraction correctly predicted

    obs Yi correctlypredicted?1 0 0.40 yes

    2 1 0.72 yes

    3 0 0.55 no

    4 1 0.44 no

    5 1 0.55 yes

    numbercorrectlypredicted: 3numberofobservations: 5fractioncorrectlypredicted: 0.6

    28

  • pseudo-R2

    Its a bit hard to see how the pseudo-R2 works so lets rewrite the formula in a slightly different

    way

    Note that ln(fmaxprobit

    )< 0 and ln (fmaxBernoulli) < 0

    Thus, we can re-write the pseudo-R2 as

    pseudo-R2 = 1 ln(fmaxprobit

    )ln (fmaxBernoulli)

    = 1 | ln(fmaxprobit

    ) || ln (fmaxBernoulli) |

    29

  • Section 11.4 Application to the Boston HMDA Data

    Mortgages (home loans) are an essential part of buying a home

    Question: is it harder for a black person to get a loan than a white person?

    Specifically: if two otherwise identical individuals, one white and one black, applied for a home

    loan, is there a difference in the probability of denial?

    The mortgage application process in the US circa 1990-1991:

    Go to a bank or mortgage company

    Fill out an application (personal and financial info)

    Meet with the loan officer

    Then the loan officer decides - by law, without considering race. Presumably, the bank wants

    to make profitable loans and the loan officer doesnt want to originate defaults

    30

  • The Loan Officers Decision

    Loan officer uses key financial variables:

    P/I ratio

    housing expense-to-income ratio

    loan-to-value ratio

    personal credit history

    The decision rule is nonlinear:

    loan-to-value ratio > 80%

    loan-to-value ratio > 95% (what happens in default?)

    credit score

    31

  • Regression Specifications

    Pr (deny = 1 | black, other Xs) = . . .

    linear probability model

    probit

    logit

    Main problem with the regressions so far: potential omitted variable bias. The following

    variables (i) enter the loan officer decision and (ii) are correlated with race:

    wealth, type of employment

    credit history

    family status

    Fortunately, the HMDA data set is very rich, containing data on individual characteristics,

    property characteristics, and loan denial/acceptance

    32

  • 11-48

    33

  • Table 11, ctd.

    11-4934

  • 11-50

    35

  • Table 12, ctd.Table 11.2, ctd.

    11-51

    36

  • Table 12, ctd.Table 11.2, ctd.

    11-52

    37

  • Summary of Empirical Results

    Coefficients on the financial variables make sense

    Black is statistically significant in all specifications

    Race-financial variable interactions arent significant

    Including the covariates sharply reduces the effect of race on denial probability

    LPM, probit, logit: similar estimates of effect of race on the probability of denial

    Estimated effects are large in a real world sense

    38