1 topic 1 binary logit models. 2 often variables in social sciences are dichotomous: employed vs....

71
1 Topic 1 Binary Logit Models

Upload: claribel-greene

Post on 25-Dec-2015

217 views

Category:

Documents


2 download

TRANSCRIPT

Page 1: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

1

Topic 1

Binary Logit Models

Page 2: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

2

Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent; Voted vs. didn’t vote

Page 3: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

3

Social scientists frequently wish to estimate regression models with a dichotomous dependent variable

Most researchers are aware thatThere is something wrong with using OLS for a dichotomous dependent variable;But they do not know what makes dichotomous dependent variable problematic in standard linear regression; andWhat other methods are superior

Page 4: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

4

Focus of this topic is on logit analysis (or logistic regression analysis) for dichotomous dependent variable

Logit models have many similarities to OLS regression models

Examine why OLS regression run into problems when the dependent variable is dichotomous

Page 5: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

5

Example

Dataset: penalty.txt Comprises 147 penalty cases in the state of

New Jersey In all cases the defendant was convicted of first-

degree murder with a recommendation by the prosecutor that a death sentence be imposed

Penalty trial is conducted to determine if the defendant should receive a death penalty or life imprisonment

Page 6: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

6

The dataset comprises the following variables:

DEATH 1 for a death sentence

0 for a life sentence

BLACKD 1 if the defendant was black

0 otherwise

WHITVIC 1 if the victim was white

0 otherwise

SERIOUS – an average rating of seriousness of the crime evaluated by a panel of judges, ranging from 1 (least serious) to 15 (most serious)

Page 7: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

7

DATA PENALTY;

INFILE ‘D:\TEACHING\MS4225\PENALTY.TXT;

INPUT DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;

PROC REG;

MODEL DEATH=BLACKD WHITVIC SERIOUS;

RUN;

Page 8: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

8

Page 9: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

9

Remarks on OLS regression output: The coefficient for SERIOUS is positive and very

significant Neither of the two racial variables are significantly

different from zero R2 is very low

F-test indicates overall significance of the model Should we trust these results?

Page 10: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

10

Assumptions of the linear regression model:1. 2. 3. (homoscedasticity)4. (absence of autocorrelation)5. ’s are treated as fixed6. ~ Normal

If assumptions 1-5 are satisfied, then OLS estimators of and are B.L.U.

If all assumptions are satisfied, then OLS estimators of and are M.V.U.

iii xY

iiE 0)(ii )var( 2

ji 0),cov( ji ix

i

Page 11: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

11

Now, what if y is a dichotomy with possible values of 1 or 0?

It is still possible to claim that assumptions 1, 2, 4 and 5 are true

But if 1 and 2 are true then 3 and 6 are necessarily false!!

Page 12: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

12

Consider assumption 6 Note that

If

It is obvious that cannot be normally distributed. In fact, it follows a Binomial distribution

So and are also not normally distributed. Standard inference procedures are no longer valid as a consequence

But in large samples, Binomial distribution tends towards the Normal distribution

iii xy iii xy 1 ,1

iii xy ,0

i

Page 13: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

13

Consider assumption 3: Note that

But from Assumptions 1 and 2,

Therefore, Linear probability model (LPM)

)0Pr(0)1Pr(1)( iii yyyE

i

ii

p

pp

)1(01

)()( iii xEyE

i

ii

x

Ex

)(

ii xp

Page 14: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

14

Accordingly, from our previous output, a 1-point increase in the SERIOUS scale is associated with a 0.038 increase in the probability of a death sentence; the probability of a death sentence for black defendants is 0.12 higher than for non-black defendants, ceteris paribus

Page 15: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

15

So, must be heteroscedastic. The disturbance variance is at a maximum when

]))([()var( 2iii EE

)1(

)1)((

)()1()1()(

)1()1()(

)(

22

22

2

ii

ii

iiii

iiii

pp

xx

xxxx

pxpx

Ei

i5.0ip

Page 16: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

16

So, what are the consequences? Violation of assumptions 3 and 6 does not lead to

biased estimation by OLS (only assumptions 1 and 2 are required for OLS to yield unbiased estimators)

If the sample size is large enough, the estimators will be approximately normal even when are not normally distributed.

Voilation of the homoscedasticity assumption makes the OLS estimators no longer efficient. In addition, the estimated standard errors are biased.

si '

Page 17: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

17

Also, the model

is implausible, because is a linear function of

and therefore has no upper or lower bound. But it is impossible for the true values (which are probabilities) to be greater than 1 or less than 0!

ii xp ip

ix

Page 18: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

18

Odds of an event: the ratio of the expected number of times that an event will occur to the expected number of times it will not occur, (e.g. an odds of 4 means we expect 4 times as many occurrences as non-occurrences; an odds of 5/2 (or 5 to 2) means we expect 4 occurrences to 2 non-occurrences.

Page 19: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

19

Let p be the probability of an event and 0 the odds of the event, then

or

p

po

1

o

op

1

Page 20: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

20

Relationship between Odds and ProbabilityProbability Odds0.1 0.110.2 0.250.3 0.430.4 0.670.5 1.000.6 1.500.7 2.330.8 4.000.9 9.00

•o<1 => p<0.5•o>1 => p>0.5•0 < o < ∞

Page 21: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

21

Blacks non-blacksDeath 28 22 50Life 45 52 97Total 73 74 147

∴ Ratio of black-odds to non-black odds are:=> The odds of death sentence for blacks are 47.6% higher than

for non-blacks, or the odds of death sentence for non-blacks are 0.63 times the corresponding odds for non-blacks.

42.052

22

62.045

28

52.097

50

|

|

NBD

BD

D

O

O

O

476.142.0

62.0

Death Sentence by Race of Defendant for 147 Penalty Trials.

Page 22: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

22

Logit model:

which is the cumulative logistic distribution function. Let , then

Notes: As Zi ranges from -∞ to +∞, Pi ranges between 0 and

1; Pi is non-linearly related to Zi

)(1

1ixi e

p

ii xZ

)(1

1iZi xF

ep

i

Page 23: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

23

Also,

(the odds of the event)

Let

Although Li is linear in Xi, the probabilities themselves are not. This is in contrast to LPM.

iZ

i

i ep

p

1

)1

ln(i

ii p

pL

i

i

x

Z

Page 24: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

24

Graph of logit model for a single explanatory variable

(produce a graph using = 0 and = 1)

-4 -3 -2 -1 0 1 2 3 4

Pi1.00.90.80.70.60.50.40.30.20.1 0

Page 25: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

25

Now

As f(+Xi) is always positive, the sign of indicates the direction of relationship between pi and Xi

i

i

i

i

ii

x

xF

X

p

xFp

)(

)(

)(

)('

i

i

xf

xF

Page 26: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

26

For the LOGIT model,

In other words, a 1-unit change in Xi does not produce a constant effect on pi

2)1()(

i

i

Z

Z

i e

exf

)1(

)](1)[(

ii

ii

pp

xFxF

)1( iii

i ppx

p

Page 27: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

27

Note that yi only takes on values of 0 and 1, so Li is not defined. Therefore, OLS is not an appropriate estimation technique. Maximum Likelihood (ML) estimation is usually undertaken.

ML basic principle: to choose as estimates those parameter values which would maximize the probability of what we have in fact observed.

Page 28: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

28

Steps: Write down an expression for the probability

of the data as a function of the unknown parameters [construction of likelihood function]

Find the values of the unknown parameters that make the value of this expression as large as possible.

Page 29: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

29

i

ii i

ii p

p

pyL )1log(

1loglog

i

xiii

i ix

xx

i

i

i

i

i

eyxy

e

eey

)1log(

1log)log( )(

)(

Taking the derivatives of log L and setting them to zero give:

log 1)ˆˆ(

iii

i

xi

yy

eyL

i

Page 30: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

30

The first order conditions are non-linear in and , so solutions are typically obtained by iterative methods.

log 1)ˆˆ(

iiiii

i

xiii

yxyx

exyxL

i

Page 31: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

31

Newton-Raphson algorithm

Let U(,) be the vector of first derivatives of log L with respect to and and let I(,) be the corresponding matrix of the second derivatives.

i.e.

iii

iii

ii

ii

yxyx

yyU

ˆ

ˆ),( gradient or

score vector

Page 32: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

32

Hessian

matrix

log

log

log

log

,22

22

LL

LL

H

iiiiiii

iiii

ii

yyxyyx

yyxyy

)ˆ1(ˆ)ˆ1(ˆ

)ˆ1(ˆ)ˆ1(ˆ

2

Page 33: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

33

The Newton-Raphson algorithm derives new estimates based on

where is the inverse of In practice, we need a set of starting values.

[PROC LOGISTIC starts with all coefficients equal to zero]

),(),(ˆˆ

ˆˆ

1

1

1

UHj

j

j

j

),(1 H ),( H

Page 34: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

34

The process is repeated until the maximum change in each parameter estimate from one step to the next is less than some criterion.

i.e.

and

ujj ˆˆ 1

ujj ˆˆ1

Page 35: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

35

Note that

This variance-covariance matrix can be obtained using the COVB option in the MODEL statement in SAS

)ˆ,ˆ()ˆ,ˆcov( 1 H

Page 36: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

36

SAS Program

DATA PENALTY;INFILE ‘D:\TEACHING\MS4225\PENALTY.TXT’;INPUT DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;PROC LOGISTIC DATA=PENALTY DESCENDING;MODEL DEATH=BLACKD WHITVIC SERIOUS;RUN;

Page 37: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

37

Page 38: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

38

Page 39: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

39

Interpretation of results

Rather than a t-statistic SAS reports a Wald Chi-square value, which is the square of the usual t-statistic.

Reason: the t-statistic is only an asymptotic one and has an “asymptotic” N(0,1) distribution under null. The square of a N(0,1) is a chi-square random variable with one df.

Page 40: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

40

Test of overall significance

1. Likelihood-Ratio test:

2. Score (Lagrange-Multipler) test

3. Wald test:

otherwise :

0...:

1

210

H

H k

2ˆ ˆ2{ln ( ) ln ( )} ~R UR kLR L L

21 ~)]ˆ()][ˆ([)]'ˆ([ kRRR UHULM

2' ~ˆ)]ˆ([ˆkURURUR HW

Page 41: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

41

Model Selection Criteria

1. Akaike’s Information CriterionAIC = -2 ln L + 2 *(k+1)

2. Schwartz Criterion

SC = -2 ln L + (k+1)*ln(n)

3. Generalized R2=

analogous to conventional R2 used in linear regression

n

LRexp1

Page 42: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

42

Optimization Technique

Fishers’ scoring (Iteratively reweighted least squares) – equivalent to Newton-Raphson algorithm.

Page 43: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

43

Odds ratio = e

The (predicted) odds ratio of 1.813 indicates that the odds of a death sentence for black defendants are 81% higher than the odds for other defendants

The (predicted) odds of death are about 29% higher when the victim is white. (But note that the coefficient is insignificant)

Page 44: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

44

a 1-unit increase in the SERIOUS scale is associated with a 21% increase in the predicted odds of a death sentence.

Page 45: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

45

Association of predicted probabilities and observed responses

Example: For the 147 observations in the sample, there are 147C2= 10731 ways to pair them up (without pairing an observation with itself). Of these, 5881 pairs have either both 1’s on the dependent variable or both 0’s. These we ignore, leaving 4850 pairs in which one case has a 1 and the other case has a zero. For each pair, we ask the question “Does the case with a 1 have a higher predicted value (based on the model) than the case with a 0?

Page 46: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

46

If yes, we call that pair concordant; if no, we call that pair discordant; if the two cases have the same predicted value, we call it a tie.

Let C = number of concordant pairs;

D = number of discordant pairs;

T = number of ties

N = total number of pairs (before

eliminating any)

Page 47: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

47

All 4 measures vary between 0 and 1 with large values corresponding to stronger associations between the predicted and observed values

DC

DCN

DC

Gamma

a-Tau

D) sSomer'1(5.0

D sSomer'

CTDC

DC

Page 48: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

48

An Illustrative example of LOGIT model

Table 12.4 of Ramanathan (1995) presents information on the acceptance or rejection to medical school for a sample of 60 applicants, along with a number of their characteristics. The variables are as follows:Accept =1 if granted an acceptance, 0 otherwise;GPA = cumulative undergraduate grade point averageBI0 = score in the biology portion of the Medical College Admission Test (MCAT);CHEM = score in the chemistry portion of the MACT;PHY = score in the physics portion of the MCAT;RED = score in the reading portion of the MCAT;

Page 49: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

49

PRB = score in the problem portion of the MCAT;

QNT = score in quantitative portion of the MCAT;

AGE = age of applicant;

GENDER = 1 if male, 0 if female;

1. Estimate a LOGIT model for the probability of acceptance into medical school

2. Predict the probability of success of an individual with the following characteristicsGPA = 2.96

BIO = 7

CHEM = 7

Page 50: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

50

PHY = 8RED = 5PRB = 7QNT = 5AGE = 25GENDER = 0

3. Calculate Cragg and Uhler’s pseudo R2 for the above model. How well does the model appear to fit the data?

4. AGE and GENDER represent personal characteristics. Test the hypothesis that AGE and GENDER jointly have no impact on the probability of success.

Page 51: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

51

DATA UNI; INFILE 'D:\TEACHING\MS4225\MEDICAL.TXT';INPUT ACCEPT GPA BIO CHEM PHY RED PRB QNT

AGE GENDER;PROC LOGISTIC DATA=UNI DESCENDING;MODEL ACCEPT=GPA BIO CHEM PHY RED PRB QNT

AGE GENDER;RUN;PROC LOGISTIC DATA=UNI DESCENDING;MODEL ACCEPT=GPA BIO CHEM PHY RED PRB QNT;RUN;

Page 52: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

52

Page 53: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

53

Page 54: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

54

Page 55: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

55

Page 56: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

56

LOGIT alternative estimation technique

DATA PENALTY;INFILE 'D:\TEACHING\MS4225\PENALTY.TXT';INPUT DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;PROC GENMOD DATA=PENALTY DESCENDING;MODEL DEATH=BLACKD WHITVIC SERIOUS/D=B;RUN;

Page 57: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

57

Page 58: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

58

Advantages of PROC GENMOD

Class variable

DATA PENALTY;

INFILE 'D:\TEACHING\MS4225\PENALTY.TXT';

INPUT DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;

PROC GENMOD DATA=PENALTY DESCENDING;

CLASS CULP;

MODEL DEATH=BLACKD WHITVIC CULP/D=B;

RUN;

Page 59: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

59

Variable CULP takes the integer values 1 to 5 (5 notes high culpability and 1 denotes low culpability)

The CLASS option treats this variable as a set of categories by creating 4 dummy variables, one for each of the values 1 through 4 (the default in GENMOD is to take the highest value as the omitted category)

Page 60: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

60

Page 61: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

61

Page 62: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

62

Multiplicative terms in the MODEL statement (to capture interaction effects)

For example, some people may argue that black defendants who kill white victims may be especially likely to receive a death sentence. We can test this hypothesis by including the variable BLACK*WHITVIC in the model statement

Page 63: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

63

DATA PENALTY;INFILE 'D:\TEACHING\MS4225\PENALTY.TXT';INPUT DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;PROC GENMOD DATA=PENALTY DESCENDING;MODEL DEATH=BLACKD WHITVIC CULP BLACKD*WHITVIC/D=B;RUN;

Page 64: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

64

Page 65: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

65

Other features of PROC GENMOD

Deviance = -2 ln L (for individual data) Involves a comparison between the model of

interest and the maximal (or saturated) model which always fit the data better. The question is whether the difference in fit is statistically significant.

With individual data, the saturated model has one parameter for every predicted probability and therefore gives a perfect fit and a likelihood value of 1.

Page 66: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

66

Unfortunately, with individual level data the deviance does not have a chi-square distribution because the number of parameters increases with sample size, thereby violating a condition of asymptotic theory.

SCALE variable (can be ignored for binary regression models unless one is working with grouped data and want to allow for over dispersion)

Pearson Chi-square test (to be considered at a later stage)

Page 67: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

67

Disadvantages of PROC GENMOD

Does not provide the odds ratio estimates Does not report a global test for the overall

significance of model.

Page 68: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

68

Hosmer-Lemeshow (HL) statistic

DATA PENALTY;

INFILE 'D:\TEACHING\MS4225\PENALTY.TXT';

INPUT DEATH BLACKD WHITVIC SERIOUS CULP SERIOUS2;

PROC LOGISTIC DATA=PENALTY DESCENDING;

MODEL DEATH=BLACKD WHITVIC CULP/LACKFIT;

RUN;

Page 69: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

69

Page 70: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

70

Page 71: 1 Topic 1 Binary Logit Models. 2 Often variables in social sciences are dichotomous: Employed vs. unemployed; Married vs. unmarried; Guilty vs. innocent;

71

The HL statistic is calculated in the following way: Based on the estimated model, predicted probabilities

are generated for all observations. These are sorted by size, then grouped into approximately 10 intervals. Within each interval, the expected frequency is obtained by adding up the predicted probabilities. Expected frequencies are compared with the observed frequency by the conventional Pearson chi-square statistic. The d.o.f. is the number of intervals minus 2.