logit model, logistic regression, and log-linear model a comparison

46
Logit model, logistic regression, and log- linear model A comparison

Upload: victor-mcdowell

Post on 30-Dec-2015

240 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Logit model, logistic regression, and log-linear model A comparison

Logit model, logistic regression, and log-linear model

A comparison

Page 2: Logit model, logistic regression, and log-linear model A comparison

R o w i , C o l u m n j S e x : A , B

u u u u ln ABij

Bj

Aiij

o r

o r

w i t h A T I M E [ e a r l y = 0 ; l a t e = 1 ] a n d B S E X [ f e m a l e = 0 ; m a l e = 1 ]

E A R L Y i s r e f e r e n c e c a t e g o r y

... ln xxx 3322110

ijjiij ln

Leaving home

Models of counts: log-linear model

Page 3: Logit model, logistic regression, and log-linear model A comparison

Model 1: null model

= 4.887 ij = 133.5 for all i and j (=530/4)

Model 2: + TIME

= 4.649

i = 0.4291

ln = exp[4.649 + 0.4291 t] 104.5 for ‘early’ (t=0) and 160.5 for ‘late’ (t=1)

or

ln = exp[4.649] = 104.5 for early

ln = exp[4.649 + 0.4291] = 160.5 for late

Leaving home

Page 4: Logit model, logistic regression, and log-linear model A comparison

M o d e l 3 : T I M E A N D S E X

= 4 . 6 9 7 ; 2 = 0 . 4 2 9 1 ; 2 = - 0 . 0 9 8 2

R e f e r e n c e c a t e g o r i e s : ‘ e a r l y ’ [ 1 = 0 ] a n d ‘ F e m a l e s ’ [ 1 = 0 ]

jiij ln

TablePredicted number of young adults leaving home by age and sex

(unsaturated log-linear model)Females Males Total

< 20 109.6 99.4 209

20 168.4 152.6 321

Total 278 252 530

Leaving home

Page 5: Logit model, logistic regression, and log-linear model A comparison

11 = exp[4.697] = 109.6

21 = exp[4.697 + 0.4291] = 168.4

12 = exp[4.697 - 0.0982] = 99.4

22 = exp[4.697 + 0.4291 - 0.0982] = 152.8

Model 3: Time and Sex (unsaturated log-linear model)

jiij ln

jiij exp

Leaving home

Page 6: Logit model, logistic regression, and log-linear model A comparison

M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n( S a t u r a t e d l o g - l i n e a r m o d e l

= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 6 T I M E 2 = - 0 . 6 0 1 2 G E N D E R 2 2 = 0 . 8 2 0 1 T I M E * G E N D E R

o r

1 i = 0 f o r < 2 0x 1 i = 1 f o r 2 0

x 2 i = 0 f e m a l e sx 2 i = 1 m a l e s

x 3 i = 0 < 2 0 a n d f e m a l e sx 3 i = 0 < 2 0 a n d m a l e sx 3 i = 0 2 0 a n d f e m a l e sx 3 i = 1 2 0 a n d m a l e s

S a t u r a t e d m o d e l p r e d i c t s p e r f e c t l y

i jjii j ln

x i332 i21 i10ij ln xx

Leaving home

Page 7: Logit model, logistic regression, and log-linear model A comparison

M o d e l 4 : T I M E A N D S E X A N D T I M E * S E X i n t e r a c t i o n

= 4 . 9 0 5 o v e r a l l e f f e c t 2 = 0 . 0 5 7 5 7 T I M E ( 2 ) 2 = - 0 . 6 0 1 2 S E X ( 2 ) 2 2 = 0 . 8 2 0 1 T I M E ( 2 ) * S E X ( 2 )

ijjiij ln

TablePredicted number of young adults leaving home by age and sex

(saturated log-linear model)Females Males Total

< 20 135 74 209

20 143 178 321

Total 278 252 530

Leaving home

Page 8: Logit model, logistic regression, and log-linear model A comparison

Model 4: TIME AND SEX AND TIME*SEX interaction

11 = exp[4.905

= 135

21 = exp[4.905 + 0.0576]

= 143

12 = exp[4.905 - 0.6012]

= 74

22 = exp[4.905 + 0.0576 - 0.6012 + 0.8201]

= 178

ijjiij ln

ijjiij exp

Leaving home

Page 9: Logit model, logistic regression, and log-linear model A comparison

Log-linear and logit model

Page 10: Logit model, logistic regression, and log-linear model A comparison

Log-linear model: μ ln μμμλAB

ij

B

j

A

iij

Select one variable as a dependent variable: response variable, e.g. does voting behaviour differ by sex

Are females more likely to vote conservative than males?

Logit model: γ ln B

j

2j

1j

λλ γ

Political attitudes

Page 11: Logit model, logistic regression, and log-linear model A comparison

μμμμμμλλ AB

21

B

1

A

2

AB

11

B

1

A

1

21

11 μ μ ln

Males voting conservative rather than labour:

Females voting conservative rather than labour:

μμμμμμλλ AB

22

B

2

A

2

AB

12

B

2

A

1

22

12 μ μ ln

Are females more likely to vote conservative than males?

Log-odds = logit

2 - - ln μ2μμμμμλλ AB

21

A

1

AB

21

AB

11

A

2

A

1

21

11

2 - - ln μ2μμμμμλλ AB

22

A

1

AB

22

AB

12

A

2

A

1

22

12

Effect coding (1)

θγγ B

1

B

1ln

θγγ B

2

B

2ln

A = Party; B = Sex

Political attitudes

Page 12: Logit model, logistic regression, and log-linear model A comparison

Are women more conservative than men? Do women vote more conservative than men? The odds ratio.

γγγγθθ B

1

B

2

B

1

B

2B

1

B

2 - γ γ ln

If the odds ratio is positive, then the odds of voting conservative rather than labour is larger for women than men. In that case, women vote more conservative than men.

0* - γ ln γγγθB

1

B

2

B

1

B

1

1* - γ ln γγγθB

1

B

2

B

1

B

2

bx a p-1

pln ln logit(p) η

pp

2

1 Logit model:

with a = γB

1 γ

and b = γγB

1

B

2

Log odds of reference category (males)

Log odds ratio (odds females / odds males)

with x = 0, 1

Political attitudes

Page 13: Logit model, logistic regression, and log-linear model A comparison

The logit model as a regression model

Page 14: Logit model, logistic regression, and log-linear model A comparison

• Select a response variable proportion

• Dependent variable of logit model is the log of (odds of) being in one category rather than in another.

• Number of observations in each subpopulation (males, females) is assumed to be fixed.

• Intercept (a) = log odds of reference category

• Slope (b) = log odds ratio

Page 15: Logit model, logistic regression, and log-linear model A comparison

DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

Logit model: descriptive statisticsCounts in terms of odds and odds ratio

Male Female TotalOdds 0.8328 1.2096 1.0080Odds ratio (ref.cat: males): 1.4524

Sex

Reference categories: Labour; Males

Party Odds Odds ratioConservative 1.2616Labour 0.8687Total 1.0472 1.4524

F11 = 279

F21 = 335 = 279 * 335/279 = 279 / 0.8328

F12 = 352 = 279 * 352/279 = 279 1.2616

F22 = 291 = 279 * 352/279 * 291/352 = 279 * 1.2616 * [1/1.2096]

Political attitudes

Page 16: Logit model, logistic regression, and log-linear model A comparison

DATA SexParty Male Female TotalConservative 279 352 631Labour 335 291 626Total 614 643 1257

Proportion voting conservative: SexParty Male Female Males Females Conservative 0.454 0.547 0.8328 1.2096

Are females more likely to vote conservative than males?Logit model: logit(p) = a + bX (males reference category)

v exp(v) pln(odds) (odds)

a = -0.18292 0.8328 0.454 Males = 0.833/(1+0.833)b = 0.37323 1.4524 Odds ratioa+b = 0.19031 1.2096 0.547 Females = 1.2096/(1+1.2096)

logit(p) = -0.18292 + 0.37323X (with X = 0 for males and X = 1 for females)

If number of males and number of females are known, the counts can be calculated.

Odds of voting cons. rather than labour

LOGIT MODEL

Political attitudes

Page 17: Logit model, logistic regression, and log-linear model A comparison

Logistic regression SPSS

Variable Param S.E. Exp(param) SEX(1) .3732 .1133 1.4524Constant -.1903 .0792

Females voting labour: 1/[1+exp[-(-0.1903)]] = 45% 291/626 (females ref.cat)Males voting labour: 1/[1+exp[-(-0.1903+0.3732)]] = 55% 335/626

Reference category: females (X = 1 for males and X = 0 for females)

Different parameter coding: X = -0.5 for males and X = 0.5 for females

Variable Param S.E. Exp(param)SEX(1) -.3732 .1133 0.6885 Constant -.0037 .0567

Females voting labour: 1/[1+exp[-(-0.0037 + 0.5*(-0.3732))]] = 45% 291/626Males voting labour: 1/[1+exp[-(-0.0037 - 0.5 * (-0.3732))]] = 55% 335/626

Political attitudes

Page 18: Logit model, logistic regression, and log-linear model A comparison

Observation from a binomial distribution with parameter p and index m

The logit model andthe logistic regression

Leaving parental home

Page 19: Logit model, logistic regression, and log-linear model A comparison

L o g i t m o d e l a n d l o g i s t i c r e g r e s s i o n

N u m b e r o f y o u n g a d u l t s l e a v i n g h o m e e a r l y : 2 0 9T o t a l n u m b e r o f y o u n g a d u l t s l e a v i n g h o m e : 5 3 0P r o b a b i l i t y o f l e a v i n g h o m e e a r l y : 2 0 9 / 5 3 0 = 0 . 3 9 4

R E F E R E N C E C A T E G O R Y : l e a v i n g h o m e l a t e ( l a t e = 0 ; e a r l y = 1 )

O D D S o f l e a v i n g h o m e e a r l y v e r s u s l a t e : 2 0 9 / ( 5 3 0 - 2 0 9 ) = 0 . 6 5 1 1L o g i t o f l e a v i n g h o m e e a r l y : l n 0 . 6 5 1 1 = - 0 . 4 2 9 1

S p e c i f y a m o d e l :

L o g i t m o d e l

0.4291- 0 .394-1

0 .394ln

p-1

pln pLogit

Leaving home

Page 20: Logit model, logistic regression, and log-linear model A comparison

L o g i s t i c r e g r e s s i o n

0.394 (-0.4291)-exp1

1 p

S t a n d a r d e r r o r :

0.0889 321

1

209

1

C o n fi d e n c e i n t e r v a l : - 0 . 4 2 9 1 1 . 9 6 * 0 . 0 8 8 9 = ( - 0 . 6 0 3 , - 0 . 2 5 5 ) O N L O G I T S C A L E

a n d

0.4366) (0.3546, 549)]exp[-(-0.21

1 ,

)][-(-0.6033exp1

1

O N P R O B A B I L I T Y S C A L E

Leaving home

Page 21: Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe unsaturated model

Log-linear model:

With i effect of timing and j effect of sex

Odds of leaving parental home late rather than early: females:

ln jiij

1.536 109.6

168.4

11

21

21ODDS

1.536 0-0.4291exp -exp

exp

exp 2112

11

12

11

21

21ODDS

Leaving home

Page 22: Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe unsaturated model

Odds of leaving parental home late rather than early: males:

1.536 99.4

152.6

12

22

21ODDS

1.536 0-0.4291exp -exp

exp

exp 2112

21

22

12

22

21ODDS

0.0889) (s.e.result same gives modellogit ofOutput

males. and femalesfor 0.4291 Logit pp

early

late

Leaving home

Page 23: Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe saturated model

Log-linear model:

With i effect of timing and j effect of sex and ij the effect of interaction between timing and sex

Odds of leaving parental home late rather than early: females (ref):

ijjiij ln

1.059 135

143

11

21

21ODDS

1.059 0) - (0 0)-(0.0576exp

) - ( ) -exp exp

exp 21112112

1111

2112

11

21

21 (ODDS

Leaving home

Page 24: Logit model, logistic regression, and log-linear model A comparison

Relation logit and log-linear modelThe saturated model

Odds of leaving parental home late rather than early: males:

2.405 74

178

12

22

22ODDS

males)for 1 and femalesfor 0 X(with X 0.8201 0.0573 logit(p) :modellogit

[ref]) females odds / males (odds RATIO ODDS log is 0.8201 0.0573 - 0.8775

malesfor odds log is 0.8775 2.405ln

cat) ref. (females modellogit ofeffect overall is 0.0573 1.059ln

2.405 0) -(0.8201 0)-(0.0576exp

) - ( ) -exp exp

exp 22122212

1221

2222

12

22

22 (ODDS

Leaving home

Page 25: Logit model, logistic regression, and log-linear model A comparison

females 278

143 0.514

0.8201)]-77exp[-(0.871

1 p

males 252

178 0.706

77)]exp[-(0.871

1 p

0.8201X - 0.8777 p-1

pln Logit(p)

Logit model:

Logistic regression: probability of leaving home late

X=0 for males

X=1 for females

Leaving home

Page 26: Logit model, logistic regression, and log-linear model A comparison

T a b l eN u m b e r o f y o u n g a d u l t s l e a v i n g h o m e b y a g e a n d s e x

F e m a l e s M a l e s T o t a l

< 2 0 1 3 5 7 4 2 0 9

2 0 1 4 3 1 7 8 3 2 1

T o t a l 2 7 8 2 5 2 5 3 0

D u m m y c o d i n g : r e f e r e n c e c a t e g o r y : ( i ) f e m a l e s ; ( i i ) l e a v i n g h o m e l a t e

L o g i t m o d e l : xx ii10i

i 0.8201 - 0.05757- p-1

pln pLogit

x i i s 0 f o r f e m a l e s a n d 1 f o r m a l e s

L O G I T p i s – 0 . 0 5 7 5 7 f o r f e m a l e s a n d – 0 . 0 5 7 5 7 – 0 . 8 2 0 1 = - 0 . 8 7 7 7 f o r m a l e s

O D D SF e m a l e s ( r e f e r e n c e ) : e x p [ - 0 . 0 5 7 5 7 ] = 0 . 9 4 4 0 = 1 3 5 / 1 4 3M a l e s : e x p [ - 0 . 8 7 7 7 ] = 0 . 4 1 5 7 = 7 4 / 1 7 8

O D D S R A T I OO D D S m a l e s / O D D S f e m a l e s = e x p [ - 0 . 8 2 0 1 ] = 0 . 4 4 0 4 = 0 . 4 1 5 7 / 0 . 9 4 4 0

A r e m a l e s m o r e l i k e l y t o l e a v e h o m e e a r l y t h a n f e m a l e s ?

Leaving home

Page 27: Logit model, logistic regression, and log-linear model A comparison

L o g i s t i c r e g r e s s i o n

0.486 (-0.05757)-exp1

1 p f

0.294 0.8201) - (-0.05757-exp1

1 p m

xx ii10i

i 0.4101 0.4676- p-1

pln pLogit

x i i s 1 f o r f e m a l e s a n d - 1 f o r m a l e s

L o g i t p i s – 0 . 4 6 7 6 + 0 . 4 1 0 1 = - 0 . 0 5 7 6 f o r f e m a l e s a n d - 0 . 4 6 7 6 + 0 . 4 1 0 1 * ( - 1 ) = - 0 . 8 7 7 7 f o r m a l e s

xx ii10

i

i 0.8201 - 0.05757- p-1

pln pLogit

Dummy coding: ref.cat: females, late

Effect coding or marginal coding: females +1; males –1

Leaving home

Page 28: Logit model, logistic regression, and log-linear model A comparison

The logistic regression in SPSS

Micro data and tabulated data

Page 29: Logit model, logistic regression, and log-linear model A comparison

SPSS: Micro-data

• Micro-data: age at leaving home in months

• Crosstabs: Number leaving home by reason (row) and sex (column)

• Create variable: Age in years• Age = TRUNC[(month-1)/12]

• Create variable: TIMING2 based on MONTH: • TIMING2 =1 (early) if month 240 & reason < 4

• TIMING2 =2 (late) if month > 240 & reason < 4

• For analysis: select cases that are NOT censored: SELECT CASES with reason < 4

Page 30: Logit model, logistic regression, and log-linear model A comparison

SPSS: tabulated data

• Number of observations: WEIGHT cases (in data)

• No difference between model for tabulated data and

micro-data

Page 31: Logit model, logistic regression, and log-linear model A comparison

The logistic regression in SPSS

SPSS: regression/logisticNote: Dependent variable: TIMING2 (p = probability of leaving home LATE)

Covariate: sex (CATEGORICAL)

Logit[p/(1-p)] = 0.8777 – 0.8201 X with males reference categoryMales coded 0; hence X is 1 for females

OUTPUT SPSS:

---------------------- Variables in the Equation -----------

Variable B S.E. Wald df Sig R Exp(B)

SEX(1) -.8201 .1831 20.0598 1 .0000 -.1594 .4404Constant .8777 .1383 40.2681 1 .0000

Leaving home

Page 32: Logit model, logistic regression, and log-linear model A comparison

Related models

• Poisson distribution: counts have Poisson distribution (total number not fixed)

• Poisson regression

• Log-linear model: model of count data (log of counts)

• Binomial and multinomial distributions: counts follow multinomial distribution (total number is fixed)

• Logit model: model of proportions [and odds (log of odds)]

• Logistic regression

• Log-rate model: log-linear model with OFFSET (constant term)

Parameters of these models are related

Page 33: Logit model, logistic regression, and log-linear model A comparison

Construct your own logistic regression model

Page 34: Logit model, logistic regression, and log-linear model A comparison

Specify the logistic regressionfor this observation

• Schoolleavers: 50% are males and 50% are females

• 70% of schoolleavers find a job within a year

• 60% of those who find a job are females

Page 35: Logit model, logistic regression, and log-linear model A comparison

1. Construct table

Table

Durationof search Females Males Total

Less than 1 year 42 28 701 year and more 8 22 30Total 50 50 100

Sex

Duration of job search among schooleavers, by sex

84% of females find a job within a year against 56% of males

Page 36: Logit model, logistic regression, and log-linear model A comparison

2. Determine reference categories

• Duration of job search: One year or more

• Sex: Males

Page 37: Logit model, logistic regression, and log-linear model A comparison

3. Odds ratios

• Males (ref. Cat): 28/22 = 1.278

• Females: 42/8 = 5.250

• Odds ratio: 5.250/1.278 = 4.125

Page 38: Logit model, logistic regression, and log-linear model A comparison

Logit model

• p = probability of finding a job within a year

• Logit(p) = ln[p/(1-p)] = a + b x • with x Sex (0 for males and 1 for females)

– a = ln 1.273 = 0.241– b = ln 4.128 = 1.418

• Logit model for these data:

logit(p) = 0.241 + 1.418 x

Page 39: Logit model, logistic regression, and log-linear model A comparison

Logistic regression

• For males:

• For females:

• 84% of females find a job within a year against 56% of males

0.56 0)]*1.418 (0.241 exp[- 1

1 p

0.84 1)]*1.418 - (0.241 exp[- 1

1 p

Page 40: Logit model, logistic regression, and log-linear model A comparison

Confidence interval

• S.e. saturated model:– s.e. of a [0.2412] =

– s.e. of b [1.417] =

0.2849 22

1

28

1

0.4796 8

1

42

1

22

1

28

1

Page 41: Logit model, logistic regression, and log-linear model A comparison

Confidence interval

• S.e. null model:– s.e. of ln[0.7/(1-0.7)]

= s.e. of 0.8473 =

• Conf. Interval: 0.8473 +/- 1.96 * 0.2180

(0.420, 1.275) on logit scale

or (0.603, 0.782) on probability scale

• The p for males and females are significantly different

0.2180 30

1

70

1

Page 42: Logit model, logistic regression, and log-linear model A comparison

SPSS output: logistic regression

Parameters of logistic regression

Variable B S.E. Wald df Sig (p-value) R

SEX(1) -1.4168 0.4795 8.7297 1 0.0031 -0.2347Constant -0.2412 0.2849 0.7165 1 0.3973

p = probability that duration of search is more than one year

Simple coding (SPSS): reference categories:

• Dependent variable: timing: early

• Factor: sex: males

Parameters

Page 43: Logit model, logistic regression, and log-linear model A comparison

SPSS output: logistic regression

Parameters of logistic regression

p = probability that duration of search is more than one year

Deviation coding (SPSS):

• Dependent variable: timing: early

• Factor: females (-1); males (+1)

ParametersVariable B S.E. Wald df Sig (p-value) R

SEX(1) -0.7084 0.2398 8.7297 1 0.0031 -0.2347Constant -0.9496 0.2398 15.6849 1 0.0001

Page 44: Logit model, logistic regression, and log-linear model A comparison

SPSS and GLIM: a comparison

TIMING2 * SEX Crosstabulation

Count

135 74 209

143 178 321

278 252 530

Early

Late

TIMING2

Total

Females Males

SEX

Total

Page 45: Logit model, logistic regression, and log-linear model A comparison

SPSS: UNSATURATED LOG-LINEAR MODEL: Parameter Estimates

Asymptotic 95% CIParameter Estimate SE Z-value Lower Upper

1 5.0280 .0721 69.75 4.89 5.17 TIMI(1)2 .0982 .0870 1.13 -.07 .27 3 .0000 . . . . SEX(1) 4 -.4291 .0889 -4.83 -.60 -.25 5 .0000 . . . .

GLIM: UNSATURATED LOG-LINEAR MODEL

estimate s.e. parameter [o] 1 4.697 0.08058 1 [o] 2 0.4291 0.08887 TIMI(2) [o] 3 -0.09819 0.08697 SEX(2) [o] scale parameter taken as 1.000

Page 46: Logit model, logistic regression, and log-linear model A comparison

SPSS: SATURATED MODEL

Asymptotic 95% CIParameter Estimate SE Z-value Lower Upper

1 5.1846 .0748 69.27 5.04 5.33TIMI(1) 2 -.2183 .1121 -1.95 -.44 1.497E-03 3 .0000 . . . .SEX(1) 4 -.8738 .1379 -6.33 -1.14 -.60 5 .0000 . . . .TIMI*SEX6 .8164 .1827 4.47 .46 1.17 7 .0000 . . . . 8 .0000 . . . . 9 .0000 . . .

GLIM: SATURATED MODEL

d e$ [o] estimate s.e. parameter [o] 1 4.905 0.08607 1 [o] 2 0.05757 0.1200 TIMI(2) [o] 3 -0.6012 0.1446 SEX(2) [o] 4 0.8201 0.1831 TIMI(2).SEX(2) [o] scale parameter taken as 1.000