multivariate probit

1

Multivariate Probit

An Analysis of Access to Amenities in Delhi’s Slums

2

Example: Coalmining and Respiratory Symptoms Ashford and Sowden, Biometrics, 1970 Is there a relationship? Best model? Standard approach: two probit equations

1. Wheezing and years in mine (age)

2. Breathlessness and years in mine (age)

3. Does this approach overlook anything?

3

Coalmining and Respiratory Symptom Each physiological system has a certain

tolerance: tolerance vector Ashford and Sowden: ignoring important

information if you estimate equations separately Model in spirit of Seemingly Unrelated Regression

(SUR)

4

Multivariate Probit (MVP)

Extension of univariate probit (UVP) model Allows for:

1. Simultaneous estimation of multiple probit equations

2. Correlated disturbances across equations

5

MVP vs. UVP

How is it better? Does not ignore information across equations Better prediction of conditional and joint

probabilities More consistent estimation More efficient estimation

6

Why MVP?

Access to amenities in Delhi slums Correlation between access to sanitation

services and access to drainage for a given household

Separate UVP estimation would ignore this

7

Where we’re going…

I. Univariate Probit

II. Bivariate Probit (i = 2)

III. Multivariate Probit (i = T)

IV. Delhi Slum Dwellers’ Access to Amenities

V. Published Applications and Extensions

8

Univariate Probit

Dichotomous dependent variable Takes on a value of either 0 or 1

Estimate with OLS?

9

Linear Probability Model

),()|1Pr( xx FY

),(1)|0Pr( xx FY

'),( xx F

'xY

10

Linear Probability Model

Shortcomings Cannot constrain probabilities to the 0-1

interval Negative variances Heteroscedasticity of ε that depends on X Logically not attractive

Solutions?

11

Normal Cumulative Distribution FunctionProperties Bounded by 0 and 1 Nonlinear relationship between P and X

12

Univariate Probit Model

)'()()|1Pr()|0*Pr('

xxxx

dttYy

y*: latent dependent variable

Y: binary dependent variable

x: vector of explanatory variables

t: standardized normal variable

φ: normal pdf

β: measures impact of changes in x

Φ: normal cdf

13

UVP Example

Y = 1: individual purchased refrigerator in last year

Y = 0: individual has not purchased refrigerator in last year

X = individual’s income per annum

)()|1Pr( XXY

14

UVP: Mechanics

1. Group the data by RHS variable (income)

2. Calculate Phati for each i grouping of income

3. For each Phati, use the standard normal cdf to find Ii

4. Add 5 to each Ii

5. Use OLS to estimate β1, β2 in:

iii uXI 21

15

Two probit equations

Y1 = 1: individual purchased refrigerator in last year, 0 otherwise

Y2 = 1: individual purchased dishwasher in last year, 0 otherwise

X = individual’s income per annum

17

BVP: Estimation Maximum Likelihood Bivariate Normal cdf:

212122211 ,),,(),Pr( dzdzzzxXxX

where φ2 represents the bivariate normal pdf and:

2/12

)1/()2)(2/1(

212 )1(2),,(

22

22

21

xxxxe

xx

18

BVP: Estimation

Probabilities that enter the likelihood function:

21* iiiqq

),,(),|,Pr( *212212211 iiiii wwyYyY xx

Φ2: bivariate normal cdf

2,1,' jqw jijijij x12,12 2211 iiii yqyq

19

BVP: Estimation

Function to be maximized:

n

iiii wwL

1212 ),,(lnlog *

20

BVP: Estimation in Practice

Simulated Maximum Likelihood

1. Markov chain Monte Carlo

2. GHK simulator Geweke-Hajivassiliou-Keane smooth recursive

conditioning simulator Greene (2003) discusses this in Appendix E Cappellari and Jenkins (2003)

21

BVP: Is it Necessary? H0: ρ = 0 (estimate independent probit

equations separately) Test statistic (Kiefer 1982):

n

iiiii

ii

n

iii

iiii

wwww

ww

ww

wwqq

LM

12211

221

2

121

2121

)()()()(

)()(

)()(

)()(

LM ~ χ2 with d.f. = (T)(T-1)/2 where T = # of equations

22

BVP: More Test Statistics

z-statistic:

Likelihood ratio:

j = number of restrictions

)(~ˆ 2

ˆ

js

z

)(~)ˆlnˆ(ln2 2 jLLLR UR

23

BVP: Properties of the Estimator Considers unobservable heterogeneity Random components of one equation are allowed to

be freely correlated with the random components of the other

Takes into account unobservable characteristics that might affect both dependent variables

More efficient and consistent than separate ML estimation of UVP models UVP does not account for the correlation between error

terms: assumes exogeneity of dep var covariates, so does not give consistent estimates of parameters (Maddala 1983)

24

BVP: Measure of Goodness of FitMcFadden’s likelihood ratio index (LRI):

lnL: maximized value of log-likelihood function for specification at hand

lnL0: maximized value of log-likelihood function calculated with only a constant term

Bounded by 0, 1, increases as fit improves

0ln/ln1 LLLRI

25

Multivariate Probit

iiiy '* x

yi = 1 if yi* > 0, 0 otherwise, i = 1,…,T

,0~',,1 MVNT

where

TTT

T

1

111

and ρii = 1, ρij = ρji

for i, j = 1, …, T

26

MVP Application: Delhi SlumsDelhi slum dwellers’ access to amenities

27

Delhi Slums

Tiebout sorting (Charles Tiebout 1956) Individuals sort themselves into

communities based on preferences of provisions of public goods

Assumptions1. Unlimited mobility

2. Unrestricted number of communities Implication: Heterogeneous preferences

28

Delhi Slums

Heterogeneity in community composition Impact on economic outcomes

1. Reduced participation to secure community grants in US (Vigdor 2004)

2. Decreased maintenance of infrastructure projects in Pakistan (Khwaja 2001)

3. Less spending on education, sewers, roads in US (Alesina et al 1999)

4. Slower growth in Sub-Saharan Africa (Easterly and Levine 2003)

Channels?

29

Delhi Slums: Model (Alesina et al 1999)Model:

g*: amount of public good provided in equilibrium : median distance from the type of public good most

preferred by the median voter α: parameter from individual’s utility function (0<α<1) Punchline: g* and are inversely related:

)1/(1* ˆ1

milg

mil̂

mil̂

1/

*ˆ1

1ˆm

imi

ll

g

30

Delhi Slums: H0 and H1

H0: Public goods provision is not affected by the degree to which preferences are polarized

H1:Public goods provision is negatively affected by polarization of preferences

31

Delhi Slums: Data

Variable name Mean Median Min. Max. Std. dev.

Religious fractionalization 0.22 0.24 0 0.43 0.11Caste fractionalization 0.68 0.69 0.44 0.73 0.04Access to medical facility (DV) 0.55 1 0 1 0.50Access to sanitation services (DV) 0.70 1 0 1 0.46Access to drainage system (DV) 0.79 1 0 1 0.41Household PC income (monthly in Rupees) 841 686 186 5947 616Participation in political process (DV) 0.78 1 0 1 0.42Years in community of head of HH 16 16 0 84 0.42Mean to median income ratio 1.14 1.14 0.98 1.24 0.08

TABLE ISummary Statistics of Slum Data

32

Delhi Slums: Public Goods

Provision of public goods is a latent variable Proxy with access to public goods

1. Medical facilities (MED)

2. Sanitation services (SAN)

3. Drainage (DRA)

33

Delhi Slums: FractionalizationProxies for Fractionalization Religion

1. Hindu

2. Muslim

3. Sikh Caste

1. Backward castes and tribes

2. Scheduled castes and tribes

3. General Hindu

4. Muslim, Sikh, other

34

Delhi Slums: Econometric Model

: amount of public good a (latent) accessible by slum dweller i

Map to the observed realizations: 1 represents access, 0 otherwise

if , 0 otherwise Lose information

aaay '* X DSMa ,,

1aiy 0* aiy

*aiy

35

Delhi Slums: Econometric ModelAssumption:

where

UVP:

,0~',, MVNDSM

1

1

1

SDMD

SDMS

MDMS

0 SDMDMS

36

Delhi Slums: Econometric ModelX vector: Religious fractionalization (frd) Caste fractionalization (fcd) Per capita household income (pcinc) Education dummies (edu1, edu2, edu3) Mean-to-median income ratio (mminrat) Poverty dummy (poor) Political participation dummy (political) Years in community (yrincomm)

Proxies for lhatm

37

RHS variables MVP UVP MVP UVP MVP UVP

REL -1.795 -1.712 -5.637 -5.03 -1.596 -1.464(1.38) (1.28) (3.45)** (3.11)** (1.15) (0.96)

CASTE -5.615 -5.546 16.345 15.702 9.837 9.104(1.87) (1.80) (3.49)** (3.36)** (3.36)** (2.81)**

PCINC 0 0 0 0 0 0(0.34) (0.67) (0.11) (0.16) (0.33) (0.20)

EDU1 0.206 0.238 0.225 0.278 0.186 0.158(0.89) (1.02) (0.97) (1.19) (0.81) (0.65)

EDU2 -0.044 -0.034 0.835 0.891 0.818 0.707(0.15) (0.12) (2.38)* (2.50)* (2.19)* (1.86)

EDU3 0.221 0.256 0.273 0.2 3.66(0.43) (0.49) (0.51) (0.37) -0.03

MMINRAT 4.487 4.545 2.558 2.456 0.297 0.487(3.47)** (3.47)** (1.65) (1.57) (0.20) (0.31)

POOR -0.246 -0.241 0.322 0.336 0.563 0.453(1.13) (1.09) (1.34) (1.36) (2.10)* (1.65)

POLITICAL 0.537 0.532 0.258 0.219 0.186 0.317(2.30)* (2.26)* (1.10) (0.93) (0.77) (1.27)

YRINCOMM 0.011 0.016 0.01 0.011 0.004 -0.001(1.04) (1.41) (1.01) (1.07) (0.35) (0.07)

Constant -1.391 -1.69 -13.068 -12.607 -6.416 -6.167(0.55) (0.64) (3.19)** (3.09)** (2.37)* (2.09)*

Coeff. z-stat.ρ MS 0.42 (3.81)**ρ MD -0.25 (1.94)ρ SD 0.52 (4.92)**

Absolute value of z-statistics in parentheses* significant at 5%; ** significant at 1%tedu3 dropped b/c it predicted success perfectly

Correlation Coefficients

TABLE II

n/at

Medical Facilities Sanitation Services Drainage

Dependent Variables are Access to the Specified Amenity

38

Delhi Slums: MVP vs. UVP

Is MVP necessary? H0: ρMS = ρMD = ρSD = 0?

Stata reports the LR test statistic = 37.78 ~ χ2(3), so reject H0

Yes, MVP is an improvement on UVP Not ignore information contained in covariance matrix

goodness of fit: 137.033.350

32.3021

LRI

39

Future Research

Dependent variable More direct measure for spendingPanel data Changes in income v changes in

fractionalizationSemiparametric and nonparametric techniques Horowitz and Savin (2001)

1. Single-index modeling2. Median regression approach

40

MVP in Practice

Ashford and Sowden (1970)

Zhao, X. and M. Harris (2004) “Demand for Marijuana, Alcohol and Tobacco: Participation, Levels of Consumption and Cross-equation Correlations,” The Economic Record, 80(251): 394-410.

Greene, W. (1998) “Gender Economics Courses in Liberal Arts Colleges: Further Results,” The Journal of Economic Education, 29(4): 291-300.

Christofides, L., T. Stengos, and R. Swidinsky (1997) “Welfare Participation and Labour Market Behavior in Canada,” The Canadian Journal of Economics, 30(3): 595-621.

41

ReferencesAlesina, A., R. Baqir, and W. Easterly (1999) “Public Goods and

Ethnic Divisions”, The Quarterly Journal of Economics, 114(4): 1243-1284.

Ashford, J.R. and R.R. Sowden (1970) “Multi-variate Probit Analysis,” Biometrics, September: 535-546.

Cappellari, L. and S.P. Jenkins (2003) “Multivariate probit regression using simulated maximum likelihood,” Stata Journal, 3(3): 221-235.

Easterly, W., and R. Levine (1997) “Africa’s Growth Tragedy: Policies and Ethnic Divisions,” Quarterly Journal of Economics, 112(4), 1203-1250.

Greene, W. (2003) Econometric Analysis (Fifth Edition), Delhi:

Pearson Education.

42

References (cont.)

Horowitz, J. and N.E. Savin (2001) “Binary Response Models: Logits, Probits and Semiparametrics,” Journal of Economic Perspectives, 15(4): 43-56.

Kiefer, N. (1982) “Testing for Dependence in Multivariate Probit Models,” Biometrika, 69(1): 161-166

Khwaja, A.I. (2001) “Can Good Projects Succeed in Bad Communities? Collective Action in the Himalayas,” John F. Kennedy School of Government Faculty Research Working Paper Series RWP01-043. URL: http://ssrn.com/abstract=295571

Maddala, G.S. (1983) Limited Dependent Variables in Econometrics, Cambridge: Cambridge University Press.

43

References (cont.)

Tiebout, C. (1956) “A Pure Theory of Local Expenditures,” Journal of Political Economy, 64(5): 416-424.

Vigdor, J. (2004) “Community Composition and Collective Action: Analyzing Initial Mail Response to the 2000 Census,” The Review of Economics and Statistics. 86(1): 303-312.

multivariate probit

Documents