laura magazzini - phdeconomics.sssup.it · laura magazzini (@univr.it) non-linear panel data...
TRANSCRIPT
Non-linear panel data modeling
Laura Magazzini
University of Verona
[email protected]://dse.univr.it/magazzini
May 2010
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 1 / 29
Binary models...
In many economic studies, the dependent variable is discrete
. car purchase, labor force participation, default on a loan, ...
Binary choice modeling: yit = 1 if the event happens for individual(household, firm, ...) i at time t, 0 otherwise
pit = Pr(yit = 1) = E (yit |xit) = F (x ′itβ)
For estimation: LPM, logit, probit
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 2 / 29
... in panel data
The presence of individual effects complicates matters significantly
. LPM also implies −x ′itβ ≤ ci ≤ 1− x ′itβ
In a latent variable framework (i = 1, ...,N; t = 1, ...,T )
y∗it = x ′itβ + ci + uit
with yit = 1 if y∗it > 0, yit = 0 otherwise
Therefore:
Pr(yit = 1) = Pr(y∗it > 0) = Pr(uit > −x ′itβ − ci )
= F (x ′itβ + ci ) due to the simmetry of logit and probit
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 3 / 29
FE and RE approach
Pr(yit = 1) = F (x ′itβ + ci )
RE approach: ci is assumed to be unrelated to xit. Stronger assumption than the linear case: also place restrictions on the
form of heterogeneity
FE approach: no assumption about the relationship between ci andxit
Modeling framework fraught with difficulties and unconventionalestimation problems
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 4 / 29
The incidental parameter problemNeyman and Scott (1948)
Pr(yit = 1) = F (x ′itβ + ci )
If you want to treat ci as a fixed parameter, then as N →∞, for fixedT , the number of parameters ci increases with N
This means that ci cannot be consistently estimated for fixed T
In the linear case the problem is solved using thewithin-transformation
. In the linear case the MLE of β and ci are asymptotically independent(Hsiao, 2003)
This is not possible in the non-linear case!
The inconsistency of ci is transmitted to β within a FE framework
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 5 / 29
The incidental parameter problemA simple example
Suppose yit ∼ N(ci , σ2)
MLE yields:
ci = yi and σ2 =
∑ni=1
∑tt=1(yit − y1)2
NT
E [σ2] = σ2(T − 1)/T , so σ2 is inconsistent for N →∞ for fixed T
. With T = 2, σ2 → 0.5σ2
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 6 / 29
Road map
Pooled probit
Random effect approach
Fixed effect approach
. How serious is the bias?
Alternative approach: max score estimator
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 7 / 29
Pooled probit or logit (1)Partial likelihood methods
Max Lik estimation: we assume that the parametric model for thedensity of y given x is correctly specified
Inference is made under the assumption that observations are i.i.d.,i.e. in case of a panel dataset, the likelihood should be written as
L(θ|y , x) =N∏i=1
T∏t=1
f (yit |xit ; θ)
This is not suited to the panel data case!
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 8 / 29
Pooled probit or logit (2)Partial likelihood methods
Suppose we have correctly specified the density of yt given xt :ft(yt |xt ; θ)
Define the partial log likelihood of each observation as
li (θ) =T∑t=1
ln ft(yt |xt ; θ)
The partial maximum likelihood estimator (PMLE) solves
maxθ∈Θ
N∑i=1
li (θ) = maxθ∈Θ
N∑i=1
T∑t=1
ln ft(yt |xt ; θ)
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 9 / 29
Dynamic completeness
A model is dynamic complete if once xt is conditioned on, neitherpast lags of yt nor elements of x from any other time period (past orfuture) appear in the conditional density of yt given xt
Quite a strong assumption: strict exogeneity + absence of dynamics
Pr(yit = 1|xit , yit−1, xit−1, ...) = Pr(yit = 1|xit)
Inference is considerably easier: all the usual statistics from a probitor logit that pools observations and treats the sample as a longindependent cross section of size NT are valid, including likelihoodratio statistics
We are not assuming independence across t
. For example, xit can contain lagged dependent variables
DC implies that the scores are serially uncorrelated across t (the keycondition for the standard inference procedures to be valid)
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 10 / 29
Testing dynamic completeness
(1) Add lagged dependent variable and possibly lagged explanatoryvariables
(2) Chi-square statistic:
Define uit = yit − F (x ′itβ)
Under DC, for each t: E [uit |xit , yit−1, xit−1, ...] = 0, i.e. uit isuncorrelated with any function of the variables (xit , yit−1, xit−1, ...)including uit−1
Let uit = yit − F (x ′itβ). A simple test is available by using pooleddata to estimate the artificial model
Pr(yit = 1|xit , uit−1) = F (x ′itβ + γuit−1)
The null hypothesis is H0: γ = 0
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 11 / 29
Random effect probit approach (1)y∗it = x ′itβ + εit with yit = 1y∗it>0
We let εit = ci + uit and assume:. Strict exogeneity assumption: Pr(yit = 1|xi , ci ) = Pr(yit = 1|xit , ci ). Independence between ci and xit. Normally distributed error components: ci ∼ N(0, σ2
c ) anduit ∼ N(0, σ2
u)
Since E (εitεis) = σ2c for t 6= s, the joint likelihood of (yi1, ..., yiT )
cannot be written as the product of the marginal likelihood of the yit
This complicates derivation of the max lik that now involvesT -dimensional integrals
Li = Pr(yi1, ..., yiT |x) =
∫...
∫f (εi1, ..., εiT )dεi1...dεiT
. Extreme of integration is (−∞,−x ′itβ) if yit = 0 and (−x ′itβ,+∞) ifyit = 1
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 12 / 29
Random effect probit approach (2)
Computation of the likelihood function is simplified if we consider thejoint density of εi and ci and then obtain the marginal density of εi
integrating out the individual effect:
f (εi1, ..., εiT , ci ) = f (εi1, ..., εiT |ci )f (ci )
Therefore: f (εi1, ..., εiT ) =∫f (εi1, ..., εiT |ci )f (ci )dci
Conditional on ci , εit are independent
f (εi1, ..., εiT ) =
∫ T∏t=1
f (εit |ci )f (ci )dci
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 13 / 29
Random effect probit approach (3)
This simplifies the computation of the likelihood. Key: lack of autocorrelation over time in uit. Allowing autocorrelation in uit : hallmark of simulation methods
(Hajivassiliou, 1984)
Li = Pr(yi1, ..., yiT |x) =
∫...
∫f (εi1, ..., εiT )dεi1...dεiT
=
∫...
∫ [∫ +∞
−∞
T∏t=1
f (εit |ci )f (ci )dci
]dεi1...dεiT
Ranges of integration are independent: exchange order of int.
=
∫ +∞
−∞
[∫...
∫ T∏t=1
f (εit |ci )dεi1...dεiT
]f (ci )dci
Conditioned on ci , the error terms are independent
=
∫ +∞
−∞
[T∏t=1
∫f (εit |ci )dεit
]f (ci )dci
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 14 / 29
Random effect probit approach (4)
Li =
∫ +∞
−∞
[T∏t=1
∫f (εit |ci )dεit
]f (ci )dci
=
∫ +∞
−∞
[T∏t=1
Pr(Yit = yit |x ′itβ + ci )
]f (ci )dci
We are left with one-dimensional integral!
Pr(Yit = yit |x ′itβ + ci ) = Φ(qit(x′itβ + ci )) with qit = 2yit − 1
Butler and Moffit (1982) proposes a procedure to approximate theintegral under normality of ci (Gaussian quadrature)
Alternatively, simulated-maximum likelihood methods
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 15 / 29
RE – What are we estimating?
Pr(yit = 1|xi , ci ) = Pr(yit = 1|xit , ci ) = Φ(x ′itβ + ci )
The interest is in average partial effect
E
[∂ Pr(yit = 1|xi , ci )
∂xit(j)
]=
βj√1 + σ2
c
φ
(x ′itβ√1 + σ2
c
)The traditional random effect probit model assumes ci |xi ∼ N(0, σ2
c ). As a result the composite error term of the latent equation has
variance 1 + σ2c
. Recall APE in the case of neglected heterogeneity (for a continuousxit(j))
. Therefore by pooled probit we can estimate βc = β/(1 + σ2c )1/2 and
APEIf we further assume independence of (yi1, ..., yiT ) conditional on(xi , ci ) we can separately estimate β and σ2
c. ρ = σ2
c/(1 + σ2c ): relative importance of the unobserved effect
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 16 / 29
RE – allowing for correlation between ci and xiChamberlain (1980)
Chamberlain (1980) allowed for correlation between ci and xi underthe assumption of conditional normal distribution with linearexpectation and constant variance:
ci |xi ∼ N(ψ + x′iξ, σ2α)
. The approach allows some dependence of ci on xi
. In its original formulation, all elements of xi are included in theconditional distribution
. The proposed formulation is more conservative on parameters
. Known as Chamberlain’s random effect probit model
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 17 / 29
Chamberlain’s random effect probit modelci |xi ∼ N(ψ + x′iξ, σ
2α)
We can write
y∗it = x ′itβ + ci + uit = x ′itβ + ψ + x′iξ + uit
Estimation is straightforward: we include xi among the regressor of aRE probit model
As in the linear case, it is not possible to estimate the effect oftime-invariant variables
Intuitively, we are adding xi as a control for unobserved heterogeneity
A test of the usual RE probit model is easily obtained as a test of H0:ξ = 0
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 18 / 29
RE & the strict exogeneity assumptionWooldridge, 2003 (pag. 490)
RE estimation relies on the strict exogeneity assumption
Correcting for an explanatory variable that is not strictly exogenous isquite difficult in nonlinear models (see Wooldridge, 2000)
It is however possible to test for strict exo:
. Let wit denote a variable suspected of failing the strict exogeneityrequirement (subset of xit)
. A simple test adds wit+1 as an additional set of covariates
. If strict exo holds, wit+1 should be insignificant
. If the test does not reject, it provides at least some justification for thestrict exo assumption
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 19 / 29
Fixed Effect approach
FE in non-linear model: unsolved problem in econometrics
Incidental parameter problem
If you force estimation (by including dummies), how serious is thebias?
. Consider a logit model with T = 2; one regressor with xi1 = 0 andxi2 = 1: plimβMLE = 2β (Hsiao, 2003)
. Simulation experiment by Greene (2004): MLE is biased even for largeT however it improves as T increases. The bias is 100% with T = 2;16% with T = 10 and 6.9% with T = 20 (N = 1000)
. Simulation experiment by Heckman and MaCurdy (1981), the bias isabout 10% (N = 100;T = 8)
. Trade-off between the virtue of FE and incidental parameter problem(Arellano, 2001)
The problem can be solved in the logit (and poisson) models
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 20 / 29
Conditional maximum likelihood estimation (1)
For the logit model, Chamberlain (1980) finds that∑T
t=1 yit is aminimal sufficient statistics for ci. Put it differently, conditioned on ni =
∑Tt=1 yit , the log-lik does not
contain ci , solving the incidental parameter problem
Consider the case T = 2
The conditional likelihood can be computed by looking atLc =
∏Ni=1 Pr(yi1, yi2|
∑2t=1 yit)
The sum yi1 + yi2 can be 0, 1, 2
. If yi1 + yi2 = 0, then yi1 = yi2 = 0: Pr(0, 0|sum = 0) = 1
. If yi1 + yi2 = 2, then yi1 = yi2 = 1: Pr(1, 1|sum = 2) = 1
. Only units where yi1 + yi2 = 1 will contribute to the log-lik
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 21 / 29
Conditional maximum likelihood estimation (2)
Pr(0, 1|sum = 1) =Pr(0, 1, sum = 1)
Pr(sum = 1)=
Pr(0, 1)
Pr(0, 1) + Pr(1, 0)
Therefore the conditional probability can be written in a form thatdoes not contain ci :
Pr(0, 1|1) =Pr1(0)× Pr2(1)
Pr1(0)× Pr2(1) + Pr1(1)× Pr2(0)
=
11+exp(x ′i1β+ci )
exp(x ′i2β+ci )
1+exp(x ′i2β+ci )
11+exp(x ′i1β+ci )
exp(x ′i2β+ci )
1+exp(x ′i2β+ci )+
exp(x ′i1β+ci )
1+exp(x ′i1β+ci )1
1+exp(x ′i2β+ci )
=exp(x ′i2β)
exp(x ′i1β) + exp(x ′i2β)=
exp[(xi2 − xi1)′β]
1 + exp[(xi2 − xi1)′β]
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 22 / 29
Conditional maximum likelihood estimation (3)
Analogously:
Pr(1, 0|1) =exp(x ′i1β)
exp(x ′i1β) + exp(x ′i2β)=
1
1 + exp[(xi2 − xi1)′β]
Standard logit package can be used for estimation
Only observations where yi1 + yi2 = 1 contribute to the likelihood
Easily generalized to T > 2
Test for individual heterogeneity by Hausman’s test comparingconditional MLE and the usual MLE ignoring the effects
. Conditional lik approach not available with probit
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 23 / 29
Max score estimator (MSE)Manski (1975, 1987)
It is possible to relax the logit assumption by generalizing the MSE topanel data
In cross-section let qi = 2yi − 1 and α a preset quantile
MSE is based on the fitting rule
maxS(β) =1
N
N∑i=1
[qi − (1− 2α)]sgn(x ′iβ)
If α = 1/2 then (1− 2α) and the MSE is computed as
maxS(β) =1
N
N∑i=1
qi sgn(x ′iβ)
It max the number of times the predictor x ′iβ has the same sign as qi(i.e. it max the number of correct predictions)
Identification condition: β′β = 1
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 24 / 29
Manski estimator with panel data
Manski allows for a strictly increasing distribution function whichdiffers across individuals, but not over time for the same individual
Strict exo is still needed (lagged dep vars are ruled out)
For T = 2, the identification of β is based on the fact that (underregularity conditions on the distribution of exogenous variables)
sgn[Pr(yi2 = 1|xi , ci )− Pr(yi1 = 1|xi , ci )] = sgn[(xi2 − xi1)′β]
For panel, MSE can be applied to the differences ∆yit on ∆xit
Exploit only the observations where yi1 6= yi2
Note that there is no likelihood, no information matrix, no s.e.:bootstrap can be employed for computing s.e.
No functional form for Pr(yit = 1), therefore no marginal effects
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 25 / 29
Overview of STATA commands
For probit, xtprobit only allows re approach
. There is no command for a conditional FE model, as there does notexist a sufficient statistic allowing the fixed effects to be conditionedout of the likelihood
. Estimation is slow because the likelihood function is calculated byadaptive Gauss-Hermite quadrature
. Computation time is roughly proportional to the number of points usedfor the quadrature; the default is intpoints(12)
. Use quadchk to check sensitivity of quadrature approximation
In the case of xtlogit, both re and fe options can be considered
. fe is conditional fixed-effect (also obtained by clogit)
. re estimates are obtained under the assumption of normality of ci
MSE not implemented (feasible up to 15 coeffs and 1,500-2,000observations)
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 26 / 29
Censored regression modelUnobserved effect Tobit model
y∗it = x ′itβ ∗ ci + uit
yit = max(0, y∗it)
uit |xi , ci ∼ N(0, σ2u)
Analogous treatment to the probit case
FE approach provides inconsistent estimates
RE – Need to specify the distribution of ci |xi ∼ N(0, σ2c )
Approximation is needed for solving the integral in the “probit” part
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 27 / 29
Count data and panel data models
Leading ref: Hausman, Hall, Griliches (1984) – developed fixed andrandom effect models under full distributional assumptions
Pooled Poisson QMLE
Conditional estimation of fixed effect models
. Sufficient statistics: ni =∑T
t=1 yit
Random effect approach (Gamma distribution assumed for ci )
. Recent advances in simulation methods allow ci ∼ N
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 28 / 29
Main references
Arellano M, Honore B (2001): “Panel Data Models: Some RecentDevelopments”, Handbook of Econometrics
Chamberlain G (1980): “Analysis of Covariance with QualitativeData”, Review of Economic Studies 47, 225–238
Chamberlain G (1984): “Panel Data”, in Griliches Intriligator (eds)Handbook of Econometrics, 1247–1318
Greene WH (2003): Econometric Analysis, ch.21
Baltagi BH (2008): Econometric Analysis of Panel Data (4th ed.),ch.11
Hajivassiliou VA (1984): “Estimation by Simulation of External DebtRepayment Problems”, Journal of Applied Econometrics 9, 109–132
Hsiao C (2003): Analysis of Panel Data
Wooldridge, JM (2002): Econometric Analysis of Cross Section andPanel Data, ch.15
Laura Magazzini (@univr.it) Non-linear panel data modeling May 2010 29 / 29