microeconometrics mect2 richard blundell ucl lecture 7 ...uctp39a/mect2 - 09... · richard blundell...

28
Microeconometrics MECT2 Richard Blundell UCL Lecture 7: Censored Data Models March 2009

Upload: others

Post on 22-Oct-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

  • MicroeconometricsMECT2

    Richard BlundellUCL

    Lecture 7: Censored Data Models

    March 2009

  • (i) Censored Data Models

    I Censored and truncated data

    Examples:

    earnings

    hours of work (mroz.dta is a good data set to play with)

    top coding of wealth

    expenditure on cars (this was James Tobin's original example which became

    know as Tobin's Probit model or the Tobit model.

    I Typical de�nitions:

    Censored data includes the censoring points

    Truncated data excludes the censoring points

  • I A mixture of discrete and continuous processes. In general we should model

    the process of censoring or truncation as a separate discrete mechanism. This is

    what is known as a `selectivity' model. To begin with we have a model in which the

    two processes are generated from the same underlying continuous latent variable

    model e.g. corner solution models in economics.

    y�i = x0i� + ui

    with

    yi =

    8>: y�i if y�i > 0

    0 otherwise

  • or

    yi =

    8>: y�i if ui > �xi�

    0 otherwise

    I Sometimes de�ne Di

    Di =

    8>: 1 if y�i > 0

    0 otherwise

    as in the Probit model.

  • The general speci�cation for the censored regression model is

    y�i = xi� + ui

    yi = maxf0; y�i g

    where y� is the unobservable underlying process (similar to what was used with

    discrete choice models) and y is the data observation.

    � When u are normally distributed - ujx s N (0; �2) - the model is called Tobin's

    Probit or the Tobit model.

    � Note that

    P (y > 0jx) = P (u > �x0�jx) = ��x0�

  • � Consider the moments of the truncated normal.

    I Assume w v N (0; �). Then wjw > c where c is an arbitrary constant, is a

    truncated normal.

    I The density function for the truncated normal is:

    f (wjw > c) = f (w)1� F (c)

    =1���w�

    �1� �

    �c�

    �� where f is the density function of w and F is the cumulative density function ofw.

  • IWe can now write

    E(wjw > c) =Z 1c

    wf (wjw > c)dw

    = ���c�

    �1� �

    �c�

    �� Applying this result to the regression model yields

    E(yjx; y > 0) = x0� + E(uju > �x0�) = x0� + ���x0��

    ���x0��

    �where �(w)=�(w) is called the Inverse Mills Ratio, usually represented by �(w).

    I Notice that, contrary to the discrete choice models, the variance of the residual

    plays a central role here: it determines the size of the partial effects.

  • OLS Bias I Truncated Data:

    I Suppose one uses only the positive observations to estimate the model and the

    unobservables are normally distributed. Then, we have seen that,

    E(yjx; y > 0) = x0� + ���x0�

    �where the last term is E(ujx; u > �x0�), which is generally non-zero.

    I A model of the form:

    y = x0� + �� + v

    would have E(vjx; y > 0) = 0:

    I This implies the inconsistency of OLS: omitted variable problem. Thus, the re-

    sulting error term will be correlated with x.

  • Censored Data:

    I Now suppose we use all observations, both positive and zero.

    I Under normality of the residual, we obtain,

    E(yjx) = ��x0�

    �x0� + ��

    �x0�

    �I Thus, once again the OLS estimates will be biased and inconsistent.

  • The Tobit Maximum Likelihood Estimator

    I Let f(yi; xi); i = 1; :::; Ng be a random sample of data on the model.

    The contribution to the likelihood of a zero observation is determined by,

    P (yi = 0jxi) = 1� ��x0i�

    �The contribution to the likelihood of a non-zero observation is determined by,

    f (yijxi) =1

    ��

    �yi � x0i��

    �which is not invariant to �.

    Thus, the overall contribution of observation i to the loglikelihood function is,

    ln li(xi; �; �) = 1(Di = 0) ln

    �1� �

    �x0i�

    ��+ 1(Di = 1) ln

    �1

    ��

    �yi � x0i��

    ��

  • and the sample loglikelihood is,

    lnLN(�; �) =NXi=1

    �(1�Di) ln

    �1� �

    �x0i�

    ��+Di

    �ln�

    �yi � x0i��

    �� ln�

    ��where D equals one when y = 1 and equals zero otherwise.

    I Notice that both � and � are separately identi�ed. Moreover, if D = 1 for all i, the

    ML and the OLS estimators will be the same.

    I FOC

    @ lnL@�

    =

    NXi=1

    1

    �2

    8>:Di(yi � x0i�)xi � (1�Di)���x0i��

    �1� �

    �x0i��

    �xi9>=>;

    @ lnL@�2

    =

    NXi=1

    8>:(1�Di)xi�� �

    �x0i��

    �2�2

    h1� �

    �x0i��

    �i +Di �(yi � x0i�)22�4

    � 12�2

    �9>=>;

  • Or write as:

    (1)@ lnL@�

    = �Xi20

    1

    �2��i1� �i

    xi +1

    �2

    Xi2+(yi � x0i�)xi

    (2)@ lnL@�2

    =1

    2�2

    Xi20

    xi��i1� �i

    +1

    2�4

    Xi2+(yi � x0i�)2 �

    N+2�2

    note that �0

    2�2x (1) + (2)!

    b�2 = 1N+

    Xi2+(yi � x0i�)2

    that is the positive observations only contribute to the estimation of �:

    I Also if we de�ne mi � E(y�i jyi) then we can write (1) as@ lnL@�

    = cNXi=1

    xi(mi � x0i�)

  • orNXi=1

    ximi =NXi=1

    xix0i�

    which de�nes an EM algorithm for the Tobit model. Note also that

    mi =

    8>: y� if y�i > 0

    x0i� � ��i1��i otherwise

    again replacing y� with its best guess, given y; when it is unobserved.

    I Using the Theorems 1 and 2 from Lecture 6, MLE of � and �2 is consistent and

    asymptotically normally distributed.

  • I The asymptotic covariance matrix is derived from the expected values of the 2nd

    partial derivatives of lnL,

    264 E @2 lnL@�2 E @2 lnL@�@�2: E @

    2 lnL@�2

    375 =264PNi=1 aixix0i PNi bixi

    :PN

    i=1 ci

    375where

    ai = �1

    �2

    �zi�i �

    �2i1� �i

    � �i�

    bi = �1

    2�3

    �z2i �i �

    zi�2i

    1� �i+ �i

    �ci = �

    1

    4�4

    �z3i �i + zi�i �

    zi2�2i

    1� �i� 2�i

    �and �i and �i are evaluated at zi = xi�=�.

  • LM or Score Test

    I Let the log likelihood be written

    lnL(�1; �2)

    where �1 is the set of parameters that are unrestricted under the null hypothesis and

    �2 are k2 restricted parameters under H0.

    H0 : �2 = 0

    H1 : �2 6= 0

    I e.g.

    y�i = x01i�1 + x

    02i�2 + ui with ui � N(0; �2):

    where �1 = (�01; �2)0 and �2 = �2:

  • @ lnL(�1; �2)@�

    =X @ ln li(�1; �2)

    @�

    or

    S (�) =X

    Si(�)

    I Let b� be the MLE under H0: Then1pNS(b�) �a N(0; H)

    therefore

    1

    NS(b�)0H�1S(b�) �a �2(k2)

  • In the Tobit model consider the case of H0 : �2 = 0

    @ lnL@�2

    =1

    �2

    Xi

    Di(yi � x0i�)x2i �1

    �2

    Xi

    (1�Di)�i�i1� �i

    x2i

    @ lnL@�2

    =1

    �2

    Xi

    e(1)i x2i

    where

    e(1)i = Di(yi � x0i�) + (1�Di)(�

    �i�i1� �i

    )

    is known as the �rst order `generalised' residual, which reduces to ui = yi � x0i�

    in the general linear model case.

  • This kind of Score or LM test can be extended to speci�cation tests for heteroskedas-

    ticity and for non-normality. Notice that is estimation under the alternative is avoided,

    at least in terms of the test statistic. If H0 is rejected then estimation under Ha is

    unavoidable.

    I Consider the normal distribution

    f (ui) =1

    �p2�exp

    ��12

    u2i�2

    �can be written in terms of log scores

    @ ln f (ui)

    @ui= �ui

    �2:

  • I A popular generalisation (Pearson family of distributions) is@ ln f (ui)

    @ui=

    �ui + c1�2i � c1ui + c2u2i

    where skedastsic function �2i = h(0 + 01zi); zi observable determinants of het-

    eroskedasticity.

    c1 6= 0! skewness

    c2 6= 0! kurtosis

    c1 = c2 = 0! Normal

    1 = 0! homoskedastic

  • I Can write out the loglikelihood with the Pearson family and take derivatives with

    respect to the c and parameters to �nd the LM or Score test. e.g.

    @ lnL@1

    = �Xi

    e(2)i zi

    where e(2)i is the second order generalised residual.

    I Also@ lnL@c2

    =1

    4�2

    Xi

    Di(u4i �

    Z 1�x0i�

    t4fdt)

    which is the 4th order generalised residual.

  • What if normality is rejected or not a good prior assumption anyway?

    Suppose we just assume symmetry:

    We can write the model as

    y�i = x0i� + ui; or

    yi = x0i� + u

    �i ; where

    u�i = max fui;�x0i�g

    We can de�ne the new residuals

    u��i = min fu�i ; x0i�g

    where the x0i� re�ects `upper' trimming. Drop observations where x0i� 6 0 as no

    symmetric trimming is possible here.

  • � Adapt least squares by replacing y byy�i = min fyi; 2x0i�g

    ! symmetrically censored least squares: Applying OLS for all i : xi� > 0yields consistent and asymptotically normal estimates: the error term now satis-�es E(u��jx) = 0.

    � Requires a symmetric distribution of the error term, u�, but no normality or ho-moskedasticity.

    � Estimation requires an iterative procedure. But can adapt EM algorithmb� = �X xix0i��1X ximiwith

    mi = minfyi; 2x0i�g

    � Monte-Carlo results.

  • Censored Least Absolute Deviations

    Assume: conditional median of ui is zero! median of yi is

    x0i�:1(x0i� > 0)

    CLAD minimises the absolute distance of yi from its median

    b�CLAD = argmin�

    Xjyi � x0i�:1(x0i� > 0)j

    Powell (1984) shows that b�CLAD is pN� consistent and asymptotically normal.I Can you think of a LAD estimator for the binary response case?

    I How does it relate to Maximum Score?

  • Endogenous Variables

    Consider the following (triangular) model

    y�1i = x01i� + y2i + u1i (1)

    y2i = z0i�2 + v2i (2)

    where y1i = y�1i1(y�1i > 0). z0i = (x01i; x02i): The x02i are the excluded `instruments' from

    the equation for y1: The �rst equation is a the `structural' equation of interest and

    the second equation is the `reduced form' for y2:

    I y2 is endogenous if u1 and v2 are correlated. If y1 was fully observed we could

    use IV.

  • Use the following othogonal decomposition for u1

    u1i = �v2i + �1i

    where E(�1ijv2i) = 0:

    I Note that y2 is uncorrelated with u1i conditional on v2: The variable v2 is some-

    times known as a control function.

    I Under the assumption that u1 and v2 are jointly normally distributed, u2 and � are

    uncorrelated by de�nition and � also follows a normal distribution.

  • Use this to de�ne the augmented model

    y�1i = x01i� + y2i + �v2i + �1i

    y2i = z0i�2 + v2i

    2-step Estimator:

    I Step 1: Estimate � by OLS and predict v2,

    bv2i = y2i � b�02ziI Step 2: use bv2i as a `control function' in the model for y�1 above and estimate byTobit or other consistent method.

  • An Exogeneity test

    The null of exogeneity in this model is analogous to

    H0 : � = 0

    A test of this null can be performed using a t-test.

    I Smith-Blundell (1986, Econometrica).

    I Speci�cally for the censored regression model (Tobit model).

  • I All this discussion of endogenous regressors follows for the binary choice (see

    Lecture 6 Notes) and other related models.

    For example, consider the following (triangular) binary choice model

    y�1i = x01i� + y2i + u1i (3)

    y2i = z0i�2 + v2i (4)

    where y1i = 1(y�1i > 0). z0i = (x01i; x02i): The x02i are the excluded `instruments' from

    the equation for y1: The �rst equation is a the `structural' equation of interest and

    the second equation is the `reduced form' for y2:

    I y2 is endogenous if u1 and v2 are correlated.

    I If y1 was fully observed we could use IV.

    I Blundell and Powell (2004) extend to semiparametric case.