new laura magazzini - santannapisa.it · 2014. 3. 1. · truncation: sample data are drawn from a...

43
Truncation and Censoring Laura Magazzini [email protected] Laura Magazzini (@univr.it) Truncation and Censoring 1 / 40

Upload: others

Post on 21-Oct-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

  • Truncation and Censoring

    Laura Magazzini

    [email protected]

    Laura Magazzini (@univr.it) Truncation and Censoring 1 / 40

  • Truncation and censoring

    Truncation and censoring

    Truncation: sample data are drawn from a subset of a largerpopulation of interest

    . Characteristic of the distribution from which the sample data are drawn

    . Example: studies of income based on incomes above or below thepoverty line (of limited usefulness for inference about the wholepopulation)

    Censoring: values of the dependent variable in a certain range are alltransformed to (or reported at) a single value

    . Defect in the sample data

    . Example: in studies of income, people below the poverty line arereported at the poverty line

    Truncation and censoring introduce similar distortion intoconventional statistical results

    Laura Magazzini (@univr.it) Truncation and Censoring 2 / 40

  • Truncation and censoring Truncation

    Truncation

    Aim: infer the caracteristics of a full population from a sample drawnfrom a restricted population

    . Example: characteristics of people with income above $100,000

    Let Y be a continous random variable with pdf f (y). The conditionaldistribution of y given y > a (a a constant) is:

    f (y |y > a) = f (y)Pr(y > a)

    In case of y normally distributed:

    f (y |y > a) =1σφ( x−µ

    σ

    )1− Φ(α)

    where α = a−µσ

    Laura Magazzini (@univr.it) Truncation and Censoring 3 / 40

  • Truncation and censoring Truncation

    Moments of truncated distributions

    E (Y |y < a) < E (Y )E (Y |y > a) > E (Y )V (Y |trunc .) < V (Y )

    Laura Magazzini (@univr.it) Truncation and Censoring 4 / 40

  • Truncation and censoring Truncation

    Moments of the truncated normal distribution

    Let y ∼ N(µ, σ2) and a constant

    E (y |truncation) = µ+ σλ(α)Var(y |truncation) = σ2[1− δ(α)]. α = (a− µ)/σ. φ(α) is the standard normal density. λ(α) is called inverse Mills ratio:

    λ(α) = φ(α)/[1− Φ(α)] if truncation is y > aλ(α) = −φ(α)/Φ(α) if truncation is y < a

    . δ(α) = λ(α)[λ(α)− α], where 0 < δ(α) < 1 for any α

    Laura Magazzini (@univr.it) Truncation and Censoring 5 / 40

  • Truncation and censoring Truncation

    Example: a truncated log-normal income distribution

    From New York Post (1987): “The typical upper affluent American...makes $142,000 per year... The people surveyed had householdincome of at least $100,000”. Does this tell us anything about the typical American?

    “... only 2 percent of Americans make the grade”. Degree of truncation in the sample: 98%. The $142,000 is probably quite far from the mean in the full population

    Assuming lognormally distributed income in the population (log ofincome has a normal distribution), the information can be employedto deduce the population mean incomeLet x = income and y = ln x

    E [y |y > log 100] = µ+ σφ(α)1− Φ(α)

    By substituting E [x ] = E [ey ] = eµ+σ2/2, we get E [x ] = $22, 087

    . 1987 Statistical Abstract of the US listed average household income ofabout $25, 000 (relatively good estimate based on little information!)

    Laura Magazzini (@univr.it) Truncation and Censoring 6 / 40

  • Truncation and censoring Truncation

    The truncated regression model

    y∗i = x′iβ + �i , with �i |xi ∼ N(0, σ2)

    Unit i is observed only if y∗i cross a threshold:

    yi =

    {n.a. if y∗i ≤ ay∗i if y

    ∗i > a

    E [yi |y∗i > a] = x ′iβ + σλ(αi ), with αi = (a− x ′iβ)/σ

    Laura Magazzini (@univr.it) Truncation and Censoring 7 / 40

  • Truncation and censoring Truncation

    OLS estimation

    OLS of y on x leads to inconsistent estimates

    . The model is yi |y∗i > a = E (yi |y∗i > a) + �i = x ′i β + σλ(αi ) + �i

    . By construction, the error term is heteroskedastic

    . Omitted variable bias (λi is not included in the regression)

    . In applications, it is usually found that the OLS estimates are biasedtoward zero: the marginal effect in the subpopulation is:

    ∂E [yi |y∗i > a]∂xi

    = β + σ(dλ(αi )/dαi )∂αi∂xi

    = ...

    = β(1− δ(αi ))

    – Since 0 < δ(αi ) < 1, the marginal effect in the subpopulation is lessthan the corresponding coefficient

    Laura Magazzini (@univr.it) Truncation and Censoring 8 / 40

  • Truncation and censoring Truncation

    Maximum likelihood estimation

    Under the normality assumption, MLE can be obtained that providesa consistent estimator

    . For each observation:

    f (yi |y∗i > a) =1σφ(

    yi−x′i βσ

    )1− Φ(αi )

    with αi =a−x′i β

    σ. The log-likelihood can be written as

    log L =N∑i=1

    log

    [σ−1φ

    (yi − x ′i β

    σ

    )]−

    N∑i=1

    log

    [1− Φ

    (a− x ′i βσ

    )]

    Laura Magazzini (@univr.it) Truncation and Censoring 9 / 40

  • Truncation and censoring Truncation

    Example: simulated dataIf y∗ is fully observed, OLS can be applied

    Laura Magazzini (@univr.it) Truncation and Censoring 10 / 40

  • Truncation and censoring Truncation

    Example: simulated dataHowever, only y∗ > a is included in the sample

    Laura Magazzini (@univr.it) Truncation and Censoring 11 / 40

  • Truncation and censoring Truncation

    Example: simulated dataOLS on the observed sample is biased

    Laura Magazzini (@univr.it) Truncation and Censoring 12 / 40

  • Truncation and censoring Truncation

    Example: simulated dataMLE (truncreg) allows to get a consistent estimate of β

    Laura Magazzini (@univr.it) Truncation and Censoring 13 / 40

  • Truncation and censoring Censored data

    Censored data

    Censored regression models generally apply when the variable to beexplained is partly continuous but has positive probability mass at oneor more points

    Assume there is a variable with quantitative meaning y∗ and we areinterested in E [y∗|x ]If y∗ and x were observed for everyone in the population: standardregression methods (ordinary or nonlinear least squares) can beapplied

    In the case of censored data, y∗ is not observable for part of thepopulation

    . Conventional regression methods fail to account for the qualitativedifference between limit (censored) and nonlimit (continuous)observations

    . Top coding / corner solution outcome

    Laura Magazzini (@univr.it) Truncation and Censoring 14 / 40

  • Truncation and censoring Censored data

    Top coding: exampleData generating process

    Let wealth∗ denote actual family wealth, measured in thousands ofdollars

    Suppose that wealth∗ follows the linear regression modelE [wealth∗|x ] = x ′βCensored data: we observe wealth only when wealth∗ > 200

    . When wealth∗ is smaller than 200 we know that it is, but we do notknow the actual value of wealth

    Therefore observed wealth can be written as

    wealth = max(wealth∗, 200)

    Laura Magazzini (@univr.it) Truncation and Censoring 15 / 40

  • Truncation and censoring Censored data

    Top coding: exampleEstimation of β

    We assume that wealth∗ given x has a homoskedastic normaldistribution

    wealth∗ = x ′β + �, �|x ∼ N(0, σ2)

    Recorded wealth is: wealth = max(wealth∗, 200) = max(x ′β + �, 200)

    β is estimated via maximum likelihood using a mixture of discrete andcontinuous distributions (details later...)

    Laura Magazzini (@univr.it) Truncation and Censoring 16 / 40

  • Truncation and censoring Censored data

    Example: seat demanded and ticket sold

    Laura Magazzini (@univr.it) Truncation and Censoring 17 / 40

  • Truncation and censoring Censored data

    The censored normal distribution

    y∗ ∼ N(µ, σ2)Observed data are censored in a = 0:{

    y = 0 if y∗ ≤ 0y = y∗ if y∗ > 0

    The distribution is a mixture of discrete and continuous distribution

    . If y∗ ≤ 0: f (y) = Pr(y = 0) = Pr(y∗ ≤ 0) = Φ(−µ/σ) = 1− Φ(µ/σ)

    . If y∗ > 0: f (y) = φ(y−µσ

    )E [y ] = 0× Pr(y = 0) + E [y |y > 0]× Pr(y > 0) = (µ+ σλ)Φ

    (0−µσ

    )with λ = φ/Φ

    Laura Magazzini (@univr.it) Truncation and Censoring 18 / 40

  • Truncation and censoring Censored data

    The censored regression modelTobit model (Tobin, 1958)

    Let y∗ be a continuous variable (latent variable):

    y∗i = x′iβ + �i ,

    where �|x ∼ N(0, σ2)The observed data y are

    yi = max(0, y∗i ) =

    {0 if y∗i ≤ 0y∗i if y

    ∗i > 0

    Why not OLS?

    Why not OLS on positive y∗?

    Laura Magazzini (@univr.it) Truncation and Censoring 19 / 40

  • Truncation and censoring Censored data

    MLE estimation

    As we assume �|x ∼ N(0, σ2), the likelihood function can be writtenThe distribution is a mixture of discrete and continuous distribution

    . A positive probability is assigned to the observations yi = 0:

    Pr(yi = 0|xi ) = Pr(y∗i ≤ 0|xi )= Pr(x ′i β + �i ≤ 0)= Pr(�i ≤ −x ′i β)= 1− Pr(�i < x ′i β)

    = 1− Φ(x ′i β

    σ

    )

    . For y∗i > 0: f (yi ) = φ(

    yi−x′i βσ

    )

    Laura Magazzini (@univr.it) Truncation and Censoring 20 / 40

  • Truncation and censoring Censored data

    MLE estimation

    The likelihood can be written as:

    L(β, σ2|y) =∏yi=0

    (1− Φ

    (x ′iβ

    σ

    )) ∏yi>0

    1

    σφ

    (yi − x ′iβ

    σ

    )

    =∏yi=0

    (1− Φ

    (x ′iβ

    σ

    )) ∏yi>0

    1√2πσ2

    e− 1

    2

    (yi−x

    ′i β

    σ

    )2

    In the case of censored data, β estimated from the Tobit model canbe employed to study the effect of x on E [y∗|x ]

    Laura Magazzini (@univr.it) Truncation and Censoring 21 / 40

  • Truncation and censoring Censored data

    Example: simulated dataIf y∗ is fully observed, OLS can be applied

    Laura Magazzini (@univr.it) Truncation and Censoring 22 / 40

  • Truncation and censoring Censored data

    Example: simulated dataHowever, if y∗ ≤ a, data are recorded as a

    Laura Magazzini (@univr.it) Truncation and Censoring 23 / 40

  • Truncation and censoring Censored data

    Example: simulated dataOLS on the observed sample is biased

    Laura Magazzini (@univr.it) Truncation and Censoring 24 / 40

  • Truncation and censoring Censored data

    Example: simulated dataMLE (tobit) allows to get a consistent estimate of β

    Laura Magazzini (@univr.it) Truncation and Censoring 25 / 40

  • Truncation and censoring Censored data

    Corner solution outcomes

    Still labeled “censored regression models”

    Pioneer work by Tobin (1958): household purchase of durable goods

    Let y be an observable choice or outcome describing some economicagent, such as an individual or a firm, with the followingcharacteristics: y takes on the value zero with positive probability butis a continuous random variable over strictly positive values

    . Examples: amount of life insurance coverage chosen by an individual,family contributions to an individual retirement account, and firmexpenditures on research and development

    . We can imagine economic agents solving an optimization problem, andfor some agents the optimal choice will be the corner solution, y = 0

    . The issue here is not data observability, rather individual behaviour

    . We are interested in features of the distribution of y given x , such asE [y |x ] and Pr(y = 0|x)

    Laura Magazzini (@univr.it) Truncation and Censoring 26 / 40

  • Truncation and censoring Censored data

    Marginal effect in the tobit model

    In the case of corner solution outcome, the estimated β are notsufficient since E [y |x ] and E [y |x , y > 0] depend on β in a non-linearway

    ∂E [yi |xi ]∂xi

    = Φ

    (x ′iβ

    σ

    ∂E [yi |xi ]∂xi

    = Pr(yi > 0)∂E [yi |xi ,yi>0]

    ∂xi+ E [yi |xi , yi > 0]∂ Pr[yi>0]∂xi

    A change in xi has two effects:

    (1) It affects the conditional mean of y∗i in the positive part of thedistribution

    (2) It affects the probability that the observation will fall in the positivepart of the distribution

    Laura Magazzini (@univr.it) Truncation and Censoring 27 / 40

  • Truncation and censoring Censored data

    Some issues in specification

    Heteroschedasticity

    . MLE is inconsistent

    . However the problem can be approached directly and σi considered inthe likelihood function instead of σ. Specification of a particular modelfor σi provides the empirical model for estimation

    Misspecification of Pr(y∗ < 0). In the tobit model, a variable that increases the probability of an

    observation being a non-limit observation also increases the mean ofthe variable

    - Example: loss due to fire in buildings

    . A more general model has been devised involving a decision equationand a regression equation for nonlimit observations

    Non-normality

    . MLE is inconsistent

    . Research is ongoing both on alternative estimators and on methods fortesting this type of misspecification

    Laura Magazzini (@univr.it) Truncation and Censoring 28 / 40

  • Truncation and censoring Sample selection

    Sample selection

    What if observation is driven by a different process?

    (1) Data observability

    . Saving function (in the population):saving = β0 + β1income + β2age + β3married + β4kids + u

    . Survey data only includes families whose household head was 45 yearsof age or older

    (2) Individual behaviour (Boyes, Hoffman, Low, 1989; Greene, 1992)

    . y1 = 1 if individual i defaults on a loan/credit card, 0 otherwise

    . y2 = 1 if individual i is granted a loan/credit card, 0 otherwise

    . For a given individual, y1 is not observed unless y2 equals 1

    Laura Magazzini (@univr.it) Truncation and Censoring 29 / 40

  • Truncation and censoring Sample selection

    Sample selection / incidental truncation

    Let y and z have a bivariate distribution with correlation ρ

    We are interested in the distribution of y given that another variablez exceeds a particular value

    . Intuition: if y and z are positively correlated then the truncation of zshould push the distribution of y to the right

    The truncated joint distribution is

    f (y , z |z > a) = f (y , z)Pr(z > a)

    To obtain the incidentally truncated marginal density of y , we shouldintegrate z out of this expression

    Laura Magazzini (@univr.it) Truncation and Censoring 30 / 40

  • Truncation and censoring Sample selection

    Moment of the incidentally truncated bivariate normaldistribution

    Let y and z have a bivariate normal distribution with means µy andµz , standard deviations σy and σz , and correlation ρ

    E [y |z > a] = µy + ρσyλ(αz)V [y |z > a] = σ2y [1− ρ2δ(αz)]. αz = (a− µz)/σz. λ(αz) = φ(αz)/[1− Φ(αz)]. δ(αz) = λ(αz)[λ(αz)− αz ]

    If the truncation is z < a, then λ(αz) = −φ(αz)/Φ(αz)

    Laura Magazzini (@univr.it) Truncation and Censoring 31 / 40

  • Truncation and censoring Sample selection

    Example: A model of labor supply

    Consider a population of women where only a subsample is engagedin market employment

    We are interested in identifying the determinants of the labor supplyfor all women

    A simple model of female labor supply consists of 2 equations

    (1) Wage equation: the difference between a person’s market wage and herreservation wage, as a function of characteristics such as age,education, number of children, ... plus unobservables

    (2) Hours equation: The desired number of labor hours supplied dependson the wage, home characteristics (e.g. presence of small children),marital status, ... plus unobservable

    Truncation: Equation 2 describes the desired hours, but an actualfigure is observed only if the individual is working, i.e. when themarket wage exceeds the reservation wage

    The hours variable is incidentally truncated

    Laura Magazzini (@univr.it) Truncation and Censoring 32 / 40

  • Truncation and censoring Sample selection

    Example: A model of labor supplyWhen OLS on the working sample?

    Assume working women are chosen randomly

    If the working subsample has similar endowments of characteristics(both obs. & unobs.) as the nonworking sample, OLS is an option

    BUT the decision to work is not random: the working andnonworking sample potentially have different characteristics

    . When the relationship is purely trough observables, appropriateconditioning variables can be included in the relevant equation

    . If unobservable characteristics affecting the work decision are correlatedwith the unobservable characteristics affecting wage, then a relationshipis determined that cannot be tackle by including appropriate controls

    . A bias is induced due to “sample selection”

    Laura Magazzini (@univr.it) Truncation and Censoring 33 / 40

  • Truncation and censoring Sample selection

    Regression in a model of selection (1)

    Equation that determines sample selection

    z∗i = w′i γ + ui

    The equation of primary interest is

    yi = x′iβ + �i

    where yi is observed only when z∗i is greater than zero (otherwise

    data are not available)

    . This model is closely related to the Tobit model, although it is lessrestrictive: the parameters explaining the censoring are not constrainedto equal those explaining the variation in the observed dependentvariable. For this reason the model is also known as Tobit type two.

    Laura Magazzini (@univr.it) Truncation and Censoring 34 / 40

  • Truncation and censoring Sample selection

    Regression in a model of selection (2)

    If ui and �i have a bivariate normal distribution with zero mean andcorrelation ρ,

    E [yi |yi is observed] = E [yi |z∗i > 0]= E [yi |ui > −w ′i γ]= x ′iβ + E [�i |ui > −w ′i γ]= x ′iβ + ρσ�λi (αu)

    where αz = −w ′i γ/σu and λ(αu) = φ(αu)/Φ(αu)So, the regression model can be written as

    yi |z∗i > 0 = E [yi |z∗i > 0] + υi= x ′iβ + ρσ�λi (αu) + υi

    Laura Magazzini (@univr.it) Truncation and Censoring 35 / 40

  • Truncation and censoring Sample selection

    Regression in a model of selection (3)

    E [yi |z∗i > 0] = x ′iβ + ρσ�λi (αu)

    OLS regression using the observed data will lead to inconsistentestimates (omitted variable bias)The marginal effect of the regressors on yi in the observed sampleconsists of two components:. Direct effect on the mean of yi (β). In addition, if the variable appears in the probability that z∗i is positive,

    then it will influence yi through its presence in λi

    ∂E [yi |z∗i > 0]∂xik

    = βk + γk

    (ρσ�σu

    )δi (αu)

    Most often z∗i is not observed, rather we can infer its sign but not itsmagnitude. Since there is no information on the scale of z∗, the disturbance

    variance in the selection equation cannot be estimated (we let σ2u = 1)

    Laura Magazzini (@univr.it) Truncation and Censoring 36 / 40

  • Truncation and censoring Sample selection

    Regression in a model of selection (4)

    Selection mechanisms

    z∗i = w′i γ + ui ,

    where we observe zi = 1 if z∗i > 0 and 0 otherwise.

    . Pr(zi = 1|wi ) = Φ(w ′i γ)

    . Pr(zi = 0|wi ) = 1− Φ(w ′i γ)

    Regression modelyi = x

    ′iβ + �i ,

    where yi is observed only when zi is equal to one (otherwise data arenot available)

    . (ui , �i ) ∼ bivariate normal[0, 0, 1, σ�, ρ]

    Laura Magazzini (@univr.it) Truncation and Censoring 37 / 40

  • Truncation and censoring Sample selection

    Estimation

    Least squares using the observed data produces incosistent estimatesof β (omitted variable)

    Least squares regression of y on x and λ would be a consistentestimator

    . However, even if λi were observed, OLS would be inefficient: υi areheteroskedastic

    Maximum likelihood estimation can be applied

    Heckman (1979) proposed a two-step procedure

    Laura Magazzini (@univr.it) Truncation and Censoring 38 / 40

  • Truncation and censoring Sample selection

    Maximum likelihood estimation

    The log likelihood for observation i , log Li = li , can be written as:

    . If yi is not observedli = log Φ(−w ′i γ)

    . If yi is observed

    li = log Φ

    (w ′i γ + (yi − x ′i β)ρ/σ�√

    1− ρ2

    )− 1

    2

    (yi − x ′i βσ�

    )− log(

    √2πσ�)

    σ� and ρ are not directly estimated (they have to be greater than 0)

    Directly estimated are log σ� and atanhρ:

    atanhρ =1

    2log

    (1 + ρ

    1− ρ

    )Estimation would be simplified if ρ = 0

    Laura Magazzini (@univr.it) Truncation and Censoring 39 / 40

  • Truncation and censoring Sample selection

    Two-step procedureHeckman (1979)

    yi |z∗i > 0 = E [yi |z∗i > 0] + υi= x ′iβ + ρσ�λi (αu) + υi

    1 Estimate the probit equation by MLE to obtain estimates of γ. Foreach observation in the selected sample, compute λ̂i (inverse Millsratio)

    2 Estimate β and βλ = ρσ� by least squares regression of y on x and λ̂

    Laura Magazzini (@univr.it) Truncation and Censoring 40 / 40

  • Truncation and censoring Sample selection

    Estimators of the variance and standard errors

    Second step standard errors need to be adjusted to account for thefirst step estimation

    The estimation of σ� needs to be adjusted:

    . At each observation, the true conditional variance of the disturbancewould be σ2i = σ

    2� (1− ρ2δi )

    . A consistent estimator of σ2� is given by:

    σ̂2� =1

    ne′e + ˆ̄δb2λ

    To test hypothesis, an estimate of the asymptotic covariance matrixof the coefficients (including βλ) is needed

    . Two problems arise: (1) the disturbance terms υi is heteroskedastic;(2) there are unknown parameters in λi

    . Formulas are rather cumbersome, but can be calculated using thematrix of independent variables, the sample estimates of σ2� and ρ, andthe assumed known values of λi and δi

    Laura Magazzini (@univr.it) Truncation and Censoring 41 / 40

  • Truncation and censoring Sample selection

    Two-step procedureDiscussion

    Identification: exclusion restriction

    . Although the inverse Mills ration is non linear in the single index w ′i γ,the function mapping this index into the inverse Mills ratio is linear forcertain ranges of the index

    . Accordingly, the inclusion of additional variables in wi in the first stepcan be important for identification of the second step estimates

    . In real world, there are few cadidates for simultaneous inclusion in wiand exclusion from xi

    Inclusion of the inverse Mills ratio into the equation of interest isdriven by the normality assumption

    . Recent research includes specific attempts to move away from thenormality assumption:

    yi |z∗i > 0 = x ′i β + µ(w ′i γ) + υi

    where µ(w ′i γ) is called “selectivity correction”

    Laura Magazzini (@univr.it) Truncation and Censoring 42 / 40

  • Truncation and censoring Sample selection

    Selection in qualitative response models

    The problem of sample selection has been modeled in other settingsbesides the linear regression model

    Binary choice model have been considered, but also count datamodels

    For example in the case of the Poisson model:

    . yi |�i ∼ Poisson(λi )

    . log λi = x′i β + �i

    . (yi , xi ) are only observed when zi = 1, where z∗i = w

    ′i γ + ui and zi = 1

    if z∗i > 0, 0 otherwise. Assume that (�i , ui ) have a bivariate normal distribution with non-zero

    correlation. Selection affects the mean (and the variance) of yi and, in the observed

    data, yi no longer has a Poisson distribution

    Laura Magazzini (@univr.it) Truncation and Censoring 43 / 40

    Truncation and censoringTruncationCensored dataSample selection