new laura magazzini - santannapisa.it · 2014. 3. 1. · truncation: sample data are drawn from a...

Truncation and Censoring

Laura Magazzini

[email protected]

Laura Magazzini (@univr.it) Truncation and Censoring 1 / 40

Truncation and censoring

Truncation and censoring

Truncation: sample data are drawn from a subset of a largerpopulation of interest

. Characteristic of the distribution from which the sample data are drawn

. Example: studies of income based on incomes above or below thepoverty line (of limited usefulness for inference about the wholepopulation)

Censoring: values of the dependent variable in a certain range are alltransformed to (or reported at) a single value

. Defect in the sample data

. Example: in studies of income, people below the poverty line arereported at the poverty line

Truncation and censoring introduce similar distortion intoconventional statistical results


Truncation and censoring Truncation

Truncation

Aim: infer the caracteristics of a full population from a sample drawnfrom a restricted population

. Example: characteristics of people with income above $100,000

Let Y be a continous random variable with pdf f (y). The conditionaldistribution of y given y > a (a a constant) is:

f (y |y > a) = f (y)Pr(y > a)

In case of y normally distributed:

f (y |y > a) =1σφ( x−µ

σ

)1− Φ(α)

where α = a−µσ



Moments of truncated distributions

E (Y |y < a) < E (Y )E (Y |y > a) > E (Y )V (Y |trunc .) < V (Y )



Moments of the truncated normal distribution

Let y ∼ N(µ, σ2) and a constant

E (y |truncation) = µ+ σλ(α)Var(y |truncation) = σ2[1− δ(α)]. α = (a− µ)/σ. φ(α) is the standard normal density. λ(α) is called inverse Mills ratio:

λ(α) = φ(α)/[1− Φ(α)] if truncation is y > aλ(α) = −φ(α)/Φ(α) if truncation is y < a

. δ(α) = λ(α)[λ(α)− α], where 0 < δ(α) < 1 for any α



Example: a truncated log-normal income distribution

From New York Post (1987): “The typical upper affluent American...makes $142,000 per year... The people surveyed had householdincome of at least $100,000”. Does this tell us anything about the typical American?

“... only 2 percent of Americans make the grade”. Degree of truncation in the sample: 98%. The $142,000 is probably quite far from the mean in the full population

Assuming lognormally distributed income in the population (log ofincome has a normal distribution), the information can be employedto deduce the population mean incomeLet x = income and y = ln x

E [y |y > log 100] = µ+ σφ(α)1− Φ(α)

By substituting E [x ] = E [ey ] = eµ+σ2/2, we get E [x ] = $22, 087

. 1987 Statistical Abstract of the US listed average household income ofabout $25, 000 (relatively good estimate based on little information!)



The truncated regression model

y∗i = x′iβ + �i , with �i |xi ∼ N(0, σ2)

Unit i is observed only if y∗i cross a threshold:

yi =

{n.a. if y∗i ≤ ay∗i if y

∗i > a

E [yi |y∗i > a] = x ′iβ + σλ(αi ), with αi = (a− x ′iβ)/σ



OLS estimation

OLS of y on x leads to inconsistent estimates

. The model is yi |y∗i > a = E (yi |y∗i > a) + �i = x ′i β + σλ(αi ) + �i

. By construction, the error term is heteroskedastic

. Omitted variable bias (λi is not included in the regression)

. In applications, it is usually found that the OLS estimates are biasedtoward zero: the marginal effect in the subpopulation is:

∂E [yi |y∗i > a]∂xi

= β + σ(dλ(αi )/dαi )∂αi∂xi

= ...

= β(1− δ(αi ))

– Since 0 < δ(αi ) < 1, the marginal effect in the subpopulation is lessthan the corresponding coefficient



Maximum likelihood estimation

Under the normality assumption, MLE can be obtained that providesa consistent estimator

. For each observation:

f (yi |y∗i > a) =1σφ(

yi−x′i βσ

)1− Φ(αi )

with αi =a−x′i β

σ. The log-likelihood can be written as

log L =N∑i=1

log

[σ−1φ

(yi − x ′i β

σ

)]−

N∑i=1

log

[1− Φ

(a− x ′i βσ

)]



Example: simulated dataIf y∗ is fully observed, OLS can be applied



Example: simulated dataHowever, only y∗ > a is included in the sample



Example: simulated dataOLS on the observed sample is biased



Example: simulated dataMLE (truncreg) allows to get a consistent estimate of β


Truncation and censoring Censored data

Censored data

Censored regression models generally apply when the variable to beexplained is partly continuous but has positive probability mass at oneor more points

Assume there is a variable with quantitative meaning y∗ and we areinterested in E [y∗|x ]If y∗ and x were observed for everyone in the population: standardregression methods (ordinary or nonlinear least squares) can beapplied

In the case of censored data, y∗ is not observable for part of thepopulation

. Conventional regression methods fail to account for the qualitativedifference between limit (censored) and nonlimit (continuous)observations

. Top coding / corner solution outcome



Top coding: exampleData generating process

Let wealth∗ denote actual family wealth, measured in thousands ofdollars

Suppose that wealth∗ follows the linear regression modelE [wealth∗|x ] = x ′βCensored data: we observe wealth only when wealth∗ > 200

. When wealth∗ is smaller than 200 we know that it is, but we do notknow the actual value of wealth

Therefore observed wealth can be written as

wealth = max(wealth∗, 200)



Top coding: exampleEstimation of β

We assume that wealth∗ given x has a homoskedastic normaldistribution

wealth∗ = x ′β + �, �|x ∼ N(0, σ2)

Recorded wealth is: wealth = max(wealth∗, 200) = max(x ′β + �, 200)

β is estimated via maximum likelihood using a mixture of discrete andcontinuous distributions (details later...)



Example: seat demanded and ticket sold



The censored normal distribution

y∗ ∼ N(µ, σ2)Observed data are censored in a = 0:{

y = 0 if y∗ ≤ 0y = y∗ if y∗ > 0

The distribution is a mixture of discrete and continuous distribution

. If y∗ ≤ 0: f (y) = Pr(y = 0) = Pr(y∗ ≤ 0) = Φ(−µ/σ) = 1− Φ(µ/σ)

. If y∗ > 0: f (y) = φ(y−µσ

)E [y ] = 0× Pr(y = 0) + E [y |y > 0]× Pr(y > 0) = (µ+ σλ)Φ

(0−µσ

)with λ = φ/Φ



The censored regression modelTobit model (Tobin, 1958)

Let y∗ be a continuous variable (latent variable):

y∗i = x′iβ + �i ,

where �|x ∼ N(0, σ2)The observed data y are

yi = max(0, y∗i ) =

{0 if y∗i ≤ 0y∗i if y

∗i > 0

Why not OLS?

Why not OLS on positive y∗?



MLE estimation

As we assume �|x ∼ N(0, σ2), the likelihood function can be writtenThe distribution is a mixture of discrete and continuous distribution

. A positive probability is assigned to the observations yi = 0:

Pr(yi = 0|xi ) = Pr(y∗i ≤ 0|xi )= Pr(x ′i β + �i ≤ 0)= Pr(�i ≤ −x ′i β)= 1− Pr(�i < x ′i β)

= 1− Φ(x ′i β

σ

)

. For y∗i > 0: f (yi ) = φ(

yi−x′i βσ

)



MLE estimation

The likelihood can be written as:

L(β, σ2|y) =∏yi=0

(1− Φ

(x ′iβ

σ

)) ∏yi>0

1

σφ

(yi − x ′iβ

σ

)

=∏yi=0

(1− Φ

(x ′iβ

σ

)) ∏yi>0

1√2πσ2

e− 1

2

(yi−x

′i β

σ

)2

In the case of censored data, β estimated from the Tobit model canbe employed to study the effect of x on E [y∗|x ]



Example: simulated dataIf y∗ is fully observed, OLS can be applied



Example: simulated dataHowever, if y∗ ≤ a, data are recorded as a



Example: simulated dataOLS on the observed sample is biased



Example: simulated dataMLE (tobit) allows to get a consistent estimate of β



Corner solution outcomes

Still labeled “censored regression models”

Pioneer work by Tobin (1958): household purchase of durable goods

Let y be an observable choice or outcome describing some economicagent, such as an individual or a firm, with the followingcharacteristics: y takes on the value zero with positive probability butis a continuous random variable over strictly positive values

. Examples: amount of life insurance coverage chosen by an individual,family contributions to an individual retirement account, and firmexpenditures on research and development

. We can imagine economic agents solving an optimization problem, andfor some agents the optimal choice will be the corner solution, y = 0

. The issue here is not data observability, rather individual behaviour

. We are interested in features of the distribution of y given x , such asE [y |x ] and Pr(y = 0|x)



Marginal effect in the tobit model

In the case of corner solution outcome, the estimated β are notsufficient since E [y |x ] and E [y |x , y > 0] depend on β in a non-linearway

∂E [yi |xi ]∂xi

= Φ

(x ′iβ

σ

)β

∂E [yi |xi ]∂xi

= Pr(yi > 0)∂E [yi |xi ,yi>0]

∂xi+ E [yi |xi , yi > 0]∂ Pr[yi>0]∂xi

A change in xi has two effects:

(1) It affects the conditional mean of y∗i in the positive part of thedistribution

(2) It affects the probability that the observation will fall in the positivepart of the distribution



Some issues in specification

Heteroschedasticity

. MLE is inconsistent

. However the problem can be approached directly and σi considered inthe likelihood function instead of σ. Specification of a particular modelfor σi provides the empirical model for estimation

Misspecification of Pr(y∗ < 0). In the tobit model, a variable that increases the probability of an

observation being a non-limit observation also increases the mean ofthe variable

- Example: loss due to fire in buildings

. A more general model has been devised involving a decision equationand a regression equation for nonlimit observations

Non-normality

. MLE is inconsistent

. Research is ongoing both on alternative estimators and on methods fortesting this type of misspecification


Truncation and censoring Sample selection

Sample selection

What if observation is driven by a different process?

(1) Data observability

. Saving function (in the population):saving = β0 + β1income + β2age + β3married + β4kids + u

. Survey data only includes families whose household head was 45 yearsof age or older

(2) Individual behaviour (Boyes, Hoffman, Low, 1989; Greene, 1992)

. y1 = 1 if individual i defaults on a loan/credit card, 0 otherwise

. y2 = 1 if individual i is granted a loan/credit card, 0 otherwise

. For a given individual, y1 is not observed unless y2 equals 1



Sample selection / incidental truncation

Let y and z have a bivariate distribution with correlation ρ

We are interested in the distribution of y given that another variablez exceeds a particular value

. Intuition: if y and z are positively correlated then the truncation of zshould push the distribution of y to the right

The truncated joint distribution is

f (y , z |z > a) = f (y , z)Pr(z > a)

To obtain the incidentally truncated marginal density of y , we shouldintegrate z out of this expression



Moment of the incidentally truncated bivariate normaldistribution

Let y and z have a bivariate normal distribution with means µy andµz , standard deviations σy and σz , and correlation ρ

E [y |z > a] = µy + ρσyλ(αz)V [y |z > a] = σ2y [1− ρ2δ(αz)]. αz = (a− µz)/σz. λ(αz) = φ(αz)/[1− Φ(αz)]. δ(αz) = λ(αz)[λ(αz)− αz ]

If the truncation is z < a, then λ(αz) = −φ(αz)/Φ(αz)



Example: A model of labor supply

Consider a population of women where only a subsample is engagedin market employment

We are interested in identifying the determinants of the labor supplyfor all women

A simple model of female labor supply consists of 2 equations

(1) Wage equation: the difference between a person’s market wage and herreservation wage, as a function of characteristics such as age,education, number of children, ... plus unobservables

(2) Hours equation: The desired number of labor hours supplied dependson the wage, home characteristics (e.g. presence of small children),marital status, ... plus unobservable

Truncation: Equation 2 describes the desired hours, but an actualfigure is observed only if the individual is working, i.e. when themarket wage exceeds the reservation wage

The hours variable is incidentally truncated



Example: A model of labor supplyWhen OLS on the working sample?

Assume working women are chosen randomly

If the working subsample has similar endowments of characteristics(both obs. & unobs.) as the nonworking sample, OLS is an option

BUT the decision to work is not random: the working andnonworking sample potentially have different characteristics

. When the relationship is purely trough observables, appropriateconditioning variables can be included in the relevant equation

. If unobservable characteristics affecting the work decision are correlatedwith the unobservable characteristics affecting wage, then a relationshipis determined that cannot be tackle by including appropriate controls

. A bias is induced due to “sample selection”



Regression in a model of selection (1)

Equation that determines sample selection

z∗i = w′i γ + ui

The equation of primary interest is

yi = x′iβ + �i

where yi is observed only when z∗i is greater than zero (otherwise

data are not available)

. This model is closely related to the Tobit model, although it is lessrestrictive: the parameters explaining the censoring are not constrainedto equal those explaining the variation in the observed dependentvariable. For this reason the model is also known as Tobit type two.




If ui and �i have a bivariate normal distribution with zero mean andcorrelation ρ,

E [yi |yi is observed] = E [yi |z∗i > 0]= E [yi |ui > −w ′i γ]= x ′iβ + E [�i |ui > −w ′i γ]= x ′iβ + ρσ�λi (αu)

where αz = −w ′i γ/σu and λ(αu) = φ(αu)/Φ(αu)So, the regression model can be written as

yi |z∗i > 0 = E [yi |z∗i > 0] + υi= x ′iβ + ρσ�λi (αu) + υi




E [yi |z∗i > 0] = x ′iβ + ρσ�λi (αu)

OLS regression using the observed data will lead to inconsistentestimates (omitted variable bias)The marginal effect of the regressors on yi in the observed sampleconsists of two components:. Direct effect on the mean of yi (β). In addition, if the variable appears in the probability that z∗i is positive,

then it will influence yi through its presence in λi

∂E [yi |z∗i > 0]∂xik

= βk + γk

(ρσ�σu

)δi (αu)

Most often z∗i is not observed, rather we can infer its sign but not itsmagnitude. Since there is no information on the scale of z∗, the disturbance

variance in the selection equation cannot be estimated (we let σ2u = 1)




Selection mechanisms

z∗i = w′i γ + ui ,

where we observe zi = 1 if z∗i > 0 and 0 otherwise.

. Pr(zi = 1|wi ) = Φ(w ′i γ)

. Pr(zi = 0|wi ) = 1− Φ(w ′i γ)

Regression modelyi = x

′iβ + �i ,

where yi is observed only when zi is equal to one (otherwise data arenot available)

. (ui , �i ) ∼ bivariate normal[0, 0, 1, σ�, ρ]



Estimation

Least squares using the observed data produces incosistent estimatesof β (omitted variable)

Least squares regression of y on x and λ would be a consistentestimator

. However, even if λi were observed, OLS would be inefficient: υi areheteroskedastic

Maximum likelihood estimation can be applied

Heckman (1979) proposed a two-step procedure



Maximum likelihood estimation

The log likelihood for observation i , log Li = li , can be written as:

. If yi is not observedli = log Φ(−w ′i γ)

. If yi is observed

li = log Φ

(w ′i γ + (yi − x ′i β)ρ/σ�√

1− ρ2

)− 1

2

(yi − x ′i βσ�

)− log(

√2πσ�)

σ� and ρ are not directly estimated (they have to be greater than 0)

Directly estimated are log σ� and atanhρ:

atanhρ =1

2log

(1 + ρ

1− ρ

)Estimation would be simplified if ρ = 0



Two-step procedureHeckman (1979)

yi |z∗i > 0 = E [yi |z∗i > 0] + υi= x ′iβ + ρσ�λi (αu) + υi

1 Estimate the probit equation by MLE to obtain estimates of γ. Foreach observation in the selected sample, compute λ̂i (inverse Millsratio)

2 Estimate β and βλ = ρσ� by least squares regression of y on x and λ̂



Estimators of the variance and standard errors

Second step standard errors need to be adjusted to account for thefirst step estimation

The estimation of σ� needs to be adjusted:

. At each observation, the true conditional variance of the disturbancewould be σ2i = σ

2� (1− ρ2δi )

. A consistent estimator of σ2� is given by:

σ̂2� =1

ne′e + ˆ̄δb2λ

To test hypothesis, an estimate of the asymptotic covariance matrixof the coefficients (including βλ) is needed

. Two problems arise: (1) the disturbance terms υi is heteroskedastic;(2) there are unknown parameters in λi

. Formulas are rather cumbersome, but can be calculated using thematrix of independent variables, the sample estimates of σ2� and ρ, andthe assumed known values of λi and δi



Two-step procedureDiscussion

Identification: exclusion restriction

. Although the inverse Mills ration is non linear in the single index w ′i γ,the function mapping this index into the inverse Mills ratio is linear forcertain ranges of the index

. Accordingly, the inclusion of additional variables in wi in the first stepcan be important for identification of the second step estimates

. In real world, there are few cadidates for simultaneous inclusion in wiand exclusion from xi

Inclusion of the inverse Mills ratio into the equation of interest isdriven by the normality assumption

. Recent research includes specific attempts to move away from thenormality assumption:

yi |z∗i > 0 = x ′i β + µ(w ′i γ) + υi

where µ(w ′i γ) is called “selectivity correction”



Selection in qualitative response models

The problem of sample selection has been modeled in other settingsbesides the linear regression model

Binary choice model have been considered, but also count datamodels

For example in the case of the Poisson model:

. yi |�i ∼ Poisson(λi )

. log λi = x′i β + �i

. (yi , xi ) are only observed when zi = 1, where z∗i = w

′i γ + ui and zi = 1

if z∗i > 0, 0 otherwise. Assume that (�i , ui ) have a bivariate normal distribution with non-zero

correlation. Selection affects the mean (and the variance) of yi and, in the observed

data, yi no longer has a Poisson distribution


Truncation and censoringTruncationCensored dataSample selection

new laura magazzini - santannapisa.it · 2014. 3. 1. · truncation: sample data are drawn from a...

Documents