microeconometrics lecture notes

407
ECO 7377 Microeconometrics Daniel L. Millimet Southern Methodist University Fall 2011 DL Millimet (SMU) ECO 7377 Fall 2011 1 / 407

Upload: burggu

Post on 26-Dec-2015

108 views

Category:

Documents


15 download

DESCRIPTION

Microeconometrics Lecture Notes by Daniel Millimet

TRANSCRIPT

Page 1: Microeconometrics Lecture Notes

ECO 7377Microeconometrics

Daniel L. Millimet

Southern Methodist University

Fall 2011

DL Millimet (SMU) ECO 7377 Fall 2011 1 / 407

Page 2: Microeconometrics Lecture Notes

Introduction

Applied research in economics can be loosely classied into two types1 Descriptive analysis2 Causal analysis

While the rst is important and useful, the second is of primaryinterest

Causal analysis is needed to predict the impact of changingcircumstances or policies, or for the evaluation of existing policies orinterventions

Prior to conducting, or when reviewing, causal analyses, questionsthat need to be answered:

1 What is the causal relationship of interest? [Is it economicallyinteresting?]

2 What is the identication strategy?3 What is the method of statistical inference?

DL Millimet (SMU) ECO 7377 Fall 2011 2 / 407

Page 3: Microeconometrics Lecture Notes

Several statistical issues are confronted when answering thesequestions in economic research:

Specication of the causal relationship of interest entails more thanjust dening x and y ... lots of parameters could be estimated

I Heterogenous vs. homogeneous e¤ectsI Know what you are estimatingI To whom does it apply?I What question does it answer?

Statistical inference is often di¢ cult and overlookedI Spherical vs. non-spherical errorsI Derivation/computation of estimated asymptotic variances ofestimators

DL Millimet (SMU) ECO 7377 Fall 2011 3 / 407

Page 4: Microeconometrics Lecture Notes

Identication of the causal relationship of interest frequentlyencounters

I Selection issues

F Self-selection (endogeneity)F Sample selection (missing data, attrition)

I Measurement issues

F Classical vs. non-classical errorF Dependent vs. independent variableF Continuous vs. discrete variables

I Modeling issues

F Functional form (P, SNP, NP)F Role of space (spillovers, spatial correlation)F Consistency with theory

DL Millimet (SMU) ECO 7377 Fall 2011 4 / 407

Page 5: Microeconometrics Lecture Notes

Dissertation considerations (applied work):

Whats the question? Is it economically interesting?

Whats the identication strategy (if question is causal)?I Selection on observables vs. unobservablesI Parameter of interest

Whats the data requirement? Is it feasible?

Has it been done? Is there value added?I Tension between hottopics and ability to contribute

DL Millimet (SMU) ECO 7377 Fall 2011 5 / 407

Page 6: Microeconometrics Lecture Notes

Dissertation Writing AdviceBe organized

I Outline paper before writing

I Most papers have a common structure

F Abstract: Very important. Be concise. No abbreviations, notation. Include the motivation, punchline.

F Intro: Outline the question. Explain why we care, and what is new in the paper. Give a slightly longer

summary than the abstract of what is done in the paper, and emphasize the major ndings.

F Lit review (may be incorporated in intro if short)

F Theoretical model: Be only as complicated as necessary. Understand ramications of assumptions. If

innovation is in the empirics, theory is only needed if it adds something not well understood.

F Empirical model: Be clear. Understand where identication comes from. Consider relevant specication

tests. Acknowledge deciencies, circumstances under which estimates are inconsistent.

F Data: Explain the sample selection criteria and variables used. If building on an existing literature, note

any di¤erences between the sample selection criteria and those used in existing papers.

F Results: Be sure to spend enough time discussing the actual results. If results di¤er from existing

literature, try to pin down the reason(s) why.

F Conclusion: Emphasize importance of new ndings, as well as shortcomings of the current paper.

Discuss potential future work still to be done. End on a positive note.

I Put discussions in relevant sections

F Avoid discussing the same point in multiple locations

F Discuss data in data section; discuss results in results section; most econometric issues belong in the

empirical model section

DL Millimet (SMU) ECO 7377 Fall 2011 6 / 407

Page 7: Microeconometrics Lecture Notes

Be considerate to your readersI Invest the time to proofread the paper many times; if you are unwillingto go through your paper carefully, why should others invest their time?

F Pascal: The letter I have written today is longer than usual because Ilacked the time to make it shorter.

F Quintilian: One should aim not at being possible to understand, butat being impossible to misunderstand.

I Spell check, grammar check, check formatting issues, check spacing,check indenting, etc.

I Dene notation, abbreviations, etc.I Avoid redundant notation, excessive notation, awkward notation, etc.I Avoid overly critical remarks about other papers; other authors are notidiots, and may be your referees

I Tables should be easy to read, and self-explanatory (need to refer backto the text should be kept to a minimum); include notes under thetables to explain things; avoid using abbreviations for variable namesunless necessary

I References should be double-checked; be sure they are accurate and allare included in the bibliography

DL Millimet (SMU) ECO 7377 Fall 2011 7 / 407

Page 8: Microeconometrics Lecture Notes

Be professional (this is not a term paper)I Avoid unsubstantiated claims, sweeping or grand statements, andgeneralizations

I Be upfront; do not hide assumptions/restrictions hoping they will beoverlooked, and justify their use

I Do not be unnecessarily complex in order to feel smart or show o¤ (seeSiegfried 1970)

F Da Vinci: Simplicity is the ultimate sophistication.F Einstein: Any fool can make things bigger, more complex, and moreviolent. It takes a touch of genius-and a lot of courage-to move in theopposite direction.

F Fowler: Any one who wishes to become a good writer shouldendeavour, before he allows himself to be tempted by the more showyqualities, to be direct, simple, brief, vigorous, and lucid.

F Mingus: Making the simple complicated is commonplace; making thecomplicated simple, awesomely simple, thats creativity.

F Je¤erson: The most valuable of all talents is that of never using twowords when one will do.

I Avoid contractionsI Be consistent with the use of Ior we if the paper uses rst person,consistent with present vs. past tense

DL Millimet (SMU) ECO 7377 Fall 2011 8 / 407

Page 9: Microeconometrics Lecture Notes

PlagiarismI Be careful, be ethical!I Give credit where credit is due; cite othersideas (in parentheses, notfootnotes)

F Milton: Copy from one, its plagiarism; copy from two, its research.F Donatus: Perish those who said our good things before we did.F Kuralt: I could tell you which writers rhythms I am imitating. Itsnot exactly plagiarism, its falling in love with good language and tryingto imitate it.

I Any statement in a paper should t one of the following categories: (i)factual (agreeable to any reader), or (ii) debatable (but then referencesin support, or it should be supported by the work done in the paperitself, or it should be written in the appropriate language: If onebelieves X, then Y.)

F But, any statement should be in your own words, or should be inquotations

DL Millimet (SMU) ECO 7377 Fall 2011 9 / 407

Page 10: Microeconometrics Lecture Notes

What to include?I Dissertation chapters can/should be longer than papers submitted forpublication

I Chapters may include greater detail on:

F Literature reviewF Data constructionF Empirical methodology

DL Millimet (SMU) ECO 7377 Fall 2011 10 / 407

Page 11: Microeconometrics Lecture Notes

BootstrapIntroduction

General structure of estimation

population ) θ

#random sample ) bθ

Problem: bθ is an estimate; need to assess its dbn for proper inferenceSolutions

I Asymptotic theoryI Simulation methods ) bootstrap

Stata: -bootstrap-, -bsample-

DL Millimet (SMU) ECO 7377 Fall 2011 11 / 407

Page 12: Microeconometrics Lecture Notes

IdeaI Re-sample (with replacement) from the random sample multiple timesand assess the dbn of the estimates

population ) θ

#random sample ) bθ

#bootstrap sample ) bθ

I Results in a vector of estimates, bθb , b = 1, ...,B, where B is the # ofbootstrap repetitions

Many di¤erent bootstrap methodsI Parametric vs. nonparametricI Resampling algorithms

F iidF Block/clusterF Sub-sampling (M/N)

I Imposing the null or not imposing

DL Millimet (SMU) ECO 7377 Fall 2011 12 / 407

Page 13: Microeconometrics Lecture Notes

BootstrapCondence Intervals

Consider a regression model

yi = xi β+ εi

Problem: given sample estimates, bβ, need to obtain std errors orcondence intervals

DL Millimet (SMU) ECO 7377 Fall 2011 13 / 407

Page 14: Microeconometrics Lecture Notes

There are two common sampling methods

1 Resampling the data2 Resampling the errors

DataI Resample (with replacement) observations (yi , xi ) ) fyi , xi gNi=1I Estimate the original model (OLS) on the re-sampled data set ) bβI Repeat B times ) bβb , b = 1, ...,B

DL Millimet (SMU) ECO 7377 Fall 2011 14 / 407

Page 15: Microeconometrics Lecture Notes

ResidualsI Given bβ from OLS on original sample, obtain residuals ) bεi ,i = 1, ...,N

I Resample (with replacement) a vector of N residuals ) bεi , i = 1, ...,NF This represents a random draw from the (nonparametric) empirical dbnof the residuals

I Alternative (parametric):

F Estimate bσ2 = 1N K ∑i bε2i

F Draw N random numbers, bεi , i = 1, ...,N , from N(0, bσ2)I Generate yi = xi

bβ+bεi (which imposes β = bβ)I Regress y on x by OLS ) bβI Repeat B times ) bβb , b = 1, ...,B

Resampling data is typically preferred since it less model dependent

DL Millimet (SMU) ECO 7377 Fall 2011 15 / 407

Page 16: Microeconometrics Lecture Notes

What to do with bβb , b = 1, ...,B? Several options...Obtain std error for original sample estimate, bβ, given by

se(bβ) = r 1B 1 ∑b

bβb bβObtain symmetric CI using normal approximation

β 2nbβ t1 α

2 ,B1se(bβ)o

Obtain asymmetric CI using percentile method

β 2nbβ α

2, bβ1 α

2

owhere subscript refers to the quantile of the empirical dbn of bβ

DL Millimet (SMU) ECO 7377 Fall 2011 16 / 407

Page 17: Microeconometrics Lecture Notes

Obtain asymmetric bias corrected and accelerated CIs (BCa)I Calculate

z0 = Φ11B ∑b I

bβb 6 bβ (median bias)

a =∑i

bβJ bβJ(i )36

"∑i

bβJ bβJ(i )2#3/2 (acceleration parameter)

where bβJ(i ) is the jacknife estimate (omitting obs i from original

sample) and bβJ is the mean of the jacknife estimatesI Calculate lower and upper quantiles

p1 = Φ

"z0 +

z0 z1 α2

1 a(z0 z1 α2)

#; p2 = Φ

"z0 +

z0 + z1 α2

1 a(z0 + z1 α2)

#where z1 α

2is the (1 α/2)th quantile of the std normal distribution

I CI given by β 2nbβp1 , bβp2o

DL Millimet (SMU) ECO 7377 Fall 2011 17 / 407

Page 18: Microeconometrics Lecture Notes

Notes:I BC CI obtained by setting a = 0I BCa requires B > 1000I z0 = 0 when bβ = median of bβI a reects the rate of change of the standard error of bβ with respect tothe true value, β

F The standard normal approximation assumes that the standard error isinvariant with respect to the true value

F The acceleration parameter corrects for deviations in practice

DL Millimet (SMU) ECO 7377 Fall 2011 18 / 407

Page 19: Microeconometrics Lecture Notes

Obtain asymmetric CI using bootstrap-tI When estimating the model on the re-sampled data, collect thet-statistics obtained from testing Ho : β = bβ

t =bβ bβse(bβ)

I Yields tb , b = 1, ...,BI Dene

tα )1B ∑b I(tb 6 tα ) = α

) tα is the αth quantile of the empirical dbn of tI CI given by

β 2nbβ t1 α

2se(bβ), bβ+ tα

2se(bβ)o

I Notes

F Method assumes se(bβ) is known based on asymptotic theoryF If unknown, then use double bootstrap

DL Millimet (SMU) ECO 7377 Fall 2011 19 / 407

Page 20: Microeconometrics Lecture Notes

Obtain asymmetric CI using bootstrap-t with double bootstrapI Estimate original model by OLS ) bβI Obtain bootstrap samples, estimate by OLS, form t given by

t =bβ bβse(bβ)

I Since denominator is not known, resample from the bootstrap sampleB2 times ) bβb , b = 1, ...,B2

I Obtain the estimated std error of bβ as the std deviation of the B2estimates

I Repeat process B1 timesI Obtain CI as above, but with se(bβ) replaced by the std deviation of theB2 estimates of bβ

DL Millimet (SMU) ECO 7377 Fall 2011 20 / 407

Page 21: Microeconometrics Lecture Notes

Example: x N(0, 1), N = 1000, xa N(0, 0.001)

010

2030

­.2 0 .2

Bootstrap Asymptotic

Reps = 20

05

1015

20

­.2 0 .2

Bootstrap Asymptotic

Reps = 100

05

1015

­.2 0 .2

Bootstrap Asymptotic

Reps = 500

05

1015

­.2 0 .2

Bootstrap Asymptotic

Reps = 1000

DL Millimet (SMU) ECO 7377 Fall 2011 21 / 407

Page 22: Microeconometrics Lecture Notes

BootstrapImposing the Null

Goal: estimate the model, derive some estimate or test statistic, andyou wish to test whether the true value of the parameter is equal tosome value or derive a p-value associated with the test statistic

StrategyI When re-sampling the data, generate new data sets where the null istrue (imposed)

I Estimate the original model on the re-sampled dataI Compare the value of the test statistics obtained from the re-sampleddata sets with the value of the test statistic from the original sample

I If the test statistic from the original sample is very di¤erent(statistically), then it is unlikely the null is true in the original sample

DL Millimet (SMU) ECO 7377 Fall 2011 22 / 407

Page 23: Microeconometrics Lecture Notes

Regression example

Modelyi = β0 + β1xi + εi , εi N(0, σ2)

Hypothesis of interest:

Ho : β1 = 0

H1 : β1 6= 0

DL Millimet (SMU) ECO 7377 Fall 2011 23 / 407

Page 24: Microeconometrics Lecture Notes

AlgorithmI Estimate model on original data ) bβ0, bβ1 ) tβ1 (t-statistic for β1)I Obtain the residuals ) bεi , i = 1, ...,NI Resample (with replacement) a vector of N residuals ) bεi , i = 1, ...,N

F This represents a random draw from the (nonparametric) empirical dbnof the residuals

I Alternative (parametric):F Estimate bσ2 = 1

N K ∑i bε2iF Draw N random numbers, bεi , i = 1, ...,N , from N(0, bσ2)

I Generate yi =bβ0 + 0 xi +bεi = bβ0 +bεi (which imposes β1 = 0)

I Regress y on x by OLS ) tβ1I Repeat B times ) tβ1,b

, b = 1, ...,BI Obtain p-value as

p-value =1B ∑b I(jtβ1 j > jtβ1 j)

I Reject null if p < α < 0.5, where α is the signicance level

DL Millimet (SMU) ECO 7377 Fall 2011 24 / 407

Page 25: Microeconometrics Lecture Notes

Distributional example

Want to test equality of CDFs of two random variables (e.g., wages ofjob training participants and non-participants)

Data sampleI xi , i = 1, ...,N, is random sample of one variable (participants), withCDF F (x)

I yi , i = 1, ...,M, is random sample of another variable(non-participants), with CDF G (y)

Hypothesis of interest:

Ho : F = G

H1 : F 6= G

DL Millimet (SMU) ECO 7377 Fall 2011 25 / 407

Page 26: Microeconometrics Lecture Notes

AlgorithmI Estimate empirical CDF in each sample: bF (x) and bG (y)I Compute test statistic

d =

rNMN +M

maxz2Supp(X ,Y )

nbF (z) bG (z)oI Pool data, re-sample (with replacement), sample size = N +M )q1, ..., qN+M

I Split the sample: denote rst N obs from F ; nal M obs from G(imposes F = G )

I Compute dI Repeat B times ) db , b = 1, ...,BI Obtain p-value as

p-value =1B ∑b I(d > d)

I Reject null if p < α < 0.5, where α is the signicance level

DL Millimet (SMU) ECO 7377 Fall 2011 26 / 407

Page 27: Microeconometrics Lecture Notes

BootstrapOther Issues

Non-iid data

All previous discussion assumes iid data since re-sampling occurswithout regard to any dependence across observations

If there exists some sort of dependence in the data, then resampleblocks or clusters of data

Example #1: Time series data with serial correlationI Model

yt = xtβ+ εt , t = 1, ...,T

I Resample blocks of length l by drawing obs randomly fromt = 1, ...,T l

I If obs t 0 is chosen for the bootstrap sample, also include obst = t 0 + 1, ..., t 0 + (l 1)

I Draw T/l obs so nal bootstrap sample size remains T

DL Millimet (SMU) ECO 7377 Fall 2011 27 / 407

Page 28: Microeconometrics Lecture Notes

Example #2: Panel dataI For example, individuals within hhs, or employees within rms, orindividuals over time

I Modelyif = xif β+ εif , i = 1, ...,N

where i represents individuals and f represents rmsI Several individuals are sampled from each of F < N rmsI Generate bootstrap samples by resampling (with replacement) the Frms

I If rm f is chosen for the bootstrap sample, include all employees ifrom that rm

I If identical number of employees from each rm are in the sample, thenbootstrap samples are still of size N

Blocks/clusters are chosen such that data are iid across blocks

DL Millimet (SMU) ECO 7377 Fall 2011 28 / 407

Page 29: Microeconometrics Lecture Notes

Sub-sampling (Politis and Romano 1992, 1994)

M of N re-sampling with or without replacement

Evaluate a statistic of interest at subsamples of the data

Use these subsampled values to build up an estimated samplingdistribution

The consistency properties of this sampling distribution hold fordependent data under very weak assumptions and even in situationswhere the bootstrap collapses

DL Millimet (SMU) ECO 7377 Fall 2011 29 / 407

Page 30: Microeconometrics Lecture Notes

Jacknife estimation

Leave-one-out estimation

AlgorithmI Estimate model using original sample ) bβ (if OLS model, say)I Omit obs i and re-estimate model on sample of N 1 obs ) bβ(i )I Repeat omitting each i once (implies N estimations)I Standard error obtained as

se(bβ) = rN 1N ∑i

bβ(i ) bβ(i )2In some situations, delete-d jacknife achieves superior performance

DL Millimet (SMU) ECO 7377 Fall 2011 30 / 407

Page 31: Microeconometrics Lecture Notes

Failure of the bootstrap or jacknife ...

Resampling methods are not guaranteed to work; theoreticaljustication is needed

Most common case of failure occurs when parameter of interest is anon-smooth function of the data (e.g., median vs. mean)

DL Millimet (SMU) ECO 7377 Fall 2011 31 / 407

Page 32: Microeconometrics Lecture Notes

Example: x N(0, 1), N = 1000, xmeda N(0, 0.00157)

020

4060

80

­.1 ­.05 0 .05 .1

Bootstrap Asymptotic

Reps = 20

010

2030

­.1 ­.05 0 .05 .1

Bootstrap Asymptotic

Reps = 100

05

1015

­.1 ­.05 0 .05 .1

Bootstrap Asymptotic

Reps = 500

05

1015

­.15 ­.1 ­.05 0 .05 .1

Bootstrap Asymptotic

Reps = 1000

DL Millimet (SMU) ECO 7377 Fall 2011 32 / 407

Page 33: Microeconometrics Lecture Notes

How to choose B?

Andrews & Buchinsky

Davidson & MacKinnon

DL Millimet (SMU) ECO 7377 Fall 2011 33 / 407

Page 34: Microeconometrics Lecture Notes

CausationIntroduction

General goal of most (applied) econometrics exercises is to distinguishbetween causation and correlation

Many empirical questions of concern to economists and/orpolicymakers pertains to the causal e¤ect of a program or policy

Statistical and econometric literature analyzing causation has seentremendous growth over the past several decades

Central problem concerns evaluation of the causal e¤ect of exposureto a treatment or program by a set of units on some outcome

I In economics, these units are economic agents such as individuals, hhs,rms, geographical areas, etc.

I The e¤ect of an exposure is only well-dened if the comparison is alsodened; typically the comparison is dened as not exposed,butsometimes it is not obvious (particularly with non-binary treatments)

DL Millimet (SMU) ECO 7377 Fall 2011 34 / 407

Page 35: Microeconometrics Lecture Notes

Philosophy of causality...I Rich literature in analytic philosophy on causalityI Two main approaches to dening causality:

F Regularity approaches: Hume: We may dene a cause to be anobject followed by another, and where all the objects, similar to therst, are followed by objects similar to the second. (from An EnquiryConcerning Human Understanding, section VII)

F Counterfactual approaches: Hume: Or, in other words, where, if therst object had not been, the second never had existed. (from AnEnquiry Concerning Human Understanding, section VII)

DL Millimet (SMU) ECO 7377 Fall 2011 35 / 407

Page 36: Microeconometrics Lecture Notes

Regularity approach: a minimal constant conjunction between thetwo objects (Suppes: a probabilistic association between the twoobjects, which cannot be explained away by other factors)

I Basic idea behind Granger causalityI Di¢ culty: what are the other factors? Limiting to only observablefactors is unsatisfying... if some factors are unobservable, then what?

I Example...

F C is a potential cause of E if Pr(E jC ) > Pr(E jnot C )F May be spurious if there exists some factor B s.t. Pr(E jC ) > Pr(E jnotC ) and Pr(E jC ,B) = Pr(E jnot C ,B)

(e.g., E = wages ,C = educ ,B = ability )F May also be a spurious zero correlation if there exists some factor Bs.t. Pr(E jC ) = Pr(E jnot C ) and Pr(E jC ,B) > Pr(E jnot C ,B)

(e.g., E = wages ,C = training ,B = shock)F B is known as a confounder or confounding variable

DL Millimet (SMU) ECO 7377 Fall 2011 36 / 407

Page 37: Microeconometrics Lecture Notes

Be wary: correlation does not imply causation as things are notalways as they seem ...

and the truth may be di¢ cult to see ...

DL Millimet (SMU) ECO 7377 Fall 2011 37 / 407

Page 38: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 38 / 407

Page 39: Microeconometrics Lecture Notes

Counterfactual approach: Lewis (1973) proposes to imagine arange of possible worlds

I Holland (1986, 2003): a treatment (cause) is a potential manipulationthat one can imagine

F NO CAUSATION WITHOUT MANIPULATIONF Gender, race are not treatments?!? (see Greiner and Rubin 2011)

I Imbens and Wooldridge (2009):

F A CRITICAL FEATURE IS THAT, IN PRINCIPLE, EACH UNIT CANBE EXPOSED TO MULTIPLE LEVELS OF THE TREATMENT.

I Angrist and Pischke (2009): a treatment should be manipulatableconditional on other factors ) Pr(C jB), Pr(not C jB) 2 (0, 1)

F NO FUNDAMENTALLY UNIDENTIFIED QUESTIONSF Example: school start age = biological age - time in school;if B = fbio age, time in schoolg, then school age is not an identiabletreatment

DL Millimet (SMU) ECO 7377 Fall 2011 39 / 407

Page 40: Microeconometrics Lecture Notes

Microeconometrics today emphasizes the counterfactual viewI Greiner & Rubin (2011):For analysts from a variety of elds, the intensely practical goal ofcausal inference is to discover what would happen if we changed theworld in some way.

Econometric methods are categorized by the type of selection involved

Selection typesI Selection on observables: all potential Bs are observedI Selection on unobservables: some potential Bs are unobserved

DL Millimet (SMU) ECO 7377 Fall 2011 40 / 407

Page 41: Microeconometrics Lecture Notes

CausationPotential Outcomes Model

Most causal research is couched in the potential outcomes framework

Typically referred to as the Rubin Causal Model (RCM); attributed toNeyman (1923, 1935), Fisher (1935), Roy (1951), Quandt (1972,1988), Rubin (1974)

Notationy1i = outcome of observation i with treatment

y0i = outcome of observation i without treatment

Di = treatment indicator ...

Di =1 treated0 untreated

fy1i , y0i ,Dig is a draw from the population of interest

fy1, y0,Dg is a sample from the population of interest

DL Millimet (SMU) ECO 7377 Fall 2011 41 / 407

Page 42: Microeconometrics Lecture Notes

NotesI Key insight is to model not just the observed outcome for each unit i ,but also the unobserved potential outcomes

I Implicit in this representation is the Stable Unit Treatment ValueAssumption (SUTVA, Rubin 1978), which assumes that outcome ofobs i with and without the treatment does not vary depending on thetreatment assignment of all other agents (rules out general equilibriumor indirect e¤ects)

F Allows one to write potential outcomes solely as a function of owntreatment assignment

y0i yi (D1,D2, ...,Di1, 0,Di+1, ...,DN ) = yi (0)

y1i yi (D1,D2, ...,Di1, 1,Di+1, ...,DN ) = yi (1)

F Imbens & Wooldridge (2009) provide some references to papers lookingat GE e¤ects; see also Ferracci et al. (2009), Heckman et al. (1999),Lewis (1963)

I Also implicit and sometimes lumped into SUTVA is the assumptionthat the mechanism for assignment treatments does not a¤ectpotential outcomes (rules out Hawthorne e¤ects, whereby agents mayact di¤erently if they know they are being observed)

DL Millimet (SMU) ECO 7377 Fall 2011 42 / 407

Page 43: Microeconometrics Lecture Notes

Parameters of interest

∆i = y1i y0i = treatment e¤ect for obs iI This is a random variable as it is obs-specicI Can summarize the distribution of this variable by focusing on di¤erentaspects

∆ATE = E[∆i ] = E[y1 y0 ]∆ATT = E[∆i jD = 1] = E[y1 y0 jD = 1]∆ATU = E[∆i jD = 0] = E[y1 y0 jD = 0]

Notes: Di¤erent parameters answer di¤erent questions, may be usefulfor di¤erent policy conclusions, and may require di¤erent assumptionsto identify

DL Millimet (SMU) ECO 7377 Fall 2011 43 / 407

Page 44: Microeconometrics Lecture Notes

Three other parameters that often appear1 Local Average Treatment E¤ect (Imbens & Angrist 1994, Angrist et al.1996)

F Dened as ∆LATE = E[y1 y0 ji 2 Ω], where Ω refers to somespecied subpopulation

2 Marginal Treatment E¤ect (Heckman & Vytlacil 1999, 2001, 2005,2007)

F Dened later3 Policy Relevant Treatment E¤ect (Heckman & Vytlacil 2001)

F Dened as ∆PRTE = E[yP yNP ], where P (NP) refers to the statewhere the program is fully (not) implemented

F With the program, all agents have access to the program, but maychoose not to participate

F Implies

∆PRTE = E[yP1 jDP = 1]Pr(DP = 1) + E[yP0 jDP = 0]Pr(DP = 0) E[yNP ]

= E[y1 y0 jDP = 1]Pr(DP = 1)where yP0 , y

P1 , and y

NP are the three potential outcomes, DP is thetreatment indicator in the world with the program, and the second linefollows if one assumes policy invariance (i.e., potential outcomes areuna¤ected by the existence of the program)

DL Millimet (SMU) ECO 7377 Fall 2011 44 / 407

Page 45: Microeconometrics Lecture Notes

Relationship among the parametersI Let

y1i = E[y1 ] + υi1

y0i = E[y0 ] + υi0

I This implies

∆i = y1i y0i= E[y1 y0 ] + υi1 υi0

= ∆ATE + υi1 υi0

and

∆ATT = ∆ATE + E[υi1 υi0 jD = 1]∆ATU = ∆ATE + E[υi1 υi0 jD = 0]

where E[υi1 υi0 jD = j ] is the average, obs-specic gain fromtreatment for group j

DL Millimet (SMU) ECO 7377 Fall 2011 45 / 407

Page 46: Microeconometrics Lecture Notes

Can re-dene any of the above parameters for sub-population denedon the basis of attributes, x

∆ATE (x) = E[y1 y0jx ]∆ATT (x) = E[y1jx ,D = 1] E[y0jx ,D = 1]∆ATU (x) = E[y1jx ,D = 0] E[y0jx ,D = 0]

The previous unconditional parameters are obtained by integratingover the dbn of x in the relevant population

∆ATE =Z

∆ATE (x)f (x)dx

∆ATT =Z

∆ATT (x)f (x jD = 1)dx

∆ATU =Z

∆ATU (x)f (x jD = 0)dx

DL Millimet (SMU) ECO 7377 Fall 2011 46 / 407

Page 47: Microeconometrics Lecture Notes

Aside

While the preceding parameters, based on di¤erences in expectations,are the near universal focus in economics, this need not be the case

Can also dene treatment e¤ects based on ratios

∆RATE = E[y1]/ E[y0]∆RATT = E[y1jD = 1]/ E[y0jD = 1]∆RATU = E[y1jD = 0]/ E[y0jD = 0]

These are referred to as relative treatment e¤ects (and priorparameters are referred to as absolute or di¤erenced treatmente¤ects)

Note, however, that relative e¤ects lack a bit of intuitive appeal sinceif we dene ∆i = y1i/y0i , then E[∆i ] = E[y1i/y0i ] 6= E[y1]/ E[y0] andsame for RATT and RATU

DL Millimet (SMU) ECO 7377 Fall 2011 47 / 407

Page 48: Microeconometrics Lecture Notes

Evaluation Problem

Only observe one potential outcome at a point in time for anyobservation

Implies...

Attributes of i Observed for ify1i , y0i ,Dig fyi ,Dig

where yi = Diy1i + (1Di )y0i = observed outcome for observation iMissing potential outcome is the missing counterfactual

I Holland (1986) refers to this as the fundamental problem of causalinference

I Because of this, the central issue in the RCM is the relationshipbetween treatment assignment and potential outcomes

F Typically referred to as the treatment assignment ruleF Growing literature on assignment rules (Manski 2000, 2004; Pepper2002, 2003; Dehejia 2005; Lechner & Smith 2007)

DL Millimet (SMU) ECO 7377 Fall 2011 48 / 407

Page 49: Microeconometrics Lecture Notes

Example #1... ATTI Consider estimating ∆ATT = E[y1 jD = 1] E[y0 jD = 1]I E[y1 jD = 1] can be estimated from the data, but one does not observe

E[y0 jD = 1]I If one uses outcomes of the untreated, we can denee∆ATT = E[y1 jD = 1] E[y0 jD = 0]

I Which implies selection bias equal to

∆ATT = E[y1 jD = 1] E[y0 jD = 0] + E[y0 jD = 0] E[y0 jD = 1]) bias = e∆ATT ∆ATT = E[y0 jD = 1] E[y0 jD = 0]

I This is generally non-zero, and may be decomposed into 3 components(Heckman et al. 1996, 1998):

1 Self-selection into treatment in a manner related to outcome in theuntreated state

2 Observables, x , impacting outcome may not overlap at certain valuesacross the treatment and control groups

3 Even with overlap, the distribution of x may vary across the treatmentand control groups

DL Millimet (SMU) ECO 7377 Fall 2011 49 / 407

Page 50: Microeconometrics Lecture Notes

Example #2... ATEI Consider estimating ∆ATE = E[y1 ] E[y0 ]I Neither unconditional expectation can be estimated from the dataI If one uses conditional expectations, we can dene

e∆ATE = E[y1 jD = 1] E[y0 jD = 0]

I Which implies selection bias equal to

e∆ATE ∆ATE = E[y1 jD = 1] E[y0 jD = 0] (E[y1 ] E[y0 ])) bias = (E[y1 jD = 1] E[y1 jD = 0])[1 Pr(D = 1)]

+ (E[y0 jD = 1] E[y0 jD = 0])Pr(D = 1)

which is a weighted average of the selection bias for the ATT and ATU

Question: How does one circumvent the missing counterfactualproblem to estimate ∆ATE , ∆ATT , ∆ATU , or any other summarystatistic of the distribution of ∆?

DL Millimet (SMU) ECO 7377 Fall 2011 50 / 407

Page 51: Microeconometrics Lecture Notes

Early Example of Potential Outcomes: Roy Model (Roy 1951)

As noted previously, at the heart of the RCM is the interplay betweenassignment of treatments, potential outcomes, and observed outcomes

Problem is one of self-selection; highlighted in a very clever fashion inRoy (1951)

Specic issue in Roy (1951) was occupational choiceI Individuals have potential earnings associated with di¤erent occupationchoices

I Realized earnings reect the chosen occuption

Example

Suppose y0y1

N

01,∑

DL Millimet (SMU) ECO 7377 Fall 2011 51 / 407

Page 52: Microeconometrics Lecture Notes

Unconditional outcome distributions look like

0.1

.2.3

.4

­4 ­2 0 2 4 6Support

kdensity y0 kdensity y1Simulated data, 1000 obs, rho=0.7

Unconditional Distributions of Potential Outcomes

DL Millimet (SMU) ECO 7377 Fall 2011 52 / 407

Page 53: Microeconometrics Lecture Notes

Conditional distributions

Depends onI Who selects into treatment or control group, andI Correlation of potential outcomes

Positive correlation in above example (ρ 0.7)

DL Millimet (SMU) ECO 7377 Fall 2011 53 / 407

Page 54: Microeconometrics Lecture Notes

Positive selection: Assume those above the mean in y1 distribution selectinto treatment

0.2

.4.6

.8

­4 ­2 0 2 4 6Support

kdensity yy0 kdensity yy1Simulated data, 1000 obs, rho=0.7; positive selection into treatment.

Conditional Distributions of Potential Outcomes

DL Millimet (SMU) ECO 7377 Fall 2011 54 / 407

Page 55: Microeconometrics Lecture Notes

Negative selection: Assume those below the mean in y1 distributionselect into treatment

0.2

.4.6

.8

­4 ­2 0 2 4Support

kdensity yy0 kdensity yy1Simulated data, 1000 obs, rho=0.7; negative selection into treatment.

Conditional Distributions of Potential Outcomes

DL Millimet (SMU) ECO 7377 Fall 2011 55 / 407

Page 56: Microeconometrics Lecture Notes

Random assignment:

0.1

.2.3

.4

­4 ­2 0 2 4 6Support

kdensity yy0 kdensity yy1Simulated data, 1000 obs, rho=0.7; random assignment into treatment.

Conditional Distributions of Potential Outcomes

Lesson to be learned: observed distributions are not the unconditionaldistributionsDL Millimet (SMU) ECO 7377 Fall 2011 56 / 407

Page 57: Microeconometrics Lecture Notes

Roy Model

Two occupations: hunter, sherPotential incomes

yd = gd (x) + υd , d = 0 (h), 1 (f)

Decision rule

D = I(y1 y0 > 0)= I(g1(x) g0(x) + υ1 υ0 > 0)

Observed incomey = Dy1 + (1D)y0

Treatment assignment depends on observables, x , and unobservables,υ1 υ0Notes:

1 Cov(D, υ1 υ0) 6= 0 referred to as essential heterogeneity (Heckmanet al. 2006)

2 Cov(D, υ1 υ0) 6= 0) Cov(D,D(υ1 υ0)) 6= 0DL Millimet (SMU) ECO 7377 Fall 2011 57 / 407

Page 58: Microeconometrics Lecture Notes

Generalized Roy Model

Replace income maximization decision rule with a more general rule

Decision ruleD = I(h(x) u > 0)

When D is a voluntary program (e.g., job training), u may reect (i)costs of participation and (ii) foregone earnings (opportunity costs)

Implies that treatment assignment depends on observables, x , andunobservables, u

I Essential heterogeneity implies Corr(u, υd ) 6= 0 8d

DL Millimet (SMU) ECO 7377 Fall 2011 58 / 407

Page 59: Microeconometrics Lecture Notes

Moving Forward

Guided by the potential outcomes framework, gure out conditionsunder which di¤erent estimators may provide consistent estimates ofthe ATE, ATT, ATU, etc.

Key points:I Given the missing counterfactual problem, any estimator of the causale¤ects of a treatment must rely on some assumptions

I Di¤erent estimators rely on di¤erent assumptions and thus should notbe expected to yield similar estimates unless the identifyingassumptions of each hold in the data

I While extraneous assumptions may be testable overidentifyingrestrictions not all assumptions can be tested

I Di¤erent estimators estimate di¤erent aspects of the dbn of ∆ and thusanswer di¤erent questions

DL Millimet (SMU) ECO 7377 Fall 2011 59 / 407

Page 60: Microeconometrics Lecture Notes

CausationRandom Experiments

First solution is to randomize treatment assignment

Generally speaking, randomization is the preferred solution; oftencalled the gold standard

Reason: randomization ensures that treatment assignment isindependent of potential outcomes in expectation

Freedman (2006): Experiments o¤er more reliable evidence oncausation than observational studies.

Imbens (2009): More generally, and this is the key point, in a situationwhere one has control over the assignment mechanism, there is little togain, and much to lose, by giving that up through allowing individuals tochoose their own treatment regime. Randomization ensures exogeneity ofkey variables, where in a corresponding observational study one wouldhave to worry about their endogeneity.

DL Millimet (SMU) ECO 7377 Fall 2011 60 / 407

Page 61: Microeconometrics Lecture Notes

That said, not everyone is convinced by experiments (without doingsome more mental work)

Much of the criticism about experiments is about thedi¢ culty of generalizing fom the evaluation of one particularprogram to predicting what would happen to this program in adi¤erent context. Clearly, without theory to guide us on why aresult extends from a context to another, it is di¢ cult to jumpdirectly to a policy conclusion. However, when experiemtns aremotivated by a theory, the results of experiments (not only onthe nal outcomes, but on the entire chain of intermediateoutcomes that led to the endpoint of interest) serve as a test ofsome of the implications of that theory. The combination of datapoints then eventually provides su¢ cient evidence to make policyrecommendations.

Duo (2010),http://www.aeaweb.org/econwhitepapers/white_papers/Esther_Duo.pdf

DL Millimet (SMU) ECO 7377 Fall 2011 61 / 407

Page 62: Microeconometrics Lecture Notes

From an ex post evaluation standpoint, a carefully plannedexperiment using random assignment of program statusrepresents the ideal scenario, delivering highly credible causalinference. But from an ex ante evaluation standpoint, the causalinferences from a randomized experiment may be a poor forecastof what were to happen if the program were to be scaled up.

DiNardo & Lee (2011)

Ex post evaluation answers the question: What happened?(descriptive)

Ex ante evaluation answers the question: What would happen?(predictive)

DL Millimet (SMU) ECO 7377 Fall 2011 62 / 407

Page 63: Microeconometrics Lecture Notes

Randomization may occur at di¤erent stages1 Population-level: randomize among agents in the population; typicallynot feasible since it would entail compellingtreatment by some

2 Eligibility-level: randomize among the population of eligibles byrandomly denying eligibility to a subset

3 Application-level: randomize among the population of programapplicants by randomly accepting/rejecting a subset

Stage at which randomization occurs generally a¤ects what can belearned unless additional assumptions are made

DL Millimet (SMU) ECO 7377 Fall 2011 63 / 407

Page 64: Microeconometrics Lecture Notes

Assumptions (with population-level randomization)(A.i) fy ,Dg is iid sample from the population(A.ii) y0, y1 ? D(A.iii) Pr(D = 1) 2 (0, 1)Notes

I (A.i) implies SUTVAI (A.ii) implies E[y1 jD = 1] = E[y1 jD = 0] = E[y1 ]; similarly for E[y0 ]I (A.ii) also implies ∆ATE = ∆ATT = ∆ATU since

E[y1 y0 ]| z ATE

= E[y1 y0 jD = 1]| z ATT

= E[y1 y0 jD = 0]| z ATU

I (A.ii) relies on perfect compliance; imperfect compliance may invalidatethe assumption if such non-compliance is related to potential outcomes

F Di¤erence in experimental means based on initial assignment still yieldsestimate of intent to treat under imperfect compliance; may actually bemore policy relevant

I (A.iii) ensures all agents have some probability of receiving and notreceiving the treatment

I Population-level randomization is feasible if compensation is o¤ered toensure compliance and this compensation does not a¤ect y0 and y1

DL Millimet (SMU) ECO 7377 Fall 2011 64 / 407

Page 65: Microeconometrics Lecture Notes

Estimation

b∆ATE = \E[yi jD = 1] \E[yi jD = 0]

=∑Ni=1 yi I[Di = 1]

∑Ni=1 I[Di = 1]

∑Ni=1 yi I[Di = 0]

∑Ni=1 I[Di = 0]

p! E[yi jD = 1] E[yi jD = 0]= E[Diy1i + (1Di )y0i jD = 1]

E[Diy1i + (1Di )y0i jD = 0]= E[y1i jD = 1] E[y0i jD = 0]= E[y1i ] E[y0i ]= ∆ATE

DL Millimet (SMU) ECO 7377 Fall 2011 65 / 407

Page 66: Microeconometrics Lecture Notes

PropertiesI UnbiasedI ConsistentI Asymptotically normalI Nonparametrically identied: no parametric or functional formassumptions needed

NotesI (A.ii) may be replaced by a mean independence assumption ...

E[yj jD = j ] = E[yj ], j = 0, 1I Randomization succeeds by balancing (in expectation) both observableand unobservable attributes of participants in the treatment andcontrol group

I Randomization can be assessed by testing for di¤erences in the jointdbn of predetermined attributes across the treatment and controlgroups

I Randomization at the eligibility or application stage only yield anestimate of the ATT, which does not equal the ATE unless (i)treatment e¤ects are homogeneous or (ii) agents do not becomeeligible or apply due to unobserved, observation-specic gains to thetreatment, υ1 υ0

DL Millimet (SMU) ECO 7377 Fall 2011 66 / 407

Page 67: Microeconometrics Lecture Notes

Selection on Observables

Randomization is typically not feasible in economics

Applied economists typically must rely on observational (ornon-experimental) data

Data structure is now given by...

attributes of i observed for ify1i , y0i ,Di , xig fyi ,Di , xig

where xi is a vector of observable attributes of i

DL Millimet (SMU) ECO 7377 Fall 2011 67 / 407

Page 68: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability

Assumptions

(A.i) iid sample: fy ,D, xg is iid sample from the population

(A.ii) Conditional independence or unconfoundedness: y0, y1 ? D jx(A.iii) Common support or overlap: Pr(D = 1jx) 2 (0, 1)

Note: CIA is sometime referred to as selection on observables (orobserved variables) assumption because if D is a deterministic fn of x ,then CIA will hold. However, the CIA is broader than this case; D mayalso depend on unobservables as long as these unobservables are notcorrelated with potential outcomes.

DL Millimet (SMU) ECO 7377 Fall 2011 68 / 407

Page 69: Microeconometrics Lecture Notes

Notes...(A.i) implies SUTVA(A.ii) implies

Pr(Di = 1jxi , y1i , y0i ) = Pr(Di = 1jxi )(A.iii) ensures one observes agents with a particular x in both thetreatment and control groups(A.ii), (A.iii) ) stong ignorability (Rosenbaum & Rubin 1983)

I xs must be pre-determined (i.e., una¤ected by treatment assignment);if some xs are directly a¤ected by D or by the anticipation of D, thenconditioning on them will mask (at least) some of the e¤ect of thetreatment

I Implies estimation under strong ignorability requires an instrumentexist, but it is not required to be observed (or even known) such thatconditional on x , D is random rather than deterministic

I There may not exist any vector x in a particular data set for aparticular treatment such that stong ignorability holds

I There is some tension between (A.ii) and (A.iii); some xs mayperfectly predict treatment assignment (invalidating CS), but omissionmay invalidate CIA... hence, the need for the implicit IV

DL Millimet (SMU) ECO 7377 Fall 2011 69 / 407

Page 70: Microeconometrics Lecture Notes

Nonparametric identication

Estimation

b∆ATE (x) = \E[yi jxi = x ,D = 1] \E[yi jxi = x ,D = 0]

=∑Ni=1 yi I[xi = x ,Di = 1]

∑Ni=1 I[xi = x ,Di = 1]

∑Ni=1 yi I[xi = x ,Di = 0]

∑Ni=1 I[xi = x ,Di = 0]

p! E[yi jxi = x ,D = 1] E[yi jxi = x ,D = 0]= E[y1i jxi = x ,D = 1] E[y0i jxi = x ,D = 0]= E[y1i jxi = x ] E[y0i jxi = x ]

and then

b∆ATE = E[b∆ATE (x)] = Z b∆ATE (x)f (x)dx = 1N ∑i

b∆ATE (xi )Similar story for other parameters, except nal step uses f (x jD = 1)or f (x jD = 0)

DL Millimet (SMU) ECO 7377 Fall 2011 70 / 407

Page 71: Microeconometrics Lecture Notes

CaveatsI If x takes on many values (even if still discrete), there may be smallsample size for any particular value, x , leading to high variance forb∆ATE (x)

I If x is continuous, then this estimator cannot be used since theprobability of observing more than one obs with the same value of x iszero

I Possible solution: functional form assumptions

DL Millimet (SMU) ECO 7377 Fall 2011 71 / 407

Page 72: Microeconometrics Lecture Notes

Final Note

CIA is not testable except by conducting random experiments forcomparison

One common testemployed entails testing for di¤erences inpre-treatment outcomes conditional on x between the to-be-treatedand the controls

I Intuition: if D is uncorrelated with unobservables related to theoutcome conditional on x , then pre-treatment outcomes should beunrelated to (future) D conditional on x

I Heckman et al. (1999) refers to this as the alignment fallacyI In particular, test based on outcomes more than one period in the pastis misleading if shocks are serially correlated and agents self-select intothe treatment group due to an adverse shock in the period directlybefore treatment

I In general, test is useful if it rejects the independence of D and yconditional on x in periods prior to treatment; if it fails to reject, thenthe test is ambiguous

DL Millimet (SMU) ECO 7377 Fall 2011 72 / 407

Page 73: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Regression

Previous results showed that

∆ATE (x) = E[y1i jxi = x ] E[y0i jxi = x ]= E[yi jxi = x ,D = 1] E[yi jxi = x ,D = 0]

Implies key is to estimate the regression function E[yi jxi ,Di ]

DL Millimet (SMU) ECO 7377 Fall 2011 73 / 407

Page 74: Microeconometrics Lecture Notes

Assumptions

(A.iv) Separability:

y0i = µ0(xi ) + υ0i

y1i = µ1(xi ) + υ1i

where E[υ1 jx ] = E[υ0 jx ] = E[υ1 υ0 jx ] = 0(A.v) Functional forms:

(A.va) Constant treatment e¤ect

µ0(xi ) = α0 + xi β

µ1(xi ) = α1 + xi β

(A.vb) Heterogeneous treatment e¤ects

µ0(xi ) = α0 + xi β0µ1(xi ) = α1 + xi β1

DL Millimet (SMU) ECO 7377 Fall 2011 74 / 407

Page 75: Microeconometrics Lecture Notes

Implications...

Given (A.i), (A.ii), (A.iv), and (A.va) ...

E[yi jxi ,D = 0] = α0 + xi β+ E[υ0i jxi ,D = 0]E[yi jxi ,D = 1] = α1 + xi β+ E[υ1i jxi ,D = 1]

implies

∆ATE (x) = E[yi jxi = x ,D = 1] E[yi jxi = x ,D = 0]= α1 α0

= ∆ATE = ∆ATT = ∆ATU

DL Millimet (SMU) ECO 7377 Fall 2011 75 / 407

Page 76: Microeconometrics Lecture Notes

Given (A.i), (A.ii), (A.iv), and (A.vb) ...

E[yi jxi ,D = 0] = α0 + xi β0 + E[υ0i jxi ,D = 0]E[yi jxi ,D = 1] = α1 + xi β1 + E[υ1i jxi ,D = 1]

implies

∆ATE (x) = E[yi jxi = x ,D = 1] E[yi jxi = x ,D = 0]= (α1 α0) + xi (β1 β0)

and

∆ATE =Z

∆ATE (x)f (x)dx = (α1 α0) + E[x ](β1 β0)

∆ATT =Z

∆ATE (x)f (x jD = 1)dx = (α1 α0) + E[x jD = 1](β1 β0)

∆ATU =Z

∆ATE (x)f (x jD = 0)dx = (α1 α0) + E[x jD = 0](β1 β0)

DL Millimet (SMU) ECO 7377 Fall 2011 76 / 407

Page 77: Microeconometrics Lecture Notes

Estimation... Given (A.i), (A.ii), (A.iv), and (A.va)

Estimate via OLS

yi y0i +Di (y1i y0i )= α0 + xi β+ υ0i +Di (α1 + xi β+ υ1i α0 xi β υ0i )

= α0 + xi β+ (α1 α0)Di + [υ0i +Di (υ1i υ0i )]

= α0 + xi β+ ∆ATEDi + eυiCoe¢ cient on D is an unbiased estimate of the causal parameter, and

∆ATE = ∆ATT = ∆ATU

DL Millimet (SMU) ECO 7377 Fall 2011 77 / 407

Page 78: Microeconometrics Lecture Notes

Estimation... Given (A.i), (A.ii), (A.iv), and (A.vb) ...

Estimate via OLS

yi = α0 + xi β0 + (α1 α0)Di + xiDi (β1 β0)

+ [υ0i +Di (υ1i υ0i )]

= α0 + xi β+ eα1Di + xiDieβ1 + eυiEstimates given by

b∆ATE (x) = beα1 + xbeβ1b∆ATE = beα1 + xbeβ1b∆ATT = beα1 + x1beβ1b∆ATU = beα1 + x0beβ1where x j = ∑i xi I[Di = j ]/ ∑i I[Di = j ], j = 0, 1

DL Millimet (SMU) ECO 7377 Fall 2011 78 / 407

Page 79: Microeconometrics Lecture Notes

Alternatively, estimate via OLS

yi = α0 + (xi x)β0 + (α1 α0)Di + (xi x)Di (β1 β0)

+ [υ0i +Di (υ1i υ0i )]

= α0 + (xi x)β0 + eα1Di + (xi x)Dieβ1 + eυiEstimates given by

b∆ATE (x) = beα1 + (x x)beβ1b∆ATE = beα1b∆ATT = beα1 + x1beβ1b∆ATU = beα1 + x0beβ1where x j = ∑i (xi x) I[Di = j ]/ ∑i I[Di = j ], j = 0, 1

DL Millimet (SMU) ECO 7377 Fall 2011 79 / 407

Page 80: Microeconometrics Lecture Notes

NotesI Inclusion of x on RHS of the introduces problem of generatedregressor; OLS std errors are incorrect, but e¤ect is generally minor

I Standard errors of estimators obtained via delta method or bootstrapI Prior to implementing regression approach, it is useful to examine thenormalized di¤erences in x across the treatment and control groups

F Normalized di¤erence for a particular x is given by

∆x =x1 x0qσ2x1 + σ2x0

F If j∆x j > 0.25, regression results are sensitive to functional formassumptions in (A.va) and (A.vb); see Imbens & Wooldridge (2009)

DL Millimet (SMU) ECO 7377 Fall 2011 80 / 407

Page 81: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching

PreliminariesI Matching methods were quite popular, and still are to a large extentI (Incorrectly) viewed by many as a magic bulletto the estimation oftreatment e¤ects, as a way to mimicrandomized experiments

I In practice, only as good as the underlying assumptionsI Matching when identifying assumptions are violated may yield worseestimate than without matching

Assumptions required: (A.i), (A.ii), and (A.iii)I Technicality #1: only need y0 ? D jx to estimate ATT; y1 ? D jx toestimate ATU

I Technicality #2: (really) only need E[yj jx ,D = j ] = E[yj jx ,D = j 0],j , j 0 = 0, 1 to estimate ATE; E[y0 jx ,D = 1] = E[y0 jx ,D = 0] toestimate ATT; E[y1 jx ,D = 0] = E[y1 jx ,D = 1] to estimate ATU

DL Millimet (SMU) ECO 7377 Fall 2011 81 / 407

Page 82: Microeconometrics Lecture Notes

Comparison to regression approachI No functional form assumptions: if CIA holds, but (A.va) or (A.vb) donot, then matching will be consistent and OLS will not

I Matching weights observations di¤erently, giving more weight to thosedeemed most similar

I Matching requires, and thus highlights problems due to, CS1

23

45

.2 .4 .6 .8 1x

Untreated Units Untreated, Regression LineTreated Units Treated, Regression Line

E[y|x,D=0]=1+1x; E[y|x,D=1]=1.5+2.5x; sigma = 0.25

F CS is violated, but OLS simply extrapolates from each group toestimate the missing counterfactual at a particular value of x

F If linear regression specication is not globally accurate, then regressionmay yield severe bias (see earlier discussion on normalized di¤erences)

DL Millimet (SMU) ECO 7377 Fall 2011 82 / 407

Page 83: Microeconometrics Lecture Notes

The fallacy (perhaps!) of extrapolation

DL Millimet (SMU) ECO 7377 Fall 2011 83 / 407

Page 84: Microeconometrics Lecture Notes

Estimation

Parameters

∆ATE = E[y1 y0]∆ATT = E[y1 y0jD = 1]∆ATU = E[y1 y0jD = 0]

Unfeasible estimators

b∆ATE =1N ∑i (y1i y0i )b∆ATT =

1

∑i I[Di = 1]∑i (y1i y0i ) I[Di = 1]

b∆ATU =1

∑i I[Di = 0]∑i (y1i y0i ) I[Di = 0]

DL Millimet (SMU) ECO 7377 Fall 2011 84 / 407

Page 85: Microeconometrics Lecture Notes

Feasible estimators

b∆ATT =1

∑i I[Di = 1]∑i (y1i byi0) I[Di = 1]

b∆ATU =1

∑i I[Di = 0]∑i (byi1 y0i ) I[Di = 0]

b∆ATE =∑i I[Di = 1]

Nb∆ATT + ∑i I[Di = 0]

Nb∆ATU

where byi0, byi1 are estimates of the missing counterfactuals, obtainedas

byi0 =1

∑l2fDl=0g

ωil∑

l2fDl=0gωilyl0

byi1 =1

∑l2fDl=1g

ωil∑

l2fDl=1gωilyl1

where ωil = weight given to observation l by observation i

DL Millimet (SMU) ECO 7377 Fall 2011 85 / 407

Page 86: Microeconometrics Lecture Notes

Feasible estimation accomplished by replacing the missingcounterfactual with a weighted average of outcomes from thecorresponding groupFormally, all matching estimators take the form

b∆ATT =1N1

∑i2fDi=1g

0BB@y1i 1

∑l2fDl=0g

ωil∑

l2fDl=0gωilyl0

1CCAb∆ATU =

1N0

∑i2fDi=0g

0BB@ 1

∑l2fDl=1g

ωil∑

l2fDl=1gωilyl1 y0i

1CCAb∆ATE =

N1Nb∆ATT + N0

Nb∆ATU

whereNj = ∑i I[Di = j ], j = 0, 1

Matching estimators di¤er in terms of how the weights are speciedand what exactly is matched onDL Millimet (SMU) ECO 7377 Fall 2011 86 / 407

Page 87: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Weighting Schemes)

Exact matching or cell matching

Assuming x contains only discrete variables, assign positive weightonly to observations with identical values of xLet there be K distinct values (or combinations) of xs indexed byk = 1, ...,K (i.e., K cells)N0k , N1k = the number of untreated, treated obs in cell kEstimators given by

b∆ATT = ∑k

N1kN1

i2k\fDi=1g

y1iN1k

∑l2k\fDl=0g

yl0N0k

!

b∆ATU = ∑k

N0kN0

l2k\fDl=1g

yl1N1k

∑i2k\fDi=0g

y0iN0k

!

which reect di¤erent weighted averages of the average treatmente¤ect within the K cellsDL Millimet (SMU) ECO 7377 Fall 2011 87 / 407

Page 88: Microeconometrics Lecture Notes

Estimator is subject to curse of dimensionality

With high dimensional x , or if x contains continuous variables,inexact matching algorithms are useful

Asymptotically, all inexact matching estimators are equivalent sincethe inexactnessdisappears as N ! ∞In nite samples, di¤erent inexact matching algorithms may yieldquite di¤erent estimates

A newly proposed middle ground between exact and inexact matchingis known as coarsened exact matching (CEM)

I Intuition: roundx to fewer distinct values, then match exactly on thecoarsened data

I Developed by King et al.I See -cem- in Stata

DL Millimet (SMU) ECO 7377 Fall 2011 88 / 407

Page 89: Microeconometrics Lecture Notes

Inexact matchingRequires a measure of distance between any two observations, i and l

I Euclidian-type distance metrics are of the form

dil = (xi xl )0W (xi xl )where common choices for W are

1 W = I (identity matrix)2 W = Σ1, where Σ is the sample variance-covariance matrix of x(Mahalanobis metric)

3 W is a diagonal matrix with the variance of x along the diagonal, zeroson the o¤-diagonal (Abadie & Imbens 2002, 2006)

4 Zhao (2004) proposes other alternativesI Propensity score methods compute the distance based on di¤erences inthe probability of being in the treatment group given x

p(x) = Pr(D = 1jx) 2 [0, 1]where distance between two observations is

dil = jp(xi ) p(xl )jI If y0, y1 ? D jx ) y0, y1 ? D jp(x), which follows from the fact thatD ? x jp(x) (Rosenbaum & Rubin 1983)

DL Millimet (SMU) ECO 7377 Fall 2011 89 / 407

Page 90: Microeconometrics Lecture Notes

Euclidean-type distance metrics, propensity score are both a means tocircumvent dimensionality as d is a scalar

No one method is superior; goal is to balance the xs ... discussedlater (Ho et al. 2007)

I In this sense, matching is not an estimator per se, but can be viewed asa way of pre-processing the data prior to applying some estimator

I Similar to a type of outlier analysis

Given dil several weighting schemes are frequently usedI Let C (0) represent a neighborhood around 0 for each iI Observations given positive weight by i are those included in the set Aiwhere

Ai = fl jDl 6= Di , dil 2 C (0)g

Focusing on propensity score estimators, we can re-write this as

Ai = fl jDl 6= Di , p(xl ) 2 C (p(xi ))g

where C (p(xi )) represents a neighborhood around p(xi )

DL Millimet (SMU) ECO 7377 Fall 2011 90 / 407

Page 91: Microeconometrics Lecture Notes

Single nearest neighbor matching

SetsC (p(xi )) = min

ljdil j

)ωil =

1 if l 2 Ai0 otherwise

Intuition: l has the closest propensity score to i , but with di¤erenttreatment assignment

DL Millimet (SMU) ECO 7377 Fall 2011 91 / 407

Page 92: Microeconometrics Lecture Notes

k-nearest neighbor matching

SetsC (p(xi )) = k-min

ljdil j

)ωil =

1/k if l 2 Ai0 otherwise

Intuition: compute the average of the k closest obs to i in terms ofpropensity score, but with di¤erent treatment assignment than i

DL Millimet (SMU) ECO 7377 Fall 2011 92 / 407

Page 93: Microeconometrics Lecture Notes

Caliper or radius matching (Cochran & Rubin 1973)

SetsC (p(xi )) = fp(xl ) j jdil j < εg

for a specied value of ε)

ωil =

1/ki if l 2 Ai0 otherwise

Intuition: compute the average over all ki obs that di¤er from i interms of propensity score by less than ε, but with di¤erent treatmentassignment than i

DL Millimet (SMU) ECO 7377 Fall 2011 93 / 407

Page 94: Microeconometrics Lecture Notes

Kernel matching (Smith & Todd 2005)

Sets

C (p(xi )) =p(xl ) p(xi )aN

6 ε

)

ωil =

8>><>>:Gp(xl )p(xi )

aN

l 02fDl 0=0gGp(xl 0 )p(xi )

aN

if l 2 Ai0 otherwise

where G () is the kernel function and aN is the bandwidthIntuition: compute a weighted average over all ki obs that receivepositive weight given the choice of G () and aN , but with di¤erenttreatment assignment than i

I G () must integrate to one, aN ! 0 as N ! ∞, and aNN ! ∞I Ex: quartic kernel (ε = 1)

G (s) = 15

16 (1 s2)2 if js j 6 10 otherwise

DL Millimet (SMU) ECO 7377 Fall 2011 94 / 407

Page 95: Microeconometrics Lecture Notes

Local linear matching (Smith & Todd 2005)

Sets

C (p(xi )) =p(xl ) p(xi )aN

6 ε

)

ωil =

8>>>>><>>>>>:Gil ∑

l 02fDl 0=0gGil 0 (pl 0pi )2[Gil (plpi )]

24 ∑l 02fDl 0=0g

Gil 0 (pl 0pi )

35∑

l2fDl=0gGil ∑

l 02fDl 0=0gGil (pl 0pi )2

24 ∑l 02fDl 0=0g

Gil (pl 0pi )

352 if l 2 Ai

0 otherwise

where Gil = GplpiaN

Intuition: similar to kernel matching, but di¤ers in handling of weightsassigned to obs when obs are distributed asymmetrically around i orwhen there are gaps in the distribution of the propensity score

DL Millimet (SMU) ECO 7377 Fall 2011 95 / 407

Page 96: Microeconometrics Lecture Notes

Stratication or interval matching

Di¤ers from above schemes (although it can be written as a matchingestimator)

Unit interval is divided into k intervals, the average outcome oftreated and untreated is computed within each interval, and b∆ATE (k)is computed within each interval

Finally

b∆ATT = ∑k

N1kN1b∆ATE (k)

b∆ATU = ∑k

N0kN0b∆ATE (k)

b∆ATE =∑i I[Di = 1]

Nb∆ATT + ∑i I[Di = 0]

Nb∆ATU

Stata: -psmatch2 - or -nnmatch-

DL Millimet (SMU) ECO 7377 Fall 2011 96 / 407

Page 97: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Comparison of Matching Methods)

Asymptotically, all methods are consistent if assumptions hold andbandwidth satsies the requisite criteria

In nite samples, choice may matter

Single nearest neighbor matching minimizes bias since it only uses theclosest match; however, Frölichs (2004) MC analysis shows it fairspoorly in practice

If sample size is large and the propensity score is evenly dispersedacross the unit interval, kneighbor matching may be idealIf sample size is large and the propensity score is asymmetricallydistributed, kernel matching may be ideal (weights obs according tocloseness)

If many obs have a propensity score close to the boundary (zero orone), LL matching may be ideal

Stratication methods face problem of arbitrarily choosing K

DL Millimet (SMU) ECO 7377 Fall 2011 97 / 407

Page 98: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Regression Adjustment)

Various methods combine matching estimators with regressionmethods

Regression then matching (Smith & Todd 2005)I Regress yi on (some) xi for treated and untreated samples, obtainresiduals, and use residuals to compute matching estimators

Matching then regression (Ho et al. 2007)I Match to obtain missing counterfactual for each obs, then regress yi onDi and (some) xi using matched sample

I Standard errors are an issue here, as the usual OLS SEs are incorrect(more below)

DL Millimet (SMU) ECO 7377 Fall 2011 98 / 407

Page 99: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching in Practice

Several practical issues are confronted when implementing matchingestimators

1 Restriction to the common support2 Does inexact matching balance the covariates, x?3 Which variables belong in x?4 Inference5 Failure of CIA

DL Millimet (SMU) ECO 7377 Fall 2011 99 / 407

Page 100: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Common Support)

Dened as

Sp = fp(x) : f (pjD = 1) > 0 and f (pjD = 0) > 0g

Matching estimates are only dened at values of p(x) 2 SpIn practice, may want to exclude obs outside SpTo do so requires an estimate

bSp = fp(x) : bf (pjD = 1) > 0 and bf (pjD = 0) > 0gSmith & Todd (2005) recommend using NP density estimators toestimate f ())

bf (pjD = j) = ∑i2fDi=jg Gp(xi ) paN

, j = 0, 1

I See -kdensity- in Stata

DL Millimet (SMU) ECO 7377 Fall 2011 100 / 407

Page 101: Microeconometrics Lecture Notes

Imprecise alternative

bSp = fp(x) : p 2

2664 max

mini2fDi=0g

fp(xi )g, mini2fDi=1g

fp(xi )g,

min

maxi2fDi=0g

fp(xi )g, maxi2fDi=1g

fp(xi )g3775

I Simpler alternativeI Excludes obs just outside the CS for whom close matches existI Does not address holesin the interior of the dbn

Note: imposing the CS changes interpretation of the parametersbeing estimated (e.g., b∆ATE becomes the ATE for treated individualswith a propensity score in a particular region)

Trimming: Smith & Todd (2005) recommend reducing the CS to

bSp = fp(x) : bf (pjD = 1) > q and bf (pjD = 0) > qg, q 2 (0, 1)

Dealing with limited overlap; see Crump et al. 2009

DL Millimet (SMU) ECO 7377 Fall 2011 101 / 407

Page 102: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Balancing)

Matching mimics a randomized experiment in that conditioning onp(x) should balance x across the treated and untreated groupsEquivalently, the problem is reduced to a series of quasirandomexperiments at each value of p(x)... hence, an IV exists whichexogenously determines treatment assignment conditional on p(x)Rosenbaum & Rubin (1983) prove that

x ? D jp(x)

which implies

E[x jp(x),D = 0] = E[x jp(x),D = 1]

This holds regardless of whether CIA holdsBalacing tests seek to gauge thisNote: this highlights that p(x) is simply a means to balance the xs;the goal of p(x) is not to modeltreatment choice (more below)

DL Millimet (SMU) ECO 7377 Fall 2011 102 / 407

Page 103: Microeconometrics Lecture Notes

Stratication tests (e.g., Deheija & Wahba 1999, 2002)I Estimate the propensity scoreI Divide the data into K intervals based on dp(x)I Test for equal means (or other moments) of each x across the treatedand control group within each strata

F See -ttest- in Stata

I Test xs individually or jointly using Hoteling T 2 test

F See -hotel- in Stata

I Add higher order or interaction terms of xs failing the test, and repeatI Problem: how to choose K?

F Too small ! typically always reject equalityF Too large ! rarely reject equality

DL Millimet (SMU) ECO 7377 Fall 2011 103 / 407

Page 104: Microeconometrics Lecture Notes

Standardized di¤erencesI Average di¤erence in each x , where weights from matching are used,normalized by the pooled SD of x in the full sample

I Example: ∆ATT

SDIFF (xm) = 100

1N1 ∑

i2fDi=1g

xmi ∑

l2fDl=0gωilxml

!q

Vari2fDi=1g(xmi )+Varl2fDl=0g(xml )2

I Problem: how large is too large? Rosenbaum & Rubin (1985) suggest20 is large

I Perhaps criteria should be more strict for variables thought to be moreimportant in particular application

DL Millimet (SMU) ECO 7377 Fall 2011 104 / 407

Page 105: Microeconometrics Lecture Notes

Hoteling T 2 testI Test joint null of equal (weighted) means across treatment and controlgroup

I Example: ∆ATT

T 2 = (x1 x0)0 ∑1(x1 x0)

where x1 = vector of (unweighted) means from treatment group andx0 = vector of weighted means from untreated group, weighted by ωil

I Test may be conservative since estimation of weights is not accountedfor

Regression-based testI Estimate propensity scoreI Regress each x on a polynomial of p(x), D, and D interacted with thesame polynomial of p(x)...

xi = φ0 +∑Ss=1 φsp(xi )

s + π0Di +∑Ss=1 πsDip(xi )

s + ηi

and test Ho : π0 = π1 = = πS = 0I Regression may be unweighted or weighted, assigning weight

ωl = ∑i2fDi=1g ωil to each untreated obs (when focus is on ∆ATT )

DL Millimet (SMU) ECO 7377 Fall 2011 105 / 407

Page 106: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Variable Selection)

CIA is a strong assumption that places great demands on the data

Two issuesI What variables to include in x?I What functional form to use; should x include higher order, interactionterms of the variables?

CIA will certainly hold if x includes all variables that determine bothoutcomes and participation, but is this required?

Rubin and Thomas (1996) favor including variables in the propensityscore model unless there is consensus that they do not belong

HIT (1997), HIST (1998), Heckman and Smith (1999), Lechner(2002), Smith & Todd (2005)

I Estimators are sensitive to variables included in xI Bias likely to result if x is too crude

DL Millimet (SMU) ECO 7377 Fall 2011 106 / 407

Page 107: Microeconometrics Lecture Notes

Brookhart et al. (2006)I Variables related to outcomes should always be includedI Variables weakly related to the outcome even if strongly related totreatment assignment should be excluded as their inclusion results inhigher mean squared error of the treatment e¤ect estimate

Zhao (2007)I Including irrelevant variables ; biased estimatesI Over-tting the propensity score model may be counterproductive

Wooldridge (2009), Pearl (2009)I Consider classes of variables whose inclusion leads to biasI Primary example is of instrumental variables

Hirano et al. (2003)I Using the true propensity score is ine¢ cient even when it is knownI May imply that over-tting the propensity score model may have littlenegative consequence in practice

Note: goal of the PS model is not to nd the best predictor of DI Generally, variables that impact participation and not outcomes shouldbe excluded; inclusion will exacerbate the CS problem

I Psuedo-R2 criteria should not be used to judge the PS modelDL Millimet (SMU) ECO 7377 Fall 2011 107 / 407

Page 108: Microeconometrics Lecture Notes

Millimet & Tchernis (2009)I MC analysis of matching and weighting estimators (discussed later)I Estimate propensity score using a series logit estimator

Pr(D = 1) =exp

θ0 +∑Ss=1 θsxs

1+ exp

θ0 +∑Ss=1 θsxs

where for su¢ ciently large S and appropriate coe¢ cients, θ, anyparticpation function may be approximated

I SLE ) bθ estimated via MLI Assess impact of

F Including irrelevant and excluding relevant higher order terms of variables that impact outcomes and

participation

F Including irrelevant and excluding relevant higher order terms of variables that impact outcomes only

F Including irrelevant and excluding relevant higher order terms of variables that impact participation only

I Little impact to over-ttingF Asymptotic variance of nonparametric estimators is dominated by bias terms (Ichimura & Linton 2005)

F Over-tting minimizes the bias

F Also, normalized weighting estimator is preferable (discussed later)

DL Millimet (SMU) ECO 7377 Fall 2011 108 / 407

Page 109: Microeconometrics Lecture Notes

DiNardo & Lee (2011) criticize us and show instances where adding xmay exacerbate bias

I Their examples are instances where the CIA does not hold, but oneapplies an estimator that requires the CIA (such as matching)

I Thus, the matching estimator is already biasedI In this case, adding an additional covariate may increase or decreasethe bias even if x belongs in the model

I That said, this is not the case examined in our work; we assume CIAholds

Shaikh et al. (2009) propose a specication test of the propensityscore model

I Informal test based on an eyeball comparison of the dbn of p(x) in thetreatment and control groups

I Formal test procedure also provided

DL Millimet (SMU) ECO 7377 Fall 2011 109 / 407

Page 110: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Standard Errors)

Non-smooth matching estimatorsI Correct standard errors are not feasible in this caseI Usual ttest for di¤ in mean outcomes across matched treated anduntreated group ignores estimation of propensity score and nature ofmatching

I Problem due to estimation of the propensity score disappearsasymptotically

I Eichler & Lechner (2001) suggest that N must be in the 1000s beforethis bias disappears

Bootstrap methods are feasible for smooth matching estimators (e.g.,kernel matching), but there is no formal evidence

Abadie & Imbens (2006) provide asymptotic standard errors fornon-propensity score matching estimators; work in progress focuses onpropensity score matching estimators

Must be careful when bootstrapping data with choice-based sampling

DL Millimet (SMU) ECO 7377 Fall 2011 110 / 407

Page 111: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Misc. Implementation Issues)

Replacement?I Single, k-nearest neighbor matching may be done with or withoutreplacement

I Without replacement implies results are sensitive to the sort order ofthe data

I With replacement reduces bias (by improving match quality), but isless e¢ cient (by using less of the data)

Estimation of propensity scoreI Typically probit or logit is used ) semiparametric estimatorI NP methods are available as well

DL Millimet (SMU) ECO 7377 Fall 2011 111 / 407

Page 112: Microeconometrics Lecture Notes

Bandwidth SelectionI In NP work, bandwidth choice is typically much important than choiceof kernel function

I Methods generally fall into three categories1 ad hoc combined with sensitivity analysis2 Rule-of-thumb approaches (Silverman 1986)

aN 1.06σN1/5

3 Data driven methods (e.g., cross-validation)

I Leave-one-out cross-validation (e.g., ∆ATT )F Perform a NP regression of y on p(x) using all untreated obs except land a candidate bandwidth, ab

F Predict bylF Repeat for all l , l = 1, ...,N0F Calculate MSE

MSE (ab) =1N0

∑l2fDl=0g

(yl byl )2F Repeat for all candidate bandwidths ab , b = 1, ...,BF Choose ab to minimize MSE (ab)

DL Millimet (SMU) ECO 7377 Fall 2011 112 / 407

Page 113: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Matching (Sensitivity to Unobservables)

CIA is not testable

Applied literature does/should assess the robustness of matchingestimators

Several currently available techniquesI Rosenbaum boundsI Simulation methods (Ichino et al. 2008)I Minimum bias approach (Millimet & Tchernis 2011)I Di¤erence-in-di¤erences matchingI Assuming SOO = SOU (Altonji et al. 2005; discussed later)I Bayesian sensitivity analysis (de Luna & Lundin 2009)

DL Millimet (SMU) ECO 7377 Fall 2011 113 / 407

Page 114: Microeconometrics Lecture Notes

Rosenbaum Bounds

Method of assessing sensitivity of matching estimator to anunobserved confounder (Rosenbaum 2002)Assume

p(xi ) = F (xi β+ ui ) =exp(xi β+ ui )

1+ exp(xi β+ ui )

where u is an unobserved binary variable and F is the logistic CDFImplications

I Odds ratio for obs i is

p(xi )1 p(xi )

= exp(xi β+ ui )

I Odds ratio for obs i relative to obs i 0

p(xi )1p(xi )p(xi 0 )1p(xi 0 )

=exp(xi β+ ui )exp(xi 0β+ ui 0)

= expfγ(ui ui 0)g if xi = xi 0

I Thus, two observationally identical obs have di¤erent probabilities ofbeing treated if γ 6= 0 and ui 6= ui 0

DL Millimet (SMU) ECO 7377 Fall 2011 114 / 407

Page 115: Microeconometrics Lecture Notes

How does inference regarding the treatment e¤ect parameters changeas γ and ui ui 0 change?

I Since u is binary, ui ui 0 2 f1, 0, 1gI Implies

1expfγg 6

p(xi )1p(xi )p(xi 0 )1p(xi 0 )

6 expfγg

where

F expfγg = 1) no selection biasF expfγg ! ∞ ) greater selection bias

I Rosenbaum bounds compute bounds on the signicance level of thematching estimate as expfγg changes values

F If matching estimate is statistically insignicant even whenexpfγg 1, then treatment e¤ect is not robust

F If matching estimate is statistically signicant even when expfγg islarge, then treatment e¤ect is not sensitive to hidden bias

Stata: -rbounds-, -mhbounds-

DL Millimet (SMU) ECO 7377 Fall 2011 115 / 407

Page 116: Microeconometrics Lecture Notes

Ichino et al. (2008) Approach

Nannicini (2007) and Ichino et al. (2008) propose an alternativemethod of assessing the robustness of ATT estimates obtained underCIA

The sensitivity analysis is performed by comparing the baselinematching estimate to estimates obtained after additionallyconditioning upon a simulated confounder

The distribution of the simulated variable can be constructed tocapture di¤erent hypotheses regarding the nature of potentialconfounders

DL Millimet (SMU) ECO 7377 Fall 2011 116 / 407

Page 117: Microeconometrics Lecture Notes

SetupI The parameter of interest is the ∆ATT E[y1 y0 jD = 1]I Accordingly, y0 ? D jx denotes the required CIAI Suppose that this condition is not met, but if an unobservable, U, isadded then a stronger CIA holds

y0 ? D jx ,U

I Implies

E[y0 jD = 1, x ] 6= E[y0 jD = 0, x ]E[y0 jD = 1, x ,U ] = E[y0 jD = 0, x ,U ]

DL Millimet (SMU) ECO 7377 Fall 2011 117 / 407

Page 118: Microeconometrics Lecture Notes

SolutionI Simulate the potential confounder and use it as a matching covariate

F For simplicity, the potential outcomes and the confounding variable areassumed to be binary

F Conditional independence of U and x is also assumedF Hence, the distribution of U is fully characterized by the choice of thefollowing four parameters

pij Pr(U = 1jD = i , y = j) = Pr(U = 1jD = i , y = j , x)

with i , j 2 f0, 1gF Given the parameters pij , a value of U is simulated for each observationdepending on D , y

I ∆ATT is then estimated with U as an additional matching covariate

For a given set of the parameters pij , many simulations are performed,∆ATT computed for each simulation, and the mean/sd of theestimates reported

DL Millimet (SMU) ECO 7377 Fall 2011 118 / 407

Page 119: Microeconometrics Lecture Notes

Choosing pij ...I It is essential to consider useful potential confoundersI Calibrated confounders: choose pij to make the distribution of Usimilar to the empirical distribution of observable binary covariates

I Killer confounders: search over di¤erent pij for the existence of a Uwhich makes ∆ATT = 0

I One can also simulate other meaningful confounders by setting theparameters pij and pi , where pi can be computed as

pi Pr(U = 1jD = i) =1∑j=0

pij Pr(y = j jD = i)

with i 2 f0, 1g

DL Millimet (SMU) ECO 7377 Fall 2011 119 / 407

Page 120: Microeconometrics Lecture Notes

Common caseI Typical scenario in applied work has b∆ATT > 0 in baseline modelI Thus, concern centers on potential confounder that has both a positivee¤ect on the untreated outcome and on the selection into treatment

I Ichino et al. prove that

1 p01 > p00 )

Pr(y0 = 1jD = 0,U = 1, x) > Pr(y0 = 1jD = 0,U = 0, x)

where p01 Pr(U = 1jD = 0, y = 1) andp00 Pr(U = 1jD = 0, y = 0)

2 p1 > p0 )

Pr(D = 1jU = 1, x) > Pr(D = 1jU = 0, x)

where p1 Pr (U = 1jD = 1) and p0 Pr (U = 1jD = 0)I Accordingly, by choosing p01 > p00 and setting p1 > p0, aconfounder is simulated such that it has a positive e¤ect on both y0and D even after conditioning on x

DL Millimet (SMU) ECO 7377 Fall 2011 120 / 407

Page 121: Microeconometrics Lecture Notes

What do these ps represent?I The di¤erences

d = p01 p00s = p1 p0

only depict the sign of Us outcome and selection e¤ectsI The size of these e¤ects must be evaluated after conditioning on x toaccount for the association between U and x that shows up in the data

I Thus, at every iteration, logit models for Pr(y = 1jD = 0,U, x) andPr(D = 1jU, x) are estimated

F The average odds ratio of U is reported as the outcome and selectione¤ects of the simulated confounder

Γ Pr(y=1jD=0,U=1,x )Pr(y=0jD=0,U=1,x )Pr(y=1jD=0,U=0,x )Pr(y=0jD=0,U=0,x )

Λ Pr(D=1jU=1,x )Pr(D=0jU=1,x )Pr(D=1jU=0,x )Pr(D=0jU=0,x )

F Γ and Λ reect the strength of U

Stata: -sensatt-

DL Millimet (SMU) ECO 7377 Fall 2011 121 / 407

Page 122: Microeconometrics Lecture Notes

Minimum Bias Approach

Intuition: Trim the sample on the basis of p(x) to minimize the biasfrom a failure of CIA

Assume (A.iv) plus unobservables are trivariate normal:υ0, υ1, u N3(0,Σ), where

Σ =

24 σ20 ρ01σ0σ1 ρ0uσ0σ21 ρ1uσ1

1

35and u is the error from the treatment assignment equation

Di = h(xi ) ui

where D is latent treatment assignment

DL Millimet (SMU) ECO 7377 Fall 2011 122 / 407

Page 123: Microeconometrics Lecture Notes

The bias of the ATT at some value of the propensity score, p(x), isgiven by

BATT [p(x)] = bτATT [p(x)] τATT [p(x)]

= ρ0uσ0φ(Φ1(p(x)))p(x)[1 p(x)]

whereI ρ0u = selection on unobservables a¤ecting outcome in untreated stateI φ and Φ are standard normal PDF and CDFI bτATT is some propensity score based estimator

BATT [p(x)] is minimized at p(x) = 0.5

DL Millimet (SMU) ECO 7377 Fall 2011 123 / 407

Page 124: Microeconometrics Lecture Notes

For the ATE,

BATE [p(x)] = fρ0uσ0 + [1 p(x)]ρδuσδg

φ(Φ1(p(x)))p(x)[1 p(x)]

where

I δ = υ1 υ0 = unobserved, individual-specic gain from treatmentI ρδu = selection on unobserved, individual-specic gains

) The bias-minimizing propensity score, p(x), depends on the errorcorrelation structure

Similar results in Black & Smith (2004), Heckman andNavarro-Lozano (2004)

DL Millimet (SMU) ECO 7377 Fall 2011 124 / 407

Page 125: Microeconometrics Lecture Notes

Minimum-biased (MB) estimation techniqueI Stage 1: Estimate the propensity score (e.g., probit model)I Stage 2: Retain only those observations with a propensity score,[p(xi ), within a xed neighborhood around p(x), the bias-minimizingpropensity score

I Stage 3: Estimate the ATE or ATT using any propensity-score basedestimator that relies on CI using this sub-sample

Notes:I Estimator is biased, but it minimizes the biasI For ATT... this is straightforward as we know that p(x) = 0.5I For ATE... p(x) is unknown, depends on error correlationsI If treatment e¤ect is heterogeneous, then interpretation changes; maynot be economically interesting

DL Millimet (SMU) ECO 7377 Fall 2011 125 / 407

Page 126: Microeconometrics Lecture Notes

For ATE, add Stage 1.5: Estimate the error correlationsI Feasible if one also imposes (A.va) or (A.vb)I Estimate via OLS (discussed in more detail later)

yi = α0 + (α1 α0)Di + xi β0 + xiDi (β1 β0)

+ βλ0(1Di )

φ(xiγ)1Φ(xiγ)

+ βλ1Di

φ(xiγ)Φ(xiγ)

+ ηi

where φ()/Φ() is the inverse Millsratio and

βλ0 = ρ0uσ0

βλ1 = ρ0uσ0 + ρδuσδ.

I Replacing γ with bγ from the rst-stage probit yields consistentestimates of ρ0uσ0 and ρδuσδ

Millimet & Tchernis (2009) nd that trimming is ine¢ cient when CIAholds, but is more robust to (some) mis-specications

DL Millimet (SMU) ECO 7377 Fall 2011 126 / 407

Page 127: Microeconometrics Lecture Notes

Di¤erence-in-Di¤erences Matching

All matching estimators are biased if unobservables invalidate the CIA

Formally (e.g., ∆ATT )

∆ATT (p(x)) =

E[y1jp(x),D = 1] E[y0jp(x),D = 0]+ E[y0jp(x),D = 0] E[y0jp(x),D = 1]

where matching estimators are based on

e∆ATT (p(x)) = E[y1jp(x),D = 1] E[y0jp(x),D = 0]

which implies

bias = e∆ATT (p(x)) ∆ATT (p(x))= E[y0jp(x),D = 1]| z

Counterfactual

E[y0jp(x),D = 0]| z Observed

which is zero under CIA

DL Millimet (SMU) ECO 7377 Fall 2011 127 / 407

Page 128: Microeconometrics Lecture Notes

Rearranging terms yields

∆ATT (p(x)) = e∆ATT (p(x)) biasThis suggests a bias-corrected estimator is feasible if the bias can beconsistently estimated

Might assume the bias equals the di¤erence in mean outcomes priorto treatment

bias = E[y0t jp(x),D = 1] E[y0t jp(x),D = 0]?= E[y0t 0 jp(x),D = 1] E[y0t 0 jp(x),D = 0]

where t 0 < t, t 0 precedes the treatment, t is post-treatment

DL Millimet (SMU) ECO 7377 Fall 2011 128 / 407

Page 129: Microeconometrics Lecture Notes

Implies

ee∆ATT (p(x)) = e∆ATT (p(x)) bias= E[y1t jp(x),D = 1] E[y0t jp(x),D = 0]

fE[y0t 0 jp(x),D = 1] E[y0t 0 jp(x),D = 0]g

=

E[y1t y0t 0 jp(x),D = 1] E[y0t y0t 0 jp(x),D = 0]

and ee∆ATT (p(x)) = ∆ATT (p(x)) requires

E[y0t y0t 0 jp(x),D = 1] = E[y0t y0t 0 jp(x),D = 0]

which is di¤erent than the original CIA

DL Millimet (SMU) ECO 7377 Fall 2011 129 / 407

Page 130: Microeconometrics Lecture Notes

Implementation: di¤erence the data 8i , then matchDID matching requires the original CIA be replaced with

∆y0,∆y1 ? D jp(x)

Intuition:I DID matching requires the change in potential outcomes to beindependent of treatment assignment given the PS

I Equivalently, there are no time varying unobservables correlated withboth outcomes and treatment assignment given x

Smith & Todd (2005) nd DID matching to be more robust, butconclusions are application-specic

DL Millimet (SMU) ECO 7377 Fall 2011 130 / 407

Page 131: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Inverse Propensity Score Weighting (IPW) Estimators

Alternative to matching estimators, but still rely onknowing/estimating the propensity score

Identities

EDyp(x)

= E

Dy1p(x)

= E

EDy1p(x)

j x

= E1p(x)

E [Dy1] j xCIA= E

1p(x)

E[D j x ]E[y1 j x ]

= Ep(x)p(x)

E[y1 j x ]= E [E[y1 j x ]] = E[y1]

and, similarly,

E(1D)y1 p(x)

= E[y0]

DL Millimet (SMU) ECO 7377 Fall 2011 131 / 407

Page 132: Microeconometrics Lecture Notes

Parameters of interest (Horvitz & Thompson 1952)

∆ATE = EDyp(x)

(1D)y1 p(x)

= E

D p(x)

p(x)[1 p(x)]y

∆ATT =1

E[p(x)]Ep(x)

Dyp(x)

(1D)y1 p(x)

=

1

E[p(x)]ED p(x)1 p(x) y

∆ATU =

1

E[1 p(x)] E[1 p(x)]

Dyp(x)

(1D)y1 p(x)

=

1

E[1 p(x)] ED p(x)p(x)

y

Proof: Wooldridge (2002, p. 613)

DL Millimet (SMU) ECO 7377 Fall 2011 132 / 407

Page 133: Microeconometrics Lecture Notes

Estimation

Unnormalized estimators

b∆ATE =1N ∑i

"Diyi[p(xi )

(1Di )yi1[p(xi )

#=1N ∑i

([Di [p(xi )]yi[p(xi )[1[p(xi )]

)

b∆ATT =1

1N ∑i

[p(xi )1N ∑i

[p(xi )"Diyi[p(xi )

(1Di )yi1[p(xi )

#

=1

1N ∑i

[p(xi )1N ∑i

([D [p(xi )]yi1[p(xi )

)

b∆ATU =1

1N ∑i

1[p(xi )

∑i

h1[p(xi )

i " Diyi[p(xi )

(1Di )yi1[p(xi )

#

=1

1N ∑i

1[p(xi )

∑i

([D [p(xi )]yi

[p(xi )

)

DL Millimet (SMU) ECO 7377 Fall 2011 133 / 407

Page 134: Microeconometrics Lecture Notes

Normalized estimators (Hirano and Imbens 2001)

I b∆ATE is the di¤erence in two weighted averages, where weights areDi

N[p(xi )and

1DiNh1[p(xi )

iI Problem: weights may not sum to unityI HI assign weights normalized by the sum of propensity scores fortreated and untreated groups

I Unnormalized estimator assigns equal weights of 1/N to eachobservation

I Normalized estimator (e.g., b∆ATE )b∆ATE = "∑i

Di yi[p(xi )

,∑i

Di[p(xi )

#"∑i

(1Di )yi1[p(xi )

,∑i

(1Di )1[p(xi )

#

I Tends to be more stable in practice as it restricts weights to 1;Millimet & Tchernis (2009), Busso et al. (2011) nd it performs better

Standard errors obtained via bootstrap

DL Millimet (SMU) ECO 7377 Fall 2011 134 / 407

Page 135: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Regression (Again)

Use propensity score as control variable in regression

Assumptions

(A.vi) E[y1 y0 jx ] is uncorrelated with Var(D jx) = p(x)[1 p(x)](A.vii) E[y1 jp(x)], E[y0 jp(x)] are linear in p(x)

(A.vi) has no good interpretation

(A.vii) replaces the functional form assumptions discussed in theprevious regression approach

DL Millimet (SMU) ECO 7377 Fall 2011 135 / 407

Page 136: Microeconometrics Lecture Notes

Estimation

Given (A.ii) and (A.vi)...I Estimate via OLS

yi = α0 + eα1Di + γ[p(xi ) + εi

I Estimates given by

b∆ATE = b∆ATT = b∆ATU = beα1which is consistent and asymptotically normal if [p(xi ) is consistent andasymptotically normal

I Proof: See Wooldridge (2002)

DL Millimet (SMU) ECO 7377 Fall 2011 136 / 407

Page 137: Microeconometrics Lecture Notes

Given (A.ii) and (A.vii)...I Estimate via OLS

yi = α0 + eα1Di + γ0[p(xi ) + γ1

h[p(xi ) bµpiDi +eεi

where bµp = 1N ∑i

[p(xi )

I Estimates given by

b∆ATE (x) = beα1 + bγ1 hdp(x) bµpib∆ATE = beα1b∆ATT = beα1 + bγ1x1b∆ATU = beα1 + bγ1x0where x j = ∑i

h[p(xi ) bµpi I[Di = j ]/ ∑i I[Di = j ], j = 0, 1

DL Millimet (SMU) ECO 7377 Fall 2011 137 / 407

Page 138: Microeconometrics Lecture Notes

Given (A.ii) and a weaker version of (A.vii)...I Estimate via OLS

yi = α0 + eα1Di +∑Kk=1 γ0k

[p(xi )k+∑K

k=1 γ1k

[p(xi )

k bµkpDi +eεi

where bµkp = 1N ∑i

[p(xi )k, k = 1, ...,K

and K is a low order numberI Estimates given by

b∆ATE (x) = beα1 +∑Kk=1 bγ1k dp(x)k bµkpb∆ATE = beα1b∆ATT = beα1 +∑Kk=1 bγ1k xk1b∆ATU = beα1 +∑Kk=1 bγ1k xk0

where xkj = ∑i

[p(xi )

k bµkp I[Di = j ]/ ∑i I[Di = j ], j = 0, 1;

k = 1, ...,K

DL Millimet (SMU) ECO 7377 Fall 2011 138 / 407

Page 139: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Double-Robust Estimators

Robins and Rotnizky (1995), Lunceford and Davidian (2004), andothers discuss DR estimators

DR estimators combine regression and weighting estimators and aredouble robust because they are consistent as long as either theregression specication for the outcome or the propensity scorespecication is correctly specied

DL Millimet (SMU) ECO 7377 Fall 2011 139 / 407

Page 140: Microeconometrics Lecture Notes

Estimation

OLS estimation

yi = α0 + xi β+ eα1Di + θ0Di[p(xi )

+ θ11Di1[p(xi )

+eεib∆ATE = beα1 + 1

N ∑i

"bθ0 Di[p(xi )

bθ1 1Di1[p(xi )

#

b∆ATT = beα1 + 1N1

∑i :Di=1

"bθ0 Di[p(xi )

bθ1 1Di1[p(xi )

#

b∆ATU = beα1 + 1N0

∑i :Di=0

"bθ0 Di[p(xi )

bθ1 1Di1[p(xi )

#

DL Millimet (SMU) ECO 7377 Fall 2011 140 / 407

Page 141: Microeconometrics Lecture Notes

WLS estimation: ATE

yi = α0 + xi β+ eα1Di + eυiwhere weights are

λi =

sDi[p(xi )

+1Di1[p(xi )

and di¤erent weights are used for ATT, ATU (given above)

Augmented IPW: ATE (Lunceford and Davidian 2004; Glynn andQuinn 2010)

b∆ATE= 1N ∑i

"Diyi (Di [p(xi ))g1(xi )

[p(xi ) (1Di )yi + (Di

[p(xi ))g0(xi )1[p(xi )

#

where g0(xi ) and g1(xi ) are estimated via separate OLS regressions ofy on x

I See -dr- in Stata

DL Millimet (SMU) ECO 7377 Fall 2011 141 / 407

Page 142: Microeconometrics Lecture Notes

Selection on ObservablesStrong Ignorability: Decomposition of Treatment E¤ects

Flores & Flores-Lagunes (2009) provide a framework to decompose∆k into a direct e¤ect of D and an indirect e¤ect that operatesthrough some causal mechanism, S

SetupI S 2 f0, 1g is a post-treatment, mechanism variableI S0,S1 are potential values of S associated with D = 1 and D = 0I S = DS1 + (1D)S0 is the realized value of S

Example: D = 1 if student i attends a private HS, 0 otherwise; S = 1if student i obtains a college degree, 0 otherwise; y = earnings as anadult

DL Millimet (SMU) ECO 7377 Fall 2011 142 / 407

Page 143: Microeconometrics Lecture Notes

Composite potential outcomes for y are dened as y(D,SD 0),D,D 0 2 f0, 1g

I y(1,S1) = potential outcome associated with D = 1 and S1, therealized value of the mechanism variable, S , when D = 1

I y(0,S0) = potential outcome associated with D = 0 and S0, therealized value of the mechanism variable, S , when D = 0

I y(1,S0) = potential outcome associated with D = 1 and S0, therealized value of the mechanism variable, S , when D = 0

DL Millimet (SMU) ECO 7377 Fall 2011 143 / 407

Page 144: Microeconometrics Lecture Notes

Decomposing ∆ATE

∆ATE = E[y(1,S1)] E[y(0,S0)]= fE[y(1,S1)] E[y(1,S0)]g| z

A

+ fE[y(1,S0)] E[y(0,S0)]g| z B

where A represents the indirect of D on y operating through S and Brepresents the direct e¤ect of D and y xing S at the non-treatmentvalueAuthors refer to

I A as the individual causal mechanism e¤ectI B as the net average treatment e¤ect

Note, B still reects two e¤ects of D on y1 E¤ects of D on y operating independently of S2 E¤ects on D on y operating through a change in the return to S (i.e.,even though the level of S is held xed, the e¤ect of S on y maychange due to D)

DL Millimet (SMU) ECO 7377 Fall 2011 144 / 407

Page 145: Microeconometrics Lecture Notes

Assumptions

(DTE.i) Independence of Treatment: y(1,S1), y(0,S0), y(1,S0),S0,S1 ? D(DTE.ii) Conditional Indepedence of Potential Mechanisms:

y(1,S1), y(0,S0), y(1,S0) ? fS0,S1gjx(DTE.iii) Constant Functional Form: If E[y(1,S1)jS1 = s1, x ] = f1(S1, x), then

E[y(1,S0)jS0 = s0, x ] = f1(S0, x)

(DTE.iii) implies that the functional form relating S and x to y whenD = 1 is the same regardless of whether S = S1 or S = S0Under (DTE.i) (DTE.iii), ∆ATE and B can be estimated, and thenA can be backed out

Extension to the case where (DTE.i) only holds conditional on x isalso presented

DL Millimet (SMU) ECO 7377 Fall 2011 145 / 407

Page 146: Microeconometrics Lecture Notes

Selection on ObservablesNon-Binary Treatments: Multi-Valued Treatments

Suppose the treatment can take on many discrete values

D 2 Ω = fd0, d1, d2, ..., dJg

) e.g., years of educationyj = potential outcome for treatment j = 0, 1, ..., JParameters of interest

∆ATEj ,j 0 = E [yj yj 0 ] , j , j 0 2 Ωe∆ATEj ,j 0 = Eyj yj 0 jD = j ,D = j 0

, j , j 0 2 Ω

∆ATTj ,j 0 = E [yj yj 0 jD = j ] , j , j 0 2 Ω

Dose-response function reects the unconditional expectation ofpotential outcomes at each dose

E [yj ] 8j 2 Ω

DL Millimet (SMU) ECO 7377 Fall 2011 146 / 407

Page 147: Microeconometrics Lecture Notes

Now, there are J missing counterfactualsI Dji = indicator if obs i receives treatment j

Dji =1 if Di = j0 otherwise

I yi = observed outcome for i

yi = ∑Jj=0 yjiDji

DL Millimet (SMU) ECO 7377 Fall 2011 147 / 407

Page 148: Microeconometrics Lecture Notes

Identication of the dose-response functionI Unconditional independence

yjj2Ω ? D

I Strong unconfoundedness (Rosenbaum & Rubin 1983)yjj2Ω ? D jx

) treatment assignment is conditionally independent of all potentialoutcomes

I Weak unconfoundedness (Imbens 2000)

yj ? Dj jx 8j 2 Ω

) assignment to any particular treatment is conditionally independentof that treatments potential outcome

DL Millimet (SMU) ECO 7377 Fall 2011 148 / 407

Page 149: Microeconometrics Lecture Notes

Implication of weak unconfoundedness

E [yj jx ] = E [y jDj = 1, x ]= E [y jD = j , x ]

)E [yj ] = E [E [y jD = j , x ]]

) one may estimate the conditional dose-response function byestimating the mean outcome given treatment assignment and x , andthen obtain the population dose-response function by averaging overthe distribution of x)

E [yj yj 0 ] = E [E [yj yj 0 jx ]]= E

E [y jD = j , x ] E

y jD = j 0, x

DL Millimet (SMU) ECO 7377 Fall 2011 149 / 407

Page 150: Microeconometrics Lecture Notes

ExampleI Let x = gender (M,F )I Ω = years of schooling (0, 1, ..., 21)I E

yjobtained by

F Computing average value of y for sub-sample with Dji = 1 and x = M) yMj

F Computing average value of y for sub-sample with Dji = 1 and x = F) yFj

F Obtaining portion of M and F in entire sample ) pM , pFF Compute pM yMj + pF yFj

I Obtain Eyj 0similarly

I Compute the di¤erenceI Other parameters can be estimated by using the proportions of M andF in various sub-samples (e.g., D = j , j 0 only)

DL Millimet (SMU) ECO 7377 Fall 2011 150 / 407

Page 151: Microeconometrics Lecture Notes

Generalized propensity scoreI Denition

r(j , x) = Pr(D = j jx) = E[Dj jx ]I r(j , x) may be estimated given data on D, x (MNL, MNP, orderedlogit/probit)

I Imbens (2000) shows that weak unconfoundedness )

yj ? Dj jr(j , x) 8j 2 Ω

and

Eyj jr(j , x)

= E

y jDj = 1, r(j , x)

= E [y jD = j , r(j , x)]

andEyj= E [E [y jD = j , r(j , x)]]

I The above result requires r(j , x) > 0 along the entire support of x

DL Millimet (SMU) ECO 7377 Fall 2011 151 / 407

Page 152: Microeconometrics Lecture Notes

EstimationI Given weak unconfoundedness and assuming r(j , x) > 0 for the entiresupport of x , then

EDjyr(j , x)

= E

yj

I Estimator

\Eyj=1N ∑i

"Dji yi\r(j , xi )

#which is analogous to the weighting estimator dened previously in thebinary treatment case

I Analogous normalized weighting estimator given by

\Eyj=

"∑i

Dji yi\r(j , xi )

# "∑i

Dji\r(j , xi )

#1

DL Millimet (SMU) ECO 7377 Fall 2011 152 / 407

Page 153: Microeconometrics Lecture Notes

Selection on ObservablesNon-Binary Treatments: Continuous Treatments

Suppose Ω is an interval [d , d ],and D has a continuous dbn on Ω) e.g., income

yj = potential outcome for treatment j 2 ΩDj is not useful since j takes on an innite number of values

Weak unconfoundedness can be re-stated as

yj ? D jx 8j 2 Ω

in contrast to strong unconfoundedness which requires fyjgj2Ω, thefull set of potential outcomes, to be conditionally independent

DL Millimet (SMU) ECO 7377 Fall 2011 153 / 407

Page 154: Microeconometrics Lecture Notes

Generalized propensity scoreI Now dened as the conditional density of D given x

r(j , x) = f (j jx)

I Implication (Hirano & Imbens 2004)

yj ? D jr(j , x) 8j 2 Ω

I Estimation based on

Eyj= E [E [y jD = j , r(j , x)]]

I Since D is continuous, estimation entails

F Estimation of r (j , x)F Estimate E [y jD = j , r (j , x)] by regessing y on D and \r (j , x)F Average \E [y jD = j , r (j , x)] over the dbn of x (at a xed value of j)

Weighting estimator version: see Robins (1998), Hernan et al. (2000)

See -doseresponse- in Stata

DL Millimet (SMU) ECO 7377 Fall 2011 154 / 407

Page 155: Microeconometrics Lecture Notes

Stratication estimator version (Imai & van Dyk 2004)

I Regress D on x via OLS ) θ = \E [D jx ] = xbβI Split sample in K strata of equal size based on θI Within each strata, model y as a function of D (and perhaps x tofurther control for di¤erences in x)

F y continuous: regress y on D and xF y binary: probit/logitF y ordered: oprobit/ologitF y count: poisson, NB

) b∆ATEk given by coe¢ cient on DI Obtain overall ∆ATE as

b∆ATE = ∑k

NkN

b∆ATEk

I Generalizable to multiple treatment case (e.g., two continuoustreatments: income, educ)

DL Millimet (SMU) ECO 7377 Fall 2011 155 / 407

Page 156: Microeconometrics Lecture Notes

Selection on ObservablesDynamic Matching

Pertains to situations where agents receive an initial treatment or not,and then have the option of receiving a second treatment if theyreceive the rst treatment

Many employment or job training programs, or treatments withinschools, operate in this manner

Need to carefully consider the parameter of interest in theseapplications, as well as CIA at di¤erent stages of the problem

See work by Lechner (2009, JBES), Lechner and Miquel (2010, EE ),Cooley et al. (2010), or Behrman et al. (2004, ReStat)

DL Millimet (SMU) ECO 7377 Fall 2011 156 / 407

Page 157: Microeconometrics Lecture Notes

Selection on ObservablesRegression Discontinuity

This estimator returns us to the class of binary treatments

First introduced in Thistlethwaite & Campbell (1960)

Two classes of models: sharp, fuzzy

Sharp RD is a selection on observables estimator, but is not based onstrong ignorability (in fact, it precludes it)

Fuzzy RD is a selection on unobservable estimators (discussed later inthe course)

Note: Recent work also on Regression Kinked Design (Card, Lee, &Pei 2009)

DL Millimet (SMU) ECO 7377 Fall 2011 157 / 407

Page 158: Microeconometrics Lecture Notes

RD setupI Agents self-select into treatment groupI Selection done at least in part on the basis of an observed continuousvariable, s

F s is referred to as the score, running variable, or forcing variable

I s may directly impact potential outcomes as wellI There exists a discrete jump in Pr(D = 1) at a known value, s

Thus, s and s are both known to the econometrician

DL Millimet (SMU) ECO 7377 Fall 2011 158 / 407

Page 159: Microeconometrics Lecture Notes

Sharp RD model

(SRD.i) Treatment assignment is a deterministic function of s (with a knownthreshhold, s)

Di = D(si ) =1 if si > s0 otherwise

(SRD.ii) Positive density at the threshold: fS (s) > 0(SRD.iii) Outcomes are continuous in s at least around s(SRD.iv) For each agent, the dbn of s is continuous at least around s

NotesI (SRD.ii) implies we see agents near sI (SRD.iii) precludes discontinuities in y at s due to other reasonsbesides changes in D

I (SRD.iv) implies that agents cannot perfectly manipulate s to ensures ? s

F This is crucial to give the setup the interpretation of a randomexperiment in the neighborhood of s

DL Millimet (SMU) ECO 7377 Fall 2011 159 / 407

Page 160: Microeconometrics Lecture Notes

Notes (cont.)I y0, y1 ? D js follows from (SRD.i)I All RD estimators require existence of following limits

D+ = lims#sPr(D = 1js)

D = lims"sPr(D = 1js)

and D+ 6= DF (SRD.i) implies D+ = 1 and D = 0

I Common support condition is necessarily violated since

Pr(D = 1) =1 if si > s0 otherwise

which implies that Pr(D = 1js) /2 (0, 1) 8s

DL Millimet (SMU) ECO 7377 Fall 2011 160 / 407

Page 161: Microeconometrics Lecture Notes

Parameter of interest

∆ATE (s) = E[y1 y0js ]= lim

s#sE[y js ] lim

s"sE[y js ]

DiNardo & Lee (2011) advocate a di¤erent intepretationI Argue that RD estimates a weighted average of ∆i where the weightsare proportional the probability that an agents si is the neighborhoodof s

DL Millimet (SMU) ECO 7377 Fall 2011 161 / 407

Page 162: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 162 / 407

Page 163: Microeconometrics Lecture Notes

EstimationUse only sub-sample with si 2 fs δ, s + δg for small δ

I Similar s ) similar observationsI Compute mean di¤erence in outcomes across treatment groupsb∆ATE (s) = \E[yi jsi 2 fs, s + δg,D = 1]

\E[yi jsi 2 fs δ, sg,D = 0]

=∑Ni=1 yi I[si 2 fs, s + δg,Di = 1]

∑Ni=1 I[si 2 fs, s + δg,Di = 1]

∑Ni=1 yi I[si 2 fs δ, sg,Di = 0]

∑Ni=1 I[si 2 fs δ, sg,Di = 0]

p!

E[yi jsi 2 fs, s + δg,D = 1] E[yi jsi 2 fs δ, sg,D = 0]

=

E[y1i jsi 2 fs, s + δg,D = 1] E[y0i jsi 2 fs δ, sg,D = 0]

= E[y1i jsi 2 fs, s + δg] E[y0i jsi 2 fs δ, sg]6= lim

s#sE[y js ] lim

s"sE[y js ] for xed δ > 0

DL Millimet (SMU) ECO 7377 Fall 2011 163 / 407

Page 164: Microeconometrics Lecture Notes

This is essentially a kernel estimator with a uniform kernel over theinterval fs, s + δg or fs δ, sg, which entails a non-negligible biasfor δ > 0

Example: If y is increasing in s, then

I \E[yi jsi 2 fs, s + δg,D = 1] will overestimate lims#s E[y js ]I \E[yi jsi 2 fs δ, sg,D = 0] will underestimate lims"s E[y js ]) b∆ATE (s) will be biased up

DL Millimet (SMU) ECO 7377 Fall 2011 164 / 407

Page 165: Microeconometrics Lecture Notes

Regression approachI Model

yi = ∆Di + εi

where D = treatment indicator, ∆ = parameter of interestI Model is not estimable via OLS since Cov(D, ε) 6= 0I However, E[εjD, s ] = E[εjs ]I Implies ∆ is estimable if the model is augmented with a su¢ cientlyexible function of s to proxy for E[εjs ]

yi = ∆Di + k(si ) + ηi

where Cov(D, η) = 0I What is k(s)?

F Linear: k(s) = s (Goldberger 1972; Cain 1975)F Quadratic: k(s) = θ1s + θ2s2 (Berk & Rauma 1983; van der Klaauw2000)

F Semiparametric: k(s) = ∑Mm=1 θmsm , with M choosen bycross-validation (Trochim 1984; van der Klaauw 2000)

DL Millimet (SMU) ECO 7377 Fall 2011 165 / 407

Page 166: Microeconometrics Lecture Notes

Example:

­10

12

3

0 .2 .4 .6 .8 1score

outcome fitted values (OLS, y on D)fitted values (OLS, y on s & D)

Note: S~U(0,1); D(s)=I(s>0.5); y=s+D+e; delta = 1

DL Millimet (SMU) ECO 7377 Fall 2011 166 / 407

Page 167: Microeconometrics Lecture Notes

NotesI Testing of some of the underlying assumptions is feasible

F Examine the density of s to look for evidence of discontinuity at s ,suggesting manipulation by agents (McCrary 2008)

F Look for existence of discontinuities in predetermined variables at s(similar to assessing balancing of predetermined variables in randomizedexperiments)

I If treatment e¤ect is heterogeneous, then RD estimates a uniqueparameter (discussed above) that may be uninteresting

F This is an example of a local average treatment e¤ect (LATE)F May be a policy relevant parameter if the question is the impact of amarginal change in an eligibilitycut-o¤, s

I Applications: nancial aid, GED, Clean Air Act attainment statusI See -rd- in Stata

DL Millimet (SMU) ECO 7377 Fall 2011 167 / 407

Page 168: Microeconometrics Lecture Notes

Selection on ObservablesDistributional Approaches

Analysis to this point has focused on mean e¤ects of treatments

Averages may mask a lot of heterogeneity

Distributional methods seeks to assess the e¤ects of treatments onother quantities

Traditional approach is quantile regression (QR)

More recent approaches have been couched in the potential outcomesframework and focus on quantile treatment e¤ects (QTE)

DL Millimet (SMU) ECO 7377 Fall 2011 168 / 407

Page 169: Microeconometrics Lecture Notes

Selection on ObservablesDistributional Approaches: Quantile Regression

MotivationI QR provides a convenient linear framework for assessing the impact ofchanges in a vector of covariates on the quantiles of the dependentvariable

I Equivalently, QR allows estimation of linear conditional quantilefunctions

I Analogous to linear regression, which estimates the conditional meanfunction

I Common applicationsF Studies of wage determinationF Studies of student achievement

NotationI F (y) = CDF of yI Qθ(y) = θth quantile of the random variable, y , given by

Qθ(y) = inffy : F (y) > θg

DL Millimet (SMU) ECO 7377 Fall 2011 169 / 407

Page 170: Microeconometrics Lecture Notes

(Unconditional) quantiles as a minimization problemI Prior to discussing QR, it is useful to view unconditional quantiles as asolution to a minimization problem

I Example: median

Q0.5(y) = argminb

∑i jyi bj

F Solution depends on the sign of the residuals, not the magnitudeF y = f99, 100, 101g ) Q0.5(y ) = 100;y = f99, 100, 150g ) Q0.5(y ) = 100 as increasing b closer to 150reduces that residual, but increases the sum of the other two residualsby twice as much

F Implies median is less sensitive to outliers than the meanI General formula for any quantile θ 2 (0, 1)

Qθ(y) = argminb

(∑i :yi>b

θjyi bj+ ∑i :yi<b

(1 θ)jyi bj)

F Quantiles other than the median are dened as the arg min of aweighted sum of the absolute residuals

F Intuition: say θ = 0.75 and b = median, then problem puts moreweight on residuals above b, which pushes the solution to theminimization problem above the median

DL Millimet (SMU) ECO 7377 Fall 2011 170 / 407

Page 171: Microeconometrics Lecture Notes

QR model (Koeneker & Bassett 1978)I Replace b in previous problem with a linear function of covariates

bβθ = argminβ

1N

(∑

i :yi>xi βθjyi xi βj+ ∑

i :yi<xi β(1 θ)jyi xi βj

)which may be rewritten as

bβθ = argminβ

1Nf∑i ρθ(εθi )g

where ρθ(εθi ) is known as the check function, dened as

ρθ(εθi ) = [θ I(εθi < 0)]εθi

and εθi is the residual for i and θI Preceding objective fn is equivalent (after some algebra) to

bβθ = argminβ

1N

∑i

θ 1

2+12

sgn(yi xi β)(yi xi β)

I Error distribution

F Key assumption: Qθ(εθ jx) = 0F No other assumption about the distribution

DL Millimet (SMU) ECO 7377 Fall 2011 171 / 407

Page 172: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 172 / 407

Page 173: Microeconometrics Lecture Notes

Estimation

The objective fn is not di¤erentiable ) standard optimizationmethods are not viable

Solved using linear programming methods

GMM estimation is also feasible (Buchinsky 1998)

Special case: median regressionI Corresponds to QR model with θ = 0.5; bβ obtained from

bβ0.5 = argminβ

1Nf∑i jyi xi βjg

I Analogous to OLS, but bβ minimizes the sum of absolute errors insteadof sum of squared errors

I Also known as LAD (Least Absolute Deviations) estimatorI Useful alternative to OLS, particularly when the distribution of theerror term is symmetric (so the conditional mean and median areequal), yet outliers are a concern

I Also useful when y is imputed for some obs

DL Millimet (SMU) ECO 7377 Fall 2011 173 / 407

Page 174: Microeconometrics Lecture Notes

Inference

Using a GMM framework, can showpN(bβθ βθ)! N(0,Λθ)

where

Λθ = ω2(θ)(x 0x)1

ω2(θ) =θ(1 θ)

f 2(F1(θ))

and f (F1(θ)) denotes the density of the error distribution evaluatedat the θth quantileIntuitiion:

I Estimation of the θth conditional quantile uses only obs near the θth

quantileI Asymptotically, obs are added in this range in a manner proportional tof (F1(θ)) assuming iid errors

Utilizing the asymptotic formula for inference is di¢ cult in practiceBootstrap methods provide a simpler alternative (Buchinsky 1998)

DL Millimet (SMU) ECO 7377 Fall 2011 174 / 407

Page 175: Microeconometrics Lecture Notes

Results

Parameters of interest are the partial derivatives of the conditionalquantile fn w.r.t. x

∂ E[Qθ(y jx)]∂xk

which equals βθk if x enters linearly

Presentation of resultsI Di¢ cult as there are a large number of results that are possible toobtain (i.e., βθk , k = 1, ...,K and θ 2 (0, 1))

I Possibilities

F Typical table of coe¢ cient estimates at several quantiles (typically θ =0.10, 0.25, 0.50, 0.75, and 0.90)

F Graph the conditional quantile fns against xk if there is one x that isthe focus of the paper (again, typically for a few quantiles)

F Graph bβθk vs. θ for several di¤erent xs on one graph (only works if xkenters linearly)

DL Millimet (SMU) ECO 7377 Fall 2011 175 / 407

Page 176: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 176 / 407

Page 177: Microeconometrics Lecture Notes

Sequential estimationI In practice, one typically wishes to estimate bβθ for multiple values of θI Estimates are not independent since they are obtained from the samedata

I Estimation one equation at a time, however, is e¢ cient unless there arecross-equation restrictions (e.g., one might wish for a type of smoothcoe¢ cientmodel)

Stata: -qreg -, -bsqreg -, -sqreg -, -grqreg - (for graphing), -qcount-(for count data models), -lqreg - (for logistic models)

DL Millimet (SMU) ECO 7377 Fall 2011 177 / 407

Page 178: Microeconometrics Lecture Notes

Selection on ObservablesDistributional Approaches: Quantile Treatment E¤ects

NotationI y1i , y0i = potential outcomes for iI Di = binary indicator of treatment assignmentI Fj (y) Pr[yji < y ], j = 0, 1 = CDFs of potential outcomesI y θ

j = inffyj : Fj (y) > θg = quantiles of potential outcome dbns

Parameters of interest

4QTEθ = E[y θ

1 y θ0 ], θ 2 (0, 1)

4QTTθ = E[y θ

1 y θ0 jD = 1], θ 2 (0, 1)

4QTUθ = E[y θ

1 y θ0 jD = 0], θ 2 (0, 1)

DL Millimet (SMU) ECO 7377 Fall 2011 178 / 407

Page 179: Microeconometrics Lecture Notes

Interpretation

Constant treatment e¤ect assumptionI y1i = y0i + ∆ 8iI Implies F11 (θ) = F10 (θ) + ∆

0.2

.4.6

.81

­4 ­2 0 2 4

y1 y0

F(y)

NOTE: y0~N(0,1); y1=y0+1

I 4QTEθ = 4QTTθ = 4QTUθ = ∆ 8θ 2 (0, 1)

DL Millimet (SMU) ECO 7377 Fall 2011 179 / 407

Page 180: Microeconometrics Lecture Notes

Heterogeneous treatment e¤ectsI y1i = y0i + ∆iI Perfect rank correlation (Heckman et al. 1997)

F Denition: F1(y1i ) = F0(y0i ) 8iF Intuition: each observation lies in the identical quantile in bothpotential outcome dbns, which implies that y1 is a monotonetransformation of y0

F Implication: 4QTEθ = E[y θ

1 y θ0 ] = Qθ(∆), which is the θth quantile of

the dbn of ∆, which implies that QTEs identify the distribution of thetreatment e¤ect, BUT this requires a strong assumption about thejoint dbn of potential outcomes

I No perfect rank correlation

F No assumption about the joint dbn of potential outcomesF Implication: 4QTE

θ = E[y θ1 y θ

0 ] 6= Qθ(∆), which implies that QTEsidentify the di¤erence in the two marginal dbns of the potentialoutcomes, BUT say nothing about the dbn of actual treatment e¤ects... QTEs reect the e¤ect of D on quantiles of the potential outcomedbns, NOT on observations at particular quantiles.

DL Millimet (SMU) ECO 7377 Fall 2011 180 / 407

Page 181: Microeconometrics Lecture Notes

Example #1...

ID y0 y1 ∆1 1 2 12 2 4 23 3 6 34 4 8 45 5 10 5

Rank preservation holds; ∆ivaries

CDF of y0, y1 are not identical) 4QTE

θ varies with θ

4QTEθ = Qθ(∆)

DL Millimet (SMU) ECO 7377 Fall 2011 181 / 407

Page 182: Microeconometrics Lecture Notes

Example #2...

ID y0 y1 ∆1 1 1 02 2 4 23 3 3 04 4 2 -25 5 5 0

Rank preservation is violated; ∆ivaries

CDF of y0, y1 are identical )4QTE

θ = 0 8θ

4QTEθ 6= Qθ(∆)

DL Millimet (SMU) ECO 7377 Fall 2011 182 / 407

Page 183: Microeconometrics Lecture Notes

EstimationIdentication assumptions: strong ignorability (CIA, CS)yi = Diy1i + (1Di )y0i = observed outcomeb∆θ obtained using sample analogues of y θ

1 and yθ0

Obtain bFj (y), j = 0, 1bFj (y) =1

∑i I(Di = j)∑i I(Di = j) I(yi y) unconditional

bFj (y) =∑i2j bωi I(yi y)

∑i2j bωicovariates

bωi =Dibpi (xi ) + 1Di

1 bpi (xi ) (QTE)

bωi = Di +bpi (xi )(1Di )1 bpi (xi ) (QTT)

bωi =[1 bpi (xi )]Dibpi (xi ) + 1Di (QTU)

where bpi (xi ) is the propensity score and x is the vector such that CIAholdsDL Millimet (SMU) ECO 7377 Fall 2011 183 / 407

Page 184: Microeconometrics Lecture Notes

by θ1 = inffy : bF1(y) > θg; similarly for by θ

0

Implies b∆QT θ = by θ1 by θ

0

Inference based on bootstrap

DL Millimet (SMU) ECO 7377 Fall 2011 184 / 407

Page 185: Microeconometrics Lecture Notes

Test of equal CDFs (Abadie 2002)I Equivalent to test for Ho : ∆θ = 0 8θ 2 (0, 1)I Utilize Kologorov-Smirnov statistic

deq =

rN2sup jF1(y) F0(y)j

I Compute bdeq = rN2 maxk nbF1(yk ) bF0(yk )ofor a grid of points, k = 1, ...,K in the support of yi

I Inference for test of equality using bootstrap

Stata: -dbn- (my code)

DL Millimet (SMU) ECO 7377 Fall 2011 185 / 407

Page 186: Microeconometrics Lecture Notes

Selection on ObservablesDistributional Approaches: Stochastic Dominance

In the event the QTEs di¤er in sign or signicance across the dbn,may be interested in rankingdbnsDenitions

I First Order Stochastic Dominance: Y1 FSD Y0 i¤

F1(y) F0(y) 8y 2 Y

with strict inequality for some y (where Y is the union of the supportsfor Y1 and Y0), or

y θ1 y θ

0 8θ 2 [0, 1]with strict inequality for some θ

I Second Order Stochastic Dominance: X SSD Y i¤Z y∞

F1(t)dt Z y∞

F0(t)dt 8y 2 Y , orZ θ

0y t1dt

Z θ

0y t0dt 8θ 2 [0, 1]

with strict inequality for some y or θ

DL Millimet (SMU) ECO 7377 Fall 2011 186 / 407

Page 187: Microeconometrics Lecture Notes

Example: FSD... (y1 N(1, 1); y0 N(0, 1))

0.2

.4.6

.81

­4 ­2 0 2 4Support

Control Treatment

F(x)

.8.9

11.

11.

2

0 10 20 30 40 50 60 70 80 90 100Quantile

(Tre

atm

ent ­

Con

trol)

Qua

ntile

Tre

atm

ent E

ffect

DL Millimet (SMU) ECO 7377 Fall 2011 187 / 407

Page 188: Microeconometrics Lecture Notes

Example: SSD... (y1 N(0.25, 0.25); y0 N(0, 1))

0.2

.4.6

.81

­4 ­2 0 2 4Support

Control Treatment

F(x)

­1­.5

0.5

11.

5

0 10 20 30 40 50 60 70 80 90 100Quantile

(Tre

atm

ent ­

Con

trol)

Qua

ntile

Tre

atm

ent E

ffect

DL Millimet (SMU) ECO 7377 Fall 2011 188 / 407

Page 189: Microeconometrics Lecture Notes

FSD ) SSD

Third and higher order rankings exist

Any two dbns can be ranking at some order of SD

ImplicationsI Notation

F W1 = class of social welfare fns that are increasing in yF W2 = sub-class of W1 that includes all social welfare fns that are alsoconcave in y

I X FSD Y ) X is at least as preferred by all welfare functions in W1,with strict inequality holding for some welfare function in the class

I X SSD Y ) X is at least as preferred by all welfare functions in W2,with strict inequality holding for some welfare function in the class

DL Millimet (SMU) ECO 7377 Fall 2011 189 / 407

Page 190: Microeconometrics Lecture Notes

Test statistics

d = min supz2Y

[F (z) G (z)]

s = min supz2Y

Z z

∞[F (t) G (t)] dt

where min is taken over F G and G FTests are based on estimates of d and s using the empirical CDFs

I Unconditional, orI Inverse propensity score weighted

Inference using bootstrap (simple and/or more complex methods)

DL Millimet (SMU) ECO 7377 Fall 2011 190 / 407

Page 191: Microeconometrics Lecture Notes

Selection on UnobservablesWhen all xs required for CIA to hold are not observed, then oneenters into selection on unobservables worldImplies unobservable attributes of obs i are correlated with bothpotential outcomes and treatment assignment of obs iIn general, this implies

E[yj jx ,D = j ] 6= E[yj jx ,D = j 0], j , j 0 = 0, 1In a regression framework, with functional form assumptions, thisimplies

yi = Diy1i + (1D)iy0i= α0 + xi β0 + (α1 α0)Di + xiDi (β1 β0)

+ [υ0i +Di (υ1i υ0i )]

where SOU results ifI Cov(D, υ0) 6= 0 ) selection on unobservables impacting outcome inuntreated state, or

I Cov(D, υ1 υ0) 6= 0 ) presence of and selection on unobserved,obs-specic gains from treatment

DL Millimet (SMU) ECO 7377 Fall 2011 191 / 407

Page 192: Microeconometrics Lecture Notes

Possible solutions1 Bound treatment e¤ects (set identicationas opposed to pointidentication) under minimal assumptions

2 Utilize panel data3 Utilize exclusion restrictions (i.e., instrumental variables)4 Model dependence between treatment and unobservables ) controlfunction approach

5 Other methods that ndidentication elsewhere

DL Millimet (SMU) ECO 7377 Fall 2011 192 / 407

Page 193: Microeconometrics Lecture Notes

Selection on UnobservablesBounding Treatment E¤ects

Recall, the ATE

∆ATE (x) = E[y1 y0jx ] = E[y1jx ] E[y0jx ]= fE[y1jx ,D = 1]Pr(D = 1jx)

+ E[y1jx ,D = 0]Pr(D = 0jx)g fE[y0jx ,D = 1]Pr(D = 1jx)

+ E[y0jx ,D = 0]Pr(D = 0jx)g= fg1(x) E[y0jx ,D = 1]gp(x)

+ fE[y1jx ,D = 0] g0(x)g[1 p(x)]

where p(x), the propensity score, and gj (x), j = 0, 1, are allobservable from the data

DL Millimet (SMU) ECO 7377 Fall 2011 193 / 407

Page 194: Microeconometrics Lecture Notes

Similar derivation for other two primary mean treatment e¤ectparameters

∆ATT (x) = g1(x) E[y0jx ,D = 1]∆ATU (x) = E[y1jx ,D = 0] g0(x)

Thus, without additional information, no parameter is identied

Early bounding approach outlined in Smith and Welch (1986)I Objective was to estimate the average wage for blacks accounting forselection into LF

E[w ] = E[w jLF = 1]Pr(LF = 1) + E[w jLF = 0]Pr(LF = 0)

where E[w jLF = 0] is not observedI Solution: E[w jLF = 0] = γ E[w jLF = 1], γ 2 [0.5, 1]I In treatment e¤ects context, can specify

E[yd jD = d 0] = γ E[yd jD = d ] for di¤erent values of γ, where d 6= d 0I Rosenbaum (2002) summarizes other papers that bound causal e¤ectsby varying the unobserved parameters

DL Millimet (SMU) ECO 7377 Fall 2011 194 / 407

Page 195: Microeconometrics Lecture Notes

More recent approaches focus on adding assumptions to tighten thebounds on the parameter of interest

Notation (Lechner 1999; Manski 1990)I L1, L0 = lower bounds of the support of y1, y0, respectivelyI U1, U0 = upper bounds of the support of y1, y0, respectivelyI BLk , B

Uk = lower, upper bounds, respectively, of treatment e¤ect k

(k = ATE ,ATT , or ATU)I wk = BUk BLk = width of bounds for treatment e¤ect k

DL Millimet (SMU) ECO 7377 Fall 2011 195 / 407

Page 196: Microeconometrics Lecture Notes

Trivial caseI No additional information

BLk = L1 U0BUk = U1 L0wk = (U1 L0) (L1 U0)

= (U1 L1) + (U0 L0)

I Example: y is binary (e.g., employment after job training program)

L1 = L0 = 0

U1 = U0 = 1

BLk = 1BUk = 1

wk = 2

DL Millimet (SMU) ECO 7377 Fall 2011 196 / 407

Page 197: Microeconometrics Lecture Notes

Tightening bounds with data

Use sample dataI p(x), g0(x), g1(x) may be consistently estimated from the data by

F Sample meansF Nonparametric smoothing methodsF Parametric methods

DL Millimet (SMU) ECO 7377 Fall 2011 197 / 407

Page 198: Microeconometrics Lecture Notes

New bounds with sample dataI ∆ATE (x)

BLATE = f[g1(x) U0gdp(x) + fL1 [g0(x)g[1 dp(x)]BUATE = f[g1(x) L0gdp(x) + fU1 [g0(x)g[1 dp(x)]wATE = (U1 L1)[1 dp(x)] + (U0 L0)dp(x)

I ∆ATT (x)

BLATT = [g1(x) U0BUATT = [g1(x) L0wATT = U0 L0

I ∆ATU (x)

BLATU = L1 [g0(x)

BUATU = U1 [g0(x)wATU = U1 L1

DL Millimet (SMU) ECO 7377 Fall 2011 198 / 407

Page 199: Microeconometrics Lecture Notes

Example: y is binary ) wk = 1 8k (sample data cuts width in half)Note: Bounds necessarily include zero

I Cannot rule out zero average treatment e¤ectI Can exclude some extreme valuesI Full characterization of the bounds should also account for uncertaintyin the variables belonging in x and the model used to estimate g0(x),g1(x), and p(x) (Heckman et al. 1999)

F While bounds conditional on x and a model, m, all have width one, theexact bounds are a¤ected

I Kreider, Pepper, and co-authors incorporate measurement error in Dinto the bounds (discussed later)

DL Millimet (SMU) ECO 7377 Fall 2011 199 / 407

Page 200: Microeconometrics Lecture Notes

Tightening bounds with assumptions

Assume ∆ATT (x) = ∆ATU (x)I Calculate bounds for ∆ATT (x) and ∆ATU (x)I New bounds include only the intersection of the two boundsI Example

∆ATT (x) 2 [0.25, 0.75]∆ATU (x) 2 [0.75, 0.25]

then new bounds are [0.25, 0.25]I Note: still necessarily include zero since bounds on ∆ATT (x), ∆ATU (x)both include zero

DL Millimet (SMU) ECO 7377 Fall 2011 200 / 407

Page 201: Microeconometrics Lecture Notes

Level-set restrictions: treatment e¤ects are constant 8x 2 X0 X(the support of x)

I Calculate bounds for ∆k (x) 8x 2 X0I New bounds include only the intersection of these boundsI Example (∆ATE )

∆ATE (xa) 2 [0.25, 0.75]∆ATE (xb) 2 [0.75, 0.25]

where xa, xb 2 X0, then new bounds are [0.25, 0.25]I Note: still necessarily include zero since bounds on ∆k (x) include zero8x

I Formally

BLk (X0) = supx2X0

BLk (x)

BUk (X0) = infx2X0

BUk (x)

wk (X0) = BUk (X0) BLk (X0)

DL Millimet (SMU) ECO 7377 Fall 2011 201 / 407

Page 202: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 202 / 407

Page 203: Microeconometrics Lecture Notes

Level-set restrictions: expected outcomes are constant8x 2 X0,1 X (for y1) and 8x 2 X0,0 X (for y0)

I Implies

E[y1 jx ] is constant 8x 2 X0,1E[y0 jx ] is constant 8x 2 X0,0

DL Millimet (SMU) ECO 7377 Fall 2011 203 / 407

Page 204: Microeconometrics Lecture Notes

) Bounds become

BLATE (x0) = supx2X0,1

f[g1(x)dp(x) + L1[1 dp(x)]g infx2X0,0

f[g0(x)[1 dp(x)] +U0dp(x)gBUATE (x0) = inf

x2X0,1f[g1(x)dp(x) + U1[1 dp(x)]g supx2X0,0

f[g0(x)[1 dp(x)] + L0dp(x)gBLATT (x0) = sup

x2X0,1f[g1(x)g inf

x2X0,0fU0g

BUATT (x0) = infx2X0,1

f[g1(x)g supx2X0,0

fL0g

BLATU (x0) = supx2X0,1

fL1g infx2X0,0

f[g0(x)g

BUATU (x0) = infx2X0,1

fU1g supx2X0,0

f[g0(x)g

where x0 2 X0,1 \ X0,0DL Millimet (SMU) ECO 7377 Fall 2011 204 / 407

Page 205: Microeconometrics Lecture Notes

Assumption: positive selectionI Implies

E[y1 jx ,D = 1] > E[y0 jx ,D = 1]which means that the treated only join the treatment group if there arenon-negative gains on average

I Bounds become

BLATE = fL1 [g0(x)g[1 dp(x)]BUATE = f[g1(x) L0gdp(x) + fU1 [g0(x)g[1 dp(x)]BLATT = 0

BUATT = [g1(x) L0

I Does not a¤ect bounds on ∆ATU (x)

DL Millimet (SMU) ECO 7377 Fall 2011 205 / 407

Page 206: Microeconometrics Lecture Notes

Combining assumptions, restrictions

BLk ,combine = maxp2Ψ

fBLk ,pg

BUk ,combine = minp2ΨfBUk ,pg

where Ψ is the set of restrictions being combined

Inference via bootstrapI Yields condence intervals for the bounds, not the treatment e¤ectI For example, a 90% CI implies that the probability that the truebounds lie in the CI is 90%; the probability that the truetreatmente¤ect lies in the CI is even higher (see also Imbens & Manski (2004))

DL Millimet (SMU) ECO 7377 Fall 2011 206 / 407

Page 207: Microeconometrics Lecture Notes

Tightening bounds (again)

Manski (1990), Manski & Pepper (2000) consider additionalassumptions

1 InstrumentE[yj jz ] = E[yj ], j = 0, 1

2 Monotone Instrument

z1 z z2 ) E[yj jZ = z1 ] E[yj jZ = z ] E[yj jZ = z2 ], j = 0, 1

3 Monotone Treatment Selection

E[yj jD = 1] E[yj jD = 0], j = 0, 1

4 Monotone Treatment Response

y0 y1 ) E[y0 ] E[y1 ]

where x is omitted for notational convenience

DL Millimet (SMU) ECO 7377 Fall 2011 207 / 407

Page 208: Microeconometrics Lecture Notes

Use of an instrumentI E[yj jz ] = E[yj ], j = 0, 1, implies

E[yj ] 2supzfE[y jD = j ,Z = z ]Pr(D = j jZ = z ) + Lj Pr(D 6= j jZ = z )g,

infzfE[y jD = j ,Z = z ]Pr(D = j jZ = z ) + Uj Pr(D 6= j jZ = z )g

i

I Bounds for ∆ATE become

BLATE = supzf[g1(z)dp(z) + L1 [1 dp(z)]g inf

zf[g0(z)[1 dp(z)] +U0dp(z)g

BUATE = infzf[g1(z)dp(z) + U1 [1 dp(z)]g sup

zf[g0(z)[1 dp(z)] + L0dp(z)g

I Bounds are tighter than worst case bounds if p(z) 6= Pr(D = 1); i.e., zis correlated with treatment assignment

DL Millimet (SMU) ECO 7377 Fall 2011 208 / 407

Page 209: Microeconometrics Lecture Notes

Use of a monotone instrument (MIV)I z1 z z2 ) E[yj jZ = z1 ] E[yj jZ = z ] E[yj jZ = z2 ], j = 0, 1

F Weaker assumption than the prior, mean independence assumptionF Implies that potential outcomes are non-decreasing in z

I Implies

E[yj ] 2"

∑z2Z

Pr(Z = z)

(supz1z

fE[y jD = j ,Z = z1 ]Pr(D = j jZ = z1)

+ Lj Pr(D 6= j jZ = z1)g

),

∑z2Z

Pr(Z = z)

(infz2z

fE[y jD = j ,Z = z2 ]Pr(D = j jZ = z2)+ Uj Pr(D 6= j jZ = z2)g

)#I Bounds derived based on this

DL Millimet (SMU) ECO 7377 Fall 2011 209 / 407

Page 210: Microeconometrics Lecture Notes

Monotone treatment selection (MTS)I E[yj jD = 1] E[yj jD = 0], j = 0, 1, implies that the treated grouphas weakly higher potential outcomes in all treatment states

I Plausible in certain cases when one does not condition on x and x iscorrelated with both D and yj in the same direction

I Implies

E[yj ] 2 [E[y jD = j ]Pr(D j) + Lj Pr(D < j),E[y jD = j ]Pr(D j) + Uj Pr(D > j)]

Monotone treatment response (MTR)I y0 y1 ) E[y0 ] E[y1 ] implies we know the sign of the treatmente¤ect (inclusive of zero)

I Implies ∆ATE 0I Stronger than the positive selection assumption previously as that onlyapplied to the sub-sample with D = 1

MIV can be combined with MTS, MTRMethodology can also be combined with assumptions concerningmeasurement error (discussed later)Stata: -bpbounds- (related)

DL Millimet (SMU) ECO 7377 Fall 2011 210 / 407

Page 211: Microeconometrics Lecture Notes

Selection on UnobservablesAltonji et al. Approach

Altonji et al. (2005) o¤er two approaches to assess the sensitivity ofestimates obtained under SOO assumption when this assumption isfalse

Approach #1 is applicable to the case of a binary outcome

Approach #2 is applicable regardless of type of outcome

Krauth (2011) attempts to extend the approach

DL Millimet (SMU) ECO 7377 Fall 2011 211 / 407

Page 212: Microeconometrics Lecture Notes

Approach #1: Bivariate probit model

Model

y i = xi β+ τDi + εi

Di = xiγ+ µi

where ε, µ N(0, 0, 1, 1, ρ) and

y =

1 if y > 00 otherwise

D =

1 if D > 00 otherwise

DL Millimet (SMU) ECO 7377 Fall 2011 212 / 407

Page 213: Microeconometrics Lecture Notes

Estimation by ML

lnL = ∑i :fy=1,D=1g ln[Φ2(xi β+ τ, xiγ, ρ)]

+∑i :fy=1,D=0g ln[Φ2(xi β,xiγ,ρ)]

+∑i :fy=0,D=1g ln[Φ2(xi β τ, xiγ,ρ)]

+∑i :fy=0,D=0g ln[Φ2(xi β,xiγ, ρ)]

Model is technically identied with no exclusion restriction, but treatρ as unidentied

Assessing treatment e¤ect as ρ varies provides evidence of sensitivityto selection on unobservables

Constrain ρ > 0) positive selection; ρ < 0) negative selection

DL Millimet (SMU) ECO 7377 Fall 2011 213 / 407

Page 214: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 214 / 407

Page 215: Microeconometrics Lecture Notes

Approach #2: SOU relative to SOO

Intuition is to assess how much SOU, relative to the amount of SOO,is needed to fully explain the observed positive association between Dand y

If

(AET.i) Random observables: x is a random subset of all factors, w , inuencingy

(AET.ii) Equally important factors: the number of elements in w is large and nosingle variable factor has an undue inuence on y

(AET.iii) Relationship between x and unobservables: slightly weaker technicalassumption than independence between x and remaining elements of w

then one should expect the amount of selection controlled for by x toequal the amount of selection on unobservables

Implies that if the amount of SOU needed to explain the observedassociation is less than amount of SOO, the estimated treatmente¤ect should not be viewed as robust

DL Millimet (SMU) ECO 7377 Fall 2011 215 / 407

Page 216: Microeconometrics Lecture Notes

Model for outcomeyi = xi β+ τDi + εi

The (normalized) amount of SOU is given by

E[εjD = 1] E[εjD = 0]Var(ε)

The (normalized) amount of SOO ignoring the impact of D isgiven by

E[xβjD = 1] E[xβjD = 0]Var(xβ)

The goal is to assess how large SOU must be relative to SOO to fullyaccount for the positive treatment e¤ect estimated under exogeneity

DL Millimet (SMU) ECO 7377 Fall 2011 216 / 407

Page 217: Microeconometrics Lecture Notes

Express actual treatment participation as

Di = xiγ+ µi

plim of OLS estimator of τ is

plim bτ = τ +Cov(µ, ε)

Var(µ)

= τ +Var(D)Var(µ)

fE[εjD = 1] E[εjD = 0]g

Under the assumption that SOO = SOU, the asymptotic bias term is

Cov(µ, ε)Var(µ)

=Var(D)Var(µ)

E[xβjD = 1] E[xβjD = 0]

Var(xβ)Var(ε)

DL Millimet (SMU) ECO 7377 Fall 2011 217 / 407

Page 218: Microeconometrics Lecture Notes

This bias can be consistently estimated under Ho : τ = 0

The ratio bτ/dbias indicates how much larger SOU needs to be relativeto SOO to entirely explain the treatment e¤ect

A small ratio ) treatment e¤ect is highly sensitive to selection onunobservables; a ratio >> 1 implies treatment e¤ect is robust

Algorithm:1 Estimate Var(D) from sample2 Estimate treatment eqtn via LPM ) \Var(µ)

3 Estimate outcome eqtn via OLS restricting τ = 0 ) xbβ, \Var(xbβ),\Var(ε)

4 Obtain sample means of xbβ in treatment and control groups )\E[xbβjD = 1], \E[xbβjD = 0]

5 Estimate outcome eqtn via OLS ) bτ6 Compute ratio of bτ/dbias

DL Millimet (SMU) ECO 7377 Fall 2011 218 / 407

Page 219: Microeconometrics Lecture Notes

Notes:I If y is binary, estimate treatment eqtn via probit perhaps in step 3 )

Var(ε) = 1I AET methods have relatively little to say about economic signicanceof treatment e¤ect unless one makes assumptions about amount ofSOU

DL Millimet (SMU) ECO 7377 Fall 2011 219 / 407

Page 220: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 220 / 407

Page 221: Microeconometrics Lecture Notes

Selection on UnobservablesPanel Data

Refer to ECO 6375 for panel data refresher...

Panel data is useful addressing selection on unobservables that areinvariant along a certain dimension

Thus, panel data methods provide a solution to selection onunobservables in only certain situations

NotationI Population regression fn given by E[y jx1, ..., xk , c ]I xk , k = 1, ...,K , are observable (to the econometrician)I c is an unobservable (to the econometrician) variable

Assuming linearity: E[y jx1, ..., xk , c ] = β0 + xβ+ c

DL Millimet (SMU) ECO 7377 Fall 2011 221 / 407

Page 222: Microeconometrics Lecture Notes

Error form of the model

y = β0 + xβ+ c + ε

where c is the unobserved e¤ect and ε is the idiosyncratic error

Time-series or cross-section models are forced to include c in the errorterm (referred to as the composite error)

yi = β0 + xi β+eεi , eεi = ci + εi

yt = β0 + xtβ+eεt , eεt = ct + εt

DL Millimet (SMU) ECO 7377 Fall 2011 222 / 407

Page 223: Microeconometrics Lecture Notes

Modelyit = β0 + xitβ+ ci + εit

I Unobserved e¤ect is assumed to be time invariant (assuming atraditional panel where t represents time)

I x may include time dummies or time trend, etc.

Problem: given presence of ci , how can we recover consistentestimates of β0, β?

Estimation techniquesI Assuming Cov(x , c) = 0

F Pooled OLS (POLS)F Random e¤ects (RE)

I Assuming Cov(x , c) 6= 0F Least squares dummy variable model (LSDV)F Fixed e¤ects (FE)F First-di¤erencing (FD)

DL Millimet (SMU) ECO 7377 Fall 2011 223 / 407

Page 224: Microeconometrics Lecture Notes

Selection on UnobservablesPanel Data: Treatment E¤ects Models

Structural model

yit = ci + λt + xitβ+ τDit + εit , i = 1, ...,N; t = 1, ...,T

where λt are time dummies

Special caseI Setup

F T = 2F Di1 = 0 8iF Di2 2 f0, 1g 8iF Assume no xs

I FE or FD estimation )

τ = E[∆y jD2 = 1] E[∆y jD2 = 0]

I Known as di¤erence-in-di¤erences estimator

DL Millimet (SMU) ECO 7377 Fall 2011 224 / 407

Page 225: Microeconometrics Lecture Notes

Visual representation of special case

yit = ci + λt + xitβ+ τDit + εit

I Expected outcomes by period and treatment status

t = 1 t = 2D = 0 c0 + λ1 c0 + λ2D = 1 c1 + λ1 c1 + λ2 + τ

I Implies

E[∆y jD2 = 1] = (c1 + λ2 + δ) (c1 + λ1) = τ + λ2 λ1

E[∆y jD2 = 0] = (c0 + λ2) (c0 + λ1) = λ2 λ1

which implies

τ = E[∆y jD2 = 1] E[∆y jD2 = 0]

DL Millimet (SMU) ECO 7377 Fall 2011 225 / 407

Page 226: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 226 / 407

Page 227: Microeconometrics Lecture Notes

Before­After Estimator

Cross­Section Estimator

DID

01

23

­1 0 1Period

Y0 Y1

Note: Illustration of Three Common Estimators.

DL Millimet (SMU) ECO 7377 Fall 2011 227 / 407

Page 228: Microeconometrics Lecture Notes

Beyond the special caseI Special case is useful to gain the intuition, not requiredI In general, as long as Dit is time-varying for some units i , then τ canbe estimated by any panel data method given the required assumptionsare met

I If selection into treatment is only on observables (not ci ), then POLSor RE may be consistent and e¢ cient

I If selection into treatment is also on time invariant unobservables (ci ),then POLS and RE are inconsistent, but FE or FD are consistent ifother assumptions are met

I Important to remember: FE/FD is not a magic bullet (Duo et al.2004)

F FE and FD require strict exogeneity ; rules out Ashenfelters Dip )Cov(Dit , εit1) 6= 0

F Rules out selection on contemporaneous shocks ) Cov(Dit , εit ) 6= 0F Key: requires treated and untreated to follow same time trend inabsence of treatment

F Di¤-in-di¤-in-di¤ may be an option

I With heterogeneous treatment e¤ects, FE identies the ATT

DL Millimet (SMU) ECO 7377 Fall 2011 228 / 407

Page 229: Microeconometrics Lecture Notes

Timing issues (LaPorte & Windmeijer 2005)

Previous model restricts D to a one-time intercept shift, τ

In certain applications, agent may anticipate treatment and alterbehavior prior to actual treatment; or, response may occur with a lag;or, some combination of bothExamples: policy changes announced, but not implemented untilfuture date; or, lags in adjustment to policy changesGeneral structural model

yit = ci + λt + xitβ+∑L0l=1 δlD

lit + δ0Dit +∑L1

l=1 δlDlit + εit

where

Dlit = Dit+l (treatment assignment l periods in future)

D lit = Ditl (treatment assignment l periods in past)

δl reects anticipatory e¤ects of treatmentδl reects lagged e¤ects of treatmentδ0 reects instantaneous e¤ects of treatment

DL Millimet (SMU) ECO 7377 Fall 2011 229 / 407

Page 230: Microeconometrics Lecture Notes

Specication test

If anticipatory and/or lagged e¤ects occur, but simplemodel ofone-time e¤ect is estimated, then FE and FD will yield (statistically)di¤erent estimates

E[bδFD ] = δ0 δ1

E[bδFE ] = ∑t ωt (δ0+ δ)

where

δ0+ = average of δ0, δ1, ..., δL1δ = average of δ1, ..., δL0

and ωt are weights

DL Millimet (SMU) ECO 7377 Fall 2011 230 / 407

Page 231: Microeconometrics Lecture Notes

Ho : δFD = δFE () Ho : φ = 0yit yit1yit y i

=

xit xit1xit x i

β+

Dit Dit1Dit D i

δ

+

0

Dit D i

φ+

ηiteηit

Estimate via OLS, look at condence interval on bφLee and Huang (2011) extend the existing literature on dynamictreatment e¤ects to allow for anticipatory behavior

DL Millimet (SMU) ECO 7377 Fall 2011 231 / 407

Page 232: Microeconometrics Lecture Notes

Autoregressive Model

Fixed e¤ects models require Dit to be time-varying for some i

If D is time invariant 8i , it is still possible to identify the e¤ect of theprogram under the common treatment e¤ect assumption

Structural model

yit = λt + xitβ+ τDi + εit

εit = ρεit1 + ηit

where ηit is iid with mean zero and τ is the homogeneous treatmente¤ect

Quasi-FD yields

yit = eλt + (xit ρxit1)β+ (1 ρ)τDi + ρyit1 + ηit

OLS is consistent if (i) x are strictly exogenous and (ii) D isuncorrelated with η (e.g., post-treatment shocks are not forecastableand therefore do not a¤ect past treatment decision

DL Millimet (SMU) ECO 7377 Fall 2011 232 / 407

Page 233: Microeconometrics Lecture Notes

Comparative Case Study Approach

Provides an alternative to DD whenI Treatment occurs at an aggregate levelI Typically only a single observation is treated and lengthy history ofpre-treatment data are availble for the treated and the pool of controls

Examples:I Mariel Cuban Boat Lift (Card 1980)I State minimum wage (Card & Krueger 1994)

SolutionI Construct a synthetic control which is a weighted average of availableto controls to estimate the missing counterfactual in post-treatmentperiod(s)

I Weights are chosen by matching pre-treatment covariates and outcomesI Allows for di¤erential time trends in treatment and control observations

F By matching pre-treatment outcomes, one is implicitly matching on thetime-invariant unobserved e¤ect

F Thus, does not matter if unobservd e¤ect has di¤erential e¤ects overtime if the time-specic e¤ect is a common factor

DL Millimet (SMU) ECO 7377 Fall 2011 233 / 407

Page 234: Microeconometrics Lecture Notes

ModelI yit is observed outcome for obs i , i = 1, ..., J + 1, in periodt = 1, ...,To , ...,T

I Obs 1 is treated; remaining 2, ..., J + 1 are never treatedI Timing of treatment e¤ects

1 No Anticipatory E¤ects: To is period prior to obs 1 being treated2 Anticipatory E¤ects: To is period prior to any anticipatory e¤ects forobs 1 begining

I Outcomes in the absence of treatment

yit = yNit = δt + θtZi + λtui + εit

I Outcomes with treatment

yit = yIit = y

Nit + αit

DL Millimet (SMU) ECO 7377 Fall 2011 234 / 407

Page 235: Microeconometrics Lecture Notes

Synthetic control is dened as

∑J+1j=2 ωjyjt = ∑J+1

j=2 ωj (δt + θtZi + λtui + εit )

where ωj is the weight given to control j and

I ∑J+1j=2 ωj = 1I ωj 0 8j

Conditional on choice of weights, ωj , period-specic treatment e¤ect

is estimated as bαit = y1t ∑J+1j=2 ω

j yjt

Requires a SUTVA-type assumption that the treatment does notimpact outcomes in the control pool

DL Millimet (SMU) ECO 7377 Fall 2011 235 / 407

Page 236: Microeconometrics Lecture Notes

Weights are chosen to match moments of the data in periods t ToI Dene

yKi = ∑Tos=1 ksyiswhere K = (k1, ..., kTo ) is a vector of weights and thus y

Ki represents

a particular linear combination of pre-treatment outcomes for obs iI Given M unique linear combinations, dene the vector of pre-treatmentoutcomes for obs 1 as

X1 = (Z01, y

K11 , ..., y

KM1 )

with dimension R 1I Dene the R J matrix of variables for the remaining obs i ,i = 2, ..., J + 1 as X0, where column j is given by

(Z 0j1, yK1j1, ..., y

KMj1)

I Weights are chosen to minimize some distance function

jjX1 X0W jjV =q(X1 X0W )0V (X1 X0W )

where V is a R R symmetric, positive semidenite matrixI In practice, V is chosen to minimize the MSE of the pre-interventionpredictions

DL Millimet (SMU) ECO 7377 Fall 2011 236 / 407

Page 237: Microeconometrics Lecture Notes

Inference is handled byI Re-doing the analysis, treated obs i , i = 2, ..., J + 1, as treatedafterperiod To and the remaining obs as the pool of potential controls

I This yields a dbn of treatment e¤ect estimates under Ho of notreatment e¤ect

I If actual estimates of bα1t look very di¤erent, this is evidence of astatistically meaningful treatment e¤ect

Code is available in Stata athttp://www.mit.edu/~jhainm/synthpage.html.

DL Millimet (SMU) ECO 7377 Fall 2011 237 / 407

Page 238: Microeconometrics Lecture Notes

Example: Abadie et al. (2010)

DL Millimet (SMU) ECO 7377 Fall 2011 238 / 407

Page 239: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 239 / 407

Page 240: Microeconometrics Lecture Notes

Selection on UnobservablesInstrumental Variables

Refer to ECO 6374 for refresher on basics...

TerminologyI Structuralmodel

yi = β0 + β1xi + εi

I First-stage modelxi = π0 + π1zi + ui

I Reduced form model

yi = (β0 + β1π0) + β1π1zi + (εi + β1ui )

= eπ0 + eπ1zi +eεi

DL Millimet (SMU) ECO 7377 Fall 2011 240 / 407

Page 241: Microeconometrics Lecture Notes

Goal: devise alternative estimation technique to obtain consistentestimates when E[εjx ] 6= 0

I Solution: identify β from exogenous variation in x isolated usinginstruments, z

I z is a valid IV for x i¤

(IV.i) First-stage: E[z 0x ] 6= 0(IV.ii) Exogeneity: E[z 0ε] = 0(IV.iii) Exclusion: E[y jx , z ] = E[y jx ]

where z and x are both N K matricesI Exogenous xs serve as instruments for themselvesI Need unique instrument for each endogenous var

Stata: -ivreg2 -, -xtivreg2 -

DL Millimet (SMU) ECO 7377 Fall 2011 241 / 407

Page 242: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 242 / 407

Page 243: Microeconometrics Lecture Notes

Several issues remain under scutiny in the literature1 Choice of estimation technique2 Properties and inference with weak IVs ) E[z 0x ] 03 Properties and inference with endogenous IVs ) E[z 0ε] 6= 0

DL Millimet (SMU) ECO 7377 Fall 2011 243 / 407

Page 244: Microeconometrics Lecture Notes

Selection on UnobservablesEstimators

1 IV2 Two-Stage Least Squares (TSLS or 2SLS)3 Nagar4 Split-sample or Two-Sample IV(data set #1: fx , zgN1i=1; data set #2: fy , zg

N2i=1)

5 JIVE6 LIML7 Fuller (modied LIML)8 GMM

DL Millimet (SMU) ECO 7377 Fall 2011 244 / 407

Page 245: Microeconometrics Lecture Notes

Selection on UnobservablesEstimators: IV Estimator

Estimator is given by

y = xβ+ ε

) z 0y = z 0xβ+ z 0ε ! β = (z 0x)1z 0y if z 0ε = 0

) bβIV = (z 0x)1z 0y

Estimated asymptotic variance is given by

Var(bβIV ) = bσ2(z 0x)1(z 0z)(x 0z)1; bσ2 = 1N K ∑i

bε2i

DL Millimet (SMU) ECO 7377 Fall 2011 245 / 407

Page 246: Microeconometrics Lecture Notes

Selection on UnobservablesEstimators: Two-Stage Least Squares

IV estimator requires 1 instrument per endogenous variable; otherwisez 0x is a LK matrix (L > K ) with rank = K , and the inverse doesnot exist

Discarding additional IVs is probably ine¢ cient

TSLS is an alternative estimator that does not face this problem

In multivariate regression, this is formalized asI First-stage bx = z(z 0z)1z 0xand replacing z with bx in the IV estimator

I Estimator now given by

bβTSLS = (bx 0bx)1bx 0y = [x 0z(z 0z)1z 0x ]1x 0z(z 0z)1z 0yDL Millimet (SMU) ECO 7377 Fall 2011 246 / 407

Page 247: Microeconometrics Lecture Notes

Notes ...

In a multiple regression...I With multiple endogenous vars, need at least as many IVs asendogenous xs; do not interpret this IV for this x , that IV for that x

I Where the second-stage contains other exogenous vars, these vars mustbe included in the rst-stage

If strictly more IVs than endogenous vars, thenI Model is overidentied (as opposed to exactly identied)I Enables additional tests for instrument validity

Estimators are CAN, but biasedI Intuition behind the bias is that the rst-stage OLS estimates, bθ, arecorrelated with the error term from the structural model, ε, whichimplies that the tted values, bx are also correlated with ε

Incorrectly treating other covariates in the model as exogenous )inconsistent estimates if instrument(s) are correlated with thesecovariates

DL Millimet (SMU) ECO 7377 Fall 2011 247 / 407

Page 248: Microeconometrics Lecture Notes

Selection on UnobservablesEstimators: JIVE, SSIV, Nagar

Breaking the correlation between bθ and ε is the motivation behindJIVE and SSIVSSIV (Angrist & Krueger 1992, 1995)

I ApproachF Divide sample into two groups: i = 1, ...,N1 and i = N1 + 1, ...,NF Estimate rst-stage using N2 obs, i = N1 + 1, ...,NF Predict bx out-of-sample for rst N1 obsF Estimate second-stage using rst N1 obs

I Estimators bβSSIV = (bx 021bx 021)1bx 021ybβUSSIV = (bx 021x 01)1bx 021ywhere bx21 = z1(z 02z2)1z 02x2 and subscript 1 (2) refers to estimationon i = 1, ...,N1 (i = N1 + 1, ...,N)

I SSIV uses OLS in the second-stage; USSIV stands for Unbiased SSIVand uses IV in the second-stage

DL Millimet (SMU) ECO 7377 Fall 2011 248 / 407

Page 249: Microeconometrics Lecture Notes

JIVEI Approach

F Estimate rst-stage using N 1 obsF Predict bx out-of-sample for the excluded obsF Repeat for all N obs and estimate second-stage using all N obs

I Estimators

bβJIVE = (bx 0ibx 0i )1bx 0i ybβUJIVE = (bx 0i x)1bx 0i y = (x 0C 0Jx)1x 0C 0Jywhere bx 0i is matrix whose i th row is ziπi , πi is the vector ofrst-stage coe¤s with obs i removed, andCj = (IDPz )1(Pz DPz ), DPz = diag(Pz ), and Pz = z(z 0z)1z 0

I JIVE uses OLS in the second-stage; UJIVE stands for Unbiased JIVEand uses IV in the second-stage

I Stata: -jive-

DL Millimet (SMU) ECO 7377 Fall 2011 249 / 407

Page 250: Microeconometrics Lecture Notes

Nagar estimator is a bias-corrected TSLS estimatorI Nagar (1959), Hahn & Hausman (2002)I Estimator given by

bβN = x 0 Pz KNIN

x1

x 0Pz

KNIN

y

where K = # IVs and Pz = z(z 0z)1z 0I Hahn & Hausman (2002) discuss the poor performance of the Nagarestimator when the model is close to being unidentied

DL Millimet (SMU) ECO 7377 Fall 2011 250 / 407

Page 251: Microeconometrics Lecture Notes

Selection on UnobservablesEstimators: LIML, Fuller, and k-Class Estimators

k-class estimators can be all be written asbβk = [x 0(IN kMz )x ]1x 0(IN kMz )y

for di¤erent values of k, where Mz = IN z(z 0z)1z 0

k = 0) OLS

k = 1) TSLS

k = λ ) LIML

k = λ α

N L ) Fuller

k = 1+LKN

) Nagar

For LIML, λ is a minimum eigenvalueFor Fuller, α is user-specied (typically 1) and L = # included +excluded instrumentsFor Nagar, LK = # over-identifying restrictionsDL Millimet (SMU) ECO 7377 Fall 2011 251 / 407

Page 252: Microeconometrics Lecture Notes

Selection on UnobservablesIV: Specication Tests

Much specication testing is required when utilizing IV in appliedresearch

Types of tests available

I Tests of endogeneity: E[x 0ε]?= 0

I Tests of instrument relevance: E[z 0x ]?= 0

I Tests of overidentication: E[z 0ε]?= 0 (partial test only)

I Tests for weak instruments:E[z 0x ] 0

Covered in ECO 6374

With weak IVs, some recommend LIML, others Fuller, others UJIVE,others TSLS (which tends to have a larger bias, similar RMSE)

DL Millimet (SMU) ECO 7377 Fall 2011 252 / 407

Page 253: Microeconometrics Lecture Notes

Selection on UnobservablesIV: Imperfect Instruments

Recent work has explored what can be learned if z is an imperfectinstrumental variable (IIV)

Two possible imperfections:1 z is also endogenous2 z is not excludable from the second-stage

Nevo & Rosen (2010) and Ashley (2009) address endogeneity

Conley et al. (2010) address excludability

Note: These are intimately related since if z is incorrectly treated asexcludable, then it will be correlated with the second-stage compositeerror that now includes the error and z

DL Millimet (SMU) ECO 7377 Fall 2011 253 / 407

Page 254: Microeconometrics Lecture Notes

Nevo & Rosen (2010) ...

SetupI Model given by

yi = βxi + wi δ+ εi

where x is a single endogenous regressor, w is exogenous (oralternatively are endogenous with valid instruments), and z is 1 kzvector of imperfect instruments for x

I z is an imperfect IV (IIV) in the sense that it is also correlated with εI Assumptions:

(IIV.i) Sign of correlation: ρx ερzj ε 0, j = 1, ..., kz(IIV.ii) Degree of endogeneity: jρx εj jρzj εj, j = 1, ..., kz(IIV.iii) True model: yi = βxi + wi δ+ εi

(IIV.ii) contrasts with the classical IV assumption that ρzj ε = 0

DL Millimet (SMU) ECO 7377 Fall 2011 254 / 407

Page 255: Microeconometrics Lecture Notes

Dene

λj =ρzj ε

ρx ε

which is in the unit interval under (IIV.i), (IIV.ii)

If λj were known, then a valid IV for x is

Vj (λj ) = σx zj λj σzj x

However, Λ = [λ1 λkz ] is unknown, but lies in the unit cube inRkz -space

Intuitively, searching over feasible values of Λ, one may bound β

DL Millimet (SMU) ECO 7377 Fall 2011 255 / 407

Page 256: Microeconometrics Lecture Notes

Consider kz = 1I Partial out the e¤ects of w by dening

eyi = yi wi [(w 0w)1w 0y ]exi = xi wi [(w 0w)1w 0x ]

(Note: If w is endogenous with valid IVs, then the OLS coe¤s arereplaced by IV coe¤s.)

I Under (IIV.i) (IIV.iii) and assuming without loss of generality thatρx ε 0, obtain the following bounds:

F Case I. (σzexσx σxexσz )σzex > 0β 2

([βIVV (1), β

IVz ] if σzex < 0

[βIVz , βIVV (1)] if σzex > 0

F Case II. (σzexσx σxexσz )σzex 0β 2

8<: [maxn

βIVz , βIVV (1)

o,∞) if σzex < 0

(∞,minn

βIVz , βIVV (1)

o] if σzex > 0

DL Millimet (SMU) ECO 7377 Fall 2011 256 / 407

Page 257: Microeconometrics Lecture Notes

Additional work to bound δ is also possible

Extension to kz > 1I Bounds can be tightened by obtaining bounds for each z individuallyand then computing the nal bounds as the intersection of the kzbounds

I Formally

F For each zj , obtain Bj = [βlj , β

uj ]

F Final bounds given by

β 2maxjfβljg,minj fβuj g

F In Case II, these bounds are one-sided; one trick may be to try anddene a new IV that is a weighted average of two of the IVs such that(σqexσx σxexσq )σqex > 0, where qi = γzji + (1 γ)zj 0 i

I Need to be careful, though, and make sure di¤erent zs estimate thesame parameter (discussed later)

DL Millimet (SMU) ECO 7377 Fall 2011 257 / 407

Page 258: Microeconometrics Lecture Notes

Conley et al. (2010) ...Setup

yi = xi β+ ziγ+ εi

xi = ziπ + ui

where x is a kx -dimensional vector of endogenous regressors, z is akz -dimensional vector of instruments, kz kx , and E[z 0ε] = 0Classical IV requires the assumption that γ = 0

I With kx = kz = 1, we have

plim bβIV = β+σzeεσxz

= β+γσ2zπσ2z

= β+γ

π

where eε = ziγ+ εi is the composite errorI Thus, IV is asymptotically biased when γ 6= 0 and the bias isdecreasing in π and increasing in γ

I Authors refer to deviations from γ = 0 as plausible exogeneity

Approach

I Track estimates bβ(γ) = bβIV γ/bπ for di¤erent values of γI Estimates will be more sensitive to γ the weaker the rst-stagerelationship

DL Millimet (SMU) ECO 7377 Fall 2011 258 / 407

Page 259: Microeconometrics Lecture Notes

Authors present several possible methods of inference, only somepresented here

Method #1. Union of CIs with γ Support AssumptionI Suppose the true value of γ = γ0 Gkz , with known boundsI If γ0 were known, then IV/TSLS applied to

yi ziγ0 = xi β+ εi

using z as instruments is consistent for βI With γ0 unknown, but contained in Gkz , one can

F Apply IV/TSLS to a grid of values for γ from Gkz

F For each value, γs , s = 1, ...,S , obtain the (1 α)% CI for βF Compute a nal CI as the union of these S CIs

CI (1 α) = [γ2Gkz CI (1 α,γ)

which has an asymptotic coverage probability 1 αF If some prior info, may want to weight di¤erent γs di¤erently

DL Millimet (SMU) ECO 7377 Fall 2011 259 / 407

Page 260: Microeconometrics Lecture Notes

Method #2. γ Local-to-Zero ApproximationI γ is treated as unknown, but coming from a known dbn

γ =ηpN, η G

where prior info on γ translates to knowing the dbn GI The normalization by

pN ensures that uncertainty about z being a

valid instrument and sampling error are of the same order and so bothfactor into the asymptotic dbn of bβ

I Assuming γ N(µγ,Ωγ) leads to the following approximate dbn

bβ N(β+ Aµγ,VIV + AΩγA0)

where A = (x 0z(z 0z)1z 0x)1x 0zI If µγ = 0, then this approach simply leads to a revised variance for theIV/TSLS estimator

Stata ado les available on Conleys website

DL Millimet (SMU) ECO 7377 Fall 2011 260 / 407

Page 261: Microeconometrics Lecture Notes

Selection on UnobservablesIV: Heterogenous Treatment E¤ects

Assume a binary endogenous regressor, D, and a binary instrument, z

Motivation arises from the fact that the treatment e¤ect may varyacross by i and agents may act on observation-specic gains whenmaking treatment decision

Admitting this possibility implies that one must think more carefullyabout what parameter one is estimating

DL Millimet (SMU) ECO 7377 Fall 2011 261 / 407

Page 262: Microeconometrics Lecture Notes

Linear model

Setup (from earlier potential outcomes framework)

yi y0i +Di (y1i y0i )= α0 + exi β+ υ0i +Di (α1 + exi β+ υ1i α0 exi β υ0i )

= α0 + exi β+ (α1 α0 + υ1i υ0i )Di + υ0i

xi β+ ∆iDi + εi

Dene ∆i = (α1 α0) + (υ1i υ0i ) ∆+ ∆iSubstitution implies

yi = xi β+ ∆Di + (∆i Di + εi )

where ∆i Di + εi is the composite error term, which di¤ers from theusual error term for the treated

DL Millimet (SMU) ECO 7377 Fall 2011 262 / 407

Page 263: Microeconometrics Lecture Notes

A valid IV in the homogeneous treatment e¤ects setup requires

E[εi jxi ,Di , zi ] = E[εi jxi ,Di ]

but nowE[∆i Di + εi jxi ,Di , zi ] = E[∆i Di + εi jxi ,Di ]

is required

Thus, z must beI Correlated with Di (as usual)I Uncorrelated with the error term from the structural model andindividual-specic gains (or losses) from treatment

F Not possible unless (i) ∆i = 0 8i (implying a constant treatmente¤ect) or (ii) ∆i ? Di jxi (implying that agents either do not know ordo not act on specic gains ... no essential heterogeneity)

F Model with ∆i and Di correlated known as Correlated RandomCoe¢ cients (CRC) model

DL Millimet (SMU) ECO 7377 Fall 2011 263 / 407

Page 264: Microeconometrics Lecture Notes

Much more restrictive requirementI Example: if z is an exogenous variable representing the cost ofparticipation in the treatment (e.g., distance to job training center),then high z will lead to no participation unless the benet fromparticipation, ∆i , is very high; if z is low, one will participate if ∆i islow or high ) positive correlation between z and ∆i conditional on Di

If z is uncorrelated with ε, but correlated with ∆i , then IV estimatesare still useful, but identify a di¤erent parameter

Parameter known as local average treatment e¤ect (LATE)

DL Millimet (SMU) ECO 7377 Fall 2011 264 / 407

Page 265: Microeconometrics Lecture Notes

Formally, given the model (ignoring x)

yi = α+ ∆Di + (∆i Di + εi )

and an instrument, z , we have

plim b∆OLS =Cov(y ,D)

Var(D)= ∆+

Cov(ε,D) +Cov(∆D,D)Var(D)

6= ∆

plim b∆IV =Cov(y , z)Cov(D, z)

= ∆+Cov(ε, z) +Cov(∆D, z)

Cov(D, z)

= ∆+Cov(∆D, z)

Cov(D, z)6= ∆

where the last inequality holds unless (i) ∆i = 0 8i or (ii) ∆i ? Di jxi(as stated above)

How do we interpret b∆IV ?DL Millimet (SMU) ECO 7377 Fall 2011 265 / 407

Page 266: Microeconometrics Lecture Notes

LATE

Assume a binary endogenous regressor, D, and a binary instrument,z , and no other covariates (for simplicity)

Four potential subpopulations

z = 0 z = 1Never Takers (NT) D = 0 D = 0Deers (DF) D = 1 D = 0Compliers (C) D = 0 D = 1Always Takers (AT) D = 1 D = 1

Compliers are the key, as their treatment status varies with theinstrument

DL Millimet (SMU) ECO 7377 Fall 2011 266 / 407

Page 267: Microeconometrics Lecture Notes

Recall, the Wald estimator

b∆IV = E[y jz = 1] E[y jz = 0]Pr(D = 1jz = 1) Pr(D = 1jz = 0)

Numerator terms may be expressed as

E[y jz = j ] =

8<: E[y1jAT ]Pr(AT ) + E[yj jC ]Pr(C )+ E[y(1j)jDF ]Pr(DF )+ E[y0jNT ]Pr(NT )

9=; , j = 0, 1

DL Millimet (SMU) ECO 7377 Fall 2011 267 / 407

Page 268: Microeconometrics Lecture Notes

Denominator terms may be expressed as

Pr[D = 1jz = j ] =

8>><>>:Pr[D = 1jz = j ,AT ]Pr(AT )+ Pr[D = 1jz = j ,C ]Pr(C )+ Pr[D = 1jz = j ,DF ]Pr(DF )+ Pr[D = 1jz = j ,NT ]Pr(NT )

9>>=>>; , j = 0, 1=

Pr(AT ) + Pr(C ) if j = 1Pr(AT ) + Pr(DF ) if j = 0

DL Millimet (SMU) ECO 7377 Fall 2011 268 / 407

Page 269: Microeconometrics Lecture Notes

Wald estimator reduces to

b∆IV =fE[y1jC ]Pr(C ) + E[y0jDF ]Pr(DF )g fE[y0jC ]Pr(C ) + E[y1jDF ]Pr(DF )g

Pr(C ) Pr(DF )

which is a weighted average of the treatment e¤ect for compliers andthe negative of the treatment e¤ect for deers

Assumptions

(LATE.i) Independence: fy0, y1,D0,D1g ? z , where Dj , j = 0, 1, are potentialtreatment assignments

(LATE.ii) Exclusion: E[y0 jz ] = E[y0 ]; E[y1 jz ] = E[y1 ](LATE.iii) First-Stage/Compliers: Pr(C ) > 0) Pr(D = 1jz) is a non-trivial

function of z(LATE.iv) Monotonicity: Pr(Di = 1jzi = 1) > Pr(Di = 1jzi = 0) 8i )

Pr(DF ) = 0

DL Millimet (SMU) ECO 7377 Fall 2011 269 / 407

Page 270: Microeconometrics Lecture Notes

Imposing these assumptions )

b∆IV = b∆LATE = E[y1 y0jC ]

which is a parameter dened with respect to a particular instrument

CommentsI LATE is a well-dened economic parameterI Whether it is an interesting parameter is a di¤erent matterI Not possible to know who are the compliers in the dataI Interpretation is similar, but derivation more complex, if D or z iscontinuous

F Continuous z estimates the local instrumental variable (LIV) parameter(Heckman and Vytlacil 1999)

I With multiple instruments, things become thorny ... di¤erentinstruments, even if all valid, potentially identify di¤erent parameters!

F No reason why di¤erent IV estimates should be the sameF Using multiple IVs yield a weighted average of di¤erent LATEs

DL Millimet (SMU) ECO 7377 Fall 2011 270 / 407

Page 271: Microeconometrics Lecture Notes

DiNardo & Lee (2011) provide an alternative interpretation of the IVestimand

I They replace the monotonicity assumption with what they call aprobabilistic monotonicity assumption

I The result is that b∆IV is shown to be a weighted average of ∆i wherethe weights are proportional to the increase inPr(Di = 1jzi = 1) Pr(Di = 1jzi = 0)

F Under the monotonicity assumption,

Pr(Di = 1jzi = 1) Pr(Di = 1jzi = 0) =0 if type = AT ,NT1 if type = C

so that only compliers receive positive weightF This follows from the assumption that D is a deterministic fn of zF Probabilistic monotonicity relaxes this assumption and allows D to be anondecreasing fn of z (conditional on type)

DL Millimet (SMU) ECO 7377 Fall 2011 271 / 407

Page 272: Microeconometrics Lecture Notes

Not possible to infer anything about ∆ATE , ∆ATT , or ∆ATU withoutadditional assumptions about how compliers compare to rest of thepopulation

I Vytlacil et al. (2009) working on when one can learn the sign of ∆ATEI DiNardo & Lee (2011) discuss extrapolating to the ∆ATEI Heckman et al. (2010) propose two tests of the CRC assumption

Ho : ∆i ? Di jxi

F Test #1 based on comparison of di¤erent (valid) IV estimates; underHo di¤erent IVs provide consistent estimates of the same parametereven if they lead to di¤erent sub-populations of compliers

F Test #2 based on testing for a linear relationship between y and theestimated propensity score conditional on x

DL Millimet (SMU) ECO 7377 Fall 2011 272 / 407

Page 273: Microeconometrics Lecture Notes

Selection on UnobservablesIV: Finding Instruments

Economic theory ... what determines participation, but not outcomes?

Exogenous variation in program availability (across space or overtime) ... must be exogenous

Natural experiments ... twins, sex composition, miscarriages, MarialCuban boatlift, Russian immigration to Israel

Randomized experiments (even if imperfect compliance) ... ProjectStar

DL Millimet (SMU) ECO 7377 Fall 2011 273 / 407

Page 274: Microeconometrics Lecture Notes

Fuzzy regression discontinuity design

Recall from sharp RD case that we require the existence of thefollowing limits

D+ = lims#sPr(D = 1js)

D = lims"sPr(D = 1js)

and D+ 6= DI Sharp RD setup implies D+ = 1 and D = 0I Fuzzy RD setup implies 1 D+ > D 0

DL Millimet (SMU) ECO 7377 Fall 2011 274 / 407

Page 275: Microeconometrics Lecture Notes

Formally

(FRD.i) Treatment assignment is a discontinuous function of s (with a knownthreshhold, s)

Di = D(si , υi )

where

Pr(D = 1) = Pr(D = 1js s)Pr(s s)+Pr(D = 1js < s)Pr(s < s)

(FRD.ii) Positive density at the threshold: fS (s) > 0(FRD.iii) Outcomes are continuous in s at least around s and do not depend on

whether s ? s(FRD.iv) For each agent, the dbn of s is continuous at least around s

DL Millimet (SMU) ECO 7377 Fall 2011 275 / 407

Page 276: Microeconometrics Lecture Notes

NotesI Endogenous treatment variable, D, depends on observed score variable,s, and stochastic element

I Discrete jump in Pr(D = 1) at sI Example: Pr(D = 1) = maxf0, 0.5s + 0.25 I(s > 0.5) + υg

0.2

.4.6

.81

0 .2 .4 .6 .8 1x

Pr(

D=1

)

Implies Di = E[D jsi ] + υi , where Cov(ε, υ) 6= 0DL Millimet (SMU) ECO 7377 Fall 2011 276 / 407

Page 277: Microeconometrics Lecture Notes

OLS estimation of

yi = xi β+ ∆Di + f (si ) + εi

where x is a vector of exogenous controls, is biased, even with aexible function of s included

SolutionI Estimate propensity score, where f (s) is included along with the

indicator I(s > s) ) [p(D)I Estimate by OLS

yi = xi β+ ∆\p(Di ) + f (si ) + εi

I Equivalent to TSLS, with I(s > s) as the instrument, when f (s) ischosen parametrically

DL Millimet (SMU) ECO 7377 Fall 2011 277 / 407

Page 278: Microeconometrics Lecture Notes

IntepretationI Typical interpretation: RD identies the LATE at sI DiNardo & Lee (2011) intepret the estimated parameter as a weightedaverage of ∆i where the weights are proportional to (i) the probabilityof si being in the neighborhood of s and (ii) the inuence of crossingthe threshold, s, on the probability of receiving the treatment

DL Millimet (SMU) ECO 7377 Fall 2011 278 / 407

Page 279: Microeconometrics Lecture Notes

Selection on UnobservablesMethods Not Requiring Exclusion Restrictions

Several methods exist that do not rely on a typical exclusionrestriction for identication

1 Heckman bivariate normal selection model2 Millimet & Tchernis (2011) bias-corrected estimator3 Higher moments4 Covariance restrictions

All such methods mustreplace the assumptionconcerning an exclusionrestriction with someother identifyingassumption (there is nosuch thing as a free lunch)

DL Millimet (SMU) ECO 7377 Fall 2011 279 / 407

Page 280: Microeconometrics Lecture Notes

Selection on UnobservablesHeckman Bivariate Normal Selection Model

Requires fairly strong parametric assumptions to circumvent theselection on unobservables problem

Also useful to solve problems of non-random sample selection(discussed later)

DL Millimet (SMU) ECO 7377 Fall 2011 280 / 407

Page 281: Microeconometrics Lecture Notes

Treatment e¤ects model with common e¤ect

Setup

y0i = xi β0 + εi

y1i = xi β1 + εi

yi = Diy1i + (1Di )y0iDi = ziγ+ ui

Di =

1 if Di > 00 if Di 6 0

DL Millimet (SMU) ECO 7377 Fall 2011 281 / 407

Page 282: Microeconometrics Lecture Notes

NotesI εi = common error component (or common e¤ect) in both potentialoutcome equations

I βs allowed to di¤er across outcome equationsI Di = latent indicator of treatment statusI Model rules out selection on observables assumption sinceunobservables associated with treatment status, u, are correlated withunobservables a¤ecting outcomes conditional on x

Assumptions

(BVN.i) ε, u N2(0, 0, σ2ε , σ2u , ρ)(BVN.ii) ε, u ? x , z(BVN.iii) σ2u = 1

DL Millimet (SMU) ECO 7377 Fall 2011 282 / 407

Page 283: Microeconometrics Lecture Notes

Parameters of interestI Given the setup, individual-specic treatment e¤ect is given by

∆i = y1i y0i = xi (β1 β0)

I Average treatment e¤ects are

∆ATE = E[∆i ] = E[Xi ](β1 β0)

∆ATT = E[∆i jDi = 1] = E[Xi jDi = 1](β1 β0)

∆ATU = E[∆i jDi = 0] = E[Xi jDi = 0](β1 β0)

I Implies consistent estimates of all three parameters require consistentestimates of β0, β1

I Two naïve options:

F Split sample into D = 1 and D = 0, and regress y on x via OLS ineach sub-sample

F Pool sample, regress y on x ,Dx

I Under selection on unobservables, neither option produces consistentestimates

DL Millimet (SMU) ECO 7377 Fall 2011 283 / 407

Page 284: Microeconometrics Lecture Notes

Conditional expectations (following from the properties of conditionalnormal random variables)

I Of the outcome in the treated state for the treated

E[yi jDi = 1, xi , zi ] = xi β1 + E[εi jui > ziγ]

= xi β1 + ρσε

φ(ziγ)Φ(ziγ)

= xi β1 + ρσε [λ(ziγ)]

where λ() is known as the Inverse MillsRatioI Of the outcome in the untreated state for the untreated

E[yi jDi = 0, xi , zi ] = xi β0 + E[εi jui 6 ziγ]

= xi β0 + ρσε

φ(ziγ)1Φ(ziγ)

I Given Corr(ε, u) 6= 0, error term is no longer well-behaved

DL Millimet (SMU) ECO 7377 Fall 2011 284 / 407

Page 285: Microeconometrics Lecture Notes

Estimation: Method #1

Estimate the outcome equation for the treated and the untreatedseparately via OLS

Consistent estimates of β0, β1 require inclusion of the selection terms

Selection terms are estimable by1 Estimating a probit model for treatment assignment ) bγ2 Estimating the selection terms

φ(zi bγ)Φ(zi bγ)

and

φ(zi bγ)1Φ(zi bγ)

3 Including these as additional covariates in each second-stage regression

DL Millimet (SMU) ECO 7377 Fall 2011 285 / 407

Page 286: Microeconometrics Lecture Notes

Upon estimation of bβ0, bβ1 ...I Predict by1i , by0i 8iI Estimate treatment e¤ect parameters

b∆ATE = by1i by0ib∆ATT = by1i by0ib∆ATU = by1i by0iwhere ATE computes mean for entire sample, and latter two computemeans using only the treated and untreated, respectively

I Equivalently,

b∆ATE = x(bβ1 bβ0)b∆ATT = x1(bβ1 bβ0)b∆ATU = x0(bβ1 bβ0)where x is the sample mean, and xk , k = 0, 1, is the sample mean inthe sub-sample with D = k

DL Millimet (SMU) ECO 7377 Fall 2011 286 / 407

Page 287: Microeconometrics Lecture Notes

Estimation: Method #2

Estimate a single outcome equation with no restriction

yi = xi β0 + xiDi (β1 β0) + βλ1Di

φ(ziγ)Φ(ziγ)

+ βλ0(1Di )

φ(ziγ)1Φ(ziγ)

+ ηi

This does not impose the restriction that the coe¢ cient on bothselection terms should be the same: ρσε

Thus, testing Ho : βλ0 = βλ1 constitutes a specication test of theunderlying model

DL Millimet (SMU) ECO 7377 Fall 2011 287 / 407

Page 288: Microeconometrics Lecture Notes

Note

ηi = εi βλ1Di

φ(ziγ)Φ(ziγ)

βλ0(1Di )

φ(ziγ)1Φ(ziγ)

= εi Di E[εi jDi = 1] (1Di )E[εi jDi = 0]

which is a well-behaved error term since the portion of the error termthat is correlated with treatment assignment now appears in themodel in the form of the selection correction terms

DL Millimet (SMU) ECO 7377 Fall 2011 288 / 407

Page 289: Microeconometrics Lecture Notes

Estimation: Method #3

Estimate a single outcome equation imposing the restriction thatβλ0 = βλ1

yi = xi β0 + xiDi (β1 β0)

+ βλ

Di

φ(ziγ)Φ(ziγ)

+ (1Di )

φ(ziγ)1Φ(ziγ)

+ ηi

E¢ ciency gain if, in fact, the restriction is true

DL Millimet (SMU) ECO 7377 Fall 2011 289 / 407

Page 290: Microeconometrics Lecture Notes

Estimation: Method #4

Maximum likelihood estimation of the system of three equations

Above estimators are known as control function approach sinceselection terms control for selection on unobservables

ML is not a control function approach, but rather directlyincorporates the covariance structure of the errors into the estimationby jointly estimating the system of equations

Benets: yields an estimate of ρ along with a std error, more e¢ cientif parametric assumptions are true

Cost: results are less robust if parametric assumptions of the modelare violated

DL Millimet (SMU) ECO 7377 Fall 2011 290 / 407

Page 291: Microeconometrics Lecture Notes

Comments

There is no instrumentor exclusion restriction required foridentication

I Identication arises from the non-linearity of the selection correctionterms, which in turn arises from the assumption of bivariate normality

I Exclusion restrictions a variable in z not in x would be nice

Semi-parametric versions existI Relaxes dependence on bivariate normalityI Require exclusion restrictionsI One version includes a polynomial of the propensity score in theregression model; motivation is to include a exible functional form tocapture the selection terms without reliance on bivariate normality

Bivariate probit treatment e¤ects modelI Similar to above models, except outcome of interest is binary (e.g.,employment following a job training program)

I Similar estimation to above by ML, except likelihood is based on abivariate probit model (same as in Altonji et al. (2005) unconstrainedbivariate probit model)

DL Millimet (SMU) ECO 7377 Fall 2011 291 / 407

Page 292: Microeconometrics Lecture Notes

Aside:

Typical IV estimator can also be implemented using a control functionapproach

I TSLS estimator of the model

yi = β1x1i + x2i β2 + εi

x1i = ziπ1 + x2iπ2 + ui

is equivalent to OLS estimation of

yi = β1x1i + x2i β2 + ui +eεiwhere ui is replaced with the OLS estimate of the rst-stage

residualI Since bui = x1i zi bπ1 x2i bπ2, this is not linearly independent of x2unless π1 6= 0

DL Millimet (SMU) ECO 7377 Fall 2011 292 / 407

Page 293: Microeconometrics Lecture Notes

Treatment e¤ects model without the common e¤ect assumption

Relaxation of common e¤ect assumption allows for heterogeneouse¤ects of the treatment even conditional on x

Setup

y0i = xi β0 + ε0i

y1i = xi β1 + ε1i

= xi β1 + [(ε1i ε0i ) + ε0i ]

= xi β1 + [δi + ε0i ]

yi = Diy1i + (1Di )y0iDi = ziγ+ ui

Di =

1 if Di > 00 if Di 6 0

DL Millimet (SMU) ECO 7377 Fall 2011 293 / 407

Page 294: Microeconometrics Lecture Notes

NotesI δi = obs-specic gain to treatment (conditional on x)I ∆i = y1i y0i = xi (β1 β0) + δi (heterogeneous treatment e¤ectsgiven x)

I Selection into treatment may depend on either ε0i (untreated outcomelevel given x) or δi (obs-specic gains given x)

I Otherwise, intuition is identical to common e¤ect version

Assumptions (replaces (BVN.i))

(BVN.i) ε0, ε1, u N(0,Σ), where

Σ =

24σ2ε0 ρ01 ρ0uσ2ε1 ρ1u

1

35

DL Millimet (SMU) ECO 7377 Fall 2011 294 / 407

Page 295: Microeconometrics Lecture Notes

Conditional expectations

E[ε0i jDi = 1, xi , zi ] = ρ0uσε0

φ(ziγ)Φ(ziγ)

E[δi jDi = 1, xi , zi ] = ρδuσδ

φ(ziγ)Φ(ziγ)

E[ε0i jDi = 0, xi , zi ] = ρ0uσε0

φ(ziγ)1Φ(ziγ)

DL Millimet (SMU) ECO 7377 Fall 2011 295 / 407

Page 296: Microeconometrics Lecture Notes

Estimation

Generalization of the previous two-step approach in the commone¤ect modelEstimating equation

yi = xi β0 + xiDi (β1 β0) +eβλ1Di

φ(ziγ)Φ(ziγ)

+ βλ0(1Di )

φ(ziγ)1Φ(ziγ)

+ ζ i

where eβλ1 = ρ0uσε0 + ρδuσδ

βλ0 = ρ0uσε0

Selection terms obtain by estimating rst-stage probit model for DML estimation of entire model is feasible, but it requires estimation ofa trivariate normal dbn (computationally di¢ cult)ρ01 is not identied since never observe y1 and y0 for same i

DL Millimet (SMU) ECO 7377 Fall 2011 296 / 407

Page 297: Microeconometrics Lecture Notes

Upon estimation of bβ0, bβ1 ...I Predict by1i , by0i 8iI Estimate b∆ATE b∆ATE = by1i by0i = x bβ1 bβ0where ATE computes mean for entire sample

I ATT is given by

∆ATT = Exi jDi=1

[xi (β1 β0)] + Eδi jDi=1

[δi ]

= Exi jDi=1

[xi (β1 β0)] + Ezi jDi=1

ρδuσδ

φ(ziγ)Φ(ziγ)

F If there is no selection on unobservable gains, then ρδu = 0 ) commone¤ect model

F eβλ1 βλ0 = ρδuσδ )\ρδuσδ =beβλ1 bβλ0, which gives the sign of the

selection on gains (which one expects to be positive if obs know theirunobservable gains)

F Estimate obtained by replacing expectations with sample averageswithin the treatment group

I ATU obtained in similar fashion, but average over x , z in control group

Stata: -treatreg -, -biprobit-DL Millimet (SMU) ECO 7377 Fall 2011 297 / 407

Page 298: Microeconometrics Lecture Notes

Selection on UnobservablesMillimet & Tchernis (2011)

Builds on the minimum biased approach (discussed earlier) by o¤eringa bias-corrected procedure

Recall, under certain assumptions the bias of the ATT, ATE at somevalue of the propensity score, p(x), is given by

BATT [p(x)] = ρ0uσ0φ(Φ1(p(x)))p(x)[1 p(x)]

BATE [p(x)] = fρ0uσ0 + [1 p(x)]ρδuσδg

φ(Φ1(p(x)))p(x)[1 p(x)]

where

I ρ0u = selection on unobservables a¤ecting outcome in untreated stateI ρδu = selection on unobserved, individual-specic gains

BATT [p(x)] is minimized at p(x) = 0.5; BATE [p(x)] does not have aunique minimum

DL Millimet (SMU) ECO 7377 Fall 2011 298 / 407

Page 299: Microeconometrics Lecture Notes

Minimum-biased (MB) estimation techniqueI Stage 1: Estimate the propensity score (e.g., probit model)I Stage 2: Retain only those observations with a propensity score,[p(xi ), within a xed neighborhood around p(x), the bias-minimizingpropensity score

I Stage 3: Estimate the ATE or ATT using any propensity-score basedestimator that relies on CIA using this sub-sample

For ATE, add Stage 1.5: Estimate the error correlations usingHeckman BVN model

BC estimator amends the previous MB estimator by removing theestimated bias

b∆kBC = b∆k Z \Bk [p(xi )]fk (x)dx , k = ATE ,ATT ,ATU

where fk (x) is the appropriate dbn needed to estimate parameter k

Millimet & Tchernis (2011) nd some benet to this estimator,particularly in large samples, using MC

DL Millimet (SMU) ECO 7377 Fall 2011 299 / 407

Page 300: Microeconometrics Lecture Notes

Selection on UnobservablesHigher Moments: Lewbel (2010) approach

Originally proposed as a solution to measurement error, butpotentially applicable to more general dependence between x and ε(Lewbel 1997, 2010)

SetupI Structuralmodel

yi = β1Di + xi β2 + εi

I First-stage modelDi = xiπ + ui

where

F x includes the interceptF Cov(ε, u) 6= 0

D may be discrete or continuous

DL Millimet (SMU) ECO 7377 Fall 2011 300 / 407

Page 301: Microeconometrics Lecture Notes

Potential instruments for D include (zi z)ui , where z xEstimation requires consistently estimating the rst-stage andreplacing u with buValidity of the IVs requires

(HM.i) E[z 0u2 ] 6= 0(HM.ii) E[z 0εu] = 0

Restrictions are satised if, say,

εi = θi +eεiui = θi + eui

where θi is a homoskedastic common factor and the sole source ofcorrelation between ε and u, and eu is heteroskedastic with variancedepending on z

DL Millimet (SMU) ECO 7377 Fall 2011 301 / 407

Page 302: Microeconometrics Lecture Notes

Selection on UnobservablesHigher Moments: Klein & Vella (2009, 2010); Farré et al. (2010)

Setup as in the prior modelI Structuralmodel

yi = β1Di + xi β2 + εi

I First-stage modelDi = xiπ + ui

where

F x includes the interceptF Cov(ε, u) 6= 0

D may be discrete or continuous

DL Millimet (SMU) ECO 7377 Fall 2011 302 / 407

Page 303: Microeconometrics Lecture Notes

Identication assumptions

(KV.i) εi = Sε(zi )εi and/or ui = Su(zi )ui , where z x , such that

Sε(zi )/Su(zi ) varies across i(KV.ii) E[εi u

i ] = ρ, which is constant

Under (KV.i) and (KV.ii), the structural model may be re-written as

yi = β1Di + xi β2 + ρ

Sε(zi )Su(zi )

ui

+eεi

where eεi is now a well-behaved error termThe term in brackets acts as a control function since it controls forselection bias such that conditional on this term and x D is nolonger correlated with the error term

Klein & Vella (2009) propose a semiparametric estimator of the model

Farré et al. (2010) outline a parametric estimator

DL Millimet (SMU) ECO 7377 Fall 2011 303 / 407

Page 304: Microeconometrics Lecture Notes

Parametric Estimation

Assuming

Sε(zi ) =qexp(zi θε)

Su(zi ) =qexp(zi θu)

the structural model becomes

yi = β1Di + xi β2 + ρ

"pexp(zi θε)pexp(zi θε)

ui

#+eεi

Estimate the rst-stage by OLS ) buEstimate by OLS

ln(bu2i ) = zi θu + euiand form bSu(zi ) = qexp(zibθu)DL Millimet (SMU) ECO 7377 Fall 2011 304 / 407

Page 305: Microeconometrics Lecture Notes

Substitute bu and bSu(zi ) into the structural model and estimate theremaining parameters by NLS

yi = β1Di + xi β2 + ρ

"pexp(zi θε)bSu(zi ) bui

#+eεi

While one could stop, performance is perhaps improved by addingadditional steps

I Given NLS estimates of β1 and β2 ) bεI Estimate by OLS

ln(bε2i ) = zi θε +eeεiand form bSε(zi ) =

qexp(zibθε)

I Estimate by OLS

yi = β1Di + xi β2 + ρ

" bSε(zi )bSu(zi )bui#+eεi

Obtain std errors via bootstrap

DL Millimet (SMU) ECO 7377 Fall 2011 305 / 407

Page 306: Microeconometrics Lecture Notes

Selection on UnobservablesHigher Moments: Klein & Vella (2009)

SetupI Structuralmodel

yi = β1Di + xi β2 + εi

I First-stage modelDi = xiπ1 + ui

where x contains an intercept

When D is binary, one may estimate the rst-stage via probit andform an instrument using the propensity score, dp(x)Even with no exclusion restriction, dp(x) is correlated with D andlinearly independent of x (since dp(x) = Φ(x bπ))However, most of this linearity occurs in the tails

DL Millimet (SMU) ECO 7377 Fall 2011 306 / 407

Page 307: Microeconometrics Lecture Notes

Additional non-linearity of the IV may be induced if one uses aheteroskedastic probit to form the IV

I σu is modeled as exp(xδ)I dp(x) = Φ(x bπ/ exp(xbδ))I Additional non-linearity is roughly equivalent to using higher-orderterms of x as exclusion restrictions

Klein & Vella (2009) also propose a semiparametric version

DL Millimet (SMU) ECO 7377 Fall 2011 307 / 407

Page 308: Microeconometrics Lecture Notes

Selection on UnobservablesHigher Moments: Vella & Verbeek (1997); Rummery et al. (1999)

Vella and Verbeek (1997) propose an alternative IV strategy that mayalso be valid with heteroskedastic errors

Known as Rank Order IV

Setup as in the prior models

yi = β1Di + xi β2 + εi

Di = xiπ + ui

whereI x includes the interceptI Cov(ε, u) 6= 0

D may be discrete or continuous

DL Millimet (SMU) ECO 7377 Fall 2011 308 / 407

Page 309: Microeconometrics Lecture Notes

Identication assumptions

(ROIV.i) An agents level of unobserved heterogeneity responsible forCov(ε, u) 6= 0 does not impact y , but rather only the agents relativeposition or rank order matters

(ROIV.ii) Data can be partitioned into subsets such that agents may be pairedacross subsets in a manner leading to pairs with identical ranks in theirrespective subsets but di¤erent levels of D

For example, if y is wages, D is participation in a training program,and endogeneity is due to unobserved work ethic, then

I (ROIV.i) implies that the level of ones work ethic does not impactwages but only the fraction of workers with whom ones work ethicexceeds

F I.e., ones level of work ethic is irrelevant, only ones percentile in thedbn if work ethic matters

I (ROIV.ii) implies we can divide the data (say, by region) such thatacross regions individuals at the same percentile of the dbn of workethic within their region have di¤erent values of D

DL Millimet (SMU) ECO 7377 Fall 2011 309 / 407

Page 310: Microeconometrics Lecture Notes

To proceed, partition the data into mutually exclusive groups,s = 1, ...,S , on the basis of some attribute, qi (which may be asubset of x)Notation

I Dene F (jqi ) as the CDF of u given qI Let ci = F (ui jqi ) be the rank order of obs i in its partition

(ROIV.i) may be expressed formally as

E[εi jxi ,Di , ui , qi ] = E[εi jui , qi ] = E[εi jci ] = m(ci )where m() is some fn mapping c to y

I This condition states that E[εi jui , qi ] depends only on u and q throughthe rank order, c

I Vella & Verbeek (1997) refer to as the order restriction

The order restriction is useful for identifying the model since it impliesthat agents from di¤erent partitions, qi 6= qj , but with identical rankorders, ci = cj , are identical along the unobserved dimensionresponsible for the endogeneityTo be useful, however, requires an additional assumption, (ROIV.ii),such that these comparable pairs of agents have di¤erent values of DDL Millimet (SMU) ECO 7377 Fall 2011 310 / 407

Page 311: Microeconometrics Lecture Notes

Estimation

Re-write the structural model as

yi = β1Di + xi β2 +m(ci ) +eεiwhere eε is now a well-behaved error term; m(c) is another example ofa control function, but c and m() are unknownEstimate ci by

I Estimating the rst-stage model via OLS ) buiI Estimate bci nonparametrically using the empirical CDF within each ofthe S partitions based on q

Approximate m(c) using a nite-order polynomial in bcAlternatively, one may estimate the original structural model

yi = β1Di + xi β2 + εi

by IV with the instrument given by the residual, bη, obtained afterOLS estimation of the model

Di = θ0 + θ1ci + ηi

DL Millimet (SMU) ECO 7377 Fall 2011 311 / 407

Page 312: Microeconometrics Lecture Notes

Selection on UnobservablesCovariance Restrictions

SetupI Structuralmodel

yi = β0 + β1Di + xi β2 + εi

I First-stage modelDi = π0 + xiπ1 + ui

I Reduced form model

yi = (β0 + β1π0) + xi (β1π1 + β2) + (εi + β1ui )

= eβ0 + eβ1xi + eυiWith no IV, estimable quantities include: π0,π1, eβ0, eβ1

I These four quantities are functions of ve structural parameters:π0,π1, β0, β1, β2

I Thus, the model is under-identied

DL Millimet (SMU) ECO 7377 Fall 2011 312 / 407

Page 313: Microeconometrics Lecture Notes

What about the covariance matrix of the system of reduced formeqtns? β1 also shows up there

yi = eβ0 + eβ1xi + (εi + β1ui )

Di = π0 + xiπ1 + ui

Assume ε, u N(0, 0, σε, σu , ρ), then eυ, u are also mean zero withcovariance matrix

Σ =

σ2ε + β21σ2u + 2β1ρσεσu ρσεσu + β1

σ2u

=

Σ11 Σ12

Σ22

Three quantities are estimable based on MLE of the system:Σ11,Σ12,Σ22

I These 3 quantities are functions of 4 structural parameters:β1, σε, σu , ρ

I Thus, the model remains under-identied

DL Millimet (SMU) ECO 7377 Fall 2011 313 / 407

Page 314: Microeconometrics Lecture Notes

Intuition: place restrictions on other parameters in Σ in order toidentify β1 from the cov matrix; intercept and slope parameters are allidentied then as well

Model is then estimated via ML

lnL = ∑i12ln jΣ1j 1

2ε0iΣ

1εi

where εi is the vector of errors for obs i

Note: If D is instead modelled as a LDV, then the likelihood must befactored appropriately to account for the fact that one eqtn has adiscrete outcome

DL Millimet (SMU) ECO 7377 Fall 2011 314 / 407

Page 315: Microeconometrics Lecture Notes

Realistic restrictions may be easier to devise if one adds additionaloutcomes that also depend on the same endogenous regressor

I Ex: K = 2

y1i = eβ10 + eβ11xi + (ε1i + β11ui )

y2i = eβ20 + eβ21xi + (ε2i + β21ui )

Di = π0 + xiπ1 + ui

which entails

Σ =

26666664σ2ε1 + β211σ2u+2β11ρ1σε1

σ2ε1 + σ2ε2 + 2ρ12σε1σε2+β11ρ2σε2σu

+β21ρ1σε1σu + β11β21

2ρ1σε1σu+β11σ2u

σ2ε2 + β221σ2u+2β21ρ2σε2

2ρ2σε2σu+β21σ2u

σ2u

37777775=

24Σ11 Σ12 Σ13Σ22 Σ23

Σ33

35I If y1, y2 are similar (e.g., two anthropometric measures), might impose

ρ1 = ρ2 and might have a strong prior for ρ12DL Millimet (SMU) ECO 7377 Fall 2011 315 / 407

Page 316: Microeconometrics Lecture Notes

Types of restrictions

Altonji et al. (2005)-type restrictions: impose values for ρ and trackestimates of β1Factor Structure

I Add additional outcomesI Decompose errors as

εki = λkµi + ηki , k = 1, ...,K

ui = λuµi + ξ i

where µ has unit var (normalization, not an assumption), η, ξ, µ areassumed to be independent, and λ are known as factor loadings

I Factor structure assumes all cross-eqtn correlation is through µI Parameters to be estimated from Σ: σηk

,λk , β1k ,λu , σξ

F This is 3K + 2 parameters in totalF Estimable quantities from Σ is (K + 1)K/2F (K + 1)K/2 3K + 2) K 6

Hogan and Rigobon (2003), Rigobon (2003) propose an Identicationthrough Heteroskedasiticity estimator that is very similar

DL Millimet (SMU) ECO 7377 Fall 2011 316 / 407

Page 317: Microeconometrics Lecture Notes

Selection on UnobservablesDistributional Approaches

Relatively recent work has begun to address endogeneity in thecontext of distributional models

Other estimators not discussed here1 Fixed e¤ect QR models (Koenker 2004)2 Nonparametric bounds applied to QR models (Giustinelli 2011)

DL Millimet (SMU) ECO 7377 Fall 2011 317 / 407

Page 318: Microeconometrics Lecture Notes

Selection on UnobservablesDistributional Approaches: Changes-in-Changes

Recall, standard DID strategyI Assume treatment group observed pre- and post-interventionI Assume control group observed in same time periodsI Assume treatment and control groups follow same time trend absenttreatment

I Estimate treatment e¤ect by the additional change over time in thetreatment group relative to the control group

Idea is extendable beyond just average treatment e¤ects

Model does require panel data or repeated cross-sections

DL Millimet (SMU) ECO 7377 Fall 2011 318 / 407

Page 319: Microeconometrics Lecture Notes

Setup (Athey & Imbens 2005)

NotationI Individual i belongs to a group Gi 2 f0, 1g, where G = 1 is treatmentgroup

I Individual i observed at time Ti 2 f0, 1gI yNi , y

Ii = potential outcomes in non-treated (N), treated (intervention,

I ) statesI yi = (1 Ii )yNi + Ii y Ii = observed outcome, where Ii = treatment(intervention) indicator

I Ii = GiTi

DL Millimet (SMU) ECO 7377 Fall 2011 319 / 407

Page 320: Microeconometrics Lecture Notes

Standard DIDI Untreated outcome

yNi = α+ βTi + γGi + εi

I Constant treatment e¤ect assumption

τ = y Ii yNiI Combining above two assumptions yields

yi = α+ βTi + γGi + τIi + εi

where

F τ = ATE with constant treatment e¤ect assumptionF τ = ATT with heterogeneous treatment e¤ect assumption

DL Millimet (SMU) ECO 7377 Fall 2011 320 / 407

Page 321: Microeconometrics Lecture Notes

Generalizing the standard modelI Untreated outcome

yNi = h(Ui ,Ti )

whereF h(u, t) is increasing in uF ui = unobservable attribute of iF yN is identical across individuals within a time period with identical u,irrespective of G

I Dbn of u may vary by G , but not over time within G , ui ? Ti jGiI In the absence of treatment...

F Any di¤erences in outcomes across groups is entirely due to di¤s in thedbn of u across groups

F Any changes in outcomes within groups over time is due to di¤s inh(u, 0) and h(u, 1) [i.e., since unobservables do not change over time,the e¤ect of unobservables on the untreated outcome must change overtime]

I Treated outcomey Ii = h

I (Ui ,Ti )

where hI (u, t) is increasing in u

DL Millimet (SMU) ECO 7377 Fall 2011 321 / 407

Page 322: Microeconometrics Lecture Notes

Changes-in-changes model

NotationI Conditional dbns

yNgt yN jG = g ,T = ty Igt y I jG = g ,T = tygt y jG = g ,T = tUg U jG = g

I Inverse CDFsF1y (q) = inffy : FY (y) > qg

GoalI Devise set of assumptions to identify dbn of yN11, FyN ,11, which is (oneof) the distributions of missing counterfactuals

I Observable dbns include: FyN ,10, Fy I ,11, FyN ,00, and FyN ,01

DL Millimet (SMU) ECO 7377 Fall 2011 322 / 407

Page 323: Microeconometrics Lecture Notes

Assumptions

(CIC.i) Model: yN = h(U,T )(CIC.ii) Strict monotonicity: h(u, t) is strictly increasing in u for t 2 f0, 1g(CIC.iii) Time invariance within groups: U ? T jG(CIC.iv) Support: U1 U0

DL Millimet (SMU) ECO 7377 Fall 2011 323 / 407

Page 324: Microeconometrics Lecture Notes

Estimator

Counterfactual CDF

bFyN ,11 = Fy ,10(F1y ,00(Fy ,01(y)))which is estimable using empirical CDFs

Treatment e¤ect estimate

τCICq = F1y I ,11(q) bF1yN ,11(q)Note, τCICq is the di¤erence in two QTE (Firpo 2007) estimates

τCICq = ∆QTEq,1 ∆QTEq 0,0

whereI ∆QTEq,1 is change over time in y at quantile q for G = 1 group

I ∆QTEq 0,0 is change over time in y at quantile q0 for G = 0 group, where

q0 is the quantile in the G = 0,T = 0 dbn corresponding to the valueof y associated with quantile q in the G = 1,T = 0 dbn

DL Millimet (SMU) ECO 7377 Fall 2011 324 / 407

Page 325: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 325 / 407

Page 326: Microeconometrics Lecture Notes

Alternative estimatorI QDID treatment e¤ect estimator

τQDIDq = F1y I ,11(q) bF1yN ,11(q)where bF1yN ,11(q) = F1y ,10(q) + [F1y ,01(q) F1y ,00(q)]which corresponds to

τQDIDq = ∆QTEq,1 ∆QTEq,0

where ∆QTEq,1 , ∆QTEq,0 is change over time in y at quantile q forG = 1, 0, respectively

I Relies on (perhaps) unrealistic assumptions

DL Millimet (SMU) ECO 7377 Fall 2011 326 / 407

Page 327: Microeconometrics Lecture Notes

Counterfactual CDF for control group

Fy I ,01 = Fy ,00(F1y ,10(Fy ,11(y)))

Treatment e¤ect estimate

τCICq,0 = F1y I ,01(q) F

1yN ,01(q)

DL Millimet (SMU) ECO 7377 Fall 2011 327 / 407

Page 328: Microeconometrics Lecture Notes

Notes

Athey & Imbens (2006) discuss extensions toI Discrete outcomesI Multiple groups and multiple time periodsI Incorporating covariates

F Semiparametric specication of potential outcomes

yN = h(u, t) + xβ

y I = hI (u, t) + xβ

where U ? T ,X jGF OLS estimation of outcomes

yi = Di δ+ xi β+ εi

where D = [GT (1 G )T G (1 T ) (1 G )(1 T )]F Perform CIC estimation on

byi = yi xibβ = Dibδ+bεiF Inverse propensity score weighting alternative?

DL Millimet (SMU) ECO 7377 Fall 2011 328 / 407

Page 329: Microeconometrics Lecture Notes

Panel data allows additional exibility, but repeated cross sections aresu¢ cient

InferenceI Athey & Imbens (2006) prove asymptotic normality, and deviseasymptotic variance

I Bootstrap alternative?

DL Millimet (SMU) ECO 7377 Fall 2011 329 / 407

Page 330: Microeconometrics Lecture Notes

Selection on UnobservablesDistributional Approaches: IV Quantile Regression

Recall, QR model (Koenker & Bassett 1978)I Assuming linear conditional quantiles, estimation is

bβθ,b∆θ= argmin

β,∆

1N

(∑

i :yi>xi βθjyi ∆Di xi βj+ ∑

i :yi<xi β(1 θ)jyi xi βj

)

I May be rewritten as

bβθ,b∆θ = argmin

β,∆

1N

(∑i

ρθ(εθi )

)

where ρθ(εθi ) is check function, dened as

ρθ(εθi ) = [θ I(εθi < 0)]εθi

and εθi is the residual for i and θ

DL Millimet (SMU) ECO 7377 Fall 2011 330 / 407

Page 331: Microeconometrics Lecture Notes

Parameters of interest are the partial derivatives of the conditionalquantile fn w.r.t. x

∂ E[Qθ(y jx ,D)]∂xk

which equals βθk if x enters linearly

For discrete regressors, parameters give the expected change in theconditional quantile fn

∆θ = E[Qθ(y jx , 1) E[Qθ(y jx ,D = 0)]

DL Millimet (SMU) ECO 7377 Fall 2011 331 / 407

Page 332: Microeconometrics Lecture Notes

QR model is biased and inconsistent if D is endogenous

Recall, potential outcomes setupI yd , d = 0, 1, are potential outcomes associated with D = 0, 1,respectively

I q(d , x , θ) = conditional quantile fn of potential outcomesI ∆θ = q(1, x , θ) q(0, x , θ) = QTE (parameter of interest)

DL Millimet (SMU) ECO 7377 Fall 2011 332 / 407

Page 333: Microeconometrics Lecture Notes

IV-QR model (Chernozhukov & Hansen 2005, 2006)

Express conditional quantile fn as

yd = q(d , x , ud ), ud U [0, 1]

where q(d , x , θ) is the conditional θth-quantile of potential outcome,ydLinear (in parameters) conditional quantile fn implies

q(d , x , θ) = ∆θDi + xi βθ

DL Millimet (SMU) ECO 7377 Fall 2011 333 / 407

Page 334: Microeconometrics Lecture Notes

Assumptions

(IV-QR.i) Potential outcomes: given X = x , for each d , yd = q(d , x , ud ),whereud U [0, 1] and q(d , x , θ) is strictly increasing in θ

(IV-QR.ii) Independence: given X = x , fud g ? Z(IV-QR.iii) Selection: given X = x ,Z = z , D δ(z , x , υ) for unknown fn δ() and

random vector, υ(IV-QR.iv) Rank similarity: given X = x ,Z = z , ud ud 0 8d , d 0(IV-QR.v) Observed data: y = q(d , x , ud ), D δ(z , x , υ), x , and z

Note: rank similarity is a bit weaker than rank invariance (wherebyUd = Ud 0 8d , d 0), and requires that Ud = Ud 0 are equal inexpectation only (thus, they may be considered equal ex ante, but areallowed to di¤er ex post)

DL Millimet (SMU) ECO 7377 Fall 2011 334 / 407

Page 335: Microeconometrics Lecture Notes

Estimation

Consider the objective fn

1N

(∑i

ρθ(εθi )

)

where

ρθ(εθi ) = [θ I(εθi < 0)]εθi

εθi = yi ∆θDi xi βθ bΦiγθ

and bΦi is the predicted value from the rst-stage regression of D onx , z

Given correctly specied structuralmodel, γθ should equal zero

DL Millimet (SMU) ECO 7377 Fall 2011 335 / 407

Page 336: Microeconometrics Lecture Notes

Algorithm1 Dene a grid of possible values of ∆, f∆j , j = 1, ..., Jg2 For each θ, estimate a QR model with yi ∆Di as the dependentvariable and x , bΦi as covariates

3 Obtain estimates bβθj , bγθj , j = 1, ..., J4 Choose b∆θ = b∆θj and bβθ =

bβθj to minimize jbγθj j

Inference via sub-sampling or typical, nonparametric iid bootstrap, asin QR model

Can test interesting hypotheses (∆θ = 0, ∆θ constant 8θ, SD,exogeneity)

Easily extendable to multiple endogenous variables, but grid searchincreases exponentially

DL Millimet (SMU) ECO 7377 Fall 2011 336 / 407

Page 337: Microeconometrics Lecture Notes

Selection on UnobservablesDistributional Approaches: Stochastic Dominance

Recall, previous denitions for stochastic dominanceI First Order Stochastic Dominance: Y1 FSD Y0 i¤

F1(y) F0(y) 8y 2 @

with strict inequality for some y (where @ is the union of the supportsfor Y1 and Y0), or

y θ1 y θ

0 8θ 2 [0, 1]with strict inequality for some θ

I Second Order Stochastic Dominance: X SSD Y i¤Z y∞

F1(t)dt Z y∞

F0(t)dt 8y 2 @

with strict inequality for some y , orZ θ

0y t1dt

Z θ

0y t0dt 8θ 2 [0, 1]

with strict inequality for some θ

DL Millimet (SMU) ECO 7377 Fall 2011 337 / 407

Page 338: Microeconometrics Lecture Notes

Recall, previous tests for stochastic dominanceI Test statistics

d = min supz2@

[F (z) G (z)]

s = min supz2@

Z z∞[F (t) G (t)] dt

where min is taken over F G and G FI Tests are based on estimates of d and s using the empirical CDFs

F Unconditional, orF Inverse propensity score weighted

Previous methods assume selection on observables

Failure of this assumption invalidates causal conclusions

DL Millimet (SMU) ECO 7377 Fall 2011 338 / 407

Page 339: Microeconometrics Lecture Notes

Solution (Abadie 2002; Imbens & Rubin 1997)

With a binary IV, Z , the potential distributions of the outcomevariable are identied for the subpopulation of compliers

Zi satises the following three assumptions:I Independence: fy0i , y1i ,D0i ,D1ig ? ZiI Correlation: Pr(Zi = 1) 2 (0, 1) and Pr(D0i = 1) < Pr(D1i = 1)I Monotonicity: Pr(D0i D1i ) = 1where:

F y0, y1 are potential outcomes (subscripts refer to treatment status)F D0,D1 are potential treatments (subscripts refer to instrument status)

SD tests comparing the distribution of outcomes across the sampleswith Z = 0 and Z = 1 identify the causal e¤ect of D on y forcompliers

DL Millimet (SMU) ECO 7377 Fall 2011 339 / 407

Page 340: Microeconometrics Lecture Notes

Dene the empirical CDF of potential outcomes for compliers as

bFC1 (y) = E[I (Y1i y) jD1i = 1,D0i = 0]bFC0 (y) = E[I (Y0i y) jD1i = 1,D0i = 0]

Abadie (2002) shows

bFC1 (y) bFC0 (y) = K [bF1(y) bF0(y)]wherebF1(y), bF0(y) are empirical CDFs for the Z = 1, Z = 0 samplesK = 1/(E[D jZ = 1] E[D jZ = 0]) < ∞Implies SD tests on bF1(y), bF0(y) yield valid inference for the SDrankings of bFC1 (y), bFC0 (y)Di¤erent Z s yield di¤erent results if the treatment e¤ect varies acrossthe population

DL Millimet (SMU) ECO 7377 Fall 2011 340 / 407

Page 341: Microeconometrics Lecture Notes

Data Issues

Data issues are a fact of life

Frequently encountered are problems pertaining to missing orcontaminated data

Sample selection concerns missing data on the dependent variable

Contaminated data refers to a scenarious where one is interested inthe marginal distribution of a potentially mismeasured variable

Measurement error more generally refers to mismeasured dependentor independent variables

DL Millimet (SMU) ECO 7377 Fall 2011 341 / 407

Page 342: Microeconometrics Lecture Notes

Data IssuesSample Selection

Population model

yi = xi β+ εi , εi N(0, σ2)

Given a random sample, fyi , xigNi=1, then OLS is consistent ande¢ cient if the usual assumptions are satised

Problem arises when data on y is only available for a non-randomsample

I Let Si = 1 if yi is observed; Si = 0 if yi is unobserved

Note: While exposition is using cross-section, a common source of(non-random) selection is attrition in panel data; particularlyimportant in rm-level studies where attrition may be due to rmsexiting the market

DL Millimet (SMU) ECO 7377 Fall 2011 342 / 407

Page 343: Microeconometrics Lecture Notes

Example: Certain subpopulations may not be representative of thepopulation

DL Millimet (SMU) ECO 7377 Fall 2011 343 / 407

Page 344: Microeconometrics Lecture Notes

Implies following data structureI Have data on a random sample, fyi , xi ,SigNi=1, but yi = . if Si = 0I Can only use M ∑i Si observations to estimate any modelI Examples

F Wages only observed for workersF Firm prots only observed for rms that remain in businessF Test scores only observed for test takersF House prices only observed for houses on the market (sold?)

IssueI Is OLS still unbiased and consistent?I Answer: depends

DL Millimet (SMU) ECO 7377 Fall 2011 344 / 407

Page 345: Microeconometrics Lecture Notes

Heckman Model (Heckman 1979)

Setup

yi = xi β+ εi

Si = ziγ+ ui

Si =

1 if Si > 00 if Si 6 0

yi = . if Si = 0

εi , ui N2(0, 0, σ2ε , 1, ρ)

x , z are exogenous

DL Millimet (SMU) ECO 7377 Fall 2011 345 / 407

Page 346: Microeconometrics Lecture Notes

ProblemI E[y jx ] = xβ, but

E[y jx ,S = 1] = E[y jx , z , u] = xβ+ E[εjx , z , u]= xβ+ E[εju > ziγ]

= xβ+ ρσεφ(zγ)

Φ(zγ)

where ρσεφ(zγ)/Φ(zγ) is the Inverse MillsRatio from beforeI Implies that E[y jx ,S = 1] = xβ i¤ ρ = 0I OLS estimation of

yi = xi β+eεiusing only M observations omits the IMR term, which implies that

eεi = ρσεφ(zγ)/Φ(zγ) + εi

which is not mean zero, and is not independent of x , unless ρ = 0

DL Millimet (SMU) ECO 7377 Fall 2011 346 / 407

Page 347: Microeconometrics Lecture Notes

SolutionI Estimate IMR (using i = 1, ...,N)

F Estimate probit model, where S is dependent variable and z are thecovariates ) bγ

F Obtain

IMRi =φ(zi bγ)Φ(zi bγ)

I Regress yi on xi , IMRi via OLS (using i = 1, ...,M)I Known as Heckman two-step methodI Test of endogenous selection

Ho : βλ = 0

Ha : βλ 6= 0

where βλ is the coe¢ cient on the IMR

DL Millimet (SMU) ECO 7377 Fall 2011 347 / 407

Page 348: Microeconometrics Lecture Notes

NotesI Usual OLS standard errors are incorrect since IMR is predicted; mustaccount for additional uncertainty due to estimation of γ

I Other complications in derivation of standard errorsI Need an exclusion restriction(s)

F A variable in z not in xF Otherwise model is identied from non-linearity of IMR, which arisessolely from the assumption of joint normality

F However, even though technically identied from the non-linearity,substantial collinearity in practice makes identication questionable

I Model can be estimated in one-step by ML

F More e¢ cient if model assumptions are validF Less robust in general since more dependent on functional formassumptions

Stata: -heckman-, -heckman2 -

DL Millimet (SMU) ECO 7377 Fall 2011 348 / 407

Page 349: Microeconometrics Lecture Notes

QR alternative

Assume the latent outcome is

y i = xi β+ ui

y is unobserved; instead observe

yi =y i if observed. otherwise

QR model estimated using data on feyi , xig, whereeyi = yi if observed

minfyig otherwise

yields bβθ = argminβ

1N

(∑i

ρθ(eyi xi β))

which is consistent as long as all missing values of y i 6 Qθ(y jx)DL Millimet (SMU) ECO 7377 Fall 2011 349 / 407

Page 350: Microeconometrics Lecture Notes

More generally, QR model estimated using data on feyi , xig, whereeyi = yi if observed

imputed value otherwise

yields bβθ = argminβ

1N

(∑i

ρθ(eyi xi β))

which is consistent as long as imputed values lie on the correct side ofQθ(y jx)

DL Millimet (SMU) ECO 7377 Fall 2011 350 / 407

Page 351: Microeconometrics Lecture Notes

Example:

­.50

.51

1.5

0 .2 .4 .6 .8 1x

ystar 'true' OLS fitted line'true' LAD fitted line OLS fitted line, y>0 onlyLAD fitted line

NOTE: x~U[0,1]; ystar=­0.25+x+e; e~N(0,0.25^2); y=ystar if ystar>0.LAD fitted line obtained by first replacing y=10 if ystar>true LAD line, ­10 otherwise.

DL Millimet (SMU) ECO 7377 Fall 2011 351 / 407

Page 352: Microeconometrics Lecture Notes

Multiple selection criteria

Setup

yi = xi β+ εi

S1i = z1iγ1 + u1i

S1i =

1 if S1i > 00 if S1i 6 0

S2i = z2iγ2 + u2i

S2i =

1 if S2i > 00 if S2i 6 0

yi = . if S1iS2i 6= 1εi , u1i , u2i N3(0, 0, 0, σ2ε , 1, 1, ρε1, ρε2, ρ12)

x , z are exogenous

DL Millimet (SMU) ECO 7377 Fall 2011 352 / 407

Page 353: Microeconometrics Lecture Notes

EstimationI Same as above, except with two IMR terms

IMR1i =φ(z1i bγ1)Φ(z1i bγ1) ; IMR2i =

φ(z2i bγ2)Φ(z2i bγ2)

I Coe¢ cients on each IMR term are ρε1σε and ρε2σε

ExamplesI Grameen Bank: only observe outcome of credit amount if villagecontains a bank, and income makes one eligible

I Child care: only observe price paid for child care if work and usemarket-based day care

DL Millimet (SMU) ECO 7377 Fall 2011 353 / 407

Page 354: Microeconometrics Lecture Notes

Regime switching models

Setup

Si = ziγ+ ui

Si =

1 if Si > 00 if Si 6 0

yi =

xi β1 + ε1ixi β0 + ε0i

which is the previous model for treatment e¤ects

Applicable to any situation where one thinks determinants of theoutcome (i.e., β) di¤er across groups or regimes

DL Millimet (SMU) ECO 7377 Fall 2011 354 / 407

Page 355: Microeconometrics Lecture Notes

May be extended to multiple regimes

Si = ziγ+ ui

Si =

8>>>>><>>>>>:

0 if Si 6 01 if Si 2 (0, α1]2 if Si 2 (α1, α2]...K if Si > αK1

yi =

8>>><>>>:xi β0 + ε0i if Si = 0xi β1 + ε1i if Si = 1...xi βK + εKi if Si = K

DL Millimet (SMU) ECO 7377 Fall 2011 355 / 407

Page 356: Microeconometrics Lecture Notes

Estimate each regime seperately

yi = xi βk + ρuεkσεkdIMRki + ηki

where

dIMRk =8>>>>><>>>>>:

φ(zi bγ)1Φ(zi bγ) if Si = 0

φ(αk1zi bγ)φ(αkzi bγ)Φ(αkzi bγ)Φ(αk1zi bγ) if Si = k 2 f1, 2, ...,K 1g...

φ(αK1zi bγ)1Φ(αK1zi bγ) if Si = K

and α0 = 0 and γ is estimated via ordered probit

ExamplesI Wages by rm size (Main & Reilly 1993)I Various outcomes by education or household size

DL Millimet (SMU) ECO 7377 Fall 2011 356 / 407

Page 357: Microeconometrics Lecture Notes

Regime switching models with unknown switch point

Setup

Si = ziγ+ ui

Si =

1 if Si > c0 if Si 6 c

yi =

xi β1 + ε1i if Si = 1xi β0 + ε0i if Si = 0

where S is observed, but c is unknown

DL Millimet (SMU) ECO 7377 Fall 2011 357 / 407

Page 358: Microeconometrics Lecture Notes

EstimationI ML, where c is unknown parameterI Grid search:

F Estimate model for several plausible values of cF bc and resulting estimates bβ are those that minimize total SSE

I Examples

F Wages of PT vs. FT (Hotchkiss 1991)F Outcomes of DCs vs. LDCsF Stock market performance of large vs. small rms

Separate literature on selection models with panel data

DL Millimet (SMU) ECO 7377 Fall 2011 358 / 407

Page 359: Microeconometrics Lecture Notes

Bounding distributions (Blundell et al. 2007)

NotationI W = latent outcome variableI E = selection indicatorI W = outcome variable, where

W =

W if E = 1. otherwise

I X = covariate vector

Goal: bound CDF F (w jx) given observable CDF F (w jx ,E = 1)Examples:

I Dbn of wages under full employmentI Dbn of child health under full HI coverageI Dbn of student achievement under universal attendance at publicschools

I Dbn of test scores on college entrance exams with full participation

DL Millimet (SMU) ECO 7377 Fall 2011 359 / 407

Page 360: Microeconometrics Lecture Notes

Worst case bounds

Identity

F (w jx) = F (w jx ,E = 1)p(x) + F (w jx ,E = 0)[1 p(x)]

where p(x) Pr(E = 1jx)F (w jx ,E = 0) is unknown, but must lie in unit intervalReplacing F (w jx ,E = 0) with zero and one yields

F (w jx ,E = 1)p(x) 6 F (w jx) 6 F (w jx ,E = 1)p(x) + [1 p(x)]

Example (ignoring x):I F (10jE = 1) = 0.4I Pr(E = 1) = 0.9) F (10) 2 [0.36, 0.46]

DL Millimet (SMU) ECO 7377 Fall 2011 360 / 407

Page 361: Microeconometrics Lecture Notes

Can be rewritten in terms of bounds on quantiles

wq,l (x) 6 wq(x) 6 wq,u(x)

whereI wq(x) = qth quantile of F (w jx)I wq,l (x) is the value of w that solves

q = F (w jx ,E = 1)p(x) + [1 p(x)]

, w = F1q [1 p(x)]

p(x)jx ,E = 1

I wq,u(x) is the value of w that solves

q = F (w jx ,E = 1)p(x)

, w = F1

qp(x)

jx ,E = 1

DL Millimet (SMU) ECO 7377 Fall 2011 361 / 407

Page 362: Microeconometrics Lecture Notes

ExampleI q = 0.5, p(x) = 0.9I wq,l (x) = F1(q00jx ,E = 1), whereq00 = (0.5 0.1)/0.9 = 0.4/0.9 0.44

I wq,u(x) = F1(q0jx ,E = 1), where q0 = 0.5/0.9 0.55) bounds on the median are given by the values of the observedconditional dbn at the 44th and 55th quantiles

NotesI Bounds cannot be used to determine if selection is non-random; onlyassess the possible consequences

I Bounds only estimable for q 2 [1 p(x), p(x)]I Bounds converge to point estimates as p(x)! 1

DL Millimet (SMU) ECO 7377 Fall 2011 362 / 407

Page 363: Microeconometrics Lecture Notes

Positive selection

Stochastic dominanceI One characterization of positive selection is to assume that

F (w jx ,E = 1) FSD F (w jx ,E = 0), F (w jx ,E = 1) 6 F (w jx ,E = 0) 8w , 8x

I Equivalent to Pr(E = 1jW 6 w , x) 6 Pr(E = 1jW > w , x)I Bounds on F (w jx) become

F (w jx ,E = 1) 6 F (w jx) 6 F (w jx ,E = 1)p(x) + [1 p(x)]

since the missing term, F (w jx ,E = 0), is now bounded from below atF (w jx ,E = 1)

Example (ignoring x):I F (10jE = 1) = 0.4I Pr(E = 1) = 0.9) F (10) 2 [0.4, 0.46] whereas the worst-case bounds were [0.36, 0.46]

DL Millimet (SMU) ECO 7377 Fall 2011 363 / 407

Page 364: Microeconometrics Lecture Notes

Median restrictionI Weaker characterization is to assume (conditional on x) thatw0.5(E=1) > w0.5(E=0)

I Equivalent toPr(E = 1jW 6 w0.5(E=1), x) 6 Pr(E = 1jW > w0.5(E=1), x)

I Bounds on F (w jx) become

F (w jx ,E = 1)p(x) 6 F (w jx) 6 F (w jx ,E = 1)p(x) + [1 p(x)]if w < w0.5(E=1)

F (w jx ,E = 1)p(x)+ 0.5[1 p(x)] 6 F (w jx) 6 F (w jx ,E = 1)p(x) + [1 p(x)]

if w > w0.5(E=1)

I Bounds are tightened (relative to worst case) only above the mediansince the missing term, F (w jx ,E = 0), is now bounded from below at0.5 for w > w0.5(E=1) (instead of zero)

DL Millimet (SMU) ECO 7377 Fall 2011 364 / 407

Page 365: Microeconometrics Lecture Notes

Exclusion restriction

Conditional independenceI Assume z satises

F (w jx , z) = F (w jx) 8w , x , z

I Bounds on F (w jx) become

maxzfF (w jx , z ,E = 1)p(x , z)g

6 F (w jx)6 min

zfF (w jx , z ,E = 1)p(x , z) + [1 p(x , z)]g

I If conditional independence is not true, bounds may cross; failure ofbounds to cross does not prove conditional independence holds

DL Millimet (SMU) ECO 7377 Fall 2011 365 / 407

Page 366: Microeconometrics Lecture Notes

MonotonicityI Higher values of z improve the dbn in a FSD sense

F (w jx , z 0) 6 F (w jx , z 00) 8w , x , z 0, z 00 s.t. z 0 > z 00

I Bounds on F (w jx , z1) become

maxz>z1

fF (w jx , z ,E = 1)p(x , z)g

6 F (w jx , z1)6 min

z6z1fF (w jx , z ,E = 1)p(x , z) + [1 p(x , z)]g

I Bounds on F (w jx) obtained by integrating over the dbn of z ; entailscomputing the weighted average of the upper and lower bounds acrossthe di¤erent values, z1, where the weights are sample proportion,Pr(z = z1 jx)

DL Millimet (SMU) ECO 7377 Fall 2011 366 / 407

Page 367: Microeconometrics Lecture Notes

Bounding di¤erences in QTEs across groups accounting fornon-random selection

NotationI D 2 f0, 1g indexes groupsI T 2 f0, 1g indexes time period

Bounds on QTEs across groups in a given time period

wq,l (1,T ) wq,u(0,T ) 6 wq(1,T ) wq(0,T )6 wq,u(1,T ) wq,l (0,T )

Bounds on QTEs across time for a given group

wq,l (D, 1) wq,u(D, 0) 6 wq(D, 1) wq(D, 0)6 wq,u(D, 1) wq,l (D, 0)

DL Millimet (SMU) ECO 7377 Fall 2011 367 / 407

Page 368: Microeconometrics Lecture Notes

Bounds on di¤-QTEs across groups

[wq(1, 1) wq(0, 1)] [wq(1, 0) wq(0, 0)] 2 [LB,UB ]

where

LB = [wq,l (1, 1) wq,u(0, 1)] [wq,u(1, 0) wq,l (0, 0)]UB = [wq,u(1, 1) wq,l (0, 1)] [wq,l (1, 0) wq,u(0, 0)]

I Example: Change in median wage gap across males and females overperiod T = 0 to T = 1

DL Millimet (SMU) ECO 7377 Fall 2011 368 / 407

Page 369: Microeconometrics Lecture Notes

Level set restrictionsI Assume di¤-QTE, [wq(1, 1) wq(0, 1)] [wq(1, 0) wq(0, 0)], isconstant across di¤erent values of some covariate x 2 X

I Calculate LB(x),UB(x) 8x 2 XI New LB,UB given by

LB = maxx2X

LB(x)

UB = minx2X

UB(x)

Test statistics derived in Blundell et al. for bounds crossings, whetherobserved conditional distribution, F (w jx ,E = 1) lies in the boundsInference via bootstrap

DL Millimet (SMU) ECO 7377 Fall 2011 369 / 407

Page 370: Microeconometrics Lecture Notes

Bounding di¤erences in average treatment e¤ects across groupsaccounting for non-random selection

Lechner and Melly (2007)

Imai (2008)

Lee (2009)

Huber and Mellace (2011)

DL Millimet (SMU) ECO 7377 Fall 2011 370 / 407

Page 371: Microeconometrics Lecture Notes

Data IssuesContamination

Horowitz and Manski (1995); see also Chen et al. (JEL 2011)

Goal is to bound the marginal distribution of y , where

yi = diy i + (1 di )eyiwhere y is the true value, ey is the mismeasured value, and d = 1 inthe absence of contamination (0 otherwise)

Add more!

DL Millimet (SMU) ECO 7377 Fall 2011 371 / 407

Page 372: Microeconometrics Lecture Notes

Data IssuesMeasurement Error

Refer to ECO 6374 for refresher on basics...

Problem: sometimes (often!) data are measured imprecisely; seeBound et al. (2001), Millimet (2011)

DL Millimet (SMU) ECO 7377 Fall 2011 372 / 407

Page 373: Microeconometrics Lecture Notes

Data IssuesME: Classical Errors-in-Variables (CEV) model

Continuous dependent variable

yi|zobserved

= y i|zactual

+ µi|zME

I Assumptions

(CEV.i) True model: y i = α+ βxi + εi(CEV.ii) Normality and Mean Zero: µi N(0, σ2µ)(CEV.iii) Independence: Cov(x, µ) = 0

I Implications

F OLS unbiased, consistentF Standard errors are correctF # R2, " standard errors due to extra noise in the data

DL Millimet (SMU) ECO 7377 Fall 2011 373 / 407

Page 374: Microeconometrics Lecture Notes

Continuous independent variable

xi|zobserved

= xi|zactual

+ µi|zME

I Assumptions (in addition to previous assumptions)

(CEV.iv) Independence: Cov(µ, ε) = 0

I Implications

F OLS biased, inconsistent unless β = 0F bβOLS su¤ers from attenuation bias

DL Millimet (SMU) ECO 7377 Fall 2011 374 / 407

Page 375: Microeconometrics Lecture Notes

Data IssuesME: Binary Dependent Variable (Hausman et al. 1998)

True modelDi = x

i β+ εi

where on a variable indicates correctly measured

Given a random sample fDi , xi gNi=1, assume logit model is consistentand e¢ cient

I Logit probabilities

Pr(D = 1jx) =exp(xi β)

1+ exp(xi β)

Pr(D = 0jx) =1

1+ exp(xi β)

I Estimation by ML

lnL = ∑ifI[D = 1] ln[Pr(D = 1jx)] + I[D = 0] ln[Pr(D = 0jx)]g

DL Millimet (SMU) ECO 7377 Fall 2011 375 / 407

Page 376: Microeconometrics Lecture Notes

With measurement error, do not observe DiI Instead one observes DiI Introduce following notation

α0 Pr(Di = 1jDi = 0)α1 Pr(Di = 0jDi = 1)

I α0, α1 dependent on D, but not on xi

DL Millimet (SMU) ECO 7377 Fall 2011 376 / 407

Page 377: Microeconometrics Lecture Notes

EstimationI Probabilities of observed responses

Pr(D = 1jx) = Pr(Di = 1jDi = 0)Pr(Di = 0jx)+ Pr(Di = 1jDi = 1)Pr(Di = 1jx)

= α0 + (1 α0 α1)

exp(xi β)

1+ exp(xi β)

Pr(D = 0jx) = 1 Pr(D = 1jx)

= 1 α0 (1 α0 α1)

exp(xi β)

1+ exp(xi β)

I Estimation by ML

lnL = ∑ifI[D = 1] ln[Pr(D = 1jx)] + I[D = 0] ln[Pr(D = 0jx)]g

I Extension to probit is trivial

DL Millimet (SMU) ECO 7377 Fall 2011 377 / 407

Page 378: Microeconometrics Lecture Notes

IdenticationI In linear probability model (LPM), conditional expectation given by

E[D jx ] = E[Di = 1jDi = 0]Pr(Di = 0)+ E[Di = 1jDi = 1]Pr(Di = 1)

= α0 + (1 α0 α1)(xi β)

= α0 + (1 α0 α1)(β0 + exi β1)= [α0 + (1 α0 α1)β0 ] + exi (1 α0 α1)β1

which makes clear that identication of α0, α1, and β arises fromnon-linearity of probit/logit, in addition to ...

I Monotonicity assumption: α0 + α1 < 1I Semiparametric alternatives available

DL Millimet (SMU) ECO 7377 Fall 2011 378 / 407

Page 379: Microeconometrics Lecture Notes

Data IssuesME: Binary Independent Variable

True modely i = α+ βDi + εi , εi N(0, σεε)

where on a variable indicates correctly measured

Given a random sample fy i ,Di gNi=1, assume OLS is consistent ande¢ cient

With measurement error, do not observe DiInstead one observes Di where

Di|zobserved

= Di|ztrue

+ µi|zME

which implies that µ 2 f0, 1g if D = 0, and µ 2 f0,1g if D = 1Thus, measurement error is

I Not normally distributed (violates CEV.ii)I Is negatively correlated with D (violates CEV.iii)

DL Millimet (SMU) ECO 7377 Fall 2011 379 / 407

Page 380: Microeconometrics Lecture Notes

Assumptions

(BME.i) Non-di¤erential classication errors: E[y jD] = E[y jD,D](BME.ii) D ? ε(BME.iii) Cov(D,D) > 0(BME.iv) Cov(D, µ) < 0

Given (BME.i) (BME.iv), asymptotic bias given by

plimbβOLS = σD D + σD µ

σD D + 2σD µ + σµµ

β

Results in attenuation bias for β if σD µ + σµµ > 0

Likely true for any mismeasured bounded variable

DL Millimet (SMU) ECO 7377 Fall 2011 380 / 407

Page 381: Microeconometrics Lecture Notes

Millimet (2011) conducts MC study comparing common treatmente¤ect estimators (∆ = 1)

DL Millimet (SMU) ECO 7377 Fall 2011 381 / 407

Page 382: Microeconometrics Lecture Notes

Partial solutions (Aigner 1973; Bollinger 1996; Black et al. 2000)

Reverse regressionI Estimate via OLS

Di = π0 + π1yi + υi

I plim given by

plimbπ11,OLS = β2σD D + σεε

β

σD D + σD µ

which is biased up in absolute value

I ImpliesbβD ,OLS 2 bβOLS , bπ11,OLS , where bβD ,OLS is the OLS

estimate if D were observed (Frisch bounds)I If R2 is low, then bounds obtained using reverse regression may beuninformative

I IV estimation also yields an upper bound (not a consistent estimate!),that may be more informative in many cases

I Inconsistency of IV results from fact that any instrument correlatedwith D will most likely be correlated µ since Cov(D, µ) 6= 0

DL Millimet (SMU) ECO 7377 Fall 2011 382 / 407

Page 383: Microeconometrics Lecture Notes

Improved lower bound obtained by estimating

y i = α+ β0 I[Di = 0,D 0i = 1]+ β1 I[Di = 1,D 0i = 0] + β2 I[Di = 1,D 0i = 1] + ηi

where D 0i is a second mis-measured indicatorI If the measurement errors are independent conditional on actualtreatment assignment, Di , then

0 <E[bβOLS ] < E[bβ2,OLS ] < jβj

Bound bβD ,OLS under various assumptions concerning severity ofmeasurement error (papers by Kreider and Pepper)

DL Millimet (SMU) ECO 7377 Fall 2011 383 / 407

Page 384: Microeconometrics Lecture Notes

Full Solutions

Point estimates possible using method-of-moments framework

Brachet (2008) proposes following algorithm1 Estimate Hausman et al. misclassication probit, including aninstrument z in the rst-stage

2 Replace D with Pr(Di = 1jx , z) in second-stage

McCarthy & Tchernis (2011) consider a similar approach in aBayesian framework

DL Millimet (SMU) ECO 7377 Fall 2011 384 / 407

Page 385: Microeconometrics Lecture Notes

Partial solutions (Kreider & Pepper 2007)

Utilize a non-regression approach to bound the e¤ect of amis-measured binary treatment

Authors do not wish to invoke (BME.i), which implies thatmis-reporting is independent of outcomes conditional on the truth

NotationI y 2 f0, 1g is a binary outcome (correctly measured)I D 2 f0, 1g is the true binary treatmentI D 2 f0, 1g is the reported binary treatmentI Z 2 f0, 1g, where Z = 1 if D = D and 0 otherwise

Estimand of interest: ∆ = Pr(y = 1jD = 1) Pr(y = 1jD = 0)Data provides an estimate of Pr(y = 1jD)

DL Millimet (SMU) ECO 7377 Fall 2011 385 / 407

Page 386: Microeconometrics Lecture Notes

Manipulation yields

Pr(y = 1jD = 1) =Pr(y = 1,D = 1)Pr(D = 1)

=

0@ Pr(y = 1,D = 1)+Pr(y = 1,D = 0,Z = 0)Pr(y = 1,D = 1,Z = 0)

1APr(D = 1) + Pr(D = 0,Z = 0)

Pr(D = 1,Z = 0)

where Pr(D = 1,Z = 0) is a false positive and Pr(D = 0,Z = 0) isa false negative

Data provide estimates of Pr(y = 1,D = 1), Pr(D = 1)

Other elements are unknown, but bounded by the unit interval

DL Millimet (SMU) ECO 7377 Fall 2011 386 / 407

Page 387: Microeconometrics Lecture Notes

Lower-Bound Accurate Reporting RateI Assume Pr(Z = 1) vI Can show that

Pr(y = 1jD = 1) 2

Pr(y = 1,D = 1) δ

Pr(D = 1) 2δ+ (1 v ) ,Pr(y = 1,D = 1) + γ

Pr(D = 1) + 2γ (1 v )

where

δ =

minf(1 v ),Pr(y = 1,D = 1)g if Pr(y = 1,D = 1) Pr(y = 0,D = 1) (1 v ) 0maxf0, (1 v ) Pr(y = 0,D = 0)g otherwise

γ =

minf(1 v ),Pr(y = 1,D = 0)g if Pr(y = 1,D = 1) Pr(y = 0,D = 1) + (1 v ) 0maxf0, (1 v ) Pr(y = 0,D = 1)g otherwise

I Bounds for Pr(y = 1jD = 0) are obtained by replacing D with 1DI Bounds for each term obtained by replacing elements with sampleanalogs

I Bounds for ∆ obtained using relevant upper and lower bounds for eachterm

I When v = 1, bounds collapse to a point estimate

DL Millimet (SMU) ECO 7377 Fall 2011 387 / 407

Page 388: Microeconometrics Lecture Notes

Partial VericationI Might assume a lower bound for accuracy among some sub-groupwhose status is more certain, W = 1

I Assume Pr(Z = 1jW = 1) vwI Can show that

Pr(y = 1jD = 1) 2

26666664Pr(y = 1,D = 1,W = 1) δ0@ Pr(D = 1,W = 1)

+Pr(y = 0,W = 0)2δ+ (1 vw )Pr(W = 1)

1A ,

Pr(y = 1,D = 1,W = 1)+Pr(y = 1,W = 0) + γ

Pr(D = 1,W = 1) + Pr(y = 1,W = 0)+2γ (1 vw )Pr(W = 1)

37777775

where

δ =

minf(1 vw )Pr(W = 1),Pr(y = 1,D = 1)g if α 0maxf0, (1 vw )Pr(W = 1) Pr(y = 0,D = 0,W = 1)g otherwise

γ =

minf(1 vw )Pr(W = 1),Pr(y = 1,D = 0)g if α0 0maxf0, (1 vw )Pr(W = 1) Pr(y = 0,D = 1,W = 1) otherwise

α = Pr(y = 1,D = 1,W = 1) Pr(y = 0,D = 1,W = 1)

Pr(y = 0,W = 0) (1 vy )Pr(W = 1) 0

α0 = Pr(y = 1,D = 1,W = 1) Pr(y = 0,D = 1,W = 1)

+ Pr(y = 1,W = 0) + (1 vy )Pr(W = 1) 0

I If vw = 1, then one has full verication for an observed sub-sample !bounds are tightened

DL Millimet (SMU) ECO 7377 Fall 2011 388 / 407

Page 389: Microeconometrics Lecture Notes

Combine the prior assumptions with a Monotone IV assumption topossibly further tighten the bounds

MIV AssumptionI 9 x s.t.

x0 2 [x1, x2 ]) Pr(y = 1jD, x0) 2 [Pr(y = 1jD, x1),Pr(y = 1jD, x2)]

I Implies that Pr(y = 1jD, x) is weakly monotonically increasing in xI Proceed by

F Computing bounds conditional on di¤erent values of xF Obtaining unconditional bounds by integratingover the dbn of x

Kreider & Hill (2009), Kreider et al. (2011) combine thismethodology on reporting errors with prior methods on boundingtreatment e¤ects under SOU

Imai & Yamamoto (2010) o¤er a similar analysis in poli sci

DL Millimet (SMU) ECO 7377 Fall 2011 389 / 407

Page 390: Microeconometrics Lecture Notes

Partial solutions (Battistin & Sianesi 2009)

Consider ME of a binary or multi-valued treatment in the context ofpropensity score estimatorsSetup

(MPS.i) CIA given no MEy0, y1 ? Djx

(MPS.ii) CS given no ME

p(x) = Pr(D = 1jx) 2 (0, 1) 8x

I D is not observed, instead D is, where Di 6= Di for at least some iEstimation based on D yieldsb∆ATE = EfE[y jD = 1, x ] E[y jD = 0, x ]g

where the outer expectation is over S , where

S = fx : p(x) = Pr(D = 1jx) 2 (0, 1)g

In contrast, estimation based on D ) b∆ATE DL Millimet (SMU) ECO 7377 Fall 2011 390 / 407

Page 391: Microeconometrics Lecture Notes

NotationI (Mis)classication probabilites given by

λjj 0(x) = Pr(D = j jD = j 0, x), j , j 0 2 f0, 1g

F λ10 = proportion of incorrect reported zerosF λ01 = proportion of incorrect reported ones

I Condensed notation for correct reporting rates

λ00(x) = λ0(x) = Pr(D = 0jD = 0, x)

λ11(x) = λ1(x) = Pr(D = 1jD = 1, x)

I Matrix of (mis)classication probabilities can be written in terms ofλ0,λ1

Λ(x) =

λ0(x) 1 λ0(x)1 λ1(x) λ1(x)

DL Millimet (SMU) ECO 7377 Fall 2011 391 / 407

Page 392: Microeconometrics Lecture Notes

Assumptions

(MPS.iii) Non-di¤erential classication errors: E[y jD, x ] = E[y jD,D, x ](MPS.iv) Informative reported treatment status: λ0(x) + λ1(x) 1 6= 0

Outcomes condition on D can be written as a weighted average ofoutcomes conditional on D

E[y jD = 0, x ]E[y jD = 1, x ]

= Λ(x)

E[y jD = 0, x ]E[y jD = 1, x ]

)

E[y jD = 0, x ]E[y jD = 1, x ]

= Λ1(x)

E[y jD = 0, x ]E[y jD = 1, x ]

provided det[Λ(x)] = λ0(x) + λ1(x) 1 6= 0Two cases satisfy (MPS.iv)

I Minimal classication errors: λ0(x) + λ1(x) > 1I Severe classication errors: λ0(x) + λ1(x) < 1

DL Millimet (SMU) ECO 7377 Fall 2011 392 / 407

Page 393: Microeconometrics Lecture Notes

The bias when using D is

∆ATE (x) = [λ0(x) + λ1(x) 1] ∆ATE(x)

Implications:I ∆ATE (x) is unbiased if λ0 = λ1 = 1I ∆ATE (x) su¤ers from attenuation bias if λ0(x) + λ1(x) > 1I ∆ATE (x) su¤ers from attenuation bias AND

sgnh∆ATE (x)

i6= sgn

h∆ATE

(x)iif λ0(x) + λ1(x) < 1

I ∆ATE (x) = ∆ATE(x) if λ0 = λ1 = 0

DL Millimet (SMU) ECO 7377 Fall 2011 393 / 407

Page 394: Microeconometrics Lecture Notes

The bias of the unconditional ATE, ∆ATE , also depends on theerroneous determination of the CS

I Can show that

p(x) =p(x) [1 λ0(x)]λ0(x) + λ1(x) 1

I This implies that boundary values of p(x) can be obtained even ifp(x) 2 (0, 1) if

p(x) = 0, λ0(x) = 1 p(x)p(x) = 1, λ1(x) = p

(x)

To ensure one does not utilize a di¤erent CS based on D, mustassume

(MPS.v) λ0(x) 6= 1 p(x) and λ1(x) 6= p(x)

DL Millimet (SMU) ECO 7377 Fall 2011 394 / 407

Page 395: Microeconometrics Lecture Notes

EstimationI Under (MPS.i) (MPS.v)

∆ATE=

RSω(x)∆ATE (x)f (x)dx

= ∆ATE +R

S[ω(x) 1]∆ATE (x)f (x)dx

where

ω(x) =Pr(D = 1)Pr(D = 1)

1+

1p(x)

1 λ0(x)λ0(x) + λ1(x) 1

Pr(D = 1) =

RS[1 λ0(x)]f (x)dx

+R

S[λ0(x) + λ1(x) 1]p(x)f (x)dx

I Shows that ∆ATEcan be obtained from an appropriately weighted

average of ∆ATE (x)I Weights depend on λ0(x), λ1(x)

DL Millimet (SMU) ECO 7377 Fall 2011 395 / 407

Page 396: Microeconometrics Lecture Notes

NotesI Bounds obtained by computing b∆ATE (λ0,λ1) over a grid of valuesand obtaining the lower and upper bounds

F Restrictions on possible values of λs can be imposed based on prior infoF b∆ATE (λ0,λ1) can be obtained using any propensity-score basedestimator

F In their paper, they use a (5 strata) stratication estimator and assume(λ0,λ1) are stratum-specic

I Extension to multi-valued treatments provided as well

DL Millimet (SMU) ECO 7377 Fall 2011 396 / 407

Page 397: Microeconometrics Lecture Notes

Data IssuesME: Missing Binary Independent Variable

Molinari (2010) applies similar bounding approach to analyze the casewhere D is missing, possibly non-randomly, due to subjectnon-response

I Examples:

F Respondents refuse to answer questions concerning drug use, welfareuse, etc.

DL Millimet (SMU) ECO 7377 Fall 2011 397 / 407

Page 398: Microeconometrics Lecture Notes

Millimet (2011) MC study also compares common treatment e¤ectestimators when y or x is measured with error (do not forget the restof the data! ... ∆ = 1)

DL Millimet (SMU) ECO 7377 Fall 2011 398 / 407

Page 399: Microeconometrics Lecture Notes

DL Millimet (SMU) ECO 7377 Fall 2011 399 / 407

Page 400: Microeconometrics Lecture Notes

Data IssuesME: Persistence of Treatment E¤ects

Often neglected in applied research is the question of whethertreatment e¤ects are persistent

Clearly relevant for policymakers; an investment that improvesoutcomes for one period only has di¤erent benets than aninvestment that yields a permanent improvement in outcomes

Jacob et al. (2010) propose an interesting method to estimate thedegree of persistence in a treatment e¤ect (under certaincircumstances)

Method relies on preceding analysis of measurement error

DL Millimet (SMU) ECO 7377 Fall 2011 400 / 407

Page 401: Microeconometrics Lecture Notes

Setupyit = yLit + y

Sit

where y is the outcome, which is decomposed into a LR component,yL, and a SR componenent, yS

I The two components are given by

ySit = τSDit + εSit

yLit = δyLit1 + τLDit + εLit

where D is a treatment (binary, discrete, or continuous)I Interpretation of parameters

F δ = persistence of the LR component of y (by denition, the SRcomponent completely decays each period)

F τS , τL = the (common) treatment e¤ect on yS , yL

Goal: say something about δ, τS , and τL

DL Millimet (SMU) ECO 7377 Fall 2011 401 / 407

Page 402: Microeconometrics Lecture Notes

Consider trying to estimate the LR component equation

yLit = δyLit1 + τLDit + εLit

Problem: yLit , yLit1 are unobserved; only y is observed

Some algebra yields

yit ySit = δ(yit1 ySit1) + τLDit + εLit

) yit = δyit1 + τLDit + [ySit δySit1 + εLit ]

NotesI Cov(yit1, ySit1) 6= 0 ... ySit1 is analagous to ME in the desiredcovariate, yLit1

I Cov(Dit , ySit ) 6= 0 if τS 6= 0Circumvent this second issue by incorporating Dit into the error term

yit = δyit1 + [τLDit + ySit δySit1 + εLit ]

= δyit1 + υit

DL Millimet (SMU) ECO 7377 Fall 2011 402 / 407

Page 403: Microeconometrics Lecture Notes

Comparison of estimators ...

OLS yields

plimbδOLS = δ

σ2y L

σ2y L + σ2y S

!< δ

using the CEV formula discussed previously

IV using yit2 as an instrument

plimbδIV ,1 = δ

if Cov(yit2, εLit ) = Cov(yit2, εSit ) = Cov(yit2, εSit1) =Cov(yit2,Dit ) = 0, implying that yit2 is predetermined anduncorrelated with future treatment status

DL Millimet (SMU) ECO 7377 Fall 2011 403 / 407

Page 404: Microeconometrics Lecture Notes

IV using Dit1 as an instrument

plimbδIV ,2 =Cov(yit ,Dit1)

Cov(yit1,Dit1)

=Cov(δyit1 + τLDit + ySit δySit1 + εLit ,Dit1)

Cov(yit1,Dit1)

= δ+Cov(τLDit + ySit δySit1 + εLit ,Dit1)

Cov(yit1,Dit1)

I Assume Cov(Dit1,Dit ) = Cov(Dit1, εSit ) = Cov(Dit1, εLit ) = 0I But, Cov(Dit1, ySit1) 6= 0 ) Dit1 is not a valid IV

plimbδIV ,2 = δ+Cov(δySit1,Dit1)

Cov(yit1,Dit1)

= δ

1

Cov(ySit1,Dit1)Cov(yit1,Dit1)

!

= δ

1 τS Var(Dit1)

(τS + τL)Var(Dit1)

= δ

τL

τS + τL

DL Millimet (SMU) ECO 7377 Fall 2011 404 / 407

Page 405: Microeconometrics Lecture Notes

Notes:I Combination of OLS and IV1 can estimate the relative contribution ofyL to y

I Combination of IV1 and IV2 can estimate the relative contribution ofD to the LR component

I xs can be incorporated by redening εLit = xit βL +eεLitI Model requires Cov(Dit1,Dit ) = 0, ruling out treatments whichpersist themselves (e.g., treaties)

F Examples (perhaps): class size, R&D (?)

DL Millimet (SMU) ECO 7377 Fall 2011 405 / 407

Page 406: Microeconometrics Lecture Notes

In conclusion, listen to the words of Sims (2010):

Natural, quasi-, and computational experiments, as well asregression discontinuity design (RDD), can all, when well applied, beuseful, but none are panaceas... Because we are not an experimentalscience, we face di¢ cult problems of inference. The same datagenerally are subject to multiple interpretations. It is not that we learnnothing from data, but that we have at best the ability to use data tonarrow the range of substantive disagreement. We are alwayscombining the objective information in the data with judgment, opinionand/or prejudice to reach conclusions...

Natural experiments, di¤erence-in-di¤erence, and regressiondiscontinuity design are good ideas. They have not taken the con outof econometrics in fact, as with any popular econometric technique,they in some cases have become the vector by which conisintroduced into applied studies. Furthermore, over-enthusiasm aboutthese methods, when it leads to claims that single-equation linearmodel with sandwiched errors are all we ever really need, can lead toour training applied economists who do not understand how to fullymodel a dataset.DL Millimet (SMU) ECO 7377 Fall 2011 406 / 407

Page 407: Microeconometrics Lecture Notes

In light of these sentiments, recall the points made at the start of thiscourse:

Prior to conducting, or when reviewing, causal analyses, questions thatneed to be answered:

1 What is the causal relationship of interest? [Is it economicallyinteresting?]

2 What is the identication strategy?3 What parameter are you actually estimating?4 To whom does the parameter apply?5 What question does the analysis answer?6 What is the method of statistical inference?

While applied work is open to multiple interpretations, theseinterpretations and objections to research are lessened when one is precisein answering these questions.

DL Millimet (SMU) ECO 7377 Fall 2011 407 / 407