Download - Wooldridge Session 5
-
8/10/2019 Wooldridge Session 5
1/57
Difference-in-Differences Estimation
Jeff WooldridgeMichigan State University
Programme Evaluation for Policy AnalysisInstitute for Fiscal Studies
June 2012
1. The Basic Methodology2. How Should We View Uncertainty in DD Settings?3. Inference with a Small Number of Groups4. Multiple Groups and Time Periods
5. Individual-Level Panel Data6. Semiparametric and Nonparametric Approaches7. Synthetic Control Methods for Comparative Case Studies
1
-
8/10/2019 Wooldridge Session 5
2/57
1.The Basic Methodology
Standard case: outcomes are observed for two groups for two time
periods. One of the groups is exposed to a treatment in the second
period but not in the first period. The second group is not exposed tothe treatment during either period. Structure can apply to repeated cross
sections or panel data.
With repeated cross sections, letAbe the control group andBthe
treatment group. Write
y 0 1dB 0d2 1d2 dB u, (1)
whereyis the outcome of interest.
2
-
8/10/2019 Wooldridge Session 5
3/57
-
8/10/2019 Wooldridge Session 5
4/57
Can refine the definition of treatment and control groups. Example: Change in state health care policy aimed at elderly. Could
use data only on people in the state with the policy change, both before
and after the change, with the control group being people 55 to 65 (say)
and and the treatment group being people over 65. This DD analysis
assumes that the paths of health outcomes for the younger and older
groups would not be systematically different in the absense of
intervention.
4
-
8/10/2019 Wooldridge Session 5
5/57
Instead, use the same two groups from another (untreated) state asan additional control. LetdEbe a dummy equal to one for someone
over 65 anddBbe the dummy for living in the treatment state:
y 0 1dB 2dE3dB dE0d2
1d2 dB 2d2 dE3d2 dB dEu
(3)
5
-
8/10/2019 Wooldridge Session 5
6/57
The OLS estimate
3is3 y B,E,2 y B,E,1y B,N,2 yB,N,1
y A,E,2 y A,E,1y A,N,2 y A,N,1
(4)
where theAsubscript means the state not implementing the policy and
theNsubscript represents the non-elderly. This is the
difference-in-difference-in-differences (DDD)estimate.
Can add covariates to either the DD or DDD analysis to (hopefully)
control for compositional changes. Even if the intervention is
independent of observed covariates, adding those covariates mayimprove precision of the DD or DDD estimate.
6
-
8/10/2019 Wooldridge Session 5
7/57
2.How Should We View Uncertainty in DD Settings?
Standard approach: All uncertainty in inference enters through
sampling error in estimating the means of each group/time period
combination. Long history in analysis of variance. Recently, different approaches have been suggested that focus on
different kinds of uncertainty perhaps in addition to sampling error in
estimating means. Bertrand, Duflo, and Mullainathan (2004), Donald
and Lang (2007), Hansen (2007a,b), and Abadie, Diamond, and
Hainmueller (2007) argue for additional sources of uncertainty.
In fact, in the new view, the additional uncertainty swamps the
sampling error in estimating group/time period means.
7
-
8/10/2019 Wooldridge Session 5
8/57
One way to view the uncertainty introduced in the DL framework aperspective explicitly taken by ADH is that our analysis should better
reflect the uncertainty in the quality of the control groups.
ADH show how to construct a synthetic control group (for California)using pre-training characteristics of other states (that were not subject
to cigarette smoking restrictions) to choose the best weighted average
of states in constructing the control.
8
-
8/10/2019 Wooldridge Session 5
9/57
Issue: In the standard DD and DDD cases, the policy effect is justidentified in the sense that we do not have multiple treatment or control
groups assumed to have the same mean responses. So, for example, the
Donald and Lang approach does not allow inference in such cases. Example from Meyer, Viscusi, and Durbin (1995) on estimating the
effects of benefit generosity on length of time a worker spends on
workers compensation. MVD have the standard DD before-after
setting.
9
-
8/10/2019 Wooldridge Session 5
10/57
. r eg l dur at af chnge hi ghear n af hi gh i f ky, r obust
Li near r egr essi on Number of obs 5626F( 3, 5622) 38. 97Pr ob F 0. 0000R- squar ed 0. 0207Root MSE 1. 2692
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| Robust
l dur at | Coef . St d. Er r . t P| t | [ 95% Conf . I nt er val ]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
af chnge | . 0076573 . 0440344 0. 17 0. 862 - . 078667 . 0939817hi ghear n | . 2564785 . 0473887 5. 41 0. 000 . 1635785 . 3493786
af hi gh | . 1906012 . 068982 2. 76 0. 006 . 0553699 . 3258325_cons | 1. 125615 . 0296226 38. 00 0. 000 1. 067544 1. 183687
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
10
-
8/10/2019 Wooldridge Session 5
11/57
. r eg l dur at af chnge hi ghear n af hi gh i f mi , r obust
Li near r egr essi on Number of obs 1524F( 3, 1520) 5. 65Pr ob F 0. 0008R- squar ed 0. 0118Root MSE 1. 3765
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -| Robust
l dur at | Coef . St d. Er r . t P| t | [ 95% Conf . I nt er val ]- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
af chnge | . 0973808 . 0832583 1. 17 0. 242 - . 0659325 . 2606941hi ghear n | . 1691388 . 1070975 1. 58 0. 114 - . 0409358 . 3792133
af hi gh | . 1919906 . 1579768 1. 22 0. 224 - . 117885 . 5018662_cons | 1. 412737 . 0556012 25. 41 0. 000 1. 303674 1. 5218
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
11
-
8/10/2019 Wooldridge Session 5
12/57
3.Inference with a Small Number of Groups
Suppose we have aggregated data on few groups (smallG) and large
group sizes (eachMgis large). Some of the groups are subject to a
policy intervention. How is the sampling done? With random sampling from a large
population, no clustering is needed.
Sometimes we have random sampling within each segment (group) ofthe population. Except for the relative dimensions ofGandMg, the
resulting data set is essentially indistinguishable from a data set
obtained by sampling entire clusters.
12
-
8/10/2019 Wooldridge Session 5
13/57
The problem of proper inference whenMgis large relative toG theMoulton (1990) problem has been recently studied by Donald and
Lang (2007).
DL treat the parameters associated with the different groups asoutcomes of random draws.
13
-
8/10/2019 Wooldridge Session 5
14/57
Simplest case: a single regressor that varies only by group:ygm xg cg ugm
g xg ugm.
(5)
(6)
(6) has a common slope,but intercept,g, that varies acrossg.
Donald and Lang focus on (5), wherecgis assumed to be independent
ofxgwith zero mean. Define the composite errorvgm cg ugm.
Standard pooled OLS inference applied to (5) can be badly biased
because it ignores the cluster correlation. And we cannot use fixed
effects.
14
-
8/10/2019 Wooldridge Session 5
15/57
DL propose studying the regression in averages:y g xg v g,g 1, . . . , G. (7)
If we add some strong assumptions, we can perform inference on (7)
using standard methods. In particular, assume thatMg Mfor allg,
cg|xg Normal0,c2andugm|xg, cg Normal0,u2. Thenv gis
independent ofxgandv g Normal0,c2 u2/M. Because we assume
independence acrossg, (7) satisfies the classical linear model
assumptions.
15
-
8/10/2019 Wooldridge Session 5
16/57
So, we can just use the between regression,y gon 1, xg, g 1, . . . , G. (8)
With same group sizes, identical to pooled OLS acrossgandm.
Conditional on thexg, inherits its distribution from
v g : g 1, . . . , G, the within-group averages of the composite errors.
We can use inference based on thetG2distribution to test hypotheses
about, providedG 2.
IfGis small, the requirements for a significanttstatistic using the
tG2distribution are much more stringent then if we use thetM1M2...MG2distribution (traditional approach).
16
-
8/10/2019 Wooldridge Session 5
17/57
Using the between regression isnotthe same as using cluster-robust
standard errors for pooled OLS. Those are not justified and, anyway,
we would use the wrong df in the tdistribution.
So the DL method uses a standard error from the aggregatedregression and degrees of freedomG2.
We can apply the DL method without normality of theugmif the
group sizes are large becauseVarv g c2 u2/Mgso that gis a
negligible part ofv g. But we still need to assumecgis normally
distributed.
17
-
8/10/2019 Wooldridge Session 5
18/57
Ifzgmappears in the model, then we can use the averaged equation
y g xg z gvg,g 1, . . . , G, (9)
providedG KL 1.
Ifcgis independent ofxg,z gwith a homoskedastic normal
distribution, and the group sizes are large, inference can be carried out
using thetGKL1distribution. Regressions like (9) are reasonably
common, at least as a check on results using disaggregated data, but
usually with largerGthen just a handful.
18
-
8/10/2019 Wooldridge Session 5
19/57
IfG 2, should we give up? Supposexgis binary, indicating
treatment and control (g 2 is the treatment,g 1 is the control). The
DL estimate ofis the usual one: y 2 y 1. But in the DL setting,
we cannot do inference (there are zero df). So, the DL setting rules outthe standard comparison of means.
19
-
8/10/2019 Wooldridge Session 5
20/57
Can we still obtain inference on estimated policy effects using
randomized or quasi-randomized interventions when the policy effects
are just identified? Not according the DL approach.
Ifygm wgm the change of some variable over time andxgisbinary, then application of the DL approach to
wgm xg cg ugm,
leads to a difference-in-differences estimate: w2 w1. But
inference is not available no matter the sizes ofM1andM2.
20
-
8/10/2019 Wooldridge Session 5
21/57
w
2 w
1has been a workhorse in the quasi-experiemental
literature, and obtaining inference in the traditional setting is
straightforward [Card and Krueger (1994), for example.]
Even when DL approach can be applied, should we? SupposeG 4with two control groups (x1 x2 0) and two treatment groups
(x3 x4 1). DL involves the OLS regressiony gon 1,xg,
g 1, . . . , 4; inference is based on thet2distribution.
21
-
8/10/2019 Wooldridge Session 5
22/57
Can show the DL estimate is
y 3 y 4/2y 1 y 2/2. (10)
With random sampling from each group, is approximately normal
even with moderate group sizesMg. In effect, the DL approach rejects
usual inference based on means from large samples because it may not
be the case that1 2and3 4.
22
-
8/10/2019 Wooldridge Session 5
23/57
Why not tackle mean heterogeneity directly? Could just define the
treatment effect as
3 4/21 2/2,
or weight by population frequencies.
23
-
8/10/2019 Wooldridge Session 5
24/57
-
8/10/2019 Wooldridge Session 5
25/57
With large group sizes, and whether or notGis especially large, we
can put the problem into an MD framework, as done by Loeb and
Bound (1996), who hadG 36 cohort-division groups and many
observations per group. For each groupg, write
ygm g zgmg ugm. (11)
Assume random sampling within group and independence across
groups. OLS estimates within group are Mg-asymptotically normal.
25
-
8/10/2019 Wooldridge Session 5
26/57
The presence ofxgcan be viewed as putting restrictions on the
intercepts:
g xg,g 1, . . . , G, (12)
where we think ofxgas fixed, observed attributes of heterogeneous
groups. WithKattributes we must haveG K1 to determineand
. In the first stage, obtaing, either by group-specific regressions or
pooling to impose some common slope elements ing
.
26
-
8/10/2019 Wooldridge Session 5
27/57
Let Vbe theG Gestimated (asymptotic) variance of . Let Xbe the
GK1matrix with rows1,xg. The MD estimator is
XV1X1X V
1 (13)
Asymptotics are as theMgget large, and has an asymptotic normal
distribution; its estimated asymptotic variance isXV1X1.
When separate group regressions are used, the
gare independent andV is diagonal.
Estimator looks like GLS, but inference is withG(number of rows
in X) fixed withMggrowing.
27
-
8/10/2019 Wooldridge Session 5
28/57
Can test the overidentification restrictions. If reject, can go back to
the DL approach, applied to theg. With large group sizes, can analyze
g xg cg,g 1, . . . , G (14)
as a classical linear model becauseg g OpMg1/2, providedcgis
homoskedastic, normally distributed, and independent ofxg.
Alternatively, can define the parameters of interest in terms of theg,
as in the treatment effects case.
28
-
8/10/2019 Wooldridge Session 5
29/57
4.Multiple Groups and Time Periods
With many time periods and groups, setup in BDM (2004) and
Hansen (2007a) is useful. At the individual level,
y igt t g xgt zigtgt vgt uigt,i 1, . . . ,Mgt,
(15)
whereiindexes individual,gindexes group, andtindexes time. Full set
of time effects,t, full set of group effects,g, group/time period
covariates (policy variabels), xgt, individual-specific covariates, z igt,
unobserved group/time effects,vgt, and individual-specific errors,uigt.
Interested in.
29
-
8/10/2019 Wooldridge Session 5
30/57
As in cluster sample cases, can write
y igt gt z igtgt uigt, i 1, . . . ,Mgt; (16 )
a model at the individual level where intercepts and slopes are allowed
to differ across allg, tpairs. Then, think ofgtas
gt t g xgt vgt. (17)
Think of (17) as a model at the group/time period level.
30
-
8/10/2019 Wooldridge Session 5
31/57
As discussed by BDM, a common way to estimate and perform
inference in the individual-level equation
y igt t g xgt z igtvgt uigt
is to ignorevgt, so the individual-level observations are treated as
independent. Whenvgtis present, the resulting inference can be very
misleading.
BDM and Hansen (2007b) allow serial correlation in
vgt : t 1, 2, . . . , Tbut assume independence acrossg.
We cannot replacet
ga full set of group/time interactionsbecause that would eliminatexgt.
31
-
8/10/2019 Wooldridge Session 5
32/57
If we viewingt t g xgt vgtas ultimately of interest
which is usually the case because xgtcontains the aggregate policy
variables there are simple ways to proceed. We observe xgt,tis
handled with year dummies,andgjust represents group dummies. Theproblem, then, is that we do not observegt.
But we can use OLS on the individual-level data to estimate thegtin
yigt gt zigtgt u igt, i 1, . . . ,Mgt
assumingEz igt uigt 0and the group/time period sample sizes,Mgt,
are reasonably large.
32
-
8/10/2019 Wooldridge Session 5
33/57
Sometimes one wishes to impose some homogeneity in the slopes
say,gt
gor even
gt in which case pooling across groups
and/or time can be used to impose the restrictions.
However we obtain the
gt, proceed as ifMgtare large enough to
ignore the estimation error in thegt; instead, the uncertainty comes
throughvgtingt t g xgt vgt.
The minimum distance (MD) approach effectively drops vgtand
viewsgt t g xgtas a set of deterministic restrictions to be
imposed ongt. Inference using the efficient MD estimator uses only
sampling variation in thegt.
33
-
8/10/2019 Wooldridge Session 5
34/57
Here, proceed ignoring estimation error, and actas if
gt t g xgt vgt. (18)
We can apply the BDM findings and Hansen (2007a) results directly
to this equation. Namely, if we estimate (18) by OLS which means
full year and group effects, along with xgt then the OLS estimator has
satisfying large-sample properties asGandTboth increase, provided
vgt : t 1, 2, . . . , Tis a weakly dependent time series for allg.
Simulations in BDM and Hansen (2007a) indicate cluster-robust
inference works reasonably well whenv
gtfollows a stable AR(1)
model andGis moderately large.
34
-
8/10/2019 Wooldridge Session 5
35/57
Hansen (2007b) shows how to improve efficiency by using feasible
GLS by modelingvgtas, say, an AR(1) process.
Naive estimators ofare seriously biased due to panel structure with
group fixed effects. Can remove much of the bias and improve FGLS. Important practical point: FGLS estimators that exploit serial
correlation require strict exogeneity of the covariates, even with large
T. Policy assignment might depend on past shocks.
35
-
8/10/2019 Wooldridge Session 5
36/57
5.Individual-Level Panel Data
Letwitbe a binary indicator, which is unity if unitiparticipates in the
program at timet. Consider
y it d2t wit c i uit,t 1, 2, (19)
whered2t 1 ift 2 and zero otherwise,c iis an observed effectis
the treatment effect. Removec iby first differencing:
yi2 y i1 wi2 wi1 ui2 ui1 (20)
y i wi ui. (21)
IfEwiui 0, OLS applied to (21) is consistent.
36
-
8/10/2019 Wooldridge Session 5
37/57
Ifwi1 0 for alli, the OLS estimate is
FD y treat y control, (22)
which is a DD estimate except that we different the means of the same
units over time.
It isnotmore general to regressy i2on 1, wi2,yi1,i 1, . . . ,N, even
though this appears to free up the coefficient ony i1. Why? Under (19)
withwi1 0 we can write
y i2 wi2 yi1 ui2 u i1. (23)
Now, ifEui2|wi2, c i, ui1 0 thenui2is uncorrelated withyi1, andy i1
andu i1are correlated. Soyi1is correlated withui2 ui1 ui.
37
-
8/10/2019 Wooldridge Session 5
38/57
In fact, if we add the standard no serial correlation assumption,
Eui1ui2|wi2, c i 0, and write the linear projection
wi2 0 1y i1 ri2, then can show that
plim LDV 1u12 /r22
where
1 Covc i, wi2/c2 u12 .
For example, ifwi2indicates a job training program and less
productive workers are more likely to participate (1 0), then the
regressiony i2(oryi2) on 1, wi2,yi1underestimates the effect.
38
-
8/10/2019 Wooldridge Session 5
39/57
If more productive workers participate, regressingy i2(ory i2) on 1,
wi2,y i1overestimates the effect of job training.
Following Angrist and Pischke (2009), suppose we use the FD
estimator when, in fact, unconfoundedness of treatment holdsconditional onyi1(and the treatment effect is constant). Then we can
write
yi2 wi2 y i1 ei2
Eei2 0, Covwi2, e i2 Covyi1, e i2 0.
39
-
8/10/2019 Wooldridge Session 5
40/57
Write the equation as
y i2 wi2 1yi1 e i2
wi2 y i1 e i2
Then, of course, the FD estimator generally suffers from omitted
variable bias if 1. We have
plim
FD
Covwi2,y i1
Varwi2
If 0 ( 1) andCovwi2,y i1 0 workers observed with low
first-period earnings are more likely to participate theplim FD ,
and so FD overestimates the effect.
40
-
8/10/2019 Wooldridge Session 5
41/57
We might expectto be close to unity for processes such as
earnings, which tend to be persistent. (measures persistence without
conditioning on unobserved heterogeneity.)
As an algebraic fact, if
0 (as it usually will be even if
1) andwi2andyi1are negatively correlated in the sample, FD LDV. But this
does not tell us which estimator is consistent.
If either is close to zero orwi2andy i1are weakly correlated, addingy i1can have a small effect on the estimate of.
41
-
8/10/2019 Wooldridge Session 5
42/57
With many time periods and arbitrary treatment patterns, we can use
yit t wit xitc i u it, t 1, . . . , T, (24)
which accounts for aggregate time effects and allows for controls, xit.
Estimation by FE or FD to removec iis standard, provided the policy
indicator,wit, is strictly exogenous: correlation beweenwitanduirfor
anytandrcauses inconsistency in both estimators (with FE having
advantages for largerTifuitis weakly dependent).
42
-
8/10/2019 Wooldridge Session 5
43/57
What if designation is correlated with unit-specific trends?
Correlated random trend model:
yit c i gitt wit xituit (25)
wheregiis the trend for unit i. A general analysis allows arbitrary
corrrelation betweenc i,giandwit, which requires at leastT3. If
we first difference, we get, fort 2, . . . , T,
yit gi t wit xit uit. (26)
Can difference again or estimate (26) by FE.
43
-
8/10/2019 Wooldridge Session 5
44/57
Can derive panel data approaches using the counterfactural
framework from the treatment effects literature.
For eachi, t, lety it1andy it0denote the counterfactual outcomes,
and assume there are no covariates. Unconfoundedness, conditional onunobserved heterogeneity, can be stated as
Ey it0|wi,ci Ey it0|ci
Ey it1|wi,ci Ey it1|ci,
(27)
(28)
where wi wi1, . . . , wiTis the time sequence of all treatments.
Suppose the gain from treatment only depends ont,
Ey it1|c i Eyit0|c i t. (29)
44
-
8/10/2019 Wooldridge Session 5
45/57
Then
Ey it|wi,c i Eyit0|c i twit (30)
whereyi1 1wity it0 wity it1. If we assume
Ey it0|c i t0 c i0, (31)
then
Eyit|wi,c i t0 ci0 twit, (32)
an estimating equation that leads to FE or FD (often with t .
45
-
8/10/2019 Wooldridge Session 5
46/57
If add strictly exogenous covariates and allow the gain from treatment
to depend on xitand an additive unobserved effectai, get
Ey it|wi,xi,ci t0 twit xit0
wit xitt ci0 a i wit,
(33)
a correlated random coefficient model because the coefficient on witis
t ai. Can eliminatea i(andc i0. Or, with t , can estimate the
i aiand then use
N1i1
N
i. (34)
46
-
8/10/2019 Wooldridge Session 5
47/57
WithT3, can also get to a random trend model, wheregitis added
to (25). Then, can difference followed by a second difference or fixed
effects estimation on the first differences. Witht ,
y it t wit xit0 wit xitt a i wit gi uit. (35)
Might ignoreaiwit, using the results on the robustness of the FE
estimator in the presence of certain kinds of random coefficients, or,
again, estimate i aifor eachiand form (34).
47
-
8/10/2019 Wooldridge Session 5
48/57
As in the simpleT 2 case, using unconfoundedness conditional on
unobserved heterogeneity and strictly exogenous covariates leads to
different strategies than assuming unconfoundedness conditional on
past responses and outcomes of other covariates. In the latter case, we might estimate propensity scores, for eacht, as
Pwit 1|y i,t1, . . . ,y i1, wi,t1, . . . , wi1,xit.
48
-
8/10/2019 Wooldridge Session 5
49/57
6.Semiparametric and Nonparametric Approaches
Consider the setup of Heckman, Ichimura, Smith, and Todd (1997)
and Abadie (2005), with two time periods. No units treated in first time
period. Without anisubscript,Ytwis the counterfactual outcome for
treatment levelw,w 0,1, at timet. Parameter: the average treatment
effect on the treated,
att EY11Y10|W 1. (36)
W 1 means treatment in the second time period.
49
-
8/10/2019 Wooldridge Session 5
50/57
Along withY01 Y00(no counterfactual in time period zero),
key unconfoundedness assumption:
EY10Y00|X, W EY10Y00|X (37)
Also the (partial) overlap assumption is critical foratt
PW 1|X 1 (38)
or the full overlap assumption forate EY1
1Y1
0,
0 PW 1|X 1.
50
-
8/10/2019 Wooldridge Session 5
51/57
Under (37) and (38),
att E WpXY1 Y0
1 pX (39)
whereYt,t
0,1are the observed outcomes (for the same unit), PW 1is the unconditional probability of treatment, and
pX PW 1|Xis the propensity score.
51
-
8/10/2019 Wooldridge Session 5
52/57
-
8/10/2019 Wooldridge Session 5
53/57
If we add
EY11 Y01|X, W EY11Y01|X, (41)
a similar approach works forate.
ate N1 i1
NWi pXiYi
pXi1pXi (42)
53
-
8/10/2019 Wooldridge Session 5
54/57
7.Synthetic Control Methods for Comparative Case Studies
Abadie, Diamond, and Hainmueller (2007) argue that in policy
analysis at the aggregate level, there is little or no estimation
uncertainty: the goal is to determine the effect of a policy on an entire
population, and the aggregate is measured without error (or very little
error). Application: Californias tobacco control program on state-wide
smoking rates. ADH focus on the uncertainty with choosing a suitable control for
California among other states (that did not implement comparable
policies over the same period).
54
-
8/10/2019 Wooldridge Session 5
55/57
ADH suggest using many potential control groups (38 states or so) to
create a single synthetic control group.
Two time periods: one before the policy and one after. Lety itbe the
outcome for unitiin timet, withi 1 the treated unit. Suppose there
areJpossible control units, and index these as2, . . . ,J1. Let xibe
observed covariates for unitithat are not (or would not be) affected by
the policy; ximay contain periodt 2 covariates provided they are notaffected by the policy.
55
-
8/10/2019 Wooldridge Session 5
56/57
Generally, we can estimate the effect of the policy as
y12 j2
J1
wjyj2, (43)
wherewjare nonnegative weights that add up to one. How to choose
the weights to best estimate the intervention effect?
56
-
8/10/2019 Wooldridge Session 5
57/57
ADH propose choosing the weights so as to minimize the distance
betweeny11,x1andj2J1
wj yj1,xj, say. That is, functions of the
pre-treatment outcomes and the predictors of post-treatment outcomes.
ADH propose permutation methods for inference, which requireestimating a placebo treatment effect for each region, using the same
synthetic control method as for the region that underwent the
intervention.
57