csdid: difference-in-differences with multiple time

47
csdid: Difference-in-Differences with Multiple Time Periods in Stata Fernando Rios-Avila Levy Economics Institute Brantly Callaway University of Georgia Pedro H. C. Sant’Anna Microsoft and Vanderbilt University Stata Conference, August 2021

Upload: others

Post on 24-Jun-2022

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: csdid: Difference-in-Differences with Multiple Time

csdid: Difference-in-Differences with Multiple TimePeriods in Stata

Fernando Rios-AvilaLevy Economics Institute

Brantly CallawayUniversity of Georgia

Pedro H. C. Sant’AnnaMicrosoft and Vanderbilt University

Stata Conference, August 2021

Page 2: csdid: Difference-in-Differences with Multiple Time

Big shout-out

• This project would not have reach its current point without the help and pushof many.

• Special thanks goes to

• Austin Nichols (Abt Associates)

• Enrique Pinzón (Stata Corp)

• Asjad Naqvi (International Institute for Applied Systems Analysis)

1

Page 3: csdid: Difference-in-Differences with Multiple Time

Big Picture

Page 4: csdid: Difference-in-Differences with Multiple Time

Big Picture: Problems of common practice - I

• Consider a setup with variation in treatment timing and heterogeneoustreatment effects.

• Researchers routinely interpret βTWFE associated with the TWFEspecification

Yi,t = αi + αt + βTWFE Di,t + ε i,t ,

as “a causal parameter of interest”.

• However, βTWFE is not guaranteed to recover an interpretable causalparameter (Borusyak and Jaravel, 2017; de Chaisemartin and D’Haultfœuille, 2020;

Goodman-Bacon, 2021).

2

Page 5: csdid: Difference-in-Differences with Multiple Time

Big Picture: Problems of common practice - II

• Researchers also routinely consider “dynamic” variations of the TWFEspecification,

Yi,t = αi + αt + γ−Kk D<−K

i,t +−2

∑k=−K

γleadk Dk

i,t +L

∑k=0

γlagsk Dk

i,t + γL+k D>L

i,t + ε i,t

with the event study dummies Dki,t = 1 {t −Gi = k}, where Gi indicates the

period unit i is first treated (Group).

• Dki,t is an indicator for unit i being k periods away from initial treatment at time

t .

• Sun and Abraham (2020) demonstrated the the γ’s cannot be rigorouslyinterpreted as reliable measures of “dynamic treatment effects”. 3

Page 6: csdid: Difference-in-Differences with Multiple Time

The heart of the drawbacks

• The heart of the these problems with these TWFE specifications is that OLSis “variational hungry”.

• OLS attempts to compare all cohorts with each other, as long as there is“variation in treatment status” in that given time-window.

• It doesn’t care about “treatment” and “comparison” groups.

• It is all about minimizing MSE.

• Causal inference is about only exploiting the good variation, i.e., thosethat respect our assumptions.

4

Page 7: csdid: Difference-in-Differences with Multiple Time

How to tackle the problems?

• With this insight in mind, it is clear what we need to do.

• We need to enforce that our estimation and inference procedure use thevariations that we want it use.

• Callaway and Sant’Anna (2020) propose a transparent way to proceed withthis insight in DiD setups with multiple time periods.

• Today’s talk is all about how to implement it with our Stata command, csdid.

5

Page 8: csdid: Difference-in-Differences with Multiple Time

How to tackle the problems?

• With this insight in mind, it is clear what we need to do.

• We need to enforce that our estimation and inference procedure use thevariations that we want it use.

• Callaway and Sant’Anna (2020) propose a transparent way to proceed withthis insight in DiD setups with multiple time periods.

• Today’s talk is all about how to implement it with our Stata command, csdid.

5

Page 9: csdid: Difference-in-Differences with Multiple Time

Framework and Assumptions

Page 10: csdid: Difference-in-Differences with Multiple Time

Framework

• csdid accommodates both panel data and repeated cross section data.

• For simplicity, I’ll focus on the panel data case.

• Consider a random sample

{(Yi,1,Yi,2, . . . ,Yi,T ,Di,1,Di,2, . . . ,Di,T ,Xi)}ni=1

where Di,t = 1 if unit i is treated in period t , and 0 otherwise

• Gi,g = 1 if unit i is first treated at time g , and zero otherwise (“Treatmentstarting-time / Cohort dummies” )

• C = 1 is a “never-treated” comparison group (not required, though)

• Staggered treatment adoption: Di,t = 1 =⇒ Di,t+1 = 1, for t = 1,2, . . . , T .6

Page 11: csdid: Difference-in-Differences with Multiple Time

Framework (cont.)

• Limited Treatment Anticipation: There is a known δ ≥ 0 s.t.

E[Yt (g)|X ,Gg = 1] = E[Yt (0)|X ,Gg = 1] a.s..

for all g ∈ G, t ∈ 1, . . . , T such that t < g − δ︸ ︷︷ ︸“before effective starting date”

.

• For simplicity, let’s take δ = 0, which is arguably the norm in the literature.

• Generalized propensity score uniformly bounded away from 1:

pg,t (X ) = P (Gg = 1|X ,Gg + (1−Dt )(1−Gg) = 1) ≤ 1− ε a.s.

7

Page 12: csdid: Difference-in-Differences with Multiple Time

Parameter of interest (or the building block of the analysis)

• Parameter of interest:

ATT (g, t) = E [Yt (g)− Yt (0) |Gg = 1] , for t ≥ g.

Average treatment effect for the group of units first treated at time period g, incalendar time t .

8

Page 13: csdid: Difference-in-Differences with Multiple Time

Parallel trend assumption based on a “never treated” group

Assumption (Conditional Parallel Trends based on a “never-treated” group)For each t ∈ {2, . . . , T }, g ∈ G such that t ≥ g,

E[Yt (0)− Yt−1(0)|X ,Gg = 1] = E[Yt (0)− Yt−1(0)|X ,C = 1] a.s..

9

Page 14: csdid: Difference-in-Differences with Multiple Time

Parallel Trends based on not-yet treated groups

Assumption (Conditional Parallel Trends based on “Not-Yet-Treated”Groups)

For each (s, t) ∈ {2, . . . , T } × {2, . . . , T }, g ∈ G such that t ≥ g, s ≥ t

E[Yt (0)− Yt−1(0)|X ,Gg = 1] = E[Yt (0)− Yt−1(0)|X ,Ds = 0,Gg = 0] a.s..

10

Page 15: csdid: Difference-in-Differences with Multiple Time

Recovering the ATT(g,t)’s

Page 16: csdid: Difference-in-Differences with Multiple Time

What if the identifying assumptions hold unconditionally?

• In the case where covariates do not play a major role into the DiDidentification analysis, and one is comfortable using the “never treated” ascomparison group,

ATT nevunc (g, t) = E[Yt − Yg−1|Gg = 1]−E[Yt − Yg−1|C = 1].

• If one prefers to use the “not-yet treated” as comparison groups,

ATT nyunc(g, t) = E[Yt − Yg−1|Gg = 1]−E[Yt − Yg−1|Dt = 0,Gg = 0].

• Estimation: use the analogy principle!

• Inference: many comparisons of means!

11

Page 17: csdid: Difference-in-Differences with Multiple Time

What if the identifying assumptions hold unconditionally?

• In the case where covariates do not play a major role into the DiDidentification analysis, and one is comfortable using the “never treated” ascomparison group,

ATT nevunc (g, t) = E[Yt − Yg−1|Gg = 1]−E[Yt − Yg−1|C = 1].

• If one prefers to use the “not-yet treated” as comparison groups,

ATT nyunc(g, t) = E[Yt − Yg−1|Gg = 1]−E[Yt − Yg−1|Dt = 0,Gg = 0].

• Estimation: use the analogy principle!

• Inference: many comparisons of means!11

Page 18: csdid: Difference-in-Differences with Multiple Time

Identification results - never treated as comparison group

• When covariates play an important role and we use the “never treated” unitsas comparison group, Callaway and Sant’Anna (2020) show you can usethree estimation methods: OR, IPW or DR (AIPW).

• Here we show the AIPW/DR estimand:

ATT nevdr (g, t) = E

Gg

E [Gg ]−

pg (X )C1− pg (X )

E

[pg (X )C

1− pg (X )

](Yt − Yg−1 −mnev

g,t (X )) .

where mnevg,t (X ) = E

[Yt − Yg−1|X ,C = 1

].

• Extends Heckman, Ichimura and Todd (1997); Abadie (2005); Sant’Anna andZhao (2020) 12

Page 19: csdid: Difference-in-Differences with Multiple Time

Identification results - never treated as comparison group

• When covariates play an important role and we use the “never treated” unitsas comparison group, Callaway and Sant’Anna (2020) show you can usethree estimation methods: OR, IPW or DR (AIPW).

• Here we show the AIPW/DR estimand:

ATT nevdr (g, t) = E

Gg

E [Gg ]−

pg (X )C1− pg (X )

E

[pg (X )C

1− pg (X )

](Yt − Yg−1 −mnev

g,t (X )) .

where mnevg,t (X ) = E

[Yt − Yg−1|X ,C = 1

].

• Extends Heckman et al. (1997); Abadie (2005); Sant’Anna and Zhao (2020)

12

Page 20: csdid: Difference-in-Differences with Multiple Time

Identification results - never treated as comparison group

• When covariates play an important role and we use the “never treated” unitsas comparison group, Callaway and Sant’Anna (2020) show you can usethree estimation methods: OR, IPW or DR (AIPW).

• Here we show the AIPW/DR estimand:

ATT nevdr (g, t) = E

Gg

E [Gg ]−

pg (X )C1− pg (X )

E

[pg (X )C

1− pg (X )

](Yt − Yg−1 −mnev

g,t (X )) .

where mnevg,t (X ) = E

[Yt − Yg−1|X ,C = 1

].

• Extends Heckman et al. (1997); Abadie (2005); Sant’Anna and Zhao (2020)

12

Page 21: csdid: Difference-in-Differences with Multiple Time

Identification results - never treated as comparison group

• When covariates play an important role and we use the “never treated” unitsas comparison group, Callaway and Sant’Anna (2020) show you can usethree estimation methods: OR, IPW or DR (AIPW).

• Here we show the AIPW/DR estimand:

ATT nevdr (g, t) = E

Gg

E [Gg ]−

pg (X )C1− pg (X )

E

[pg (X )C

1− pg (X )

](Yt − Yg−1 −mnev

g,t (X )) .

where mnevg,t (X ) = E

[Yt − Yg−1|X ,C = 1

].

• Extends Heckman et al. (1997); Abadie (2005); Sant’Anna and Zhao (2020)

12

Page 22: csdid: Difference-in-Differences with Multiple Time

Identification results - not-yet treated as comparison group

• Callaway and Sant’Anna (2020) show you can get analogous results whenusing “not-yet treated” units as the comparison group.

• Here we show the AIPW/DR estimand:

ATT nydr (g, t) = E

Gg

E [Gg ]−

pg,t (X ) (1−Dt )

1− pg,t (X )

E

[pg,t (X ) (1−Dt )

1− pg,t (X )

](Yt − Yg−1 −mny

g,t (X )) .

where mnyg,t (X ) = E

[Yt − Yg−1|X ,Dt = 0,Gg = 0

]. .

• Extends Heckman et al. (1997); Abadie (2005); Sant’Anna and Zhao (2020),too.

13

Page 23: csdid: Difference-in-Differences with Multiple Time

Stata Implementation

Page 24: csdid: Difference-in-Differences with Multiple Time

Let’s get start with the csdid package in Stata

We first need to install csdid and its sister package, drdid, that implementsSant’Anna and Zhao (2020); see Rios-Avila, Naqvi and Sant’Anna (2021)

* Let's first install drdidssc install drdid, all replace

* Now let's install csdidssc install csdid, all replace

I strongly recommend that you take a look at our help files:

* Help file for csdidhelp csdid

* Help file for Post-estimation procedures associated with csdidhelp csdid_postestimation

14

Page 25: csdid: Difference-in-Differences with Multiple Time

Let’s get start with the csdid package in Stata

We first need to install csdid and its sister package, drdid, that implementsSant’Anna and Zhao (2020); see Rios-Avila et al. (2021)

* Let's first install drdidssc install drdid, all replace

* Now let's install csdidssc install csdid, all replace

I strongly recommend that you take a look at our help files:

* Help file for csdidhelp csdid

* Help file for Post-estimation procedures associated with csdidhelp csdid_postestimation

14

Page 26: csdid: Difference-in-Differences with Multiple Time

csdid syntax

csdid depvar [indepvars] [if] [in] [weight], [ivar(varname)] time(varname) gvar(varname) [ options ]

• depvar: Outcome of interest

• indepvars: Optional vector of covariates

• weight: Optional vector of (sampling) weights

• ivar: Cross-sectional identifier

• time: time-series identifier

• gvar: Treatment-group (cohort) identifier (0 for never-treated)

• options: where a lot of action takes place - important for choice of comparison group,estimation method and type of inference procedure

15

Page 27: csdid: Difference-in-Differences with Multiple Time

csdid syntax - some additional details inside option

• notyet: Use not-yet-treated units as comparison group. If not set, we will usenever-treated (if any).

• method(method): Select the estimation method to be used (only relevant ifthere are covariates). Current options are

• drimp (default): Implement improved doubly robust DiD estimator based oninverse probability of tilting and weighted least squares (Sant’Anna and Zhao,

2020).• dripw: Implement doubly robust DiD estimator based on IPW and OLS.

(Sant’Anna and Zhao, 2020; Callaway and Sant’Anna, 2020)

• reg: Implement outcome regression DiD estimator based on OLS (Heckman et

al., 1997; Callaway and Sant’Anna, 2020).• ipw: Implement (stabilized) IPW DiD estimator (Abadie, 2005; Callaway and

Sant’Anna, 2020). 16

Page 28: csdid: Difference-in-Differences with Multiple Time

What about Post-Estimation?

• csdid_plot: Command for plotting results from csdid.

• Need to specify the group you want to plot the effects;

• style(styleoption): Allows you to change the style of the plot.The options are rspike (default), rarea, rcap and rbar.

• csdid_stats pretrend or estat pretrend: estimates the chi2 statistic of the nullhypothesis that all pretreatment ATT (g, t)’s are equal to zero.

17

Page 29: csdid: Difference-in-Differences with Multiple Time

Illustration

Page 30: csdid: Difference-in-Differences with Multiple Time

Example using subset of data from CS2020

In this illustration, we will use a subset of the Callaway and Sant’Anna (2020)dataset.

This serves purely for syntax illustration!

18

Page 31: csdid: Difference-in-Differences with Multiple Time

Unconditional DiD with never-treated as comparison group

19

Page 32: csdid: Difference-in-Differences with Multiple Time

Unconditional DiD with never-treated as comparison group

20

Page 33: csdid: Difference-in-Differences with Multiple Time

Conditional IPW-based DiD with not-yet-treated as comp. group

21

Page 34: csdid: Difference-in-Differences with Multiple Time

Conditional IPW-based DiD with not-yet-treated as comp. group

22

Page 35: csdid: Difference-in-Differences with Multiple Time

Aggregating the ATT(g,t)’s

Page 36: csdid: Difference-in-Differences with Multiple Time

Summarizing

• Since we have been “sub-setting the data” to get ATT (g, t)’s, you may bewondering: “Are we throwing away information?”

• Alternatively, you may be wondering how to better communicate the results,specially in setups with many groups/period.

• Aggregation of causal effects is something empiricist commonly pursue:• Run a TWFE “static” regression and focus on the β associated with the

treatment.

• Run a TWFE event-study regression and focus on β associated with thetreatment leads and lags.

• Collapse data into a 2 x 2 Design (average pre and post treatment periods).23

Page 37: csdid: Difference-in-Differences with Multiple Time

Summarizing Causal Effects

• Callaway and Sant’Anna (2020) propose taking weighted averages of theATT (g, t) of the form:

T∑

g=2

T∑t=2

1{g ≤ t}wgtATT (g, t)

• Name-of-the-game: we must choose “reasonable” weights such that theaggregated causal effect is easy-to-interpret.

24

Page 38: csdid: Difference-in-Differences with Multiple Time

Summarizing Causal Effects

• Callaway and Sant’Anna (2020) suggest some arguably intuitive weightingschemes, including

• Simple weighted-average of all ATT (g, t)’s:

θsimpleW :=

T∑g=2

T∑t=2

1{g ≤ t}ATT (g, t)P(G = g|C 6= 1) (1)

• Average effect of participating in the treatment for the group of units that havebeen exposed to the treatment for exactly e time periods

θeventD (e) =

T∑g=2

1{g + e ≤ T }ATT (g,g + e)P(G = g|G + e ≤ T ,C 6= 1)

• Implement in Stata via: estat all or csdid_stats all 25

Page 39: csdid: Difference-in-Differences with Multiple Time

Conditional DR-based DiD with never-treated as comp. group

26

Page 40: csdid: Difference-in-Differences with Multiple Time

Conditional DR-based DiD with never-treated as comp. group

27

Page 41: csdid: Difference-in-Differences with Multiple Time

Conditional DR-based DiD with never-treated as comp. group

28

Page 42: csdid: Difference-in-Differences with Multiple Time

Conditional DR-based DiD with never-treated as comp. group

29

Page 43: csdid: Difference-in-Differences with Multiple Time

Conclusion

Page 44: csdid: Difference-in-Differences with Multiple Time

Conclusion

• Callaway and Sant’Anna (2020) proposes semi-parametric DiD estimatorswhen there are multiple time-periods and variation in treatment timing.

• These tools are attractive because they are transparent and avoid weightingproblems associated with TWFE specifications.

• csdid provide a native Stata implementation of these methods.

• Embrace TE heterogeneity in the same way as teffects does in cross-sectionsetups.

30

Page 45: csdid: Difference-in-Differences with Multiple Time

References

Page 46: csdid: Difference-in-Differences with Multiple Time

Abadie, Alberto, “Semiparametric Difference-in-Differences Estimators,” TheReview of Economic Studies, 2005, 72 (1), 1–19.

Borusyak, Kirill and Xavier Jaravel, “Revisiting Event Study Designs,” SSRNScholarly Paper ID 2826228, Social Science Research Network, Rochester, NYAugust 2017.

Callaway, Brantly and Pedro H. C. Sant’Anna, “Difference-in-Differences withMultiple Time Periods,” Journal of Econometrics, 2020, (Forthcoming).

de Chaisemartin, Clément and Xavier D’Haultfœuille, “Two-Way Fixed EffectsEstimators with Heterogeneous Treatment Effects,” American EconomicReview, 2020, 110 (9), 2964–2996.

Goodman-Bacon, Andrew, “Difference-in-Differences with Variation in TreatmentTiming,” Journal of Econometrics, 2021, (Forthcoming).

30

Page 47: csdid: Difference-in-Differences with Multiple Time

Heckman, James J., Hidehiko Ichimura, and Petra E. Todd, “Matching As AnEconometric Evaluation Estimator: Evidence from Evaluating a Job TrainingProgramme,” The Review of Economic Studies, October 1997, 64 (4), 605–654.

Rios-Avila, Fernando, Asjad Naqvi, and Pedro H. C. Sant’Anna, “drdid: Doublyrobust difference-in-differences estimators in Stata,” Working Paper, 2021.

Sant’Anna, Pedro H. C. and Jun Zhao, “Doubly robust difference-in-differencesestimators,” Journal of Econometrics, November 2020, 219 (1), 101–122.

Sun, Liyan and Sarah Abraham, “Estimating Dynamic Treatment Effects inEvent Studies with Heterogeneous Treatment Effects,” Journal of Econometrics,2020, (Forthcoming).

30