psm in stata using teffects
TRANSCRIPT
-
8/10/2019 PSM in Stata Using Teffects
1/6
12/12/2014 Propensity Score Matching in Stata using teffects
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
Propensity Score Matching in Stata using teffects
For many years, the standard too l for propensity score matching in Stata has been the psmatch2command, written by Edwin Leuven aBarbara Sianesi. However, Stata 13 introduced a new teffectscommand for estimating treatments effects in a variety of ways, includpropens ity score matching. The teffects psmatchcommand has one very important advantage over psmatch2: it takes into
account the fact that propensity scores are estimated rather than known when calculating standard errors. This often turns out to make asignificant difference, and sometimes in surprising ways. We thus strongly recommend switching from psmatch2to teffectspsmatch, and this article w ill help you make the transition.
An Example of Propensity Score Matching
Run the following command in Stata to load an example data set:
use http://ssc.wisc.edu/sscc/pubs/files/psm
It consists of four variables: a treatment indicator t, covariates x1and x2, and an outcome y. This is constructed data, and the effect ofthe treatment is in fact a one unit increase in y. However, the probability of treatment is positively correlated with x1and x2, and both xand x2are positively correlated with y. Thus simply comparing the mean value o f yfor the treated and untreated groups badlyoverestimates the e ffect of treatment:
ttest y, by(t)
(Regressing yon t, x1, and x2will give you a pretty good picture of the situation.)
The psmatch2command will give you a much better estimate of the treatment effect:
psmatch2 t x1 x2, out(y)
--------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-sta----------------------------+---------------------------------------------------------- y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.2 ATT | 1.8910736 .871388246 1.01968536 .173034999 5.8----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The teffects Command
You can carry out the same estimation with teffects. The basic syntax of the teffectscommand when used for propens ity scorematching is:
teffects psmatch (outcome) (treatmentcovariates)
In this case the basic command would be:
teffects psmatch (y) (t x1 x2)
However, the default behavior of teffectsis not the same as psmatch2so w e'll need to use some options to get the same resultsFirst, psmatch2by default reports the average treatment effect on the treated (which it refers to as ATT). The teffectscommand bdefault reports the average treatment e ffect (ATE) but will calculate the average treatment effect on the treated (which it refers to as ATE
if given the atetoption. Second, psmatch2by default uses a p robit model for the probability of treatment. The teffectscommanduses a logit model by default, but w ill use probit if the probitoption is applied to the treatment equation. So to run the same model
using teffectstype:
teffects psmatch (y) (t x1 x2, probit), atet
Treatment-effects estimation Number of obs = 1000Estimator : propensity-score matching Matches: requested = 1Outcome model : matching min = 1Treatment model: probit max = 1------------------------------------------------------------------------------ | AI Robust y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------ATET | t | (1 vs 0) | 1.019685 .1227801 8.30 0.000 .7790407 1.26033
-
8/10/2019 PSM in Stata Using Teffects
2/6
12/12/2014 Propensity Score Matching in Stata using teffects
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
------------------------------------------------------------------------------
The average trea tment effect on the treated is identical, other than being rounded at a d ifferent place. But note that teffectsreportsvery different standard error (we'll discuss why that is shortly), plus a Z-statistic, p-value, and 95% confidence interval rather than just a T
statistic.
Running teffectswith the default options gives the following:
teffects psmatch (y) (t x1 x2)
Treatment-effects estimation Number of obs = 1000Estimator : propensity-score matching Matches: requested = 1
Outcome model : matching min = 1Treatment model: logit max = 1------------------------------------------------------------------------------ | AI Robust y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------ATE | t | (1 vs 0) | 1.019367 .1164694 8.75 0.000 .7910912 1.247643------------------------------------------------------------------------------
This is equivalent to:
psmatch2 t x1 x2, out(y) logit ate
--------------------------------------------------------------------------------------- Variable Sample | Treated Controls Difference S.E. T-sta----------------------------+---------------------------------------------------------- y Unmatched | 1.8910736 -.423243358 2.31431696 .109094342 21.2 ATT | 1.8910736 .930722886 .960350715 .168252917 5.7 ATU |-.423243358 .625587554 1.04883091 . ATE | 1.01936701 . ----------------------------+----------------------------------------------------------Note: S.E. does not take into account that the propensity score is estimated.
The ATE from this model is very similar to the ATT/ATET from the previous model. But note that psmatch2is reporting a somewhatdifferent ATT in this model. The teffectscommand reports the same ATET if asked:
teffects psmatch (y) (t x1 x2), atet
Treatment-effects estimation Number of obs = 1000Estimator : propensity-score matching Matches: requested = 1Outcome model : matching min = 1Treatment model: logit max = 1------------------------------------------------------------------------------ | AI Robust y | Coef. Std. Err. z P>|z| [95% Conf. Interval]-------------+----------------------------------------------------------------ATET | t | (1 vs 0) | .9603507 .1204748 7.97 0.000 .7242245 1.196477------------------------------------------------------------------------------
Standard Errors
The output of psmatch2includes the following caveat:
Note: S.E. does not take into account that the propensity score is estimated.
A recent paper by Abadie and Imbens (2012. Matching on the estimated propensity score. Harvard University and National Bureau of
Economic Research) established how to take into account that propensity scores are estimated, and teffects psmatchrelies on thwork. Interestingly, the adjustment for ATE is always negative, leading to smaller standard errors: matching based on es timated propens i
scores turns out to be more efficient than matching based on true propens ity scores. However, for ATET the adjustment can be pos itive or
negative, so the standard errors reported by psmatch2may be too large or to small.
Handling Ties
Thus far we've used psmatch2and teffects psmatchto do s imple neares t-neighbor matching with one neighbor (and no calipeHowever, this raises the question of what to do when two observations have the same propensity score and a re thus tied for "nearest
neighbor." Ties are common if the covariates in the trea tment model are categorical or even integers.
http://www.hks.harvard.edu/fs/aabadie/pscore.pdfhttp://-/?- -
8/10/2019 PSM in Stata Using Teffects
3/6
12/12/2014 Propensity Score Matching in Stata using teffects
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
The psmatch2command by default matches w ith one of the tied observations, but w ith the tiesoption it matches with all tiedobservations. The teffects psmatchcommand always matches with all ties. If your data se t has multiple observations with the sapropens ity score, you won't get exactly the same results fromteffects psmatchas you were getting from psmatch2unless yougo back and add the tiesoption to your psmatch2commands. (At this time we are not aware of any clear guidance as to whether it better to match with ties or not.)
Matching With Multiple Neighbors
By default teffects psmatchmatches each observation with one other observation. You can change this with the nneighbor()(or just nn()) option. For example, you could match each observation with its three nearest neighbors with:
teffects psmatch (y) (t x1 x2), nn(3)
Postestimation
By default teffects psmatchdoes not add any new variables to the data set. However, there are a variety of useful variables thatcan be created with options and post-estimation predictcommands. The following table lists the 1st and 467th observations of theexample data set after some of these variables have been created. We'll refer to it as we explain the commands that created the new
variables. Reviewing these variables is also a good way to make sure you understand exactly how propensity score matching works.
+-------------------------------------------------------------------------------------------------------
| x1 x2 t y match1 ps0 ps1 y0 y1 te
|------------------------------------------------------------------------------------------------------
1. | .0152526 -1.793022 0 -1.79457 467 .9081651 .0918349 -1.79457 2.231719 4.026289
467. | -2.057838 .5360286 1 2.231719 781 .907606 .092394 -.6012772 2.231719 2.832996
+------------------------------------------------------------------------------------------------------
Start with a clean slate by typing:
use http://ssc.wisc.edu/sscc/pubs/files/psm, replace
The gen()option tells teffects psmatchto create a new variable (or variables). For each observation, this new variable willcontain the number of the observation that observation was matched with. If there are ties or you told teffects psmatchto usemultiple neighbors, then gen()will need to create multiple variables. Thus you supply the s tem of the variable name, and teffectspsmatchwill add suffixes as needed.
teffects psmatch (y) (t x1 x2), gen(match)
In this case each observation is only matched with one other, so gen(match) only creates match1. Referring to the e xample output,the match of observation 1 is observation 467 (which is why those two are listed).
Note that these observation numbers are only valid in the current sort order, so make sure you can recreate that order if needed. If
necessa ry, run:
gen ob=_n
and then:
sort ob
to restore the current sort order.
The predictcommand w ith the psoption creates two variables containing the propensity scores, or that observation's predictedprobability of being in either the control group or the treated group:
predict ps0 ps1, ps
Here ps0is the predicted probability of being in the control group (t=0) and ps1is the predicted probability of being in the treated grou
(t=1). Obse rvations 1 and 467 were matched because their propens ity scores are very similar.
The pooption creates variables containing the potential outcomes for each observation:
predict y0 y1, po
Because observation 1 is in the control group, y0contains its observed value of y. y1is the observed value of yfor observation 1's matobse rvation 467. The propensity score matching estimator assumes that if obse rvation 1 had been in the treated group its value of y wou
have been that of the observation in the trea ted group most similar to it (where "s imilarity" is measured by the difference in their propens
scores).
Observation 467 is in the treated group, so its value for y1is its observed value of ywhile its value for y0is the observed value of yforits match, obse rvation 781.
Running the predict command with no options gives the treatment effect itself:
-
8/10/2019 PSM in Stata Using Teffects
4/6
12/12/2014 Propensity Score Matching in Stata using teffects
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
predict te
The treatment effect is simply the difference be tween y1and y0. You could calculate the ATE yourself (but emphatically not its s tandarderror) with:
sum te
and the ATET with:
sum te if t
Regression on the "Matched Sample"
Another way to conceptualize propensity score matching is to think of it as choosing a sample from the control group that "matches" the
treatment group. Any differences between the treatment and matched control groups are then assumed to be a result of the treatment.
Note that this gives the average treatment effect on the treatedto calculate the ATE you'd create a sample of the treated group that
matches the controls. Mathematically this is all equivalent to using matching to estimate what an observation's outcome would have been
it had been in the other group, as described above.
Sometimes researchers then want to run regressions on the "matched sample," defined as the observations in the treated group plus the
observations in the control group which were matched to them. We will discuss how this can be done without passing judgement on the
appropriateness or usefulness of the technique.
psmatch2makes this easy by creating a _weightvariable automatically. For observations in the treated group, _weightis 1. Forobse rvations in the control group it is the number of observations from the treated group for which the observation is a match. If the
obse rvation is not a match,_weightis missing. _weightthus acts as a frequency weight (fweight) and can be used with Stata'sstandard weighting syntax. For example (starting with a clean slate again):
use http://ssc.wisc.edu/sscc/pubs/files/psm, replacepsmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
Observations with a missing value for _weightare omitted from the regression, so it is automatically limited to the matched sample.
teffects psmatchdoes not create a _weightvariable, but it is possible to create one based on the match1variable. Here isexample code, w ith comments:
gen ob=_n //store the observation numbers for future use
save fulldata,replace // save the complete data set
keep if t // keep just the treated group
keep match1 // keep just the match1 variable (the observation numbers of their matches)
bysort match1: gen weight=_N // count how many times each control observation is a match
by match1: keep if _n==1 // keep just one row per control observationren match1 ob //rename for merging purposes
merge 1:m ob using fulldata // merge back into the full data
replace weight=1 if t // set weight to 1 for treated observations
The resulting weightvariable will be identical to the _weightvariable created by psmatch2, as can be verified with:
assert weight==_weight
It is used in the same way and w ill give exactly the same results:
reg y x1 x2 t [fweight=weight]
Obviously this is a good bit more work than using psmatch2. If your propensity score matching model can be done using bothteffects psmatchand psmatch2, you may want to run teffects psmatchto get the correct standard error and then
psmatch2if you need a _weightvariable.
This regress ion has an N of 666, 333 from the treated group and 333 from the control group. However, it only uses 189 different
obse rvations from the control group. About 1/3 of them are the matches for more than one obse rvation from the treated group and a re th
duplicated in the regress ion (run tab weight if !tfor details). Researchers sometimes use the norepl(no replacement) optionpsmatch2to ensure each observation is used just once, even though this generally makes the matching worse. To the best of ourknowledge there is no equivalent with teffects psmatch.
The results of this regression leave somewhat to be desired:
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | 1.11891 .0440323 25.41 0.000 1.03245 1.205369 x2 | 1.05594 .0417253 25.31 0.000 .97401 1.13787
-
8/10/2019 PSM in Stata Using Teffects
5/6
12/12/2014 Propensity Score Matching in Stata using teffects
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
t | .9563751 .0802273 11.92 0.000 .7988445 1.113906 _cons | .0180986 .0632538 0.29 0.775 -.1061036 .1423008------------------------------------------------------------------------------
By construction all the coefficients should be 1. Regress ion using all the observations (reg y x1 x2 trather than reg y x1 x2 [fweight=weight]) does be tter in this case:
------------------------------------------------------------------------------ y | Coef. Std. Err. t P>|t| [95% Conf. Interval]-------------+---------------------------------------------------------------- x1 | 1.031167 .0346941 29.72 0.000 .9630853 1.099249 x2 | .9927759 .0333297 29.79 0.000 .9273715 1.05818 t | .9791484 .0769067 12.73 0.000 .8282306 1.130066 _cons | .0591595 .0416008 1.42 0.155 -.0224758 .1407948------------------------------------------------------------------------------
Other Methods of Estimating Treatment Effects
While propensity score matching is the most common method of estimating treatments effects at the SSCC, teffectsalso implementsRegression Adjustment (teffects ra), Inverse Probability Weighting (teffects ipw), Augmented Inverse Probability Weighting(teffects aipw), Inverse Probability Weighted Regression Adjustment (teffects ipwra), and Nearest Neighbor Matching(teffects nnmatch). The syntax is similar, though it varies whe ther you need to specify variables for the outcome model, thetreatment model, or both:
teffects ra (y x1 x2) (t)
teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)
teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)
Complete Example Code
The following is the complete code for the examples in this article.
clear all
use http://www.ssc.wisc.edu/sscc/pubs/files/psm
ttest y, by(t)
reg y x1 x2 t
psmatch2 t x1 x2, out(y)
teffects psmatch (y) (t x1 x2, probit), atet
teffects psmatch (y) (t x1 x2)
psmatch2 t x1 x2, out(y) logit ate
teffects psmatch (y) (t x1 x2), atet
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
teffects psmatch (y) (t x1 x2), gen(match)
predict ps0 ps1, ps
predict y0 y1, po
predict te
l if _n==1 | _n==467
use http://www.ssc.wisc.edu/sscc/pubs/files/psm, replace
psmatch2 t x1 x2, out(y) logit
reg y x1 x2 t [fweight=_weight]
gen ob=_n
save fulldata,replace
teffects psmatch (y) (t x1 x2), gen(match)
keep if t
keep match1
bysort match1: gen weight=_N
by match1: keep if _n==1
ren match1 ob
-
8/10/2019 PSM in Stata Using Teffects
6/6
12/12/2014 Propensity Score Matching in Stata using teffects
http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm#StandardErrors
merge 1:m ob using fulldata
replace weight=1 if t
assert weight==_weight
reg y x1 x2 t [fweight=weight]
reg y x1 x2 t
teffects ra (y x1 x2) (t)
teffects ipw (y) (t x1 x2)
teffects aipw (y x1 x2) (t x1 x2)teffects ipwra (y x1 x2) (t x1 x2)
teffects nnmatch (y x1 x2) (t)
Last Revised: 11/13/2013
2009-2014 UW Board of Regents, University of Wiscons in - Madison
http://www.wisc.edu/