propensity score matching for causal inference: possibilities, limitations, and an example
DESCRIPTION
Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example. sean f. reardon MAPSS colloquium March 6, 2007. Overview. the counterfactual model of causality matching estimators for causal inference conceptual logic assumptions propensity score matching - PowerPoint PPT PresentationTRANSCRIPT
Propensity Score Matching for Causal Inference:
Possibilities, Limitations, and an Example
sean f. reardonMAPSS colloquium
March 6, 2007
Overview
the counterfactual model of causality matching estimators for causal inference
conceptual logic assumptions propensity score matching advantages and limitations
an example what is the effect of attending Catholic school in elementary
school on math and reading skills?
Definition of an “Effect” The effect, , [on some outcome Y] [for some unit i]
[of some treatment condition t relative to some other condition c] is defined as the difference between the value of Y that would be observed if unit i were exposed to treatment t and the value of Y that would be observed if unit i were exposed to treatment c.
More formally, we define the effect of t relative to c on Y for unit i as:
We define the average effect of t relative to c in a population P as:
ci
tii YY
cP
tPP YY
The average effect is population specific
the average effect of t relative to c in a population P (ATE or ATP):
the average effect of t relative to c in the subpopulation TP who receive/choose the treatment (ATT):
the average effect of t relative to c in the subpopulation CP who receive/choose the treatment (ATC):
cT
tTT YYATT
cC
tCC YYATC
cP
tPP YYATP
Although both and are defined in principle, it is impossible to observe both of them for the same unit (because any given unit can be exposed to only one of t or c).
Thus, the causal effect i cannot be observed. The problem of causal inference is thus a problem
of missing data. The outcome Yi under its “counterfactual” condition is never observed.
How can we construct unbiased estimates of the average potential outcomes and under the counterfactual conditions?
The “Fundamental Problem of Causal Inference” (Holland, 1986) t
iY ciY
tCY c
TY
The missing counterfactuals
We can never observe the counterfactuals quantities and
So we can never directly observe the quantities we need to compute the ATP, ATT, or ATC
CTTT
cC
tCT
cT
tTT
cCT
cTT
tCT
tTT
cP
tPP
YYYY
YYYY
YY
1
1
11
tCY c
TY
Estimating the missing counterfactuals Under random assignment
to t and c, we estimate: assumes:
randomization
Using OLS, we estimate: assumes:
correct functional form valid extrapolation no confounding (treatment
assignment is ignorable, conditional on X
cC
cT
tT
tC
YY
YY
ˆ
ˆ
CTi
iT
cT
TCi
iC
tC
NY
NY
βX
βX
ˆ1ˆ
ˆ1ˆ
Estimating the missing counterfactuals Using matching, we assume that the potential outcomes are
independent of treatment assignment, conditional on a vector of covariates X: this means: and
We can then estimate the counterfactuals as:
and
x
x
xx
xx
|
|ˆ
tTC
tCC
tC
Y
YY
x
x
xx
xx
|
|ˆ
cCT
cTT
cT
Y
YY
xx || cC
cT YY xx || t
TtC YY
Ear
ning
s
Cognitive Skill
Observed Mean Earnings, HS Grads
Observed Mean Earnings, HS Dropouts
Earnings by High School Graduation Status, by Cognitive Skills
Ear
ning
s
Cognitive Skill
Observed Mean Earnings, HS Grads
Observed Mean Earnings, HS Dropouts
Imputed Mean Earnings, HS Grads
Imputed Mean Earnings, HS Dropouts
Earnings by High School Graduation Status, by Cognitive Skills
Ear
ning
s
Cognitive Skill
Observed Mean Earnings, HS Grads
Observed Mean Earnings, HS Dropouts
Region of Common Support
Earnings by High School Graduation Status, by Cognitive Skills
The conditional independence assumptions Conditional on x, treatment assignment is ignorable.
So we can obtain an unbiased estimate of at each x, and then average these over the population distribution of X to obtain
xx || cC
cT YY
xx || tT
tC YY
x
P̂
How well does matching work in practice? Compare experimental estimates of treatment effect to
matching estimates of same treatment effect (Lalonde, 1986)
Matching works well when X includes theoretically-relevant covariates, and when matches are drawn locally (from a population that is similar to the experimental population).
Texp
Cexp Cmatched
The “curse of dimensionality”
We can’t match exactly on a large vector of covariates without a really large sample K variables each with m values mK cells
Rosenbaum & Rubin (1983) show that matching on the propensity score is equivalent to matching
on the full vector X reduces the dimensionality of the matching if treatment assignment is strongly ignorable at a given
value of p, then comparison of the treatment and control means at p is an unbiased estimate of the treatment effect at p.
Matching as weighting
The ATT can be written as a weighted average of Tx (the treatment effect when X=x), weighted by the proportion of treated cases with X=x.
This leads to the following: Under the conditional
independence assumption,the counterfactual outcome is estimated by re-weightingthe control cases according tothe distribution of treatment cases
x
xx
x
x
xxx
xxx
xxx
xxxx
xxx
dYY
dYdY
dYY
d
cC
TtT
cT
TtT
T
cT
tT
T
TT
T
cTY
What’s so great about matching (over regression/covariate adjustment)? explicitly clarifies the region of common support does not rely on functional form and extrapolation (in principle) allows the researcher to design the
study while blind to the outcomes (avoid model fishing)
allows (partial) checks of the conditional independence assumptions
allows estimates of ATT & ATC as well as ATP
Limitations of matching estimators conditional independence assumptions are not fully
verifiable all relevant pre-treatment covariates are not always
available limits the population of inference (region of common
support) larger standard errors than covariate adjustment
Propensity score matching in practice1. hide the outcome data2. fit logit/probit model to predict
X is a vector of pre-treatment covariates correlated with both t and YX may include higher-order & interaction terms as neededX should include no instruments
3. check balance after matching; verify: refit model if inadequate balanceidentify region of common support and balance
4. estimate
ii tp x|Prˆ
pxt ˆ|
cpYtpYE ,ˆ|,ˆ|
What is the effect of Catholic schooling on elementary school student achievement? Early Childhood Longitudinal Study-Kindergarten Cohort
(ECLS-K) Observational longitudinal study 21,260 kindergarten students in 1,001 US schools in Fall, 1998 Subsample: 6,364 students
first-time kindergarten students urban or suburban areas remained in study for 6 years (K-5) English proficient in Fall of kindergarten year data available on covariates and outcomes enrolled in either public (n=5,320) or Catholic (n=1,044) schools
Tests in math and reading in Fall K and Spring K, 1, 3, & 5
Catholic-Public Matching
Fit propensity score model using vector of covariates potentially related to selection of public vs Catholic schooling primarily measures of socioeconomic status, income, and
parental preferences for education (as measured by child’s preschool & childcare exposure):
income, mother & father education, mother and father occupation, race, poverty status, welfare and public assistance receipt, birthdate, birthweight, childcare and preschool experience (type of child care, age began childcare, time in childcare, etc.)
(initially) do not match on Fall kindergarten scores
400
300
200
100
0
100S
ampl
e Fr
eque
ncy
0.00 0.20 0.40 0.60 0.80Estimated Propensity Score
catholic students (n=1,041)
public students (n=5,320)
public students used in matching (n=3,391)
Distribution of Catholic, Public, and Matched Samples
400
300
200
100
0
100S
ampl
e Fr
eque
ncy
0.00 0.20 0.40 0.60 0.80Estimated Propensity Score
catholic students (n=1,041)
public students (n=5,320)
re-weighted matched public sample
Distribution of Catholic, Public, and Matched Samples
-.5
-.25
0
.25
.5
.75
1
Ave
rage
Mat
h S
core
Diff
eren
ce (S
tand
ard
Dev
iatio
ns)
0.00 0.10 0.20 0.30 0.40 0.50Estimated Propensity Score
Fall KSpring 5
Estimated Catholic-Public Mean Math Score Difference,by Grade and Propensity Score
-.5
-.25
0
.25
.5
.75
1
Ave
rage
Mat
h S
core
Effe
ct (S
tand
ard
Dev
iatio
ns)
0.00 0.10 0.20 0.30 0.40 0.50Estimated Propensity Score
Estimated Catholic-Public Effect on Math Score Gain,by Propensity Score
-.5
-.25
0
.25
.5
.75
1
Ave
rage
Rea
ding
Sco
re D
iffer
ence
(Sta
ndar
d D
evia
tions
)
0.00 0.10 0.20 0.30 0.40 0.50Estimated Propensity Score
Fall KSpring 5
Estimated Catholic-Public Mean Reading Score Difference,by Grade and Propensity Score
-.5
-.25
0
.25
.5
.75
1
Ave
rage
Rea
ding
Sco
re E
ffect
(Sta
ndar
d D
evia
tions
)
0.00 0.10 0.20 0.30 0.40 0.50Estimated Propensity Score
Estimated Catholic-Public Effect on Reading Score Gain,by Propensity Score