introducing the overlap weights in causal inferencefl35/ow/jasa_talk.pdf · 2018. 11. 11. ·...

Post on 23-Jan-2021

1 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Introducing the Overlap Weights inCausal Inference

Fan Li

Department of Statistical ScienceDuke University

November 12, 2018

Introduction

I Population-based observational data increasingly used forcausal inference

I Essential for causal comparisons: Balancing covariatedistributions across groups to remove confounding

I One common approach is weighting

I Main idea: weigh the treatment and control groups tocreate a pseudo-population—the target population—wherethe two groups are balanced, in expectation

I The dominant weighting approach: inverse probabilityweighting (IPW), originated from the Horvitz-Thompsonestimator in survey

Example: Framingham Heart Study(Thomas, Lorenzi, et al. 2018)

I Goal: evaluate the effect of statins on health outcomes

I Patients: cross-sectional population from the offspringcohort with a visit 6 (1995-1998)

I Treatment: statin use at visit 6 vs. no statin use

I Outcomes: CV death, myocardial infarction (MI), stroke

I Confounders: sex, age, body mass index, diabetes,history of MI, history of PAD, history of stroke...

I Significant imbalance between treatment and controlgroups in covariates

Standard Setup

I Data: a random sample of N units from a population.I Treatment status Zi(= 0,1) and covariates Xi = (Xi1, ...Xip)

are observed.I For each unit i , two potential outcomes (Yi(0),Yi(1)), but

only Yi(Zi) is observed.

I Estimand: Average Treatment Effect (ATE)

τ ATE = E[Y (1)− Y (0)]

I Assuming strong ignorability:(i) Pr(Z = 1|Y (1),Y (0),X ) = Pr(Z = 1|X )(ii)0 < Pr(Z = 1|X ) < 1 for all units

I Then ATE is identified from the observed data:E(Y (z)|X ) = E(Y |X ,Z = z) for z = 0,1

Inverse Probability Weights (IPW)

I The propensity score: e(X ) = Pr(Z = 1 | X )

I Base of IPW

E[

ZYe(X )

− (1− Z )Y1− e(X )

]= τ ATE.

I Inverse probability weights:{w1(Xi) = 1

e(Xi ), for Zi = 1

w0(Xi) = 11−e(Xi )

, for Zi = 0.

I IPW balances, in expectation, the weighted distribution ofcovariates in the two groups

I An unbiased nonparametric estimator of ATE is thedifference in the mean of the weighted outcomes betweengroups

IPW: Conceptual Challenges(Thomas, Li, Pencina, 2018)

I Target population of IPW: the “whole” population – thecombined treatment and control groups

I Key but often forgotten question: what population does thestudy sample is representative of?

I In observational studies, the study sample is often aconvenience sample – does not represent any naturalpopulation of scientific interest

I Applying IPW to such a sample does NOT lead to an ATEon any meaningful population

I IPW (equivalently ATE) may correspond to the effect of aninfeasible intervention

IPW: Operational Challenges

I Prone to adverse finite-sample consequences due toextreme probabilities (0 or 1) – Basu’s elephant: severebias and variance

I Normalization of weights helps, but not a lot

I Common remedy is trimming (remove extremepropensities): ad hoc, sensitive to cut off points,ambiguous target population

I Core problem: lack of overlap in the tail of the propensitydistribution – causal comparisons of these units are highlyuncertain

Weighting Beyond IPW

I Two main points of this talk1. Provide a unified framework—the balancing weights—to

allow different user-specified target populations

2. Propose a new weighting scheme—the overlapweighting—to capture the “overlapped population” andpossess statistical optimality

A General Framework: Defining Estimands(Li, Morgan, Zaslavsky, 2018)

I Define conditional average treatment effect (CATE)

τ(x) ≡ E(Y (1)|X = x)− E(Y (0)|X = x).

I Assume density of the covariates of the sample, f (x),exists wrt a base measure µ

I Target population density: g(x) = f (x)h(x), withpre-specified h(·)

I Estimand: average τ(x) over the target population g(x)

τh ≡∫τ(x)f (x)h(x)µ(dx)∫

f (x)h(x)µ(dx). (1)

I τh represents a general class of weighted ATE (WATE)estimands.

Balancing weights

I Let fz(x) = Pr(X = x |Z = z), we have

f1(x) ∝ f (x)e(x), f0(x) ∝ f (x)(1− e(x))

I For a given h(x), to estimate τh, we can weight fz(x) to thetarget population using weights w1(x) ∝ f (x)h(x)

f1(x)= f (x)h(x)

f (x)e(x) = h(x)e(x) ,

w0(x) ∝ f (x)h(x)f0(x)

= f (x)h(x)f (x)(1−e(x)) = h(x)

1−e(x) .(2)

I We call the class of weights (w0,w1) balancing weights:they balance the distributions of the weighted covariatesbetween comparison groups.

Examples: target population (h) and balancing weights

I Choice of h(x) determines the target population, estimand,weights.

I Statistical, scientific and policy considerations all come intoplay in specifying h(x).

target population h(x) estimand weight (w1,w0)

combined 1 ATE(

1e(x) ,

11−e(x)

)[IPW]

treated e(x) ATT(

1, e(x)1−e(x)

)control 1− e(x) ATC

(1−e(x)

e(x) , 1)

overlap e(x)(1− e(x)) ATO (1− e(x), e(x))trunc combined 1(α < e(x) < 1− α)

(1(α<e(x)<1−α)

e(x) ,1(α<e(x)<1−α)

1−e(x)

)matching min{e(x), 1− e(x)}

(min{e(x),1−e(x)}

e(x) ,min{e(x),1−e(x)}

1−e(x)

)

Large-sample properties

I Sample estimator of WATE

τh =

∑i w1(xi)ZiYi∑

i w1(xi)Zi−∑

i w0(xi)(1− Zi)Yi∑i w0(xi)(1− Zi)

(3)

I Theorem 1. τh is a consistent estimator of τh.

Large-sample properties

Theorem 2. As n→∞, the expectation (over possible samplesof covariate values) of the conditional variance of the estimatorτh given the sample X = {x1, . . . , xn} converges:

n · Ex V[τh | X ] →∫

f (x)h(x)2[

v1(x)

e(x)+

v0(x)

1− e(x)

]µ(dx)

/C2

h ,

where vz(x) = V[Y (z) | X ] and Ch =∫

h(x)f (x)dµ(x) is anormalizing constant.

Corollary 1. The function h(x) ∝ e(x)(1− e(x)) gives thesmallest asymptotic variance for the weighted estimator τhamong all h’s under homoscedasticity, and as n→∞,

n ·minh{V[τh]} → v/∫

f (x)e(x)(1− e(x))µ(dx).

Overlap Weights

I Based on Corollary 1, we propose a new type of weights,the overlap weights, by letting h(x) = e(x)(1− e(x)),{

w1(x) ∝ 1− e(x), for Z = 1,w0(x) ∝ e(x), for Z = 0.

I Each unit is weighted by its probability of being assigned tothe opposite group

I Target population f (x)e(x)(1− e(x)) is defined by overlapof covariates

I Target population: the units whose characteristics couldappear with substantial probability in either treatmentgroup (most overlap)

Overlap Weights: Exact Balance

Theorem 3. When the propensity scores are estimated bymaximum likelihood under a logistic regression model,

logit{e(xi)} = β0 + x ′i β,

the overlap weights lead to exact balance in the means of anyincluded covariate between treatment and control groups:∑

i xijZi(1− ei)∑i Zi(1− ei)

=

∑i xij(1− Zi)ei∑

i(1− Zi)ei, for j = 1, . . . ,p, (4)

where ei = {1 + exp[−(β0 + x ′i β)]}−1 and β = (β1, ..., βj) is theMLE for the regression coefficients.

I Remark: the exact balance property applies to anyincluded covariate and derived covariate, including highorder terms and interaction terms of the covariates

Overlap Weights: Exact Balance in Subgroups(Lorenzi, Thomas, Li, 2018)

Corollary 2. If the postulated propensity score model includesany interaction term of a binary covariate, then the overlapweights lead to exact balance in the means in the subgroupsdefined by that binary covariate.

Remarks:I If the true PS model has interaction terms, then overlap

weights using PS estimated from any model that nests thetrue model gives exact balance in the subgroups definedby the interaction terms

Overlap Weights: Statistical Advantages

I Minimum variance of the nonparametric estimator amongall balancing weights

I Exact balance for means of included covariates in logisticpropensity score model

I Weights are bounded (unlike IPW)

I Avoids ad hoc eliminating cases: continuously down-weighunits in the tail

Overlap Weights: Statistical Advantages

I Overlap weights are adaptive, ATO approximate

I ATE: if treatment and control groups are nearly balanced insize and distribution (for e(x) ≈ 1/2,(1− e(x),e(x)) ≈

(.25e(x) ,

.251−e(x)

))

I ATT: if propensity to treatment is always small (for e(x) ≈ 0,(1− e(x),e(x)) ≈

(1, e(x)

1−e(x)

))

I ATC: if propensity to control is small

Overlap Weights: Scientific Relevance

I Overlap weights focus on the (sub)population closest tothe population in a randomized clinical trial

I Overlap weights put emphasis on internal validity

I The overlap population is of intrinsic substantive interest,for example

I In medicine, patients in clinical equipoise

I In policy, units whose treatment assignment would be mostresponsive to a policy shift as new information is obtained

I Better transportability than ATE: always focus on the mostoverlapped population regardless of what the study sampleis representative of

Variance Estimator(Li, Thomas, Li, 2018)

I A sandwich variance estimator for τh using the OW whenthe PS is estimated from a logistic regression

V(τow ) =

∑ni=1 ψ

2i{∑n

i=1 ei(1− ei)}2 ,

where

ψi = Zi(Yi − τ1)(1− ei)− (1− Zi)(Yi − τ0)ei − (Zi − ei)H ′βE−1ββ xi ,

and

Hβ = n−1n∑

i=1

{Zi(Yi − τ1)− (1− Zi)(Yi − τ0)}ei(1− ei)xi ,

and E−1ββ is the information matrix.

Variance Estimator and Finite Sample Performance(Li, Thomas, Li, 2018)

I Based on M-estimation, account for the uncertainty inestimating the propensity scores

I The last subtraction in ψi is an orthogonal projection termthat accounts for the uncertainty in estimating thepropensity scores, i.e., ψi = ψi − Π(ψi |Λ).

I Finite sample performance: OW consistently beats IPWand IPW with trimming (Crump et al. 2009; Sturmer et al.2010) across a wide range of simulation scenarios

Connection to Matching

I Matching: link “similar" cases in two samples, discardunmatched cases (bottom-up approach)

I Weighting: apply weights to entire samples, designed tocreate global balance (top-down approach)

I Intrinsic connection: Overlap weighting approachesmany-to-many matching as the propensity score modelbecomes increasingly complex.

I The limit is a saturated model with a fixed effect for eachdesign point.

I The nonparametric weighted estimate τh using the overlapweights is the same as that from a LS model for theoutcome with a fixed effect for each design point.

Combine Matching and Weighting: The Tudor Solution

A hybrid approach to combine the benefits of matching andoverlap weighting

1. Obtain a matched sample using any preferred approach(e.g., Mahalanobis distance)

2. Estimate the propensity scores a logistic regression with allmain effects within the matched sample

3. Apply the overlap weights to the matched sample toestimate the treatment effect

Retain the nearness of matched cases in multivariate space,and adjust for residual imbalance in matching via overlapweighting

A Simulated ExampleI Simulate n0 = n1 = 1000 units.I A single covariate: Xi ∼ N(0,1) + 2Zi .I Outcome model: Yi(z) ∼ N(Xi ,1) + τz, and τ = 1.I Use the nonparametric estimator τh with different weights

Figure: Original covariate distributions within each treatment group,and weighted covariate distributions with overlap, HT, ATT weights.

Z=0Z=1

Unweighted

X

Den

sity

−2 0 2 4

Overlap

X

Den

sity

−2 0 2 4

HT

X

Den

sity

−2 0 2 4

ATT

X

Den

sity

−2 0 2 4

Unweighted Overlap IPW ATTτ 2.945 1.000 0.581 0.640

SE(τ) 0.054 0.038 0.386 0.402

Framingham revisited

Weighted Distribution

Weighted Distribution

Results: CV death

Figure: IPW 1: No trimming; IPW 2: trimming ps between (.10, 0.90);IPW 3: asymmetric trimming 5th% ps of trt, 95th% of ps for control

Results: composite of non-death endpoints

Figure: IPW 1: No trimming; IPW 2: trimming ps between (.10, 0.90);IPW 3: asymmetric trimming 5th% ps of trt, 95th% of ps for control

Conclusion and extension

I We proposed a unified framework of balancing weights tobalance covariates for any target population

I We proposed the overlap weights: efficiency and exactbalance

I Takeaways: (1) Important to consider scientific appropriatetarget population in practice; (2) should not automaticallyfocus on IPW (ATE)

I ExtensionsI multiple treatments/groups (Li and Li, 2018)I time-varying treatmentsI variance reduction in randomized experimentsI clustering structure

Acknowledgements

I Alan Zaslavsky (Harvard)I Laine Thomas (Duke)I Frank Li (Duke)I Elizabeth Lorenzi (Duke)I Michael Pencina (Duke)I Kari Lock Morgan (Penn State)

References

1. Li, F, Morgan, LK, and Zaslavsky, AM. (2018). Balancing covariates via propensityscore weighting. Journal of the American Statistical Association. 113(521), 390-400.

2. Li, F, Thomas, LE, and Li, F. (2018). Addressing extreme propensity scores via theoverlap weights. American Journal of Epidemiology. Forthcoming.

3. Thomas, LE, Li, F, Pencina, M. (2018). Propensity score weighting methods incomparative effectiveness research. Journal of American Medical Association. underrevision.

4. Li, F, Li, F. (2018). Propensity score weighting for causal inference with multipletreatments. arXiv: 1808.05339

5. Lorenzi, E, Thomas, LE, Li, F. (2018). Balance covariates in subgroups via theoverlap weights. In preparation.

6. Thomas, LE, Lorenzi, L, Navar, AM, Pencina, M, Petersen, E, and Li, F. (2018).Overlap Weights for Estimating Treatment Effects from Observational Studies: Effectsof Statins for Cardiac Patients from Framingham Heart Study.

top related