sukyeong pi larry featherston employment and disability institute cornell university

28
Sukyeong Pi Larry Featherston Employment and Disability Institute Cornell University Feb. 21, 2009 www.edi.cornell.edu Causal Inference Using Observational Data

Upload: rollin

Post on 11-Jan-2016

30 views

Category:

Documents


4 download

DESCRIPTION

www.edi.cornell.edu. Causal Inference Using Observational Data. Sukyeong Pi Larry Featherston Employment and Disability Institute Cornell University Feb. 21, 2009. Agenda. Randomized Controlled Trial Observational Studies Propensity Score Matching Example Limitations of PSM. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Sukyeong Pi Larry Featherston

Employment and Disability InstituteCornell University

Feb. 21, 2009

www.edi.cornell.edu

Causal Inference Using Observational Data

Page 2: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Agenda

• Randomized Controlled Trial• Observational Studies• Propensity Score Matching• Example • Limitations of PSM

Page 3: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Randomized Controlled Trial (RCT)

• A research study in which the participants are randomly assigned groups to objectively compare different interventions

• RCT is recognized as a sound scientific method: Gold Standard for making causal inferences and making policy decisions

• Control for subject selection bias: Minimize subject differences between groups

Page 4: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Limitations of RCT

• Philosophical/Ethical Issue: Against the obligation to offer each student optimal treatment

• Strategic Issues: Requires time and specialized expertise, Generalizability Issue

• Tactical Issues: Issues of treatment fidelity and integrity

• Logistical Issues: Challenges finding adequate numbers of subjects, Expensive requiring substantial resources

Page 5: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Advantages of Observational Studies

• Address chief criticism of RCTs: Genealizability

• Availability, Cost, Time

• Serve as a rich source of descriptive information

• Examine exposure in real life Policy decisions possible

• Large sizes permit investigation of exposures with smaller effect sizes

Page 6: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Observational Studies• Selection Bias: No control for group assignment

(Ignorability of treatment assignment)

• Baseline characteristics of comparison groups are different in ways that affect the outcome due to observed or unobserved confounders.

• One approach to remove the bias in nonrandomized experiments is propensity score matching.

A

B

tx DV

DVctl?

Page 7: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Propensity Score Matching

• Definition: The conditional probability (0 to 1) of receiving a given exposure (treatment) given a vector of measured or observed covariates.

• Assumption of RCT: the probability to be assigned to treatment group is 0.5

• PS reduces baseline information to a single composite summary of the covariates, thus minimizing differences and improving comparability between two groups in observational research

Page 8: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Procedures of Propensity Score Analysis

1. Estimate propensity for treatment given covariates using Logistic Regression method: Save predicted value

e (x) = β0 + β1X1i + β2X2i +… + βnXni + ei

Propensity Score = e(x) / {1+e(x)}

2. Balance checkCompare propensity scores between Tx and Ctl groups

3. Estimate effect of treatment on outcome using PSa. Regression Modelb. Stratificationc. Matching

Page 9: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

EXAMPLE

• Research Question: What is the effect of VR services? (LR found top three services related to successful VR outcome: On the Job Support, Rh Tech, Job Placement)

• Data Source: 2006 RSA 911 data (including consumers closed after IPE developed; N=352,138)

• IVs: Gender, Race/Ethnicity, Level of Education, Work Status at Application, Primary Source of Support, SSI/DI, Type of Disability

• Intervention (tx): Types of Services

• Outcome: Type of Closure

Page 10: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 0: Data Set-up

• Variable Selection by crosstabulation of covariates and type of closure (outcome)

• Covariates for this example (dummy var.)- Gender (2)- Race/Ethnicity (3): White no Hispanic, African, others- Education (3): <12 yr, 12 yrs (incl. SE cert), >12 yrs- Work Status at App (3): Emp wo sup, Other emp, No emp- Source of Support (2): Personal Income, Others- SSI/DI (2): Y/N- Disability (5): Sensory, All Mental with SA, LD/ADHD,

MR/Autism, Others

Page 11: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 1: Propensity Scores

• Goal: to include all variables that play a role in the selection process, including interactions and other nonlinear terms and variables that show weak relations to outcome (e.g., p<.10 or p<.25) (Rosenbaum & Rubin,1984)

“Unless a variable can be excluded because there is a consensus that it is unrelated to outcome or is not a proper covariate, it is advisable to include it in the propensity score model even if it is not statistically significant.” (Rubin & Thomas,1984)

• In the example, all variables were included for PS computation

Page 12: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 1: PS by Stepwise LRJob Placement Services  B S.E. Wald Sig. Exp(B)

White 0.084 0.012 53.172 0.0000 1.088

African Am 0.204 0.013 247.599 0.0000 1.227

HS Diploma 0.084 0.009 88.717 0.0000 1.087

College+ 0.05 0.011 22.45 0.0000 1.051

Employment wo Support at app -0.486 0.014 1189.201 0.0000 0.615

All other employment at app -0.287 0.021 184.183 0.0000 0.751

SSD/I 0.16 0.008 362.081 0.0000 1.173

Personal Income at app -0.182 0.015 153.565 0.0000 0.833

Sensory Disab -0.362 0.014 707.399 0.0000 0.696

Mental Disab 0.323 0.009 1210.793 0.0000 1.381

LD/ADHD 0.324 0.012 678.563 0.0000 1.383

MR/Autism 0.56 0.013 1837.416 0.0000 1.751

Gender_Male 0.066 0.007 81.003 0.0000 1.069

Constant -1.011 0.014 4874.301 0.0000 0.364

Page 13: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 2: Balance Check

• Compare two groups in their distributions using descriptive statistics and t-tests

• Box plot graph illustrates some overlaps (similar characteristic band of propensity scores) between two groups

• No overlap indicates that the differences in outcome was drawn from group differences (Selection Bias), not from the service effect (e.g., rehab tech services)???

Page 14: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 2: Check Distribution/Balance

 Propensity Score Ctl Tx

N 236731 115407

Mean 0.316 0.351

Median 0.724 0.243

Mode 0.332 0.365

Std. Deviation 0.315 0.388

Minimum 0.092 0.076

Maximum 0.115 0.115

Quartiles 25 0.252 0.301

50 0.332 0.365

75 0.388 0.403

Page 15: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 2: Check Distribution/Balance

 Pre adjustment Ctl Tx

N 236731 115407

Mean 0.316 0.351

Median 0.724 0.243

Mode 0.332 0.365

Std. Deviation 0.315 0.388

Minimum 0.092 0.076

Maximum 0.115 0.115

Quartiles 25 0.252 0.301

50 0.332 0.365

75 0.388 0.403

After Adjust. Ctl Tx

N 82040 44379

Mean 0.348 0.350

Median 0.353 0.353

Mode 0.315 0.315

Std. Deviation 0.023 0.023

Minimum 0.301 0.301

Maximum 0.388 0.388

Quartiles 25 0.328 0.331

50 0.353 0.353

75 0.369 0.369

Page 16: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 2: Balance Check

Job Placement Services Propensity Scores

Pre Means Means After Adj. T-test

No Svcs Svcs No Svcs Svcs Pre Post

dum_gender 0.533 0.557 0.552 0.542 -13.777* 3.429*

dum_white 0.664 0.633 0.670 0.658 18.326* 4.096*

dum_black 0.209 0.252 0.178 0.194 -28.049* -6.875*

dum_hsdiploma_12 year ed 0.428 0.452 0.369 0.363 -13.297* 2.090

dum_college+ 0.289 0.250 0.302 0.296 24.656* 2.388*

dum_emp wo support at app 0.217 0.111 0.021 0.028 84.518* -7.793*

dum_other employment at app 0.041 0.030 0.022 0.028 16.061* -6.925*

dum_ssi or ssdi 0.266 0.324 0.246 0.246 -35.110* 0.031

dum_persona income at app 0.201 0.105 0.042 0.051 78.297* -6.973 *

dummy_sensory disab 0.173 0.085 0.000 0.000 78.064* N/A

dummy_mental disab 0.299 0.365 0.357 0.381 -38.424* -8.447 *

dummy_LD_ADHD 0.126 0.144 0.204 0.202 -14.897* 0.982

dummy_MR_Autism 0.088 0.143 0.019 0.030 -46.195* -11.147*

Page 17: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 2: Check Distribution/Balance

Pre-Adj  Employment outcome

Services initiated, not employed

Not received 125728 111003

53.1% 46.9%

Received 80063 35344

69.4% 30.6%

Total 205791 146347

58.4% 41.6%

After Adj. 

Employment outcome

Services initiated, not employed

Not received 37355 44685

45.5% 54.5%

Received 30439 13940

68.6% 31.4%

Total 67794 58625

53.6% 46.4%

Page 18: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 2: Check Distribution/Balance

 Propensity Score Ctl Tx

N 321402 30736

Mean 0.072 0.243

Median 0.027 0.256

Mode 0.063 0.452

Std. Deviation .101 .157

Minimum .005 .005

Maximum .630 .630

Quartiles 25 0.016 0.094

50 0.027 0.256

75 0.080 0.363

Page 19: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis with PS

• Three techniques are commonly used to reduce selection bias and increase precision with PS

- Regression (covariance) adjustment

- Stratification

- Matching

Page 20: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis I - Regression

• Treat the PS as an additional covariate in multivariable regression model

• As a composite of confounders, PS can reduce bias in the estimate of the treatment effect by adjusting for the pattern of observed confounders.

• Treatment effect appears more efficient when using PS as a covariate after stratification within the strata

Page 21: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis II - Stratification

• Solution for the problem of dimensionality to make two groups comparable (2k subclasses needed for k covariates)

• PS as a scalar summary of all the observed background covariates, stratification can balance the distributions of the covariates

• Five strata based on the PS will remove over 90% of the bias in each of the covariates (Cochran, 1968)

Page 22: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis II - Stratification

Job Placement Assistance Services WO JOB PLCT W JOB PLCT

TotalsQuintiles Type of Closure Freq % Freq %

1 Employment outcome 44607 77.2 10075 79.1 70543

20.8%WO Emp outcome 13205 22.8 2656 20.9

2 Employment outcome 23232 51.7 14299 70.9 65084 19.2%WO Emp outcome 21685 48.3 5868 29.1

3 Employment outcome 20131 43.5 16839 67.3 7132221.1%WO Emp outcome 26177 56.5 8175 32.7

4 Employment outcome 18779 47.2 17216 69.0 64740 19.1%WO Emp outcome 21018 52.8 7727 31

5 Employment outcome 14986 38.5 18773 66.7 67079 19.8%WO Emp outcome 23943 61.5 9377 33.3

Totals 227763 (67.2%) 111005 (32.8%) 338768

Page 23: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis II - Stratification

Page 24: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis – Stratification (26 closures)

77.279.1

51.7

70.9

43.5

67.3

47.2

69

38.5

66.7

0

10

20

30

40

50

60

70

80

Pe

rce

nta

ge

of

Su

cc

es

sfu

l C

los

ure

1 2 3 4 5

Quintiles of Propensity Score

W/O Job Placement W/ Job Plaement

Page 25: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis III - Matching

• Nearest available matching on the estimated PS

• Mahalanobis metric matching including the PS:

- An equal percent bias reducing technique (mean for the

treated minus the mean for the control)

- Add PS to other covariates in the calculation of the

Mahalanobis distance

• Nearest available Mahalanobis metric matching within calipers defined by the PS within a caliper of ¼ of the standard deviation of the propensity score

Page 26: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Step 3: Analysis III - Matching

Using the key variable of PS, matching was conducted

(based on the same PS). Matched cases N=114,790

  Employment outcome

No Employment

outcome Total

No Job Placement Services

72543 42247 114790

63.2% 36.8% 100.0%

Job Placement Services Received

79808 34982 114790

69.5% 30.5% 100.0%

Total 152351 77229 229580

66.4% 33.6% 100.0%

Page 27: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Interpretation

What do you think?

Do you think PS gives better ideas

to make a causal inference?

Page 28: Sukyeong Pi  Larry Featherston Employment and Disability Institute Cornell University

Limitations of PSM

• With only observed covariates; No control for unobserved

(e.g., age for this example)

• Inspection of the overlap between conditions before matching or other techniques: Group overlap must be substantial (e.g., rehab tech svcs)

• Best with large samples