analyzing observational data: focus on propensity scores
DESCRIPTION
Analyzing Observational Data: Focus on Propensity Scores. Arlene Ash. QMC - Third Tuesday September 21, 2010 (as amended, Sept 23). The Problem. Those with the intervention and those without have markedly different values for important measured risk factors & - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/1.jpg)
1
Arlene Ash
QMC - Third TuesdaySeptember 21, 2010
(as amended, Sept 23)
Analyzing Observational Data: Focus on Propensity Scores
![Page 2: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/2.jpg)
2
The Problem
• Those with the intervention and those without have markedly different values for important measured risk factors &
• Outcome is related to the risk factors that are imbalanced between the groups &
• It is not clear how the risk factors and outcome are related
• Why may standard analyses be misleading?
![Page 3: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/3.jpg)
3
0
0.2
0.4
0.6
0.8
1.0
0 0.5 1.0 1.5 2.0
Risk
Out
com
eTrue and Modeled Relationship
Between Risk and Outcome
![Page 4: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/4.jpg)
4
Is Imbalance in Risk a Problem?
• If we correctly model the relationship between risk factors and outcome, we correctly estimate effect of the intervention
• With many risk factors, hard to know if the relationship between risk factors and outcome is correctly modeled
• Propensity score - a way to reduce the effect of imbalance in measured risk when models may be inadequate
![Page 5: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/5.jpg)
5
Propensity Score Method (Key Idea)
• The propensity score (PS) for an observation is the probability that the observation is “exposed” or “got the intervention”
• Use the PS model in pre-processing the data– To draw a sub-sample where the exposed and non-
exposed groups are fairly balanced on risk factors. Then
– Use standard techniques to analyze the sub-sample
![Page 6: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/6.jpg)
6
Simple Propensity Score Approach
• Estimate a model to predict the “probability of intervention/exposure” – This is “the propensity score”
• Divide the population into PS quintiles• Create a subsample by taking equal numbers of
exposed and unexposed observations from each quintile• Use a subsequent regression model to estimate the
effect of the intervention in the subsample
![Page 7: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/7.jpg)
7
Propensity Score Sampling Example
PS Quintile # Cases # Controls # Sampled
Lowest 12 81 24
2nd 30 67 60
Middle 44 38 76
4th 53 15 30
Highest 78 8 16
Total 217 209 206
![Page 8: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/8.jpg)
8
Propensity Score Sampling Example: Treatments for Drug Abusers
• Patients seeking substance abuse detoxification in Boston receive eitherResidential detoxification Lasts ~ one week + encouragement for post-
detox treatment, orAcupuncture Acute (daily) detox + 3-6 months of maintenance
with acupuncture and motivational counseling
![Page 9: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/9.jpg)
9
Data
• From Boston’s publicly-funded substance abuse treatment system
• All cases discharged from residential detox or acupuncture between 1/93 and 9/94
• Client classified (only once) as residential or acupuncture based on the modality of first discharge
![Page 10: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/10.jpg)
10
Outcome
• Is client re-admitted to detox within 6 months? (Y/N)
• Study question: Are acupuncture clients more likely to be re-admitted than residential detox clients?– Exposure = assigned to accupuncture
![Page 11: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/11.jpg)
11
Client Characteristics Available At Time Of Admission
• Gender• Race/ethnicity• Age• Education• Employment status• Income• Health insurance status
• Living situation• Prior mental health treatment • Primary drug• Substance abuse treatment history
![Page 12: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/12.jpg)
12
Residential Detox & Acupuncture Cases:% with Various Characteristics
CharacteristicResidential (n = 6,907)
Acupuncture(n = 1,104)
Gender: female 29 33
Race/ethnicity: black 46 46
Hispanic 12 10White 41 43Education: HS grad 56 59College graduate 4 13
![Page 13: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/13.jpg)
13
Employment: unemployed 86.8 43.2Insurance: uninsured 65.4 52.3
Medicaid 28.2 21.2
Private insurance 3.0 15.4
Lives: with child 9.5 19.3
In shelter 30.3 2.9
CharacteristicResidential (n = 6,907)
Acupuncture(n = 1,104)
Characteristics of Residential Detox & Acupuncture Clients (2)
![Page 14: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/14.jpg)
14
Prior mental health treatment 12.3 27.8
Primary drug: alcohol 42.3 32.4
Cocaine 16.2 16.6
Crack 15.9 20.2
Heroin 24.6 19.0
CharacteristicResidential (n = 6,907)
Acupuncture(n = 1,104)
Characteristics of Residential Detox & Acupuncture Clients (3)
![Page 15: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/15.jpg)
15
Substance abuse admits in the last yearResidential detox: 0
12+
Short-term residential: 0Long-term residential: 0Outpatient: NoneAcupuncture: None
56.720.223.176.280.580.695.9
81.012.17.0
94.893.554.390.1
CharacteristicResidential (n = 6,907)
Acupuncture(n = 1,104)
Characteristics of Residential Detox & Acupuncture Clients (4)
![Page 16: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/16.jpg)
16
Results Of Standard Analysis
Percentage of clients re-admitted to detox within 6 months• Among 1,104 acupuncture cases, 18% re-admitted • Among 6,907 residential detox cases, 36% re-admitted• Raw odds ratio = 0.40From a multivariable stepwise logistic regression model:• Odds ratio for acupuncture: 0.71 (CI = 0.53-0.95)
![Page 17: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/17.jpg)
17
What’s the Worry? How Do We Address It?
• Given how different the two groups are, can we trust a model to correctly estimate the effect of acupuncture?
• PS methods generalize (long-standing) matching-within-strata methods that work well with 1 or 2 predictors
• PS can address imbalances in many important predictors simultaneously
• Both traditional and PS matching allow for – A pooled estimate (across all strata) or – When N is large enough, stratum-specific estimates
![Page 18: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/18.jpg)
18
Propensity Score Application
• Use stepwise logistic regression to build a model to predict whether a client “is exposed” (i.e., receives acupuncture)
• Select sub-samples of exposed and non-exposed with similar distributions of the “propensity score” (predicted probability of being exposed)
• Model (as before) on the sub-sample
![Page 19: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/19.jpg)
19
Sampling Results
• Able to match 740 who received acupuncture (out of 1,104)
with 740 people who did not (out of 6,907)
• The risk factors in this subsample of 1480 are much more balanced between the two groups
![Page 20: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/20.jpg)
20
Characteristic Residential Acupuncture
College graduateEmployedPrivate InsuranceLives with child or adultLives in shelterPrior mental health Rx
7% 41%
9% 72%
5% 21%
(4%)(13%)
(3%)(55%)(30%)(12%)
7% 42%
6% 77%
4% 21%
(13%)(57%)(15%)(76%)
(3%)(28%)
Characteristics of Clients in Subsample (vs. Full Sample)
![Page 21: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/21.jpg)
21
Comparing Standard and Propensity Score Findings
From the multivariable model fit to all cases:Odds Ratio for acupuncture: 0.7195% Confidence Interval: 0.53-
0.95From multivariable model fit to more comparable sub-
sample:OR for acupuncture: 0.6195% CI: 0.39-
0.94
![Page 22: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/22.jpg)
22
Summary
• In this case, results were similar - Why? Original model was very good (C-statistic = 0.96)• What we learned from the PS analysis:
–Could find a subset of (about 10% of) patients who got residential detox who look very similar to those who got acupuncture
–Skeptics were more receptive to findings from the PS analysis
![Page 23: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/23.jpg)
23
Which X’s Belong in the PS Model?
The goal is to estimate the effect of exposure E on outcome Y
• Confounders (Brookhart’s X1 variables)?– Directly affect both E and Y
• Simple predictors (X2 s)?– Affect Y but not E
• Simple selectors (X3 s)?– Affect E but not Y
![Page 24: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/24.jpg)
24
Example
The goal is to estimate the effect of E = CABG surgery onY = 30-day mortality following admission for a heart attack– Confounder (e.g., disease severity)– Simple predictors (e.g., home support)– Simple selectors, aka “instrumental variables”
(e.g., random assignment)
![Page 25: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/25.jpg)
25
Variable type Directly affectsBelongs in
which modelOutcome
(Y) Exposure
(E) PSSubsequent Regression
X1 Confounder 1 1 Yes Yes
X2 Predictor 1 0 ? Yes
X3 Selector 0 1 No ?
? = inclusion should neither harm nor help
![Page 26: Analyzing Observational Data: Focus on Propensity Scores](https://reader035.vdocuments.us/reader035/viewer/2022062501/5681671f550346895ddba191/html5/thumbnails/26.jpg)
26
Discussion• The “pre-processing” that occurs when sub-
sampling to create “PS-balanced” comparison groups protects against bias from confounding variables
• Putting selector variables in the PS model will hurt accuracy (by reducing the numbers of good matches) without making the groups more comparable
• Subsequent regression improves accuracy