blending propensity score matching and logistic regression in support service evaluations terrence...

25
BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE SAN DIEGO, NOVEMBER 2014

Upload: janessa-titcomb

Post on 14-Dec-2015

220 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS

TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN

CAIR CONFERENCE

SAN DIEGO, NOVEMBER 2014

Page 2: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE
Page 3: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE
Page 4: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE
Page 5: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

OUTCOMES

• Describe purpose of regression and propensity score matching (PSM)

• Explain data requirements and procedures for regression and PSM

• Compare and contrast regression and PSM

• Identify additional resources for further exploration

Page 6: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

WHY CAUSAL INFERENCE

• If you need to use statistics, then you should design a better experiment. –attributed to Rutherford

• Most education research is observational/correlational, not experimental.

Page 7: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

COMMON SCENARIO

• Students self-selected to participate and/or were recruited to participate.

• Differences between participants and non-participants, can reasonably be attributed to differences in background variables or motivation.

• Can we determine if the participation caused a change in outcomes?

• No, but…

Research question: Did participation in “intervention X” result in better outcomes for students than would have happened had they not participated?

Page 8: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

REGRESSION …+

• Classic correlational technique

• Covariates used in model to attempt to control for differences in background variables or motivation

• Background variables can include measures of or proxies for skill level, social capital, or socio-economic status

• Measures of self-motivation often unavailable

• Models are imperfect and generally must be combined with other evidence to more completely describe the possible influence of an intervention, program or strategy

Page 9: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

PROPENSITY SCORE MATCHING

• One of several ways to create a matched comparison group of non-participants intended to be similar to participant group for a valid comparison

• Logistic regression or other techniques used to create a score indicating the likelihood that a particular non-participant would have been a participant based on similarity to one or more participants

• Resolves issue of matching on many dimensions

Page 10: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

THE COUNTERFACTUAL (POTENTIAL OUTCOMES) FRAMEWORKFOR PSM

• Based on counterfactual theories of causation, which is a set of conceptual tools for analyzing causal claims

• Originated Early 70’s

• Introduced by philosopher David Lewis

• Subsequently taken up by scientists and extended (Statistician Donald Rubin at Harvard in the 80’s, and my others since)

In the following slides I will:1)Introduce some of the notation developed

in this area2)Explain the logic of PSM 3)Alternative criteria for deciding between

PSM/Reg.

Page 11: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

TO START….

What does it mean for a program to make a difference in someone’s outcome?

Easy:

is the potential outcome under treatment.

(Treatment = participation = choosing to do x = ….)

is the potential outcome under control (non-treatment).

However, there is a problem here….

is counterfactual. It is not observed in our universe*.*Recently published evidence for inflationary theories of the cosmos suggests we may be living in a multiverse (Alan Guth, 2014). Counterfactual conditions may obtain in alternate universes. I am not going to address the possibility of information transfer between alternate realities, but that possibility is being taken seriously by some.

Page 12: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

THE POTENTIAL OUTCOMES MATRIX

Potential Outcome

Actual Treatment

Status(T)

1   

0   

  = observable

  = not observable

Page 13: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

CAUSAL EFFECTS AT THE PROGRAM AND POPULATION LEVELS

• Our focus is usually not on individuals, but just about always on the aggregate effect – the average effect of a program on the outcomes of groups of individuals or populations.

Page 14: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

AVERAGE TREATMENT EFFECT (ATE)• BRUTE FACTS:

• Participants and non-participants differ systematically (w.r.t. demographics, trajectories, risk profiles, self-selection, etc.)

• Different people respond differently to treatment (differential response)

• These facts must be taken into account when modeling/ computing treatment effects.

• This means all four cells of the matrix must be estimated in order to obtain an average treatment effect!

• How?

• Here comes the assumptions….

Page 15: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

CONDITIONAL INDEPENDENCE(PERFECT STRATIFICATION)(SELECTION ON OBSERVABLES)• If our observations include information on every

one of the variables influencing likelihood of participation or differential responses*, then it is possible to avoid omitted variable bias and so achieve CI (PS, SOO). And if we have CI, this is like randomized assignment ….

* How often do we encounter such datasets in institutional research?

Page 16: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

SYMBOLIC DERIVATION OF AVERAGE EFFECTS

ATE

ATTATU

Page 17: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

AND NOW WE PLAY THE MATCHING GAME. ALSO KNOW AS….

Potential Outcome

Actual Treatment

Status(T)

 

  = observable

  = not observable

Y(t)

Y(c)

May I please borrow your outcome?

Page 18: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

ESTIMATING TREATMENT EFFECTS

Potential Outcome

Actual Treatment

Status(T)

1  A B 

0 C  D

ATE = (A + B) – (C + D)

ATT = (A – B)

ATU = (C – D)

Page 19: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

AVERAGE TREATMENT EFFECT (ATE)• PSM offers a way to plug values into (5), (6) and

(7) to obtain unbiased estimates.

• However, if we have CI, then why not just something like….

?

(and many other regression techniques)

There is no firm and fast answer to this question. Decisions are based on pragmatic considerations.

However, there may revisit this question later….

Page 20: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

EXAMPLE

Page 21: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

PROS AND CONS• Regressions can be “easier” to run but harder to explain to a general

audience

• PSM can be more time consuming to conduct but easier to explain to a general audience

• Regressions tend to perform better with large data sets while PSM tends to perform better with few observations provided the non-participant group has sufficient numbers of individuals with the key confounding variables

• Regressions have been used for many years and are well described mathematically with broad consensus on proper error terms

• PSM is newer and there is not consensus on optimal matching procedures or proper error terms

• Regression will use all cases with non-missing data while PSM may only a subset of cases from the pool of non-participants

• All analytic methods suffer if key variables are not available

• Conclusions can often be the same with either method

Page 22: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

HOW TO RUN PSM• Create data file (95% of effort)

• Match participants and non-participants on a set of control variables to create a comparison group with similar proportions on all characteristics (i.e. comparison group would have a similar percent female, Hispanic, low income, etc. as compared to the participant group)

• This step is referred to as “balancing” and generally must be repeated several times to obtain balance on all variables of interest either by adjusting matching criteria or removing variables

• Run comparative analyses, which can include simple t-tests, post-PSM regressions, or other techniques

• Major packages that conduct PSM include STATA, R, and SAS

• STATA version 12 and older have psmatch2, v13 has teffects psmatch

• http://www.ssc.wisc.edu/sscc/pubs/stata_psmatch.htm compares the two packages

• Note SPSS/PASW does not do PSM directly but there is an R plugin for SPSS

• http://arxiv.org/ftp/arxiv/papers/1201/1201.6385.pdf

Page 23: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

ALTERNATIVE PERSPECTIVES• A criterion that can be applied to regression

and PSM: how do they perform at predicting new observations? (false positives, false negatives, etc.)

• Regression and PSM methods can both be used as tools of discovery

• SO: CHOOSE THE METHOD/MODEL WHICH YIELDS THE SMALLEST PREDICTION ERROR.

• That may decide a battle in a particular setting (or occasion), but the war between methods will go on…

Page 24: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

FURTHER READING• Angrist, J. D., & Pischke, J. (2008). Mostly Harmless Econometrics:

An Empiricists Companion

• Morgan, S., Harding, D. (2006) Matching Estimators of Causal Effects: From Stratification and Weighting to Practical Data Analysis Routines

• Caliendo and Kopeinig. 2005. Practical Guide for PSM

• www.caliendo.de/Papers/practical_revised_DP.pdf

• Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70, 41-55.

• Padgett, R.; Salisbury, M.; An, B.; & Pascarella, E. (2010). Required, Practical, or Unnecessary? An Examination and Demonstration of Propensity Score Matching Using Longitudinal Secondary Data. New Directions for Institutional Research – Assessment Supplement (pp. 29-42). San Francisco, CA: Jossey-Bass.

• Soledad Cepeda, M.; Boston, R.; Farrar, J., & Strom, B. (2003). Comparison of Logistic Regression versus Propensity Score When the Number of Events Is Low and There Are Multiple Confounders. American Journal of Epidemiology, 158, 280-287.

• http://www.biostat.jhsph.edu/~estuart/propensityscoresoftware.html

Page 25: BLENDING PROPENSITY SCORE MATCHING AND LOGISTIC REGRESSION IN SUPPORT SERVICE EVALUATIONS TERRENCE WILLETT, CRAIG HAYWARD, AND NATHAN PELLEGRIN CAIR CONFERENCE

THANK YOUTerrence WillettDirector of Planning, Research, and Knowledge SystemsCabrillo [email protected]

Craig HaywardDirector of Planning, Research, and AccreditationIrvine Valley [email protected]

Nathan PellegrinDirector of Institutional ResearchPeralta [email protected]