when is a program ready for rigorous impact analysis?
DESCRIPTION
[G] overnment should be seeking out creative, results-oriented programs like the ones here today and helping them replicate their efforts across America. President Barack Obama, 6/30/2009 http://www.nationalservice.gov/about/ newsroom/ statements_detail.asp?tbl_pr_id =1828. - PowerPoint PPT PresentationTRANSCRIPT
Abt Associates |pg 1
[G]overnment should be seeking out creative, results-oriented programs like the ones here today and helping them replicate their efforts across America.
President Barack Obama, 6/30/2009http://www.nationalservice.gov/about/ newsroom/
statements_detail.asp?tbl_pr_id=1828
Abt Associates |pg 2
When Is a Program Ready for Rigorous Impact Analysis?
Diana Epstein (CAP/Center for American Progress)and Jacob Alex Klerman (Abt Associates)
APPAM/HSE Conference“Improving the Quality of Public Services”,Moscow, June 2011
Abt Associates |pg 3
The Basic Argument
On Logic Models
Some Examples
Some Broader Implications
Discussion
Outline
Abt Associates |pg 4
The Goal
• Identify program ideas that can successfully address pressing social problems
• Roll them out nationally
Program Idea
Broad Rollout
Abt Associates |pg 5
Require Rigorous Impact Evaluation
• Many apparently plausible programs “don’t’ work”
• Many that work in one site, don’t work in another site
• So, require Impact Evaluation “tollgate” – Usually random assignment– Saving money
• This is the “New Orthodoxy” – Coalition for Effective Policy– OMB (2009)
Program Idea
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 6
Many Programs Fail RA, So Pilot
• We argue that a rush to “random assignment evaluation” has two problems1. Some programs clearly will not pass the
Impact Evaluation “tollgate”2. Some of those programs, would pass the
Impact Evaluation with more “development”
• A “pilot” would help with both problems– i.e., run the program for a while– Then, if the program is promising …– Start the Impact Analysis
Program Idea
Pilot
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 7
Formative Evaluation/Process Evaluation
• Formative Evaluation to improve the program
• Process Evaluation to screen out programs that are unlikely to show impact
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 8
Formative Evaluation/Process Evaluation
• Formative Evaluation to improve the program
• Process Evaluation to screen out programs that are unlikely to show impact
• But, how do you do that?– New Orthodoxy: Only random assignment
can reliably detect impact– So, how can a Process Evaluation screen?
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 9
The Basic Argument
On Logic Models
Some Examples
Some Broader Implications
Discussion
Outline
Abt Associates |pg 10
“Falsifiable Logic Models” Can Screen
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Require Falsifiable Logic Model
Abt Associates |pg 11
“Falsifiable Logic Models” Can Screen
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Require Falsifiable Logic Model
Revise program and Falsifiable Logic Model
Abt Associates |pg 12
“Falsifiable Logic Models” Can Screen
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Require Falsifiable Logic Model
Revise program and Falsifiable Logic Model
Only proceed if program satisfies it’s own Falsifiable Logic Model
Abt Associates |pg 13
Why Might this Work?
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
• Logic Models explicate the path from resources to impacts
• All but the “impact” step occur– In the treatment group– During (or at the end of) treatment
Abt Associates |pg 14
Why Might this Work?
• Logic Models explicate the path from resources to impacts
• All but the “impact” step occur– In the treatment group– During (or at the end of) treatment
• So, verifying the logic model does not require– Random assignment– Or even a control group– Long program follow-up – And expensive post-program survey tracking efforts
Program Idea
Formative Evaluation
Process Evaluation
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 15
The Basic Argument
On Logic Models
Some Examples
Some Broader Implications
Discussion
Outline
Abt Associates |pg 16
Intermediate benchmarks– Resources/inputs, Activities, Outputs, Outcomes
… that were (should have been/could have been) specified in a Falsifiable Logic Model
And, that could be detected – Using only the treatment group– Without an expensive follow-up survey– Before (or perhaps shortly after) the end of treatment
Here goes …
But Will this Screen? Need Examples of …
Abt Associates |pg 17
1. Acquire Resources: Form partnerships, acquire and retain staff (with target qualifications)• Salem ERA: Very high staff turnover
2. Recruit Cases: Fill the program• Portland ERA: Recruited only a third of target enrollees
3. Sustain Participation• Rural WTW Strategies Evaluation; SC Moving Up ERA; Cleveland Achieve ERA
Forms of Logic Model Failures: 1-3
Abt Associates |pg 18
4. Implement with Fidelity• Mathematica Supplemental Reading Evaluation; Abt Mentoring
Evaluation
5. Pre/Post Progress• MDRC NEWWS HCD program’s academic testing (but see GED)
Forms of Logic Model Failures: 4-5
Abt Associates |pg 19
The Basic Argument
On Logic Models
Some Examples
Some Broader Implications
Discussion
Outline
Abt Associates |pg 20
Currently, program developers have an incentive to over-promise– More likely to be funded– But, underpowered Impact Evaluations and null results
Inducing “Truth Telling”
Abt Associates |pg 21
Currently, program developers have an incentive to over-promise– More likely to be funded– But, underpowered Impact Evaluations and null results
Process Evaluation tollgate gives an incentive to under-promise– More likely to pass the Process Evaluation tollgate, but– Less likely to fund Pilot– And, less likely to fund Impact Evaluation
Inducing “Truth Telling”
Abt Associates |pg 22
Currently, program developers have an incentive to over-promise– More likely to be funded– But, underpowered Impact Evaluations and null results
Process Evaluation tollgate gives an incentive to under-promise– More likely to pass the Process Evaluation tollgate, but– Less likely to fund Pilot– And, less likely to fund Impact Evaluation
And if developing a Falsifiable Logic Model forces program developers to more thoroughly and realistically think through their program models, that’s good too!
Inducing “Truth Telling”
Abt Associates |pg 23
For Program Operator– Otherwise, an implicit expectation of proceeding– E.g., ED i3, CNCS SIF, Orszag (2009)
For Evaluator– And probably different contractors– Otherwise, an implicit expectation of proceeding– And contractual considerations lean towards doing so
Key Innovation: Separate Contracts
Abt Associates |pg 24
For Program Operator– Otherwise, an implicit expectation of proceeding– E.g., ED i3, CNCS SIF, Orszag (2009)
For Evaluator– And probably different contractors– Otherwise, an implicit expectation of proceeding– And contractual considerations lean towards doing so
Key Innovation: Separate Contracts
Current practice often runs Process Evaluation simultaneously with Impact Evaluation
Abt Associates |pg 25
The Basic Argument
On Logic Models
Some Examples
Some Broader Implications
Discussion
Outline
Abt Associates |pg 26
Evaluation timelines are already long
Inconsistent with– Pressing problems– Short-term attention to (and funding for) specific problems
This approach would make evaluation timelines much longer– Additional piloting– Additional contracting between the steps
Approach Seems Infeasible: Timeline
Abt Associates |pg 27
Implicit Assumption: Programs are willing to subject themselves to:– Long and burdensome evaluation– Possibility (likelihood) of failure
Plausible if:– Program’s goal is broad scale rollout– Rigorous evaluation is the only way to get there– Programs are confident of “passing”
Some positive examples (Nurse-Family Partnership; Teen Pregnancy Prevention Program; Orszag, 2009)
But they are the exception, rather than the rules
Approach Seems Infeasible: Willingness
Abt Associates |pg 28
The Basic Argument
On Logic Models
Some Examples
Some Broader Implications
Discussion
Outline
Abt Associates |pg 29
When Is a Program Ready for Rigorous Impact Analysis?
When Is a Program Ready for Rigorous Impact Evaluation?
Diana Epstein (CAP/Center for American Progress)and Jacob Alex Klerman (Abt Associates)
APPAM/HSE Conference “Improving the quality of Public Services”,Moscow, June 2011
Abt Associates |pg 30
The Need for Impact Evaluation
• Many apparently plausible programs “don’t’ work”
• So, require Impact Evaluation “tollgate” – Usually random assignment– Saving money
Program Idea
Broad Rollout
Random Assignment Trial
Abt Associates |pg 31
Efficacy Trial/Replication/Effectiveness Trial
• Some programs that work in one site, don’t work in other sites
• So:– Efficacy Evaluation (small trial, ideal
conditions)– Replicate to other (and more) sites– Effectiveness trial at the replicated sites
(larger trial, real world conditions)
Program Idea
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 32
This Is Hardly New
• It’s the “New Orthodoxy” – Coalition for Effective Policy– OMB (2009)
Program Idea
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 33
This Is Hardly New
• It’s the “New Orthodoxy” – Coalition for Effective Policy– OMB (2009)
• And, we think that’s a problem
Program Idea
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 34
Random Assignment Has Lots of Problems
Program Idea
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
• Random assignment fits Winston Churchill’s description of “democracy”
– “The worst form of government [evaluation], except for all the others that have been tried from time to time.”
• Random assignment is– Expensive– Long time lines– Subjects people to programs that don’t work
Abt Associates |pg 35
Random Assignment Has Lots of Problems
• Random assignment fits Winston Churchill’s description of “democracy”
– “The worst form of government [evaluation], except for all the others that have been tried from time to time.”
• Random assignment is– Expensive– Long time lines– Subjects people to programs that don’t work
• Can we do better?– Avoid evaluating programs with no impact– Improve programs so that they will have impact
Program Idea
Broad Rollout
Efficacy Trial
Effectiveness Trial
Replication
Abt Associates |pg 36
1. Acquire Resources
2. Recruit Cases
3. Sustain Participation
4. Implement
5. with Fidelity
6. Pre/Post Progress
Thus, Tollgate Is Implementable
Falsifiable and specifiable in Logic Model
Measured in Treatment Group only
No expensive follow-up survey needed
Occurs during or shortly after program activities
Abt Associates |pg 37
1. Acquire Resources
2. Recruit Cases
3. Sustain Participation
4. Implement
5. with Fidelity
6. Pre/Post Progress
Thus, Tollgate Is Implementable
Falsifiable and specifiable in Logic Model
Measured in Treatment Group only
No expensive follow-up survey needed
Occurs during or shortly after program activities
… with a Pilot Implementation and a Process Evaluation
Abt Associates |pg 38
When Is a Program Ready for Rigorous Impact Analysis?
When Is a Program Ready for Rigorous Impact Evaluation?
Diana Epstein (CAP/Center for American Progress)and Jacob Alex Klerman (Abt Associates)
APPAM/HSE Conference “Improving the quality of Public Services”,Moscow, June 2011
Abt Associates |pg 39
“The program logic model is defined as a picture of how your organization does its work – the theory and assumptions underlying the program. A program logic model links outcomes (both short- and long-term) with program activities/processes and the theoretical assumptions/principles of the program.”
Logic Model Definition
Source: W.K. Kellogg Foundation Logic Model Guide http://www.wkkf.org/~/media/6E35F79692704AA0ADCC8C3017200208.ashx
Abt Associates |pg 40
YOUR PLANNED WORK describes what resources you think you need to implement your program and what you intend to do.– 1. Resources include the human, financial, organizational, and
community resources a program has available to direct toward doing the work. Sometimes this component is referred to as Inputs.
– 2. Program Activities are what the program does with the resources. Activities are the processes, tools, events, technology, and actions that are an intentional part of the program implementation. These interventions are used to bring about the intended program changes or results.
YOUR INTENDED RESULTS include all of the program’s desired results (outputs, outcomes, and impact).
Logic Model: Your Planned Work …
Abt Associates |pg 41
YOUR PLANNED WORK describes what resources you think you need to implement your program and what you intend to do.
YOUR INTENDED RESULTS include all of the program’s desired results (outputs, outcomes, and impact).– 3. Outputs are the direct products of program activities and may include
types, levels and targets of services to be delivered by the program.– 4. Outcomes are the specific changes in program participants’ behavior,
knowledge, skills, status and level of functioning. Short-term outcomes should be attainable within 1 to 3 years, while longer-term outcomes should be achievable within a 4 to 6 year timeframe.
– 5. Impact is the fundamental intended or unintended change occurring in organizations, communities or systems as a result of program activities within 7 to 10 years.
Logic Model: Your Intended Results …
Abt Associates |pg 42
If you don’t know where you’re going, how are you gonna’ know when you get there?
Yogi Berra, New York Yankees Player and Manger, 1925-
Happy families are all alike; every unhappy family is unhappy in its own way.
Anna Karenina, Chapter 1, first lineLeo Tolstoy, Russian mystic & novelist (1828 – 1910)
Abt Associates |pg 43
Paper is nights and weekends work– In reaction to evaluation experience—positive and negative
Has been presented and read internally (Abt JASG) and externally (N. Campbell/ACF, Burt Barnow, Demetra Nightingale; we hope soon B. Kelly/ACF)– Probably going to try to present at ACF
Your comments—on presentation, on paper, and on ideas—much appreciated– In particular, more and better examples
(We hope) to a journal “soon”
Paper Status
Abt Associates |pg 44
Goal: Effective Programs
Program
Idea
Broad-scale
Rollout
Effectiveness Evaluation
Replication
Efficacy Evaluation
Abt Associates |pg 45
Question: How to Get There?
Program
Idea
Broad-scale
Rollout
Effectiveness Evaluation
Replication
Efficacy Evaluation
?
Abt Associates |pg 46
Most rigorously evaluated programs “fail”– Even programs that pass
initial efficacy trial often “fail” follow-on effectiveness trial
And the more rigorous the evaluation, the more likely is “failure”
=> Evaluate before roll-out– Otherwise implement
ineffective programs
Random Assignment is Necessary
Abt Associates |pg 47
Suggests a Random Assignment “Tollgate”
Program
Idea
Broad-scale
Rollout
Effectiveness Evaluation
Replication
Efficacy Evaluation
Abt Associates |pg 48
Most rigorously evaluated programs “fail”– Even programs that pass
initial efficacy trial, often “fail” follow-on effectiveness trial
And the more rigorous the evaluation, the more likely is “failure”
=> Evaluate before roll-out– Otherwise implement
ineffective programs
Random Assignment is Necessary, but Expensive
• In dollars
• In calendar time
• In the lives of clients/participants who waste time in programs that don’t work
Abt Associates |pg 49
Random Assignment is Necessary, but Expensive
• In dollars
• In calendar time
• In the lives of clients/participants who waste time in programs that don’t work
=> Don’t evaluate programs that will fail. Duh!
• Most rigorously evaluated programs “fail”
– Even programs that pass initial efficacy trial, often “fail” follow-on effectiveness trial
• And the more rigorous the evaluation, the more likely is “failure”
=> Evaluate before roll-out– Otherwise implement ineffective
programs
Abt Associates |pg 50
Random Assignment is Necessary, but Expensive
• In dollars
• In calendar time
• In the lives of clients/participants who waste time in programs that don’t work
=> Don’t evaluate programs that will fail. Duh!
• Most rigorously evaluated programs “fail”
– Even programs that pass initial efficacy trial, often “fail” follow-on effectiveness trial
• And the more rigorous the evaluation, the more likely is “failure”
=> Evaluate before roll-out– Otherwise implement ineffective
programs
Abt Associates |pg 51
It seems reasonable to require a Logic Model as a condition of funding– If a program can’t describe its “planned work” and its “intended
results”, it is not ready for (implementation) funding– And that Logic Model should be detailed and falsifiable—“When?” and
“How many?/What fraction?” (though this is unconventional) According to its own Logic Model, a program that can not
“satisfy” the explicit falsifiable goals will not succeed=> Don’t proceed to rigorous impact evaluation (i.e., random
assignment) until you have checked the Logic Model in a pilot
Hold a Program to Its Own Logic Model
Abt Associates |pg 52
Only the last step—“Impact”—requires a control/comparison group
The first three (and often the fourth) steps occur during the program, so no need for expensive follow-up surveys
… And this Idea Is Not Vacuous
i.e., a conventional Process Evaluation
Abt Associates |pg 53
Programs that fail don’t (necessarily) need to be discarded– Program refinement—sometimes called “Formative
Evaluation” is an option
Two caveats– What is simply “moving the goalposts”?
Consider whether we are interested in the program with more modest expectations
– When and how many times to refine? Not sure; refinement comes at the expense of developing other (potentially) beneficial programs
Possibility of Program Refinement
Abt Associates |pg 54
Recall the Earlier Program Development Model …
Program
Idea
Broad-scale
Rollout
Effectiveness Evaluation
Replication
Efficacy Evaluation
Abt Associates |pg 55
Suggest Adding a Process Evaluation “Tollgate”