evaluation methods
DESCRIPTION
Evaluation MethodsTRANSCRIPT
![Page 1: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/1.jpg)
Why Randomize?
![Page 2: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/2.jpg)
Course Overview
1. What is evaluation?
2. Measuring impacts (outcomes, indicators)
3. Why randomize?
4. How to randomize?
5. Sampling and sample size
6. Threats and Analysis
7. Cost-Effectiveness Analysis
8. Project from Start to Finish
![Page 3: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/3.jpg)
What is the most convincing argument you have heard against RCTs?
A. Too expensive
B. Takes too long
C. Unethical
D. Too difficult to design/implement
E. Not externally valid (Not generalizable)
F. Can tell us whether there is impact, and the magnitude of that impact, but not why or how (it is a black box)
Too expensiv
e
Takes too lo
ng
Unethica
l
Too difficu
lt to desig
n/...
Not extern
ally va
lid (N
o...
0% 0% 0%0%0%
![Page 4: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/4.jpg)
Impact: What is it?
A. Positive
B. Negative
C. No impact
D. Don’t Know
0% 0%0%0%
Intervention
Pri
mary
Ou
tcom
e
Time
![Page 5: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/5.jpg)
Impact: What is it?
Time
Pri
mary
Outc
om
e
Impact
Counterfactual
Intervention
![Page 6: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/6.jpg)
Impact: What is it?
A. Positive
B. Negative
C. No impact
D. Don’t Know
Positive
Negative
No impact
Don’t Know
0% 0%0%0%
Pri
mary
Outc
om
e
Intervention
Counterfactual
Time
![Page 7: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/7.jpg)
Impact: What is it?
Time
Pri
mary
Ou
tcom
e
ImpactCounterfactual
Intervention
![Page 8: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/8.jpg)
Impact: What is it?
Time
Pri
mary
Ou
tcom
e
ImpactCounterfactual
Intervention
![Page 9: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/9.jpg)
Impact is defined as a comparison between:
The outcome some time after the program has been introduced
The outcome at that same point in time had the program not been introduced
This is know as the “Counterfactual”
How to Measure Impact?
![Page 10: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/10.jpg)
Counterfactual
The Counterfactual represents the state of the world that
program participants would have experienced in the
absence of the program (i.e. had they not participated in
the program)
Problem: Counterfactual cannot be observed
Solution: We need to “mimic” or construct the
counterfactual
![Page 11: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/11.jpg)
IMPACT EVALUATION METHODS
![Page 12: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/12.jpg)
Impact Evaluation Methods
1. Randomized Experiments
Also known as:
Random Assignment Studies
Randomized Field Trials
Social Experiments
Randomized Controlled Trials (RCTs)
Randomized Controlled Experiments
![Page 13: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/13.jpg)
Impact Evaluation Methods
2. Non- or Quasi-Experimental Methods
Pre-Post
Simple Difference
Differences-in-Differences
Multivariate Regression
Statistical Matching
Interrupted Time Series
Instrumental Variables
Regression Discontinuity
![Page 14: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/14.jpg)
WHAT IS A RANDOMIZED EXPERIMENT?
![Page 15: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/15.jpg)
The Basics
Start with simple case:
Take a sample of program applicants
• Randomly assign them to either:
• Treatment Group – is offered treatment
• Control Group - not allowed to receive treatment (during the
evaluation period)
![Page 16: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/16.jpg)
Key Advantage
Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,
Any difference that subsequently arises between them can be attributed to the program rather than to other factors.
![Page 17: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/17.jpg)
WHY RANDOMIZE?
![Page 18: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/18.jpg)
Example: Pratham’s Balsakhi Program
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program
Incorporating random assignment into the program
![Page 19: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/19.jpg)
What was the Problem?
Many children in 3rd and 4th standard were not even at the 1st standard level of competency
Class sizes were large
Social distance between teacher and many of the students was large
![Page 20: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/20.jpg)
Context and Partner
124 Municipal Schools in Vadodara (Western India)
2002 & 2003:Two academic years
~ 17,000 children
“Every child in school and learning well”
Works with most states in India reaching millions of children
![Page 21: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/21.jpg)
Proposed Solution
Hire local women (Balsakhis)
From the community
Train them to teach remedial competencies
• Basic literacy, numeracy
Identify lowest performing 3rd and 4th standard students
• Take these students out of class (2 hours/day)
• Balsakhi teaches them basic competencies
![Page 22: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/22.jpg)
Pros
Reduced social distance
Reduced class size
Teaching at appropriate level
Improved learning for lower-
performing students
Improved learning for higher-
performers
Cons
Less qualified
Teacher resentment
Reduced interaction with
higher-performing peers
Increased gap in learning
Reduced test scores for all
kids
Possible Outcomes
What is the Impact?
![Page 23: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/23.jpg)
J-PAL Conducts a Test at the End
Balsakhi students score an average of 51%
What can we conclude?
![Page 24: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/24.jpg)
1. Pre-post (Before vs. After)
Look at average change in test scores over the school year for the balsakhi children
Average change in the outcome of interest before and after the programme
![Page 25: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/25.jpg)
Start of program End of program0
10
20
30
40
50
60
24.8
51.22
Average test scores of Balsakhi students
26.42
Method 1: Pre vs Post (Before vs. After)
Average post-test score for children with a Balsakhi
51.22
Average pretest score for children with a Balsakhi
24.80
Difference 26.42
![Page 26: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/26.jpg)
Pre-Post
Limitations of the method
• No comparison group, doesn’t take time trend into account
What else can we do to estimate impact?
![Page 27: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/27.jpg)
Method 2: Simple Difference
Divide the population into two groups:
One group enrolled in Balsakhi program
(Treatment)
One group not enrolled in Balsakhi program
(Control)
Compare test score of these two groups at the end of the program.
Measure difference between program participants and non-participants after the program is completed
![Page 28: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/28.jpg)
Not enrolled in program Enrolled in program0
102030405060 56.27
51.22
Average test scores end of program
-5.05
Method 2: Simple Difference
Average score for children with a balsakhi
51.22
Average score for children without a balsakhi
56.27
Difference -5.05
QUESTION:Under what
conditions can the difference of
-5.05 be interpreted as the impact of the Balsakhi
program?
![Page 29: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/29.jpg)
Method 3: Difference-in-difference
Divide the population into two groups:
• One group enrolled in Balsakhi program (Treatment)
• One group not enrolled in Balsakhi program (Control)
Compare the change in test scores between Treatment and Control
• i.e., difference in differences in test scores
Same thing: compare difference in test scores at post-test with difference in test scores at pretest
Measure improvement (change) over time of participants relative to the improvement (change) over time of non-participants
![Page 30: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/30.jpg)
Start of program End of program0
102030405060
24.8
51.22
36.67
56.27
Average test scores
Enrolled in Balsakhi programNot enrolled in Balsahki program
Method 3: Difference-in-difference
Pretest Post-test Difference
Average score for children with a Balsakhi
24.80 51.22 26.42
![Page 31: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/31.jpg)
Method 3: Difference-in-differences
Method 3: What would have Happened without Balsakhi?
26.42
75
50
25
0
2002 2003
![Page 32: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/32.jpg)
Method 3: Difference-in-differences
Pretest Post-test Difference
Average score for children with a balsakhi
24.80 51.22 26.42
Average score for children without a Balsakhi
36.67 56.27 19.60
![Page 33: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/33.jpg)
Method 3: Difference-in-differences
Method 3: What would have Happened without Balsakhi?
26.4219.60 6.82 points?
75
50
25
00
2002 2003
![Page 34: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/34.jpg)
Method 3: Difference-in-Differences
QUESTION: Under what conditions can 6.82 be interpreted as the impact of the balsakhi program?
Issues:
• failure of “parallel trend assumption”, i.e. impact of time on both groups is not similar
Pretest Post-test
Difference
Average score for children with a Balsakhi
24.80 51.22 26.42
Average score for children without a Balsakhi
36.67 56.27 19.60
Difference 6.82
![Page 35: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/35.jpg)
Method 4: Regression Analysis
Divide the population into two groups:• One group enrolled in Balsakhi program• One group not enrolled in Balsakhi program
Compare test score of these two groups at the start and at the end of the program.
Control for additional variables like gender, class-size
Post-test =
![Page 36: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/36.jpg)
Method 4: Regression Analysis
0 10 20 30 40 50 60 70
post_tot_noB
Linear (post_tot_noB)
post_tot_B
Linear (post_tot_B)
Test Score (at Post Test)
Incom
e
QUESTION: Under what conditions can the coefficient of 1.92 be interpreted as the impact of the Balsakhi program?
1.92
![Page 37: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/37.jpg)
-10
5
2026.42
-5.05
6.82 1.92
* Significant at 5% level
Impact of Balsakhi Program
Method Impact Estimate
(1) Pre-post 26.42*
(2) Simple Difference -5.05*
(3) Difference-in-Difference 6.82*
(4) Regression with controls 1.92
![Page 38: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/38.jpg)
Counterfactual is often constructed by selecting a group not affected by the program
Non-randomized:• Argue that a certain excluded group mimics the
counterfactual.
Randomized:• Use random assignment of the program to create a
control group which mimics the counterfactual.
38
Constructing the Counterfactual
![Page 39: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/39.jpg)
Randomised Evaluations
Individuals, villages, or districts are randomly selected to receive the treatment, while other villages serve as a comparison
Treatment Group
Comparison Group
Village 1Village 2
=
Groups are Statistically Identical before the Program
Any Difference at the Endline can be Attributed to the Program
Two groups continue to be identical, except for treatment. Later, compare outcomes (health, test scores) between the two groups. Any differences between the groups can be attributed to the program.
![Page 40: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/40.jpg)
Basic Set-up of a Randomized Evaluation
Target Populatio
n
Not in evaluation
Evaluation Sample
TotalPopulation
Random Assignmen
t
Treatment Group
Control Group
![Page 41: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/41.jpg)
Randomly samplefrom area of interest
Random Sampling and Random Assignment
![Page 42: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/42.jpg)
Randomly samplefrom area of interest
Randomly assignto treatmentand control
Random Sampling and Random Assignment
Randomly samplefrom both treatment and control
![Page 43: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/43.jpg)
Randomization Design
Population = all schools in case villages
Target population: weakest students in all of these schools
Stratify on three criteria:
• Pre-test scores
• Gender
• Language
Give 50% of them the Balsakhi program
![Page 44: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/44.jpg)
Impact of Balsakhi - Summary
Method Impact Estimate
(1) Pre-post 26.42*
(2) Simple Difference -5.05*
(3) Difference-in-Difference 6.82*
(4) Regression 1.92
*: Statistically significant at the 5% level
![Page 45: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/45.jpg)
Which of these methods do you think is closest to the truth?
A. Pre-post
B. Simple difference
C. Difference-in-Difference
D. Regression
E. Don’t know
Method Impact Estimate
(1) Pre-post 26.42*
(2) Simple Difference -5.05*
(3) Difference-in-Difference 6.82*
(4) Regression 1.92
*: Statistically significant at the 5% level
![Page 46: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/46.jpg)
Impact of Balsakhi - Summary
Method Impact Estimate
(1) Pre-post 26.42*
(2) Simple Difference -5.05*
(3) Difference-in-Difference 6.82*
(4) Regression 1.92
(5)Randomized Experiment 5.87*
*: Statistically significant at the 5% level
![Page 47: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/47.jpg)
Example #2 - Pratham’s Read India Program
*: Statistically significant at the 5% level
Method Impact
(1) Pre-Post 0.60*
(2) Simple Difference -0.90*
(3) Difference-in-Differences 0.31*
(4) Regression 0.06
![Page 48: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/48.jpg)
Which of these methods do you think is closest to the truth?
A. Pre-post
B. Simple difference
C. Difference-in-DifferenceD. Regression
E. Don’t know
*: Statistically significant at the 5% level
Method Impact
(1) Pre-Post 0.60*
(2) Simple Difference -0.90*
(3) Difference-in-Differences
0.31*
(4) Regression 0.06
A. B. C. D. E.
0% 0% 0%0%0%
![Page 49: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/49.jpg)
Example #2 – Pratham’s Read India Program
Method Impact
(1) Pre-Post 0.60*
(2) Simple Difference -0.90*
(3) Difference-in-Differences 0.31*
(4) Regression 0.06(5) Randomized Experiment 0.88*
*: Statistically significant at the 5% level
![Page 50: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/50.jpg)
Method Comparison Works only if…
Pre-Post Program participants before program
Nothing else was affecting outcome
Simple Difference
Individuals who did not participate (data collected after program)
Non-participants are exactly equal to participants
Differences-in-Difference
Same as above + data collected before and after
If two groups have exactly the same trajectory over time
Regression
Same as above +additional “explanatory” variables
Omitted variables do not affect results
Randomized Evaluation
Participants randomly assigned to control group
The two groups are statistically identical on observed and unobserved characteristics
Summary of Methods
![Page 51: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/51.jpg)
Conditions Required
Method Comparison Group Works if….
Pre-Post Program participants before program
The program was the only factor influencing any changes in the measured outcome over time
Simple Difference
Individuals who did not participate (data collected after program)
Non-participants are identical to participants except for program participation, and were equally likely to enter program before it started.
Differences in Differences
Same as above, plus: data collected before and after
If the program didn’t exist, the two groups would have had identical trajectories over this period.
Multivariate Regression
Same as above plus:Also have additional “explanatory” variables
Omitted (because not measured or not observed) variables do not bias the results because they are either: uncorrelated with the outcome, ordo not differ between participants and non-participants
Propensity Score Matching
Non-participants who have mix of characteristics which predict that they would be as likely to participate as participants
Same as above
Randomized Evaluation
Participants randomly assigned to control group
Randomization “works” – the two groups are statistically identical on observed and unobserved characteristics
![Page 52: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/52.jpg)
Other Methods
There are more sophisticated non-experimental methods to estimate program impacts:
• Regression
• Matching
• Instrumental Variables
• Regression Discontinuity
These methods rely on being able to “mimic” the counterfactual under certain assumptions
Problem: Assumptions are not testable
![Page 53: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/53.jpg)
Conclusions: Why Randomize?
There are many ways to estimate a program’s impact
This course argues in favor of one: randomized experiments
• Conceptual argument: If properly designed and
conducted, randomized experiments provide the most
credible method to estimate the impact of a program
• Empirical argument: Different methods can generate
different impact estimates
![Page 54: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/54.jpg)
Key Steps in Conducting an Experiment
1. Design the study carefully
2. Randomly assign people to treatment or control
3. Collect baseline data
4. Verify that assignment looks random
5. Monitor process so that integrity of experiment is not compromised
6. Collect follow-up data for both the treatment and control groups
7. Estimate program impacts by comparing mean outcomes of treatment group vs. mean outcomes of control group.
8. Assess whether program impacts are statistically significant and practically significant.
![Page 55: Evaluation Methods](https://reader038.vdocuments.us/reader038/viewer/2022110120/55844b6fd8b42a6a6d8b49aa/html5/thumbnails/55.jpg)
THANK YOU