#iegovern impact evaluation workshop istanbul, turkey january 27-30, 2015 measuring impact 1...
TRANSCRIPT
#ieGovern Impact Evaluation Workshop
Istanbul, Turkey January 27-30,
2015
Measuring Impact1 Non-experimental methods2 Experiments
Vincenzo Di MaroDevelopment Impact Evaluation
3
How we can evaluate this?Regression Discontinuity
DesignCase: pay reward offered on the basis of exam score• new scheme under which all tax inspectors will have to take
a compulsory written exam • Grades for this exam range from 0 to 100 where 0 means
worst outcome and 100 is the best outcome. • At the end of the exam all tax inspectors that achieve a
minimum score (of 50) will be offered to enter the pay reward scheme.
• Idea: compare tax inspectors with score a bit below 50 (and unable to choose the reward scheme)….
….with inspectors with score a bit above 50 (and so eligible for the scheme).
Regression DiscontinuityRe
venu
es
100 Score0 50
Close to 50 very similar characteristics
Discontinuity
40 60
7
Regression discontinuity
Method Treated Control Difference in %
Regression Discontinuity $80,215 $69,753 $10,463 15%
Problem: Impact is valid only for those subjects that are close to the cut-off point, that is only for tax inspectors that have an exam score close around 50. Is this the group you want to know about?
• Powerful method if you have:– Continuous eligibility index– Clearly defined eligibility cut-off.
• It gives a causal impact but with a local interpretation
8
Summary of impacts so farMethod Treated Control/Comparison Difference in %
Participants - Non-participants $93,827 $70,800 $23,027 33%
Before - After $93,827 $72,175 $21,653 30%
Difference-in-differences 1 (P1-NP1)$23,027
(P0-NP0)$3,347 $19,590 29%
Difference-in-differences 2 (P1-P0)$21,652
(NP1-NP0)$2,062 $19,590 29%
Regression Discontinuity (RD) $80,215 $69,753 $10,463 15%
• Weak methods can lead to very misleading results• RD (causal impact) is only around half of the impact
estimated with the other weaker methods.• Valid results from IE only if you use rigorous methods.
9
Experiments• Other names: Randomized Control
Trials (RCTs) or Randomization• Assignment to Treatment and Control
is based on chance, it is random (like flipping a coin)
• Treatment and Control groups will have exactly the same characteristics (balanced) at baseline.
• Only difference is that treatment receives intervention, control does not
10
Experiments: plan
Design of experiments
How to implement RCTs
One treatment and many treatments
Encouragement design
Random assignment
1. Population
External Validity
2. Evaluation sample
3. Randomize treatment
Internal Validity
Comparison
Treatment
Unit of RandomizationChoose according to type of program
o Individual/Householdo School/Health Clinic/catchment
area/Government agencyo Block/Village/Communityo Ward/District/Region
Keep in mindo Need “sufficiently large” number of units to detect
minimum desired impact: Power.o Spillovers/contaminationo Operational and survey costs
As a rule of thumb, randomize at the smallest viable unit of implementation.
Implementation(pure) Randomization might be not feasible because some eligibles would be excluded from benefits.
Usually there are constraints within project implementation that allow randomization:
• Budget constraints Lottery
– Not enough treatments for all eligible subjects
– Lottery is fair, transparent and ethical way to assign benefits
• Limited capacity Randomized phase-in
– Not possible to treat all the units in the first phase
– Randomize which group of units is control (they will be treated at later stage, say after 1 year)
Multiple Treatments•Different level of benefits–Randomly assign people to different intensity of the treatment (20% vs. 30% reward)
•No evidence on which alternative is best: test variations in treatment:– Randomly assign subjects to different interventions–Compare one to another–Assess complementarities
Multiple Treatments: 2X2 design
•Assess complementarities•Overall reward effect
Intervention 2
Control Treatment
Intervention 1
Control XSocial recognition
reward
Treatment
Monetary reward
Both rewards
Encouragement design• Not always possible to randomly
assign to control group:– Political and ethical reasons– Participation is voluntary and all
eligible
• Randomized promotion/encouragement
o program available to everyoneo But provide additional promotion,
encouragement or incentives to a random sub-sample:–Additional Information. –Incentives (small gift or prize).–Transport (bus fare).
Encouragement designRandomize Incentive to participate. Ex. small gifts
Encouraged
Not encouraged
High participation (ex. 80%)
Low participation (ex. 10%)
18
How we can evaluate this?Randomized Control Trials
Case: pay scheme offered to a subset of inspectors selected randomly.
• Out of the 482 inspectors, 218 randomly assigned to the treatment group, the rest (264) to control group
• No pre-treatment difference between control and treatment as only reason that explains assignment to one of the groups is chance
• Comparison of treatment and control group gives a causal impact only difference is one group receives treatment, the other does not
19
Treatment and control group balance
• All key variables are balanced at baseline
• That is: the difference between control and treatment is zero before the intervention starts
• This happens because of randomization
20
RCT causal impact
Method Treated Control Difference in %
RCT $75,611 $68,738 $6,874 10%
Problem: - Implementation of experiments- External validity
• Impact can be attributed to the intervention
• Benchmark to assess other methods
21
Summary of impacts so farMethod Treated Control/Comparison Difference in %
Participants - Non-participants $93,827 $70,800 $23,027 33%
Before - After $93,827 $72,175 $21,653 30%
Difference-in-differences 1 (P1-NP1)$23,027
(P0-NP0)$3,347 $19,590 29%
Difference-in-differences 2 (P1-P0)$21,652
(NP1-NP0)$2,062 $19,590 29%
Regression Discontinuity (RD) $80,215 $69,753 $10,463 15%
RCT $75,611 $68,738 $6,874 10%
- Different methods: quite different results- RCT is the benchmark- Other methods can be vastly wrong- RD close to RCT
22
Testing other schemes
Treatment Control Difference %
RCT "Revenue Incentive" $75,611 $68,738 6,874 10%
RCT "Revenue Plus" $72,174 $68,738 3,437 5%
RCT "Flexible Bonus" $69,425 $68,738 687 1%
3 versions of performance pay incentive were tested:
(1) “Revenue” scheme provided incentives solely on revenue collected above a benchmark predicted from historical data.
(2) “Revenue Plus” under which adjustments for whether teams ranked in the top, middle, or bottom third (based on an independent survey of taxpayers) were applied
(3) “Flexible Bonus” under which rewards both on the basis of pre-specified criteria set by the tax department and on subjective adjustments based on period-end overall performance (based on subjective assessment by managers of the tax units).
23
Experiments
If experiments are not possible choose methods
that are still valid
Before-After Participants-Non-participants
RDDiff-in-Diff
Multiple treatmentsX X