#iegovern impact evaluation workshop istanbul, turkey january 27-30, 2015 measuring impact 1...

#ieGovern Impact Evaluation Workshop

Istanbul, Turkey January 27-30,

2015

Measuring Impact1 Non-experimental methods2 Experiments

Vincenzo Di MaroDevelopment Impact Evaluation

2

Impact evaluation for causal impacts

3

How we can evaluate this?Regression Discontinuity

DesignCase: pay reward offered on the basis of exam score• new scheme under which all tax inspectors will have to take

a compulsory written exam • Grades for this exam range from 0 to 100 where 0 means

worst outcome and 100 is the best outcome. • At the end of the exam all tax inspectors that achieve a

minimum score (of 50) will be offered to enter the pay reward scheme.

• Idea: compare tax inspectors with score a bit below 50 (and unable to choose the reward scheme)….

….with inspectors with score a bit above 50 (and so eligible for the scheme).

Regression DiscontinuityRe

venu

es

100 Score0 50

At Baseline

Average tax revenue


venu

es

100 Score0 50

After treatment offered

Discontinuity


venu

es

100 Score0 50

Close to 50 very similar characteristics

Discontinuity

40 60

7

Regression discontinuity

Method Treated Control Difference in %

Regression Discontinuity $80,215 $69,753 $10,463 15%

Problem: Impact is valid only for those subjects that are close to the cut-off point, that is only for tax inspectors that have an exam score close around 50. Is this the group you want to know about?

• Powerful method if you have:– Continuous eligibility index– Clearly defined eligibility cut-off.

• It gives a causal impact but with a local interpretation

8

Summary of impacts so farMethod Treated Control/Comparison Difference in %

Participants - Non-participants $93,827 $70,800 $23,027 33%

Before - After $93,827 $72,175 $21,653 30%

Difference-in-differences 1 (P1-NP1)$23,027

(P0-NP0)$3,347 $19,590 29%

Difference-in-differences 2 (P1-P0)$21,652

(NP1-NP0)$2,062 $19,590 29%

Regression Discontinuity (RD) $80,215 $69,753 $10,463 15%

• Weak methods can lead to very misleading results• RD (causal impact) is only around half of the impact

estimated with the other weaker methods.• Valid results from IE only if you use rigorous methods.

9

Experiments• Other names: Randomized Control

Trials (RCTs) or Randomization• Assignment to Treatment and Control

is based on chance, it is random (like flipping a coin)

• Treatment and Control groups will have exactly the same characteristics (balanced) at baseline.

• Only difference is that treatment receives intervention, control does not

10

Experiments: plan

Design of experiments

How to implement RCTs

One treatment and many treatments

Encouragement design

Random assignment

1. Population

External Validity

2. Evaluation sample

3. Randomize treatment

Internal Validity

Comparison

Treatment

Unit of RandomizationChoose according to type of program

o Individual/Householdo School/Health Clinic/catchment

area/Government agencyo Block/Village/Communityo Ward/District/Region

Keep in mindo Need “sufficiently large” number of units to detect

minimum desired impact: Power.o Spillovers/contaminationo Operational and survey costs

As a rule of thumb, randomize at the smallest viable unit of implementation.

Implementation(pure) Randomization might be not feasible because some eligibles would be excluded from benefits.

Usually there are constraints within project implementation that allow randomization:

• Budget constraints Lottery

– Not enough treatments for all eligible subjects

– Lottery is fair, transparent and ethical way to assign benefits

• Limited capacity Randomized phase-in

– Not possible to treat all the units in the first phase

– Randomize which group of units is control (they will be treated at later stage, say after 1 year)

Multiple Treatments•Different level of benefits–Randomly assign people to different intensity of the treatment (20% vs. 30% reward)

•No evidence on which alternative is best: test variations in treatment:– Randomly assign subjects to different interventions–Compare one to another–Assess complementarities

Multiple Treatments: 2X2 design

•Assess complementarities•Overall reward effect

Intervention 2

Control Treatment

Intervention 1

Control XSocial recognition

reward

Treatment

Monetary reward

Both rewards

Encouragement design• Not always possible to randomly

assign to control group:– Political and ethical reasons– Participation is voluntary and all

eligible

• Randomized promotion/encouragement

o program available to everyoneo But provide additional promotion,

encouragement or incentives to a random sub-sample:–Additional Information. –Incentives (small gift or prize).–Transport (bus fare).

Encouragement designRandomize Incentive to participate. Ex. small gifts

Encouraged

Not encouraged

High participation (ex. 80%)

Low participation (ex. 10%)

18

How we can evaluate this?Randomized Control Trials

Case: pay scheme offered to a subset of inspectors selected randomly.

• Out of the 482 inspectors, 218 randomly assigned to the treatment group, the rest (264) to control group

• No pre-treatment difference between control and treatment as only reason that explains assignment to one of the groups is chance

• Comparison of treatment and control group gives a causal impact only difference is one group receives treatment, the other does not

19

Treatment and control group balance

• All key variables are balanced at baseline

• That is: the difference between control and treatment is zero before the intervention starts

• This happens because of randomization

20

RCT causal impact

Method Treated Control Difference in %

RCT $75,611 $68,738 $6,874 10%

Problem: - Implementation of experiments- External validity

• Impact can be attributed to the intervention

• Benchmark to assess other methods

21

Summary of impacts so farMethod Treated Control/Comparison Difference in %

Participants - Non-participants $93,827 $70,800 $23,027 33%

Before - After $93,827 $72,175 $21,653 30%

Difference-in-differences 1 (P1-NP1)$23,027

(P0-NP0)$3,347 $19,590 29%

Difference-in-differences 2 (P1-P0)$21,652

(NP1-NP0)$2,062 $19,590 29%

Regression Discontinuity (RD) $80,215 $69,753 $10,463 15%

RCT $75,611 $68,738 $6,874 10%

- Different methods: quite different results- RCT is the benchmark- Other methods can be vastly wrong- RD close to RCT

22

Testing other schemes

Treatment Control Difference %

RCT "Revenue Incentive" $75,611 $68,738 6,874 10%

RCT "Revenue Plus" $72,174 $68,738 3,437 5%

RCT "Flexible Bonus" $69,425 $68,738 687 1%

3 versions of performance pay incentive were tested:

(1) “Revenue” scheme provided incentives solely on revenue collected above a benchmark predicted from historical data.

(2) “Revenue Plus” under which adjustments for whether teams ranked in the top, middle, or bottom third (based on an independent survey of taxpayers) were applied

(3) “Flexible Bonus” under which rewards both on the basis of pre-specified criteria set by the tax department and on subjective adjustments based on period-end overall performance (based on subjective assessment by managers of the tax units).

23

Experiments

If experiments are not possible choose methods

that are still valid

Before-After Participants-Non-participants

RDDiff-in-Diff

Multiple treatmentsX X

#ieGovern Impact Evaluation Workshop

Istanbul, Turkey January 27-30,

2015

Thank You!

facebook.com/ieKnow

#impacteval

blogs.worldbank.org/impactevaluations

microdata.worldbank.org/index.php/catalog/impact_evaluation

http://dime.worldbank.orgWEB

#iegovern impact evaluation workshop istanbul, turkey january 27-30, 2015 measuring impact 1...

Documents

discontinuity slide

regression discontinuity

measuring impact

local interpretation

tax inspectors

minimum score

rctsone treatment

minimum desired impact