11 th of october, 2010 university of cape town kamilla gumede martin abel introduction to randomised...

11th of October, 2010University of Cape Town

Kamilla GumedeMartin Abel

Introduction to Randomised Evaluations

• New research programme within SALDRU• Regional office of a global network• Specialise in RANDOMISED IMPACT

EVALUATIONS• Do 3 things:

– Run evaluations– Disseminate result – public good– Train others to run evaluations

J-PAL Africa

Fight poverty

• I. Why do we evaluate social programmes?• II. What is an IMPACT?• III. Impact evaluation methodologies • IV. How to run an RCT:

– Advantages of randomised evaluations– Theory of Change– Randomisation Design– External vs. Internal Validity

Overview

• Surprisingly little hard evidence on what works

• Need #1: Can do more with given budget with better evidence.

• Need #2: If people knew money was going to programs that worked, could help increase pot for anti-poverty programs

• Instead of asking “do aid/development programs work?” should be asking:– Which work best, why and when?– How can we scale up what works?

Evidence-based policy making

4

5

Example Aid: Optimists

“I have identified the specific investments that are needed [to end poverty]; found ways to plan and implement them; [and] shown that they can be affordable.”

Jeffrey Sachs End of Poverty

6

“After $2.3 trillion over 5 decades, why are the desperate needs of the world's poor still so tragically unmet?

Isn't it finally time for an end to the impunity of foreign aid?”

Bill Easterly The White Man’s Burden

Example Aid: Pessimists

• Accountability

• Lesson learning– Program– Organization– Beneficiaries– World

• So that we can reduce poverty through more effective programs

• Different types of evaluation contribute to these different objectives of evaluation

Objective of evaluation

The different types of evaluation

Evaluation (M&E)

Program Evaluation

Impact Evaluation

Randomized Evaluation

Evaluating Social Programmes

• What is outcome after programme?• What would have happened in the absence of

the programme?

• Take the difference between what happened (with the program)

- what would have happened (without the program)= IMPACT of the program

10

How to measure impact? (I)

Impact is defined as a comparison between:

1. the outcome some time after the program has been introduced

2. the outcome at that same point in time had the program not been introduced (the ”counterfactual”)

11

How to measure impact? (II)

Impact: What is it?

Time

Prim

ary

Out

com

e

Impact

Counterfactual

Intervention

Impact: What is it?

Time

Prim

ary

Out

com

e

ImpactCounterfactual

Intervention

• The counterfactual represents the state of the world that program participants would have experienced in the absence of the program (i.e. had they not participated in the program)

• Problem: Counterfactual cannot be observed

• Solution: We need to “mimic” or construct the counterfactual

Counterfactual

• Counterfactual is often constructed by selecting a group not affected by the program

• Randomized:– Use random assignment of the program to create a

control group which mimics the counterfactual.

• Non-randomized:– Argue that a certain excluded group mimics the

counterfactual.

15

Constructing the counterfactual

• Experimental:– Randomized Evaluations

• Quasi-experimental– Instrumental Variables– Regression Discontinuity Design

• Non-experimental– Pre-post– Difference in differences– Cross Sectional Regression– Fixed Effects Analysis– Statistical Matching

16

Methodologies in impact evaluation

South Africa OAP on labour supply

Non-experimental evaluations – Cross Sectional Regression

Bertrand et al. (2003) Posel et al. (2006)

• We can control for observable differences (age, gender, education,...)• There are also unobservable characteristics we cannot control for (motivation, etc.)

What people does a household with a pension attract?

Non-experimental evaluations – Panel Data Analysis with Fixed Effects

Ardington et al. (2009)

• Fixed effects analysis limits sample to households that changed pension status over time • We can control for unobservable characteristics that do not change • Unobservable characteristics may change over time• Data requirements: panel data, sizeable proportion of households switching

How to randomise

A. The basics• Randomly assign them to either:

Treatment Group – is offered treatment Control Group - not allowed to receive treatment

(during the evaluation period)

20

Target Populati

on

Not in evaluation

Evaluation Sample

Random Assignment

Treatment group

Control group

A. Why randomize? – Conceptual Argument• If properly designed and conducted,

randomized experiments provide the most credible method to estimate the impact of a program

• Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,

• any difference that subsequently arises between them can be attributed to the program rather than to other factors.

21

Example: Primary vs Secondary

Returns to Secondary Education (?)• Standard way to measure this:

– Equation• But are people who complete school “the

same” as those who don’t:– More patient, more ambitious, more resourced

families and have lower immediate economic opportunities.

• 1,200 teens, qualified but cannot afford:– 300 boys, 300 girls get 4 year scholarship– Followed for 10 years

In class test

Basic setup of a randomized evaluation

26

Target Populati

on

Not in evaluation

Evaluation Sample

Random Assignment

Treatment group

Control group

Base- line

survey

End- line

survey

Roadmap to Randomized Evaluations

• willing partner• sufficient time• interesting policy question / theory• sufficient resources

Environment / Context1

• mechanism of change (log frame)• state assumptions• identify research hypothesis• identify target population• identify indicators• identify threads to validity

Theory of Change2

• statistical validity• cluster correlation

Sufficient Sample Size4• Spillovers• Discouragement• Attrition• Political interference

Strategy to Manage Threats5

Check on •

• competing interventions• simple program• packages

Randomization Design

• encouragement• gradual rollout• simple lottery• rotation design

3

• individual •cluster design • block random.

Intervention Unit of randomization Randomization mechanism

Revise

• Willing partner• Sufficient time• Interesting policy question / theory• Sufficient resources

B. Environment / Context

Programs /Policies

Knowledge• Evidence• Experience• Personal• collective

Ideology• Own• External

Support• Budget• Political• Capacity

II. Evaluations: Providing evidence for policymaking

• What are the possible chains of outcomes in the case of the intervention?

• What are the assumptions underlying each chain of causation?

• What are the critical intermediary steps needed to obtain the final results?

• What variables should we try to obtain at every step of the way to discriminate between various models?

C. Theory of Change (I)

30

C. Theory of Change (II) – SA Pension System

31

Bertrand et al. (2003) Posel et al. (2006)

Different theories of change determine what indicators we measure and who do include in our evaluation

• Based on the Theory of Change, we identify indicators to test the different lines of causation and measure outcomes

...room for creativity…• How to measure women empowerment?

– Measure fraction of time they speak during village council meeting

• How to measure corruption in infrastructure projects?– Drill holes in the asphalt of newly built roads and

measure difference in actual and official thickness

C. Indicators





Theory of Change2


Sufficient Sample Size4• Spillovers• Discouragement• Attrition• Political interference


Check on •




3



Revise

D. Basic setup of a randomized evaluation

34

Target Populati

on

Not in evaluation

Evaluation Sample

Random Assignment

Treatment group

Control group

• Evidence on the effectiveness of providing microfinance loans to the poor has been mixed. Some argue that financial literacy training is more effective while others propose that both loans and training needs to be provided to alleviate poverty

How can you design a randomised evaluation to assess which of these claims is true?

Case Study: Microfinance and/or Financial Literacy Training

D. Forms of Intervention

Random Assignment

6 month Financial Literacy

Control group

1 month Financial LiteracyRandom

Assignment

Microfinance

Control group

Financial Literacy

Financial Literacy AND Microfinance

Random Assignment Microfinance

Control group

Financial Literacy

Random Assignment

Microfinance

Control group

Simple Treatment / Control

Cross- cutting Design

Multiple Treatment

Varying levels of Treatment

• Individual

• Cluster (Class room, school, district,…)

• Generally, best to randomize at the level at which the treatment is administered.

• Ethical and practical concerns

E. Unit of Randomization

• Confronted with overcrowded schools and a shortage of teachers, in 2005 the NGO ICS offered to provided funds to hire 140 extra teachers each year.

What is the best unit of randomisation for our RCT?

Case Study: Extra Teachers in Kenya

• Lottery• Pull out of a hat/bucket• Use random number generator in spreadsheet or STATA

• Phase-in design• Rotation design• Encouragement design

F. Method of Randomization

How to Randomize, Part I - 40

Random assignment through lottery

2006

Income per person, per month, rupees

1000

500

0Treat Compare

1457 1442

Alternative Mechanism: Phase-in design

Round 1Treatment: 1/3Control: 2/3


Round 3Treatment: 3/3Control: 0 1

1

11

1

1

1

1

1

11

1

1

1

2

2

22

2

2

22

2

2

2

22

2

2

2

3

333

3

33

33

3

33

3

3

3 3

3



Randomized evaluation endsRandomized evaluation ends





Theory of Change2


Sufficient Sample Size4

• Spillovers• Sample Bias• Attrition


Check on •




3



Revise

• Internal Validity: Can we estimate the treatment effect for our particular sample?– Fails when there are differences between the two

groups (other than the treatment itself) that affect the outcome

• External Validity: Can we extrapolate our estimates to other populations?– Fails when outside our evaluation environment,

the treatment has a different effect

G. Internal vs. External Validity

• Threads to Internal Validity: control group is different from the counterfactual– Spill-overs– Sample Selection Bias – Attrition

• Examples:– Individuals assigned to comparison group could attempt to

move into treatment group (cross-over) and v.v.– Individuals assigned to treatment group could drop out of

the program (Attrition)

G. Threads to Internal Validity

45

Depends on three factors:• Program Implementation: can it be replicated

at a large scale?

• Study Sample: is it representative?– Does de-worming have the same effects in Kenya

and South Africa?

• Sensitivity of results: would a similar, but slightly different program, have same impact?

G. External Validity: Generalisability of results

Interested? Become part of the J-PAL research team!

“You get to spend a year in Siberia, while I have to stay here in Hawaii, to apply for grants to extend your research time there.”

11 th of october, 2010 university of cape town kamilla gumede martin abel introduction to randomised...

Documents