evaluation methods

Why Randomize?

Course Overview

1. What is evaluation?

2. Measuring impacts (outcomes, indicators)

3. Why randomize?

4. How to randomize?

5. Sampling and sample size

6. Threats and Analysis

7. Cost-Effectiveness Analysis

8. Project from Start to Finish

What is the most convincing argument you have heard against RCTs?

A. Too expensive

B. Takes too long

C. Unethical

D. Too difficult to design/implement

E. Not externally valid (Not generalizable)

F. Can tell us whether there is impact, and the magnitude of that impact, but not why or how (it is a black box)

Too expensiv

e

Takes too lo

ng

Unethica

l

Too difficu

lt to desig

n/...

Not extern

ally va

lid (N

o...

0% 0% 0%0%0%

Impact: What is it?

A. Positive

B. Negative

C. No impact

D. Don’t Know

0% 0%0%0%

Intervention

Pri

mary

Ou

tcom

e

Time

Impact: What is it?

Time

Pri

mary

Outc

om

e

Impact

Counterfactual

Intervention

Impact: What is it?

A. Positive

B. Negative

C. No impact

D. Don’t Know

Positive

Negative

No impact

Don’t Know

0% 0%0%0%

Pri

mary

Outc

om

e

Intervention

Counterfactual

Time

Impact: What is it?

Time

Pri

mary

Ou

tcom

e

ImpactCounterfactual

Intervention

Impact is defined as a comparison between:

The outcome some time after the program has been introduced

The outcome at that same point in time had the program not been introduced

This is know as the “Counterfactual”

How to Measure Impact?

Counterfactual

The Counterfactual represents the state of the world that

program participants would have experienced in the

absence of the program (i.e. had they not participated in

the program)

Problem: Counterfactual cannot be observed

Solution: We need to “mimic” or construct the

counterfactual

IMPACT EVALUATION METHODS

Impact Evaluation Methods

1. Randomized Experiments

Also known as:

Random Assignment Studies

Randomized Field Trials

Social Experiments

Randomized Controlled Trials (RCTs)

Randomized Controlled Experiments

Impact Evaluation Methods

2. Non- or Quasi-Experimental Methods

Pre-Post

Simple Difference

Differences-in-Differences

Multivariate Regression

Statistical Matching

Interrupted Time Series

Instrumental Variables

Regression Discontinuity

WHAT IS A RANDOMIZED EXPERIMENT?

The Basics

Start with simple case:

Take a sample of program applicants

• Randomly assign them to either:

• Treatment Group – is offered treatment

• Control Group - not allowed to receive treatment (during the

evaluation period)

Key Advantage

Because members of the groups (treatment and control) do not differ systematically at the outset of the experiment,

Any difference that subsequently arises between them can be attributed to the program rather than to other factors.

WHY RANDOMIZE?

Example: Pratham’s Balsakhi Program

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

Case 2: Remedial Education in IndiaEvaluating the Balsakhi Program

Incorporating random assignment into the program

What was the Problem?

Many children in 3rd and 4th standard were not even at the 1st standard level of competency

Class sizes were large

Social distance between teacher and many of the students was large

Context and Partner

124 Municipal Schools in Vadodara (Western India)

2002 & 2003:Two academic years

~ 17,000 children

“Every child in school and learning well”

Works with most states in India reaching millions of children

Proposed Solution

Hire local women (Balsakhis)

From the community

Train them to teach remedial competencies

• Basic literacy, numeracy

Identify lowest performing 3rd and 4th standard students

• Take these students out of class (2 hours/day)

• Balsakhi teaches them basic competencies

Pros

Reduced social distance

Reduced class size

Teaching at appropriate level

Improved learning for lower-

performing students

Improved learning for higher-

performers

Cons

Less qualified

Teacher resentment

Reduced interaction with

higher-performing peers

Increased gap in learning

Reduced test scores for all

kids

Possible Outcomes

What is the Impact?

J-PAL Conducts a Test at the End

Balsakhi students score an average of 51%

What can we conclude?

1. Pre-post (Before vs. After)

Look at average change in test scores over the school year for the balsakhi children

Average change in the outcome of interest before and after the programme

Start of program End of program0

10

20

30

40

50

60

24.8

51.22

Average test scores of Balsakhi students

26.42

Method 1: Pre vs Post (Before vs. After)

Average post-test score for children with a Balsakhi

51.22

Average pretest score for children with a Balsakhi

24.80

Difference 26.42

Pre-Post

Limitations of the method

• No comparison group, doesn’t take time trend into account

What else can we do to estimate impact?

Method 2: Simple Difference

Divide the population into two groups:

One group enrolled in Balsakhi program

(Treatment)

One group not enrolled in Balsakhi program

(Control)

Compare test score of these two groups at the end of the program.

Measure difference between program participants and non-participants after the program is completed

Not enrolled in program Enrolled in program0

102030405060 56.27

51.22

Average test scores end of program

-5.05

Method 2: Simple Difference

Average score for children with a balsakhi

51.22

Average score for children without a balsakhi

56.27

Difference -5.05

QUESTION:Under what

conditions can the difference of

-5.05 be interpreted as the impact of the Balsakhi

program?

Method 3: Difference-in-difference

Divide the population into two groups:

• One group enrolled in Balsakhi program (Treatment)

• One group not enrolled in Balsakhi program (Control)

Compare the change in test scores between Treatment and Control

• i.e., difference in differences in test scores

Same thing: compare difference in test scores at post-test with difference in test scores at pretest

Measure improvement (change) over time of participants relative to the improvement (change) over time of non-participants

Start of program End of program0

102030405060

24.8

51.22

36.67

56.27

Average test scores

Enrolled in Balsakhi programNot enrolled in Balsahki program

Method 3: Difference-in-difference

Pretest Post-test Difference

Average score for children with a Balsakhi

24.80 51.22 26.42

Method 3: Difference-in-differences

Method 3: What would have Happened without Balsakhi?

26.42

75

50

25

0

2002 2003


Pretest Post-test Difference

Average score for children with a balsakhi

24.80 51.22 26.42

Average score for children without a Balsakhi

36.67 56.27 19.60


Method 3: What would have Happened without Balsakhi?

26.4219.60 6.82 points?

75

50

25

00

2002 2003

Method 3: Difference-in-Differences

QUESTION: Under what conditions can 6.82 be interpreted as the impact of the balsakhi program?

Issues:

• failure of “parallel trend assumption”, i.e. impact of time on both groups is not similar

Pretest Post-test

Difference

Average score for children with a Balsakhi

24.80 51.22 26.42

Average score for children without a Balsakhi

36.67 56.27 19.60

Difference 6.82

Method 4: Regression Analysis

Divide the population into two groups:• One group enrolled in Balsakhi program• One group not enrolled in Balsakhi program

Compare test score of these two groups at the start and at the end of the program.

Control for additional variables like gender, class-size

Post-test =

Method 4: Regression Analysis

0 10 20 30 40 50 60 70

post_tot_noB

Linear (post_tot_noB)

post_tot_B

Linear (post_tot_B)

Test Score (at Post Test)

Incom

e

QUESTION: Under what conditions can the coefficient of 1.92 be interpreted as the impact of the Balsakhi program?

1.92

-10

5

2026.42

-5.05

6.82 1.92

* Significant at 5% level

Impact of Balsakhi Program

Method Impact Estimate

(1) Pre-post 26.42*

(2) Simple Difference -5.05*

(3) Difference-in-Difference 6.82*

(4) Regression with controls 1.92

Counterfactual is often constructed by selecting a group not affected by the program

Non-randomized:• Argue that a certain excluded group mimics the

counterfactual.

Randomized:• Use random assignment of the program to create a

control group which mimics the counterfactual.

38

Constructing the Counterfactual

Randomised Evaluations

Individuals, villages, or districts are randomly selected to receive the treatment, while other villages serve as a comparison

Treatment Group

Comparison Group

Village 1Village 2

=

Groups are Statistically Identical before the Program

Any Difference at the Endline can be Attributed to the Program

Two groups continue to be identical, except for treatment. Later, compare outcomes (health, test scores) between the two groups. Any differences between the groups can be attributed to the program.

Basic Set-up of a Randomized Evaluation

Target Populatio

n

Not in evaluation

Evaluation Sample

TotalPopulation

Random Assignmen

t

Treatment Group

Control Group

Randomly samplefrom area of interest

Random Sampling and Random Assignment

Randomly samplefrom area of interest

Randomly assignto treatmentand control

Random Sampling and Random Assignment

Randomly samplefrom both treatment and control

Randomization Design

Population = all schools in case villages

Target population: weakest students in all of these schools

Stratify on three criteria:

• Pre-test scores

• Gender

• Language

Give 50% of them the Balsakhi program

Impact of Balsakhi - Summary


(1) Pre-post 26.42*



(4) Regression 1.92

*: Statistically significant at the 5% level

Which of these methods do you think is closest to the truth?

A. Pre-post

B. Simple difference

C. Difference-in-Difference

D. Regression

E. Don’t know


(1) Pre-post 26.42*



(4) Regression 1.92


Impact of Balsakhi - Summary


(1) Pre-post 26.42*



(4) Regression 1.92

(5)Randomized Experiment 5.87*


Example #2 - Pratham’s Read India Program


Method Impact

(1) Pre-Post 0.60*


(3) Difference-in-Differences 0.31*

(4) Regression 0.06

Which of these methods do you think is closest to the truth?

A. Pre-post

B. Simple difference

C. Difference-in-DifferenceD. Regression

E. Don’t know


Method Impact

(1) Pre-Post 0.60*


(3) Difference-in-Differences

0.31*

(4) Regression 0.06

A. B. C. D. E.

0% 0% 0%0%0%

Example #2 – Pratham’s Read India Program

Method Impact

(1) Pre-Post 0.60*


(3) Difference-in-Differences 0.31*

(4) Regression 0.06(5) Randomized Experiment 0.88*


Method Comparison Works only if…

Pre-Post Program participants before program

Nothing else was affecting outcome

Simple Difference

Individuals who did not participate (data collected after program)

Non-participants are exactly equal to participants

Differences-in-Difference

Same as above + data collected before and after

If two groups have exactly the same trajectory over time

Regression

Same as above +additional “explanatory” variables

Omitted variables do not affect results

Randomized Evaluation

Participants randomly assigned to control group

The two groups are statistically identical on observed and unobserved characteristics

Summary of Methods

Conditions Required

Method Comparison Group Works if….

Pre-Post Program participants before program

The program was the only factor influencing any changes in the measured outcome over time

Simple Difference

Individuals who did not participate (data collected after program)

Non-participants are identical to participants except for program participation, and were equally likely to enter program before it started.

Differences in Differences

Same as above, plus: data collected before and after

If the program didn’t exist, the two groups would have had identical trajectories over this period.

Multivariate Regression

Same as above plus:Also have additional “explanatory” variables

Omitted (because not measured or not observed) variables do not bias the results because they are either: uncorrelated with the outcome, ordo not differ between participants and non-participants

Propensity Score Matching

Non-participants who have mix of characteristics which predict that they would be as likely to participate as participants

Same as above

Randomized Evaluation

Participants randomly assigned to control group

Randomization “works” – the two groups are statistically identical on observed and unobserved characteristics

Other Methods

There are more sophisticated non-experimental methods to estimate program impacts:

• Regression

• Matching

• Instrumental Variables

• Regression Discontinuity

These methods rely on being able to “mimic” the counterfactual under certain assumptions

Problem: Assumptions are not testable

Conclusions: Why Randomize?

There are many ways to estimate a program’s impact

This course argues in favor of one: randomized experiments

• Conceptual argument: If properly designed and

conducted, randomized experiments provide the most

credible method to estimate the impact of a program

• Empirical argument: Different methods can generate

different impact estimates

Key Steps in Conducting an Experiment

1. Design the study carefully

2. Randomly assign people to treatment or control

3. Collect baseline data

4. Verify that assignment looks random

5. Monitor process so that integrity of experiment is not compromised

6. Collect follow-up data for both the treatment and control groups

7. Estimate program impacts by comparing mean outcomes of treatment group vs. mean outcomes of control group.

8. Assess whether program impacts are statistically significant and practically significant.

THANK YOU

evaluation methods

Education

program participants

program problem

impact evaluation methods1

impact evaluation methods2

sample of program applicants

learning reduced test

treatment group

programaverage score