how to build a better education review

How to Build a Better Systematic Review in Education

David K. Evans (presenting co-authored work with Anna Popova)World Bank

April 21, 2016 Building Evidence in Education (BE2) – Washington DC, USA

Divergent Findings in Systematic Reviews + A Few Proposals

2

MotivationRecent years have seen an explosion in evidence on learningSix reviews over last two years on the same topic: How to improve learning outcomes for children in low and middle income countriesAlthough these are not exactly the same, they have a common goal but often reach different conclusions

1980 1985 1990 1995 2000 2005 2010 20150

50

100

150

200

250

Cumulative learning studies227 total studies

32 total studies

3

The reviews & their recommendations to improve learning(2013-2014)

Conn

• Pedagogical interventions

• Student incentives

Glewwe et al.

• Desks, tables, & chairs• Teacher subject

knowledge• Teacher presence

Kremer et al.• Pedagogical

interventions to match teaching to student learning

• Accountability• Incentives

Krishnaratne et al.

• Materials

McEwan

• Computers or instructional technology

• Teacher training• Smaller classes or

ability grouping

Murnane & Ganimian• Provide info about

school quality & returns to schooling

• Teacher incentives (in low performance settings)

• Specific guidance to low-skilled teachers

4

Since we did our analysis, several more have come out!(2015 alone)

Asim et al. (South Asia only)

• Teachers & schools (not households & communities)

• Provide resources• Provide incentives

Masino & Niño-Zarazúa

• Combine 2 or 3 of• Material and human resources• Behaviors and intertemporal choices

of teachers and students• Participatory and community

management reformsGlewwe & Muralidharan

• Pedagogy• School governance• Teacher accountability

Snilstveit et al. (900 pages!)

• Structured pedagogy

5

Not all reviews are systematic…. What is a systematic review anyway?

Ask the Campbell Collaboration:

“A systematic review must have:

1. Clear inclusion / exclusion criteria

2. An explicit search strategy

3. Systematic coding and analysis of included studies

4. Meta-analysis (where possible)”

What is meta-analysis anyway? Meta-analysis actually combines the estimates from different studies to see their size and significance when taken together.

6

What types of review are there?Meta-analysis

Vote counting

Narrative

7

What types of review are there?Meta-analysis

Vote counting

Narrative

Pros • Increase statistical power

• Objective weighting of the evidence

• Includes all quantitative studies

• Discussion of mechanisms

• Can include every study

Cons • Excludes good studies without particular data reported

• Average out bimodal outcomes

• Can tend to over-aggregation

• Ignores sample size & effect size

• Misleading if studies are underpowered

• Subjective weighting of the evidence

• Not transparent

ConnMcEwanKrishnaratne et al.

Glewwe et al. Kremer, Brannen, & GlennersterMurnane & Ganimian

8

Characteristics of a systematic review (Campbell collaboration)

Clear inclusion/ exclusion criteria

Explicit search strategy

Systematic coding and analysis of included studies

Meta-analysis (where possible)

0% 20% 40% 60% 80% 100%

Full Partial None

Proportion of reviews

9

What drives different conclusions?

Why do 6 reviews ostensibly covering largely the same literature find different conclusions?

10

Differing compositions

1 2 3 4 5 60

20

40

60

80

100

120

140

160

180159

3223

6 4 3

Number of reviews in which a study is included

Num

ber

of in

divi

dual

edu

cati

on s

tud-

ies

Out of 227 total studies with learning outcomes, how many are included in most systematic reviews about improving learning?

11


1 2 3 4 5 60

20

40

60

80

100

120

140

160

180159

3223

6 4 3


Num

ber

of in

divi

dual

edu

cati

on s

tud-

ies

Only 3/227 studies are included in all 6 reviews


12


1 2 3 4 5 60

20

40

60

80

100

120

140

160

180159

3223

6 4 3


Num

ber

of in

divi

dual

edu

cati

on s

tud-

ies

Only 13 studies are in-cluded in the majority of

reviews


13


1 2 3 4 5 60

20

40

60

80

100

120

140

160

180159

3223

6 4 3


Num

ber

of in

divi

dual

edu

cati

on s

tud-

ies

159 of the studies are included in only 1 of the 6 reviews


14

Differing categorizations

Study

Categorization across the 6 systematic reviewsTotal

citations

Conn 2014 Glewwe et al. 2014

Kremer et al. 2013

Krishnaratne et al. 2013

McEwan 2014 Murnane &

Ganimian 2014

Kremer et al. (2009)

Student incentives

Merit-based scholarships

Merit scholarships School fees Performance

incentivesCash

transfers 6

Banerjee et al. (2007) -

Computers & electronic

games

Reducing class size/Computer-

assisted learning/Contra

ct teachers

Materials

Instructional materials/

Computers or technology/

Teacher training/Class size or composition/Contract or volunteer teachers

Computer-assisted learning

5

How are the most frequently cited papers categorized in different reviews?

15

So, what drives the different conclusions?How many of the studies in one review’s recommended category of intervention are included in other reviews?

Percentage of driving studies included in other reviews

RecommendationConn 2014

Glewwe et al. 2014

Kremer et al. 2013

Krishnaratne et al. 2013

McEwan 2014

Murnane & Ganimian

2014 Conn 2014 - Pedagogical interventions -- 6% 0% 6% 6% 18%Kremer et al. 2013 - Matching teaching to students’ learning 50% 50% -- 50% 100% 50%Krishnaratne et al. 2013 - Materials provision 17% 67% 50% -- 100% 67%McEwan 2014 - Computers or instructional technology 0% 30% 30% 40% -- 70%Murnane & Ganimian 2014 - Information provision 11% 0% 11% 33% 33% --

16

Variation within intervention categories

17

How can we conduct better individual systematic reviews?

1. Conduct an exhaustive search, and maybe replicate it

• About 50 studies should be in 5-6 reviews, but only 8 are

2. Combine methods

• Overcome the fact that meta-analysis excludes too many but narratives aren’t systematic enough

3. Maintain low aggregation of intervention categories so that the categories can actually be useful

• Variation within intervention type means “computers work” is less useful than “computers software that adapts to student learning levels works”

18

What works?Conn

(2014)Glewwe et al. (2014)

Kremer, Brannen, & Glennerster

(2013)

Krishnaratne, White, &

Carpenter (2013)

McEwan (2014)

Murnane & Ganimian

(2014)

Tally

Pedagogical interventions that match teaching to students’ learning

✓ ✓ ✓ ✓ 4

Individualized teacher training

✓ ✓ ✓ ✓ 4

Teacher incentives ✓ ✓ ✓ 3Materials ✓ ✓ ✓ 3Student incentives ✓ ✓ 2Accountability ✓ 1Contract or volunteer teachers

✓ 1

Providing information about school quality and returns to schooling

✓ 1

Smaller classes or ability grouping

✓ 1

Teacher presence ✓ 1

19

What works: (1) Pedagogical interventions that match teaching to individual student learning levels

- Assign students to separate classes based on initial ability so that teachers can focus instruction at the level of learning of individual students in Kenya (Duflo, Dupas & Kremer 2011)

- Use math software to help students learn at their own pace (Banerjee et al. 2007)

• But just giving out laptops or desktop computers won’t guarantee the gains

20

What works: (2) Individualized, repeated teacher training, associated with a specific method or task

- Train teachers and provide them with regular mentoring to implement early grade reading instruction in local language in Kenya (Lucas et al.

2014)

- Help teachers learn to use storybooks and flash cards in India (He et al. 2009)

• As opposed to a similar (not identical) program introduced without teacher preparation (He et al. 2008)

21

And this is already out of date!The 2015 studies

• Teacher performance pay in Pakistan

• Public-private partnerships in Uganda

• Cash transfers in Honduras

• Malaria control + literacy instruction in Kenya

• Literacy innovations in Kenya (RTI)

• Teacher training + text messages in Kenya

• Teacher training for literacy in Uganda

• Teacher performance pay in China

22

Systematic reporting of results to permit aggregation

𝐸𝑓𝑓𝑒𝑐𝑡=𝑌 𝑇−𝑌 𝐶

𝑠𝑝𝑜𝑜𝑙𝑒𝑑

𝑆𝑡𝑎𝑛𝑑𝑎𝑟𝑑 𝑒𝑟𝑟𝑜𝑟=√𝑛𝑇+𝑛𝐶

𝑛𝑇𝑛𝐶+𝐸𝑓𝑓𝑒𝑐𝑡 22(𝑛𝑇+𝑛𝐶)

Statistics wanted!

• Sample standard deviation across treatment and control

• Student sample size in treatment group

• Student sample size in control group

Sources: McEwan 2015; Borenstein 2009

23

Systematic reporting on programs

Each intervention is unique.

For 24 teacher training evaluations, we sought evidence on 43 potential indicators.

0% 20% 40% 60% 80% 100%

Reported indicators

We have developed an instrument to improve systematic reporting on interventions with teacher training.

Needed for other interventions.

24

Database of results(Do we really need another database?)

Don’t we already have lots of databases?

What would make it worthwhile?• Real-time updating• Systematic reporting of standardized effect

sizes• Systematic reporting of implementation

details

Database # of studies

IE2 – Impact Evaluations in Education

288

3ie Impact Evaluation database (Education)

773

AEA RCT Registry 55*Evans-Popova collection 322Others…

* Completed only, both developing & rich countries

What would it allow?• Just-in-time queries• Auto-updating meta-analysis• What do the 10 most effective

pedagogical interventions look like? • What do the 50 most effective programs

overall look like?

25

Conclusions

Systematic reviews in education could be much better They need more exhaustive searches, combined methodologies, and de-aggregation A coordinating body could help with systematic cataloguing

26

Sources

This presentation is based largely on

Evans, David, and Anna Popova. 2015. “What really works to improve learning in developing countries? An analysis of divergent findings in systematic reviews,” World Bank Policy Research Working Paper 7203. (link)

It also draws on work from the ongoing project:

Popova, Anna, David Evans, and Violeta Arancibia. 2016. “Inside In-Service Teacher Training: What Works and How Do We Measure It?” Work in progress. World Bank.

The database of learning studies on which the analysis is based is available here.

http://documents.worldbank.org/curated/en/2015/02/24060240/really-works-improve-learning-developing-countries-analysis-divergent-findings-systematic-reviews

https://sites.google.com/site/davidkevans/database-of-education-studies

27

Review referencesConn, K. (2014). “Identifying Effective Education Interventions in Sub-Saharan Africa: A meta-analysis of rigorous impact evaluations.” Unpublished manuscript, Columbia University, New York, NY.

Glewwe, P. W., Hanushek, E. A., Humpage, S. D., & Ravina, R. (2014). “School resources and educational outcomes in developing countries: a review of the literature from 1990 to 2010.” in Education Policy in Developing Countries, ed. Glewwe, P. University of Chicago Press: Chicago and London.

Kremer, M., Brannen, C., & Glennerster, R. (2013). “The challenge of education and learning in the developing world.” Science, 340(6130), 297-300.

Krishnaratne, S., White, H., & Carpenter, E. (2013). “Quality education for all children? What works in education in developing countries.” 3ie Working Paper 20, International Initiative for Impact Evaluation.

McEwan, P. (2014). “Improving Learning in Primary Schools of Developing Countries: A Meta-Analysis of Randomized Experiments.” Review of

Educational Research.

Murnane, R. J., & Ganimian, A.J. (2014). “Improving Educational Outcomes in Developing Countries: Lessons from Rigorous Evaluations.” Unpublished manuscript.

https://test.equaleducation.org.za/attachment/download/2013-07-31-Glewwe-Hanushek-Humpage-Ravina-2011-NBER-w17554-1.pdf







http://www.schoolsandhealth.org/Shared%20Documents/The%20Challenge%20of%20Education%20and%20Learning%20in%20the%20Developing%20World.pdf




http://www.3ieimpact.org/media/filer_public/2013/09/10/wp_20.pdf

http://www.3ieimpact.org/media/filer_public/2013/09/10/wp_20.pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.404.9086&rep=rep1&type=pdf

http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.404.9086&rep=rep1&type=pdf

http://www.nber.org/papers/w20284