european week of regions and cities open days october 11, 2011 capturing the impact of eu funding...

EUROPEAN WEEK OF REGIONS AND CITIES OPEN DAYS

October 11, 2011

CAPTURING THE IMPACT OF EU FUNDING

The potential and limits of experimental methods

Robert Picciotto, European Evaluation Society

“The state of the empirical evidence on the performance of cohesion policy is very

unsatisfactory” Fabrizio Barca

1

Citizens want to know whether EU programs actually work

• The great global economic contraction currently underway has created an extraordinarily tight fiscal environment

• EU cohesion funding (Euros 350 billion for 2007-13) will only be sustained if it generates verifiable results.

• Limited progress over the past 20 years towards understanding “what works” and “why”

“2

The new authorizing environment has raised the bar for evaluation quality

Evaluation quality implies trustworthiness through compelling narratives, logical reasoning and sound methods. According to Guba and Lincoln this calls for evaluation credibility (or accuracy); dependability (reliability); transferability (external validity) and conformability (verifiability)

3

The role of impact evaluation EU cohesion programs will have to be designed and

implemented so that they are evaluable. Towards intended policy goals they should:• Include relevant dimensions of expected program costs

and benefits captured through relevant indicators; • Measure changes in these indicators with acceptable

accuracy throughout the monitoring and evaluation cycle; and

• Ascertain the extent to which observed changes can legitimately be ascribed to the intervention – the distinctive role of impact evaluation.

4

“Impact” means different things to different people

First definition: impact is the difference between results with (Y1) and without the intervention (Y2), i.e. I = Y2 – Y1 measured at any time in the project cycle.

This definition focuses on attribution, i.e. on assigning causality to effects observed by comparing them to the counterfactual

Second definition: “the positive and negative, primary and secondary long-term effects produced by a development intervention, directly or indirectly, intended or unintended”.

This definition distinguishes between medium term outcomes and long term impacts

5

Different definitions imply different evaluation questions

• The first definition asks whether an intervention works

• The second definition is concerned with whether an intervention worked over time- why and who was responsible

• The first definition focuses on attribution • The second definition implies focused

attention on accountability and sustainability

6

RCTs are specifically designed to ascertain attribution

• Before- after comparisons are the traditional way of assessing the impact of public interventions and they are appropriate where no other rival explanation for the changes exists

• But where influences other than the intervention could have affected the results the difference between the ex-ante and ex-post situation is not a reliable measure of the effects of an intervention

• Overcoming this indeterminacy is the basic rationale of with-without methods that compare observed outcomes with those of a counterfactual (what would have happened without the intervention)

7

Hence, RCTs are an integral part of the evaluation tool kit

• RCTs establish causality by comparing observed treatment effects with the counterfactual

• Random assignment ensures that the impact of the intervention is reliably ascertained since all the other contributing factors are identical except for stochastic errors

• RCTs eliminate the selection bias that arises from the programmatic choice of intervention groups or from participants’ self selection

• RCTs enjoy the additional advantage of allowing evaluators to establish a measure of statistical significance of likely program impacts

8

Experimental evaluations aim to replicate medical research successes

“Creating a culture in which rigorous randomised evaluations are promoted, encouraged and financed has the potential to revolutionize social policy during the 21st century just as randomised trials revolutionized medicine during the 20th” Esther Duflo

While the European Evaluation Society acknowledges that randomized control trials represent a useful approach in certain circumstances its 2007 policy statement rejects the notion that experimental methods are a gold standard

(http://www.europeanevaluation.org/images/file/Member_section/EES%20Statement%20on%20Methodological%20Diversity/EES%20Statement.pdf)

9

http://www.europeanevaluation.org/images/file/Member_section/EES%20Statement%20on%20Methodological%20Diversity/EES%20Statement.pdf

http://www.europeanevaluation.org/images/file/Member_section/EES%20Statement%20on%20Methodological%20Diversity/EES%20Statement.pdf

RCTs are rarely feasible or relevant...• RCTs are not fit for complex, changing or adaptable

programs• RCTs are not feasible when no untreated target

group can be identified• RCTs “black boxes” do not distinguish between

design and implementation issues• RCTs cost a lot, require large studies and call on

exceedingly scarce skills • RCTs encourage simplistic interventions• Ethical standards may prevent experimentation

10

... and it is very hard to achieve rigour in RCTs

Credibility (accuracy)• “Indicators of interest” may leave out unintended, indirect

and secondary effects• Securing adequate group sizes can be problematicDependability (reliability) • For programs with modest effects (high noise levels) RCTs can

lead to the erroneous conclusion that “nothing works” Transferability (external validity) • RCT conclusions cannot be generalised since the groups

affected and the circumstances in which an intervention takes place vary

Conformability (verifiability). • Observed behaviour may be affected by the experiment itself

11

Alternatives to RCTs’ estimations of the counterfactual do exist

• Regression and factor analysis• Quasi-experimental designs • Multivariate statistical modelling • Qualitative approaches• Surveys and sampling• General elimination methodology• Expert panels • Benchmarking

12

Tools are just tools• Since doctrinal differences cannot be

reconciled it is best to focus on real world solutions to real world problems

• Understanding the limits of tools is key to evaluation quality

• Making them explicit is an ethical imperative• All guidelines give equal weight and credence

to qualitative and quantitative approaches. • Overinvestment in a particular technique is a

threat to quality 13

Conclusion: RCTs have an important but limited role in evaluation

Where adequate resources and skills are available and ethical dilemmas can be resolved, rigorously designed and independently implemented RCTs are the best way to assess attribution but only for relatively simple interventions the effects of which are realised in a short period of time and are large relative to other potential influences

14

“The only gold standard is appropriateness” (Patton)

1. In most real world situations mixed methods are the way to ascertain what works and doesn’t work; why interventions succeed or fail; whether design or implementation problems need to be addressed and who among partners is responsible for particular outcomes

2. An explicit focus on the political and organizational constraints is equally critical

3. Methodological fundamentalism has no place in the evaluation community: a tolerant and pragmatic approach ought to prevail

15

THANK YOU FOR YOUR ATTENTION! ReferencesMichael Bamberger, Vijayendra Rao, Michael Woolcock, Using Mixed Methods in

Monitoring and Evaluation: Experiences from International Development, Brooks World Poverty Institute Working Paper 107, University of Manchester

Ernest R. House, 2004, Democracy and Evaluation, Paper presented to the European Evaluation Society, Berlin, October, IMA publications (http://www.informat.org/publications/ernest-r-house.html)

Ernest R. House, 2008, Blowback: Consequences of evaluation for evaluation, in American Journal for Evaluation, 29:416, December.

John P.A. Ioannidis, 2005a, Why most published research findings are false, PLoS Medicine 2 (8)

John P.A. Ioannidis, 2005b, Contradicted and initially stronger effects in highly cited clinical research, Journal of American Medical Association, 294 (2)

F. Leuuw and J. Vaessen, 2009, Impact evaluations and development: NONIE guidance on impact evaluation, World Bank, Washington D.C.

H. White, 2009a, Some Reflections on Current Debates in Impact Evaluation, International Initiative for Impact Evaluation, Working Paper 1, New Delhi

H. White, 2009b, Theory based Impact Evaluation: Principles and Practice, International Initiative for Impact Evaluation, Working Paper 3, New Delhi

16

european week of regions and cities open days october 11, 2011 capturing the impact of eu funding...

Documents