programme evaluation for policy analysis

PEPA is based at the IFS and CEMMAP

© Institute for Fiscal Studies

Programme Evaluation for Policy AnalysisMike Brewer, 4 October 2011www.pepa.ac.uk

Outline• Who we are

• Overview and aims

• The 5 projects

• Training and capacity building


Who we are: PI and co-Is• Richard Blundell, UCL & IFS• Mike Brewer, University of Essex & IFS• Andrew Chesher, UCL & IFS• Monica Costa Dias, IFS• Thomas Crossley, Cambridge & IFS• Lorraine Dearden, IoE & IFS• Hamish Low, Cambridge & IFS• Costas Meghir, Yale & IFS• Imran Rasul, UCL & IFS• Adam Rosen, UCL• Barbara Sianesi, IFS• DWP is a “partner”


Programme Evaluation for Policy Analysis: overview• PEPA is about ways to do, and ways to get the most out

of, “programme evaluation”


“estimating the

casual impact of”“government

policies” (although

can often

generalise)

Programme Evaluation for Policy Analysis: overview• PEPA is about ways to do, and ways to get the most out

of, “programme evaluation”• Aims

– To stimulate a step change in the conduct of programme evaluation in the United Kingdom (and around the world)

– To maximise the value of programme evaluation by improving the design of evaluations, and improving the way that such evaluations add to the knowledge base

• Beneficiaries– those who do programme evaluation– those who commission, design and make decisions based

on the results of evaluations– those interested in impact of labour market, education and

health policies


More on our aims: three challenges for programme evaluation

1. We know the outcomes for participants on a training programme. But what was the counterfactual?

2. Given the counter-factual, we can estimate the programme’s impact. But how certain are we?

3. Given that the evaluation has been done, how can we get the most value from it?– How can we generalize what we learn from this evaluation

to other training programs?– How should we synthesize the lessons learned from

multiple studies of different training programs?


PEPA: overview

PEPA

1. Are RCTs worth it?

Barbara Sianesi, Jeremy Lise

2. Inference

Thomas Crossley, Mike Brewer, Marcos Hernandez, John Ham

3. Control functions and evidence synthesis

Richard Blundell, Adam Rosen, Monica Costa Dias, Andrew

Chesher

4. Structural dynamic models

Hamish Low, Monica Costa Dias, Costas Meghir

5. Social networks

Imran Rasul, Marcos Hernandez

0. Core programme evaluation skills



1. Making the most of RCTs: reassessing ERA(Sianesi & Lise)• The Employment, Retention and Advancement

demonstration (2003-2007)– first large-scale RCT in social policy in UK (over 16,000

people)– has been evaluated experimentally (Hendra et al.,

2011)• Aim: maximise the value of the ERA experiment

– Improve the design of non-experimental evaluations – Improve way such evaluations add to the knowledge

base• “Gold standard” randomisation is still rare

– costly, impractical or politically infeasible → Project 1A– lack of external validity and ex ante analysis → Project

1B


1a. Lessons for non-experimental methods(Sianesi)• Non-experimental evaluation methods have been

assessed against an experimental benchmark in a small number of US studies in the 1970s and 1980s

• Exploit a recent and UK-based random experiment to learn about – and possibly improve upon – the performance of non-experimental methods routinely used in UK evaluations– pilot-control areas – individual matching– difference-in-differences

• The experimental estimates will be compared against the best alternative that can be devised with the available data

1b. A reassessment of the ERA(Lise)

• Can experimental data be combined with behavioural models of labour market behaviour to lead to better ex ante evaluations?

• Methodology– take a typical search and matching model, and calibrate it

to match the data on ERA comparison group– simulate ERA policy within the model– check if simulated outcomes match observed data for ERA

participants• Experimental variation allows testing of theoretical

model• If simulated outcomes match ERA participants’

outcomes, then:– can use simulations to evaluate ex ante alternative ERA

policies– can see how estimate of policy impact changes once

interactions with wider labour market are taken into account


2. Improving inference for policy evaluation(Crossley, Brewer, Hernandez, Ham)• Critical to characterise uncertainty of estimates (and

thus perform inference correctly)• This can be hard when

– data have a multi-level structure, and where there is serial correlation in the treatment and in group-level shocks

– when the estimated policy impacts are complex and discontinuous functions of estimated parameters

• Similarly, can be hard to perform power calculations in all but simplest RCT

• Aims– Review, disseminate and (hopefully) develop techniques– Provide resources– Substantive applications: impact of labour market or

welfare-to-work programmes © Institute for Fiscal Studies

2a. Inference and power in Diff-in-Diff (Crossley, Brewer, Hernandez) • A common evaluation technique is to use diff-in-diff over

areas and time• Serially-correlated errors and group-level structure of

data mean naïve inference often incorrect (standard errors “too small”; Bertrand et al. 2004)– But most solutions work only for “large” number of groups,

and literature evolving much faster than practice• Aims

– Demonstrate the problems for inference caused by serially-correlated and multi-level data, and the practicality and relevance of a range of suggested solutions, providing resources where appropriate

– Develop new tools for inference • randomisation/permutation tests• serial correlation in the non-linear DiD


2a. Inference and power in Diff-in-Diff (Crossley, Brewer, Hernandez)• Flip side to inference is a power calculation• Will produce resources to carry out power calculations

for non-experimental designs.– difference-in-differences– instrumental variables– regression discontinuity

• Power calculations will reflect:– Cluster effects: observations from different agents are not

independent from each other– Monte Carlo methods to deal with a reduced number of

clusters– Different patterns of time-series correlation


2b. Inference in duration analysis(Brewer, Ham)• Duration/survivor or transition models are natural tools

for programme evaluation when outcomes of interest are spells or transitions– Estimated policy impacts often complex, discontinuous

functions of the estimated parameters of a statistical model• Will establish how best to use event history models to

provide policy-makers with– estimates of the impact of a policy on the hazard rate– expected time spent in various states– correct confidence intervals around these both

• Will build on Eberwein, Ham and Lalonde (2002), Ham and Woutersen (2009) and Ham, Li and Sheppard (2010)


3. Control functions in policy evaluation(Blundell, Costa Dias, Rosen, Chesher, Kitagawa)• Choice among alternative evaluation methods is driven

by three concerns– Question to be answered– Type and quality of data available– Assignment rule (the mechanism that allocates

individuals to the programme)• This project focuses on the last• Idea

– The ideal assignment rule comes from an RCT– But if we know something about the assignment rule, then

the control function approach allows us to account for/correct for the endogenous selection into treatment

3. The control function approach: example• Interested in the impact of university education on

subsequent labour market earnings (the “returns to university education”)

• Unobservable determinants of earnings, e.g. underlying ability, will be correlated with the decision to attend university, so a simple regression will provide a biased view of the returns to university

• By modelling key features of the decision to attend university – the “assignment rule” to university – the control function approach can correctly recover the average return to university among those who took up a place

3. The control function approach: example (continued)• These key features will ideally be factors that determine

assignment to university but do not determine directly final earnings in the labour market– Family socio-economic background, level of university fees,

distance to university, availability of university places (if rationed)

• If can write down an equation modelling the way these factors determine university attendance, we can construct an index (or ‘control function’) that can then be included in the earnings regression along with the indicator for attending university. – Extension of the ‘Heckman’ selection approach that

controls for the endogenous selection into treatment

3. The control function approach: our research• Research questions:

– Under what circumstances does the use of a control function compare favourably to matching and instrumental variables? What are the key trade-offs?

– How does a control function approach map into a behavioural model? What can a control function approach tell us about structural parameters of interest?

– Can we weaken the control function approach by incorporating partial knowledge of the assignment rule to produce bounds?

– Will study various education and labour market policies

4. Dynamic behavioural models for policy evaluation (Low, Dias, Shaw, Meghir, Pistaferri)

• Classical ex post empirical evaluation methods often fail to explain the nature of the estimated effect– Cannot disentangle impact of programme on

incentives from how incentives affect individual decisions

– Cannot account for dynamic responses (anticipation or changes now affect decisions in future)

– Studies often rely on different sets of behavioural assumptions

• Difficult to understand, as not explicitly stated• Complicates task of synthetising information from different

studies– Cannot be used for counterfactual analysis

• Results are specific to the policy, time and environment

4. Dynamic behavioural models for policy evaluation

• Aim: to address these weaknesses using a structural (dynamic behavioural) approach– Explicitly formalises incentives and decisions– But relies on heavy set of (explicit) behavioural

assumptions• Will study ways to make minimal and

transparent assumptions– Use quasi-experimental data to estimate and validate

models of behaviour– Explore the use of optimality conditions -

independent of the full structure of the model - to estimate some parameters

– Use robust estimates of bounds on treatment effects to bound structural parameters


4. Dynamic behavioural models for policy evaluation: applications

• Impact of welfare time-limits– Develop dynamic model to study how time-limits in

welfare eligibility may affect claiming decisions at different stages of life

– Use the US programme, “Targeted Help to Needy Families”, as the empirical application

– Our model will replicate, and then generalise, previous empirical results

• Impact of welfare-to-work on education– Use structural behavioural model of education and

labour supply choices to evaluate how future welfare-to-work programmes affects the ex ante value of education

– Use evaluation studies to validate the behavioural assumptions

– Use partial identification to provide bounds for structural parameters© Institute for Fiscal Studies

5. Social networks and program evaluation(Rasul, Fitzsimons, Hernandez, Malde)

• To understand individuals’ or households’s behaviour, must recognize that individuals are embedded within social networks

• In developing countries, networks play various roles:– substitute for missing markets– key source of insurance and other resources to their

members• Will seek to understand how networks interplay

with policy interventions• Will combine developments in theories of

network formation and behavior within networks with empirical methods for program evaluation with social interactions© Institute for Fiscal Studies

5. Social networks and program evaluation: example of Progresa

• Progresa is village-level intervention in rural Mexico. Previous research has shown that:– 1 in 5 households are “isolated” (none of their

extended family resides within the same village)– On some margins, only non-isolated households

responded to Progresa

• Was it because poor families needed assistance and encouragement to join the programme?

• Or was it because of nature of Progresa intervention, part of which was to encourage teenage girls to stay in school?

5. Social networks and program evaluation

• Substantive research questions– How are the benefits of program interventions dissipated

within communities once social networks are accounted for?

– How do such spillovers (from beneficiary to non-beneficiary households) affect the cost-benefit analysis of programs, and how we think about targeting?

– Why and how are social networks formed (can investigate this by studying particular interventions)

• Methodological research questions– How best to measuring whether and how households are

socially tied (blood ties , resource flows)?

PEPA: research questions

PEPA


Can non-experimental methods replicate the

results of RCTs?

How can we combine results from RCTs with models of labour market behaviour?

How do GE effects alter estimated impact of

training programmes?

2. Inference

Correct inference and power calculations where data have multi-level structure & serially-correlated

shocks?

Correct inference when policy impacts are complex functions

of estimated parameters?

Impact of time-limited in-work benefits on job

retention?


Can we weaken control function approach to

estimate bounds?

Link between control function and structural or

behavioural model s?

How are lessons from multiple evaluations best

synthesised


How best to use ex post evaluations in ex ante

analysis?

How are education decisions affected by welfare-to-work

programmes?

How do life-cycle time limits on welfare receipt

affect behaviour?

5. Social networks

How best to collect data on social networks?

How is impact of policy affected by the social networks within and

between treated and control groups?

Can social networks explain heterogeneity in impact of a

health intervention?


Training and capacity building• Mixture of courses, masterclasses, workshops and

resources (how-to manuals, software)• All projects have their own TCB programme• Plus core TCB offering in general programme evaluation

skills– 4 “standard” courses/year and 1 “advanced” course/year– 1 course/year for those designing or commissioning

evaluations


PEPA: training and capacity building

PEPA

0. Core programme evaluation skills

Core course in evaluation methods

Courses in programme evaluation for designers and users of evaluations

How-to guide for PS matching


Course on estimating “search

models”

Workshop on using “search models”

Workshop on value of RCTs

2. Inference

Course, manual and software tools on

power calculations

Workshop, course, methods survey, manual and

software tools on correct inference in evaluations

Workshop and courses on using survivor models for

policy evaluation


Course and workshop n control functions in

policy evaluation

Workshop on evidence synthesis

Course on bounds in policy evaluation


Courses, manual and software tools on building

dynamic behavioural models

Workshop on dynamic behavioural models

and policy evaluation

5. Social networks

Methods survey, course and workshop on

collecting and using data on social networks


PEPA management and administration team• Director

– Now until October 2012: Mike Brewer– April 2012 thereafter: Lorraine Dearden

• Co-director: Monica Costa Dias• Administrator: Kylie Groves• IT: Andrew Reynolds

• DWP are partner organisation, with hope that this eases access to their data. In practice, very reliant on key contact (Mike Daly)


programme evaluation for policy analysis

Documents

programme evaluation

value of programme evaluation

training programme

overview institute

partner institute

presentation institute

programme evaluationthose

programme evaluationwe