evaluating global health margaret mcconnell (asst. prof. of global health economics) september 29,...

48
Evaluating Global Health Margaret McConnell (Asst. Prof. of Global Health Economics) September 29, 2014

Upload: homer-day

Post on 01-Jan-2016

218 views

Category:

Documents


2 download

TRANSCRIPT

Evaluating Global Health

Margaret McConnell

(Asst. Prof. of Global Health Economics)

September 29, 2014

Program Evaluation

Essential Question: How effective is this program?

Who wants to know: NGOs, governments, donors, media, program recipients

Program evaluations (in particular randomized trials) can be costly, complex and time consuming -- why bother?

Motivating Example

“Throwing the Baby out with the Drinking Water: Unintended Consequences of Arsenic Mitigation

Efforts in Bangladesh”

Field, Glennester and Hussam (2011)

Background

In 1998 – Geological tests reveal >20 m Bangladeshis drinking water contaminated with arsenic

Some evidence that arsenic linked to negative health outcomes

Wells had been built with international support and heavily promoted

“Obvious” solution to arsenic contamination

Replace backyard shallow well with remote deep wells

Massive public health campaign

Hugely successful – considered a public health triumph!

Estimated 23% of individuals changed their water source

What does program evaluation say

Households who were living near contaminated water sources have 27% higher infant and child mortality rates after successful public health campaign

Why?

Increased use of secondary water sources

Contamination of stored water

Large increase in diarrheal deaths

Lessons

Unintended Consequences

Biological pathways are not deterministic – human behavior plays an important roll

Many examples of evaluations which show no health effects that no one wanted to fund (assumed health effects were obvious)

What kinds of questions can evaluations answer?

• Was the program effective?• Why was the program effective?• Which components made the program effective?• Were there unintended consequences?• Who benefited most? Who was hurt most?• How cost effective was the program?• How does the program compare to other

programs with the same objectives?

Evaluation Steps

• Needs assessment• Program theory assessment• Goals, outcomes and measurement• Process evaluation• Impact evaluation• Cost-benefit + cost-effectiveness assessment

Needs AssessmentNeeds: What problem is the program trying to solve?

Example: With deep water wells, problem was that local wells exposed households to arsenic

Program Theory• Program theory: What mechanism will the

program use to solve the problem?• Theory of Change– Should include both clinical and behavioral

understanding– Ideally you should know these before undertaking

an evaluation– Will inform goals, outcomes and measurement

Theory of Change

Ex: Arsenic

Deep water wells built households use deep water wells exclusively for water households not exposed to arsenic

What’s missing?- Understanding of household behavior

Process evaluation

• Are services and goals in line?• Are services delivered as intended?• How well is service delivery organized?• How well is the program managed?• Monitoring and data collection is essential– If you do an evaluation without understanding processes

– you won’t learn much – Program can be highly effective on process but have no

positive health impacts: how?• Ex: has the wrong theory of change

Impact Evaluation

• What impact does the program have on – Behaviors– Outcomes of interest (preferably health outcome)– Example

• Not enough to measure whether a public health campaign convinced people to change drinking water sources

• Impact evaluation would also measure the effect of this change on child deaths

• Optimal learning will have broad consideration of outcomes

Evaluation Process

Impact Evaluation

• Impact evaluations are intended to identify causal relationships

What is Causality?

Cause (kôz): n. 1. a. The producer of an effect, result or consequence. b. The one, such as a person, event, or a condition, that is responsible for an action or a result.

v. 1. To be the cause of or reason for; result in. 2. To bring about or compel by authority or force.

Isolating the Causal ChannelIf we change X, what is the effect on Y?

X Z Y

Z

X Y

Isolating the Causal Channel

Correlation between X and Y, includes:

– X (flu vaccine) may cause Y (lower risk of mortality)

– Z (income) is a confounder

Role of impact evaluation is to isolate this causal channel

Is X causing health improvements?

Influenza VaccinationsGood topic for discussion of causality and impact

evaluation. Why? : 1) No definitive evidence of impact of flu vaccine on mortality

2) Despite this, widespread belief that flu vaccine is effective

3) Many confounders complicating interpretation of causality

4) Lots of $ is spent vaccinating people against seasonal flu

Many issues like this in global health. Can we be sure that this is the best use of funds?

Is there a higher-impact intervention to prevent mortality from flu?

Are there higher-impact interventions to prevent mortality period?

Causality and Confounders

• Does vaccination reduce mortality? (By how much?)– That is, want to know the causal effect of X on Y

• Sometimes can control for the confounder in the regression– Problem is when confounders are unknown or unmeasurable

• Evidence suggests omitted variables are important

– Various factors (economic, psycho-social, etc.) are likely to affect selection into program that also influence outcomes

– Also called “selection effects”

In most cases, throwing in a bunch of controls doesn’t solve the problem

Selection Problems

• Why can’t you compare those who choose to vaccinate and those who don’t?

• Individuals who choose to vaccinate could be different for other reasons– Greater access to health care – May be sicker – May be more cautious (unobservable)

Counterfactuals

• How can we isolate the true causal effect of an intervention?

• Ideally want to take the same person, under the same circumstances, and observe what happens if we give him the intervention and if we don’t

– What would have happened to him in the absence of the intervention?

– This counterfactual is not observable

• The goal of impact evaluation is to construct or mimic the counterfactual as closely as possible

Constructing the Counterfactual

• Usually we construct counterfactual by comparing group of people who received program to a group who did not

• How is that group selected?

The control group can be selected:

• Randomly: Use random assignment of the program to create a control group that mimics the counterfactual

• Non-randomly: Argue that a certain excluded group mimics the counterfactual– Challenge: they could look like the program group on

observables (ex: income) but not unobservables (ex: patience, tendency toward caution)

Advantages to Randomization

• Key advantage to random assignment: removes selection bias

• If assignment is truly random, then:

– Those who get the program and those who don’t will not differ systematically by observable or unobservable characteristics (i.e. they are statistically identical)

Implies any differences in outcomes between treatment and control must have been caused by the program

In our vaccine example, randomization ensures that

Cov(X,Z) = 0, which means bias = 0

ie: program participants don’t systematically differ when it comes to income (or anything else)

Types of Evaluations

Types of Impact EvaluationMethods of impact evaluation that do not use random

assignment to construct counterfactual:1. Pre-post

2. Difference-in-differences

3. Statistical Matching

4. Instrumental Variables

5. Regression Discontinuity

6. Interrupted Time Series

Diff in Diff Evaluation Example Wells in Bangladesh

• How to evaluate the impact of a public health campaign encouraging individuals not to use contaminated wells?

• Program already occurred – can’t be randomized.

• Make use of natural variation– Arsenic concentration taken to be quasi-random

– Show it is uncorrelated with other measures of land quality

• Compare changes in health outcomes across contaminated and uncontaminated– First difference: time – before & after

– Second difference: contaminated vs uncontaminated

Diff in Diff Evaluation Example Wells in Bangladesh

• Caveat 1: arsenic could be correlated with other characteristics of land or wealth

– Robustness check: Were there changes in outcomes that varied with arsenic level above the “contamination threshold”?

– If our instrument is good there should not be

• Caveat 2: individuals who knew they were contaminated could have moved away – introduces selection

• Caveat 3: this kind of natural source of variation not always available

Why Use Randomization?

Each method of impact evaluation can be compared along two key dimensions:

• Internal validity: ability to draw causal inference (i.e. impact estimates can be attributed to the program and nothing else)

• External validity: relates to the ability to generalize impact estimates to other settings of interest

When it comes to internal validity, randomization is the gold standard:

– Other methods can reduce but not eliminate bias

Types of RandomizationSimplest version of random assignment: take program

applicants & randomly assign who gets offered the program

Other options:• Random assignment to units other than individuals

– Schools– Communities– Health centers

• Multiple treatment groups (random assignment to different program variations)

• Encouragement design: offer everyone the program but randomly assign an incentive for take-up• Ex: voucher for contraception• Compare those offered voucher to those not offered voucher

Other Benefits to Randomization

1) Results are transparent– In simplest case, just compare avg outcomes across

treatment and control– Provides policy-makers, NGOs etc. with straightforward

way to gauge what works

2) Methodology is replicable: impact of same program can be tested in multiple populations, countries, etc.

3) Fair method for allocating scarce resources or determining who gets a phased-in program first

4) Used in multiple ways to look at impact:• Why did the program work? • Certain parameters/behavior that seem to matter?• What is the most effective route to achieve a certain outcome

(e.g. education, malaria control, etc)?

Benefits to Randomization (Cont.)

• Can help channel resources to highest impact programs

• Without a credible impact estimate, resources often allocated based on emotion, ideology, favoritism…

• In private sector, market can guide resources to most valued and productive use

• Don’t have this for public policies and private non-profit programs: • Very limited feedback mechanism for funders and policy makers

to learn benefits and costs or the preferences of beneficiaries

• Randomized trials offer a sort of second-best feedback and accountability mechanism

Limitations to Randomization: External Validity

RCTs done on specific populations under specific conditions: How do we know they will apply in other settings?

Culture, religion, corruption etc. can affect impact• Ideal for RCTs to test a behavioral model with variables we

think are relevant to program impact

• Simple RCTs are something of a black box since we can see if something works but not why

Limits to Randomization: Scalability

• One typically must control the environment carefully & keep scale small in order to conduct a field experiment

• But many differences between an NGO run pilot and a government-run national program

– Admin capacity/corruption

– Supply chain failures

– “General equilibrium effects” (returns on investment are different at large scale)• Ex: price subsidies to certain drugs may affect local

distribution channels

Limits to Randomization: Maintaining Integrity of Design

Series of problems one can encounter threatening the validity of the study, rendering causal interpretation of true impact difficult:

• Hawthorn effects– People behavior differently because they know they are being studied

• Sample attrition

• Spillover

Biggest concern is that control group gets “treated” in other ways: • Do local gov’ts and NGOs target extra resources to control areas

because they don’t get the intervention?

RCTs are simple to interpret but complex to undertake

• Drawbacks not insurmountable, but RCTs require a ton of forethought and ingenuity!

• Can be designed to improve:– External validity: choose the population and setting

most representative & policy-relevant

– Scalability: choose simple designs that could be adopted on a larger scale; analyze supply and demand sides to interventions

– Validity: spend extra resources on follow-up, auditing, power; be creative about reducing Hawthorn effects

Impact Evaluation • What is an example of an evaluation of a program

you were involved with? – Was the evaluation designed to measure causal impact?– If so: describe how the program measured causal impact– If not: describe how you might have re-designed the

evaluation to measure causal impact?

• What is an example of an program you have been involved with that you think should be evaluated– How would you design an evalaution that would measure

causal impact?

RCT Example 1

“Should Aid Reward Performance? Evidence from a field experiment on health and education in Indonesia”

Olken, Onishi and Wong

Motivation

• Health systems in the developing world are hampered by absenteeism, “leakage” and corruption

• Increasingly, donors attempt to increase accountability by instituting “pay for performance”

• However – no rigorous evidence exists about the effectiveness of pay for performance

– Rewarding one dimension might lead to less effort on other dimensions

– “Crowding out” of intrinsic motivation

Experimental Design• 3,100 villages were randomly assigned into one of three treatment categories

– Control: No grant

– Non-conditional grant: Block grant that communities could spend on any activity supporting 12 health and education outcomes (8 maternal and child health outcomes).

– Conditional grant: Block grant that communities could spend on any activity supporting 12 health and education outcomes. Next year’s grant is conditional on performance relative to other communities

• Block Grants allocated based on poverty levels. Performance pay determined by performance above a minimum predicted level. 70 percent of

• 2

• In 2007 the average block grant for each subdistrict was USD 112,300 per subdistrict; in 2008, the average

• block grant was raised to USD 200,000 per subdistrict. A subdistrict contains roughly between 15,000 and 50,000

• individuals and 10 to 20 villages. 10

• the average achievement level for villages with similar levels of access to health and education

• providers and numbers of beneficiaries

Incentive Design• Block Grants allocated based on poverty levels.

– Year 1: pool divided based on number of beneficiaries

– Year 2 (treatment group): 80% of pool divided based on # of beneficiaries and 20% based on performance

• Performance pay: weighted sum of performance above a minimum predicted level across all indicators.

• Predicted level: 70 percent of average of other districts with similar levels of access to health and education providers and number of beneficiaries

• Weights: reflect difficulties of achieving improvements in each indicator.

Results

• Substantial improvements for health but not education.

• Biggest effects– 5% increase in pre-natal visits

– 3% increase in immunization

• Heterogeneity– Largest effects where baseline level of delivery low

• Larger increase for incentivized health measures than non-incentivized. Still see increases in non-incentivized measures.

Why?

• Changes in composition of spending?– Incentives move spending away from education (16% decrease)

and toward health (6% increase)

– But no fewer services aren’t received (cheaper uniforms and school supplies

• Increased worker effort?

• Community Effort?

• Decrease in capture?

Why?

• Changes in composition of spending?

• Increased worker effort? – 6% increase in labor of midwives

– No change in teacher labor

– Context – midwives paid per hour, teachers are not

• Community Effort?

• Decrease in capture?

Why?

• Changes in composition of spending?

• Increased worker effort?

• Community Effort? – No evidence of increased community involvement

• Decrease in capture?– No evidence of decrease funds going to providers

Downsides

• Few measures of health outcomes– Percent malnourished is exception

• Use estimates from other studies to calculate cost-effectiveness

Study conclusions

• Incentives improved service delivery on target measures without compromising other measures

• Evidence suggests they may be as cost-effective as conditional cash transfer programs.

• Potential not as cost-effective as direct targeting of programs (ie deworming)