design (1)

Co-ordination of multi-site evaluations: design, support for

execution, QA and synthesis in the Paris Declaration Evaluation

Bernard Wood and Julia Betts, Core Evaluation Team PDE

February 2012

Core Team roles: To move from approach paper and other preparatory stages to an operational design, framework and matrix for the overall evaluation and country studies.

What worked well?a. International governance arrangements, ‘culture’, participation and

support reflected vast experience & global best practice in joint evaluation.

b. Phase I lessons could be applied to Phase 2 framework & country studies.c. Identifying the self defined ‘intended outcomes’ of the Paris Declaration,

the implicit ‘programme theory’ and the centrality of context.d. Introducing 3-question sequence and contribution analysis as the best

available way to handle the difficult link to development resultse. Support of the International Management Group and IRG for these

design steps.f. Wide participation in fleshing out the evaluation framework and matrix

strengthened ownership and trust by stakeholders. g. Designing the synthesis approach from the start and gearing all tools,

analytical processes etc. to the same framework.

Design (1)

Challenges (what did not work well?) and responses

a. Much progress was needed from the approach paper and earlier theoretical exploration. All (esp. IMG) tacitly accepted the need for a fresh start.

b. Too many evaluation questions: The 11 intended outcomes had compelling legitimacy, but were still very wide. Participation added yet more questions. No full solution: Using some big questions for conclusions helped. The ability to add country-specific questions also helped contain this problem but not fully overcome it. Resulted in some spotty coverage & fewer “hard” quantifiable findings. But the parallel monitoring survey fed the appetite for “hard” indicators, while also exposing their limits.

c. Most teams did not focus enough on context chapters to get full value, esp. in the Busan era. (e.g. on forces beyond ‘aid’, non-traditional providers). The synthesis pushed contextual discussion to the limits of the evidence, no resources to supplement fully.

d. Too much information and candour were expected from country studies on the performance of traditional and non-traditional donors. The synthesis pushed to the limits of evidence, including from other solid sources.

Design (2)

Support to execution by country and donor teams

Core Team roles: to support teams in applying the framework, solving issues, and helping safeguard professional independence

What worked well?a. Regional workshops (good but expensive) and individual video and other

support (mainly request driven). Intranet tool for managing guidance and information was vital, but not used by all.

b. Standard framework of questions, sub-questions and suggested indicators and sources

c. Flexible, multi-lingual core team resourcesd. Tracking progress at milestones and following upChallenges (what did not work well?) and responsesa. Staggered starting points (esp. in contracting teams) made support more

expensive and less effective Extra support resources, sessions and follow-up were added – of some help

b. Donor studies were contracted before the framework was set, support provisions were unclear. Not overcome, limited support possible to donor studies, more was needed

c. Different understandings of independence and QA roles. International scrutiny, some interventions & clarifications of independence (with Sec’t.) helped

Core Team roles: As part of overall QA strategy, to assess the quality of draft country and donor reports & suggest strengthening, then validate and gauge the reliability of evidence from each final report for the synthesis.

What worked well?a. Systematic check by at least two CET members of each main finding for strength

of evidence and conclusions and recommendations for clarity of argument.b. The Emerging Findings workshop, as a forum for transparent focus on quality,

examples of good practice, constructive peer pressure and support opportunities for lagging cases.

Challenges (what did not work well?) and responsesa. The number of late and incomplete drafts limited the scope for a solid emerging

findings report and for rigorous overall checks at the Bali workshop. Used the solid evidence on hand but intensive extra work was needed post-Bali to extract evidence from final drafts / reports etc.

b. Some workshop participants focused on their own opinions and experiences rather than evaluation findings. Listened and reported back faithfully, but filtered to keep solid evaluation evidence as the base for the synthesis.

c. Double checks of both drafts and final reports imposed heavy demands in a very short time. Worked harder and didn’t compromise on rigour

Quality assurance

Synthesis processCore team roles: Systematically assemble and reflect key findings & conclusions

from the body of evaluations and studies and distil policy-relevant overall findings, conclusions and recommendations, calibrated to the strength of the synthesized evidence.

What worked well? a. Assembling validated evidence and following the evaluation framework. b. Finding the balance - enough detail to reflect key evidence, but focusing on

strategic findings and conclusions.c. Making the leap to policy-relevant findings, conclusions and recommendations

– requires policy grasp as well as evaluation rigour. Level and language geared to dissemination and use.

d. Validation process (and the rules applied) for the first draft synthesis, steered by the Management Group.

Challenges (what did not work well?) and responsesa. Some different expectations for synthesis product: accessibility and policy-

relevance vs. detail and methodology. Opted for accessibility with essential details and a well-signposted technical annex.

b. Uneven engagement by IRG: thus fuller discussion needed at the final validation workshop, while protecting the agreed process

c. Time pressures, as throughout: work harder.

1. Campaign for evaluable frameworks of intended outcomes in the up-front design of programmes and policies, but remain wary of crude and over-simplified indicators. Accept and embrace the need for rigorous qualitative evaluations of complex realities.

2. Aim for governance arrangements, ‘culture’, participation and support that reflect experience and best practice in joint evaluation.

3. Keep the working language as clear and non-technocratic as possible, minimizing jargon – especially, but not only, in multi-lingual and multi-cultural evaluation processes. Carry this through to reports to maximize ultimate dissemination and use.

4. Recognize genuinely participatory design and validation as not just desirable but integral to the necessary ownership of the process and the ultimate quality and utility of the evaluation. Build in the “careful planning, structure, execution, and facilitation” implied (MQP).

5. Recruit highly competent teams early to play a major role in design, together with evaluation managers and stakeholders.

6. (Perhaps) be prepared to impose selectivity, even among vital questions, in order to have a manageable challenge across the body of cases.

Some key lessons

Some key lessons (2)7. Prepare for complex and uneven processes in multi-site evaluations but set and

keep the deadlines necessary to maintain momentum and deliver timely results.

8. While working to strengthen them, expect uneven capacities and delivery among varied teams. Be ready to reinforce, but if necessary abandon or sideline results where they are found weak against transparent standards. Ensure in advance that an adequate base will remain after ‘dropouts’ for reasonable overall validity.

9. Recognize that written component and synthesis reports are only part of the contribution of the evaluation, alongside benefits from the process and building a community of shared understanding and trust.

10. Set and consistently apply rules to protect teams’ independence within agreed evaluation frameworks and arrangements for quality assurance and validation.

11. Be realistic about the candour to be expected in assessments of other actors’ performance as well as self assessments

12. Calibrate the strength of particular synthesis findings, conclusions and recommendations according to the relative strength of evidence in the body of cases.

design (1)

Documents