unicef geros meta-analysis 2017 · 2019-06-07 · unicef geros meta-analysis 2017 3 seven...

55
UNICEF GEROS META-ANALYSIS 2017 UNICEF GEROS Meta-Analysis 2017 An independent review of UNICEF evaluation report quality and selected trends in 2017 July 2018 v. 2 Principal Author: Tom Orrell Evaluation Manager: Ada Ocampo

Upload: others

Post on 31-Jul-2020

5 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

UNICEF GEROS Meta-Analysis 2017

An independent review of UNICEF evaluation report quality and selected trends in 2017

July 2018 – v. 2

Principal Author: Tom Orrell

Evaluation Manager: Ada Ocampo

Page 2: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

1

Acronyms COs Country Offices

EAPRO

ECARO

East Asia and Pacific Regional Office

Europe and Central Asia Regional Office

EMOPs Office of Emergency Programmes

EO Evaluation Office

ESARO Eastern and Southern Africa Regional Office

EQA Evaluation Quality Assessment

GEROS Global Evaluation Report Oversight System

HQ Headquarters

HRBAP Human Rights Based Approach to Programming

LACRO Latin America and Caribbean Regional Office

M&E Monitoring and Evaluation

MENARO Middle East and North Africa Regional Office

SP Strategic Plan

N/A Not Applicable

OECD/DAC Organization for Economic Co‐operation and Development/Development Assistance Committee

RBM Results‐based Management

ROs Regional Offices

ROSA Regional Office for South Asia

RTE Real‐time evaluation

SPO Strategic Plan Outcome

ToC Theory of Change

TORs Terms of Reference

UN United Nations

UNDAF United Nations Development Assistance Framework

UNEG United Nations Evaluation Group

UNICEF United Nations Children’s Fund

UN-SWAP UN System‐wide Action Plan for gender equality and empowerment of women

WASH Water, Sanitation and Hygiene

WCARO West and Central Africa Regional Office

Cover photo: © UNICEF/UN0162292/Tremeau

Page 3: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

2

Executive Summary

Introduction This review is a meta-analysis of the quality of the evaluation reports submitted to UNICEF’s Global Evaluation Reports Oversight System (GEROS)1 during 2017. It synthesizes results of 88 evaluation reports, reviewed for quality by an independent team according to UNICEF and UN-SWAP standards; and shares findings on a global level, as well as highlighting trends across regions, sectors trends and quality assessment criteria. This report contributes to a wider body of knowledge in addition to integrating requirements for reporting on the UN-SWAP evaluation performance indicator.

The purpose of the meta-analysis is to contribute to achieving the three overall objectives of GEROS (particularly objective 1), of which the meta analysis is only one part2:

Objective 1: Enabling environment for senior managers and executive board to make informed decisions based on a clear understanding of the quality of evaluation evidence and usefulness of evaluation reports;

Objective 2: Feedback leads to stronger evaluation capacity of UNICEF and partners;

Objective 3: UNICEF and partners are more knowledgeable about what works, where and for whom.

GEROS is underpinned by United National Evaluation Group (UNEG) norms and standards, UN System Wide Action Plan on gender equality (UN-SWAP) and other UNICEF-adapted standards, including equity and human-rights based approaches. The system consists of rating evaluation reports commissioned by UNICEF Country Offices, Regional Offices and HQ divisions.

All reports and the results of their quality assessment are made available in the UNICEF Global Evaluation and Research Database (ERDB), as well as made publicly available on the UNICEF external website. GEROS is an organization-wide system.

In total, 88 over 89 evaluation reports from the 2017 cycle were reviewed. This is slightly less than the 101 evaluation reports submitted in 2016. For 2017, 100% of evaluation reports submitted to the Evaluation and Research Database were reviewed.

Findings Overall, the proportion of reports meeting UNICEF standards has been maintained since the previous year; the majority of reports (72%) fully met UNICEF evaluation report standards. Of these, 15% of reports were rated as highly satisfactory, which is a substantial improvement over the 6% achieving this standard in the previous year; and 57% were rated as satifactory. No report was fully unsatisfactory. The remaining 28% of reports were rated as ‘fair’, meaning that they can be used with caution by taking account of their limitations.

The evaluation quality assurance tool used in 2016 and 2017 allows for additional levels of disaggregation beyond the five main classifications (highly satisfactory, satisfactory, fair, unsatisfactory, missing). Deeper analysis of ratings (see Figure 3) reveals that the majority of evaluation reports reaching UNICEF standards are in the lower-band of the satisfactory rating. Furthermore, 15% of reports, an increase on 9% in the previous year, were rated in the lower band of ‘fair’; which indicates that the pattern of increasing quality is not assured, but needs continuous strengthening of the evaluation function.

1 https://icon.unicef.org/apps02/cop/edb/SitePages/Home.aspx 2 The objectives for GEROS were revised in 2016 and now differ slightly from the original Terms of Reference for the meta analysis

Page 4: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

3

Seven evaluation reports were rated at the very upper end of the ‘satisfactory’ range, with a mean score (x) of 3.3≤x<3.5 out of a maximum of 4 points (reports scoring 3.5 or over are rated as ‘highly satisfactory’). With improvements in one or two sections, these reports are likely to have been rated ‘highly satisfactory’.

The strongest aspects of evaluation reports in 2017 were ‘purpose, objectives and scope’, ‘structure’ and ‘recommendations’. This is consistent with previous years. ‘Findings’ were also rated relatively strongly. These elements are all key contributors to utility of evaluations. By contrast, evaluation principles (HRBAP and gender equality), lessons learned and methods sections received the lowest ratings. These results indicate that evaluations are focusing on utility for primary intended users. More attention needs to be focused on strengthening credibility and learning. This need also reflects 2015 and 2016 evaluation trends.

Comparison of evaluation reports across the UNICEF regions reveals a reduction in the number of reports from ESAR (normally the largest region for evaluation), but an improved average quality in the reports submitted. ROSA and EAPR also produced slightly fewer reports, but improved in terms of the propotion of reports rating as satisfactory. By comparison, a large increase in the number of reports from ECAR, with 7 of these rated as ‘highly satisfactory’ (1 in 2016) – although also 3 ‘fair’ reports unlike in 2016. WCAR increased the number of reports, but the ‘additional’ evaluations were rated ‘fair’, LACR also saw an increase in the number and percentage of ‘fair’ evaluation reports, while MENA one more report than in 2016 was also rated ‘highly satisfactory’. The main difference relates to HQ, with a significant reduction in the number of corporate evaluations being completed in 2017.

The largest body of evaluative knowledge was generated for health and education, followed by child protection and social inclusion (see figure 17). These same areas were also most covered in 2016 evaluation reports. The priority areas for action to improve all reports are similar, with weaknesses in the articulation of human-rights based approaches (HRBAP), gender equality, ethics, and lessons learned. As expected, evaluations that successfully mainstreamed gender equality as a cross-cutting theme were also strongest regarding HRBAP and equity.

The aggregated average UN-SWAP score for integration of gender equality in 2017 was 6.15, which is classified as Approaching Requirements. This is almost the same as the 2016 and 2015 cycle, suggesting that fully mainstreaming gender equality within the evaluation system remains a challenge. The priority for action to improve UN-SWAP remains to ensure gender analysis is used to inform evaluation findings, conclusions and recommendations.

Most evaluations (82%) are managed directly by UNICEF. Of these, 76% were rated as fully meeting UNICEF standards. These are almost identical patterns to 2016. Once again, in 2017 there were no purely quantitative evaluations; most evaluations used mixed methods, with 17% of evaluations purely qualitative (19% in 2016). Unlike in previous years, however, there was no difference in the quality of reports between these methodological approaches.

2017 saw a large increase in the number of quasi-experimental evaluations. Project evaluations improved in both quality and number; as did strategy evaluations. Country programme evaluations and joint programme evaluations both reduced in number, but retained exactly the same level of quality as 2016. By comparison, programme and pilot/innovation evaluations reduced in both number and quality.

Conclusions and Recommendations Conclusion 1: UNICEF evaluation reports in 2017 have maintained the quality and coverage of the previous year, while being fewer in number but less strategic in scope due to an increase in the number and proportion of project evaluations.

The overall picture of GEROS data for 2017 reveals a consistency in performance by comparison with 2016. The portfolio of evaluations has changed in some ways but the key indicators of performance have remained constant. The percentage of reports meeting

Page 5: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

4

UNICEF standards is similar to 2016, as is the overall UN-SWAP score and the number of countries covered.

Evaluations that were rated as ‘unsatisfactory’ were eliminated in 2017; and there was an increase in the number of reports rated ‘highly satisfactory’. However, most of this improvement took place within bands containing reports that already met UNICEF standards, and some stubborn weaknesses remain in the overall portfolio. These relate to the integration of UNICEF principles (human-rights based approaches, gender equality, and equity), the articulation of evaluation designs and limitations, the analysis on unexpected/unintended results, and the inclusions of lessons. It is these key factors that seem to be preventing further improvement in the overall universe of evaluation reports.

Recommendation 1: In aligning the evaluation function to the UNICEF Strategic Plan 2018-2021, incentivize and support the use of more strategic evaluations by re-focusing away from project and output-level evaluations.

EVALUATION OFFICE

Longitudinal review of UNICEF meta analyses reveal some norms and standards that have been consistently achieved for a prolonged period of time. These include indicators around predominantly structural and narrative elements such as: report structure, the completeness of findings, and the inclusion of some basic information. Consistently delivering on these has meant that evaluation reports, including project level evaluations, are no longer rated as ‘unsatisfactory’.

However, there are other indicators that are either inconsistent, or persistent weaknesses across time and reports. Inconsistent indicators include those that broadly relate to the more analytical and critical elements of the GEROS framework: the specification of a report design, theories of change, detailed information on sampling approaches and sources, the inclusion of ethics, and the elaboration of limitations. Persistent weaknesses include the use of human-rights, equity and gender analyses, exploring unexpected effects, the elaboration of lessons learned, and describing the process for developing recommendations.

In combination, these two trends mean that GEROS has largely achieved its original aim of ensuring UNEG/UNICEF evaluation report standards are consistently applied, but that UNICEF business areas are not being incentivised to undertake more complex and strategic evaluations such as country-led, thematic, or multi-country by the GEROS system. With increasingly systematic application of basic reporting standards even for low-budget evaluations, project evaluations are more frequently attaining a ‘satisfactory rating’ than more complex evaluations – incentivizing lower level evaluations.

Incentivizing more complex evaluations also requires addressing current weaknesses in terms of clear evaluation designs that genuinely apply mixed methods for both data collection and analysis. In particular, reports can better explain how different methods were sequenced and combined to achieve triangulation; and how specific methods were applied to examine criteria such as efficiency. Evaluation Office can review the current guidance and determine whether refreshed or new material on designs for strategic evaluations, including evaluations of innovations may be required to support this.

Conclusion 2: While the integration of human rights based approaches and gender equality commitments continues to improve over time, the pace of this change is insufficient to meet UNICEF targets, including for UN-SWAP.

As with the previous meta-analysis, the UN-SWAP evaluation performance indicator has remained static for 2017. Other indicators, and trend analysis over time, do indicate that incremental improvements are being made – but these are not at the pace required to fully meet UNICEF commitments. Accelerating the integration of gender-responsive and human rights-based evaluation designs and analysis remains a challenge across nearly all regions; although lessons may be available from the improvement demonstrated by ECAR in 2017.

Page 6: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

5

Recommendation 2: To ensure that no child is left behind and to deliver on the UNICEF equity agenda, initiate urgent action to overcome persistent bottlenecks and to strengthen the full integration of HRBAP, equity and UN-SWAP requirements in all evaluations using UN Evaluation Group guidance and good practices.

EVALUATION OFFICE, AND REGIONAL EVALUATION AND GENDER ADVISORS

The achievements of UNICEF evaluations in consistently meeting or exceeding standards for purpose and objectives demonstrates the capacity of the evaluation function in UNICEF to improve final report quality through the improving evaluation management. For example, terms of reference can be used to ‘set evaluations up’ to better meet UNICEF requirements. The same level of success has not been achieved for integrating gender equality, human-rights and equity. Equity analysis is especially inconsistent, which is a particular concern for UNICEF. The meta-analysis strongly recommends that the evaluation office and regional advisors work with all evaluation managers to apply current UN Evaluation Group guidance3 on integrating human rights and gender equality within the full evaluation cycle: from planning, to ToR, to ensuring that evaluators are recruited who are sufficiently experienced in the required standards, and to quality assurance of evaluation processes and products.

Recommendation 3: Reassess the integration of gender, human-rights and equity indicators within the GEROS assessment tool, with a view to generating more detailed insights on the bottlenecks to delivering UNICEF commitments.

EVALUATION OFFICE

While the current UN-SWAP indicators, and dedicated gender, human-rights, and equity questions within the GEROS assessment tool provide an indication of overall performance and trends, the challenge of improving this performance may require more disaggregated analysis of bottlenecks than is currently available. It may help, therefore, to revisit the current set of indicators on principles, with a view to providing more nuanced insights into where in the ‘evaluative-chain’ the biggest opportunities are for improvement.

Conclusion 3: Inconsistency in the inclusion and quality of lessons learned has important implications on both the quality assessment of reports and the utility of evaluations.

Comparison of GEROS data for 2016-2017 (where lessons are included in the same section as conclusions) and 2012-2015 (where lessons were included in the same section as recommendations) reveals that the inclusion of properly formulated lessons learned is the most inconsistent element of UNICEF evaluation reports. Without high quality generalizable lessons learned, evaluation evidence is useful mostly for the individual intervention being evaluated and as part of an overall picture of UNICEF effectiveness.

Given increasing consistency in the presence of other elements within evaluation reports, lessons learned thus have an important influence on the final rating. The current assumptions within the evaluation standards (which form the basis for GEROS) that the blanket inclusion of lessons learned is desirable may not be appropriate. However, there is currently no source of guidance on which types and levels of evaluation should include lessons learned, other than

3 A growing body of material is available on concrete approaches to implementing human rights and gender equality in evaluations. This includes normative (theory-based) Integrating Human Rights and Gender Equality in Evaluation – Towards UNEG Guidance, a UNEG handbook, the revised UNICEF evaluation report standards, a guidance manual from UN Women on managing gender responsive evaluations, and recent good practice guidance from across the UN system compiled by UNEG. More broadly, Better Evaluation provides a list of materials on gender analysis: https://www.betterevaluation.org/en/search/site/gender%20equality.

Page 7: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

6

whether they are included as a requirement in the Terms of Reference (ToR). ToRs are also inconsistent with regard to lessons, and so there is a gap in the ensuring that evaluations contribute to the wider knowledge management function.

Recommendation 4: Clarify UNICEF standards regarding which types of evaluations are required to include lessons learned, and facilitate knowledge exchange to better support the development and sharing of lessons.

EVALUATION OFFICE

Building on a similar recommendation from the previous meta-analysis, it is recommended that the Evaluation Office review the standards on including lessons in all evaluation reports. The determination on which evaluations are required to include lessons learned (in both ToRs and final reports) should be accompanied with clear guidance that is both useful to evaluation managers, and can be incorporated in the GEROS standards (and reflected in the advice provided by regional helpdesks).

In addition to incorporating these reviewed standards on lessons into GEROS, where lessons learned are required, further knowledge exchange on good practices, and improved tools, guidance or templates may be developed to address current inconsistencies in the understanding of what constitutes a lesson. Moreover, the Evaluation Office should consider ways to enhance dissemination of good practices and lessons from evaluations to demonstrate the value of this evaluation purpose.

Page 8: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

7

Contents Acronyms ................................................................................................................................................ 1 Executive Summary ................................................................................................................................ 2

Introduction.......................................................................................................................................... 2 Findings ............................................................................................................................................... 2 Conclusions and Recommendations .................................................................................................. 3

Introduction.............................................................................................................................................. 9 Purpose, Scope and Objective of GEROS ......................................................................................... 9

Methodology .......................................................................................................................................... 10 Overview of evaluation reports included in the meta-analysis .......................................................... 10

Findings ................................................................................................................................................. 12 Overall findings on evaluation report quality ..................................................................................... 12 Areas of progress and priorities for improvement ............................................................................. 13 Overall regional performance ............................................................................................................ 26 Overall thematic performance ........................................................................................................... 27 UN-SWAP performance and trends .................................................................................................. 28 Other observable patterns................................................................................................................. 30

Conclusions and Recommendations .................................................................................................... 33 Appendices ........................................................................................................................................... 37

Annex 1. Terms of Reference ........................................................................................................... 37 Annex 2. GEROS evaluation quality assessment indicators ............................................................ 40 Annex 3. List of reports quality assessed ......................................................................................... 46 Annex 4: UN-SWAP Calculations ..................................................................................................... 52

Tables

Table 1: Number of reports included in GEROS, 2011-2017 ............................................................... 10 Table 2: Priority areas for action to strengthen evaluation identified in meta evaluations, 2012-2017 15 Table 3: Heat map of average rating of indicators from 2017 evaluation reports for Section A. .......... 16 Table 4: Heat map of average rating of indicators from 2017 evaluation reports for Section B. .......... 17 Table 5: Heat map of average rating of indicators from 2017 evaluation reports for Section C. .......... 19 Table 6: Heat map of average rating of indicators from 2017 evaluation reports for Section D. .......... 20 Table 7: Heat map of average rating of indicators from 2017 evaluation reports for Section E. .......... 21 Table 8: Heat map of average rating of indicators from 2017 evaluation reports for Section F. .......... 22 Table 9: Heat map of average rating of indicators from 2017 evaluation reports for Section G. .......... 23 Table 10: Heat map of average rating of indicators from 2017 evaluation reports for Section H. ........ 24 Table 11: Heat map of average rating of indicators from 2017 evaluation reports for Section I .......... 26 Table 12: Performance according to UN-SWAP evaluation criteria, 2017 ........................................... 29 Table 13: Regional variations in report performance according to UN-SWAP criteria, 2017 ............... 29 Table 14: The number of different types of evaluation in 2017 and 2016, and the percentage of reports meeting UNICEF standards .................................................................................................................. 30 Table 15: Number of reports meeting UNICEF standards for different levels of evaluation in 2017 .... 31

Figures

Figure 1: Number of reports submitted to GEROS per region and HQ ................................................ 11 Figure 2: Coverage of countries in 2017 evaluation reports ................................................................. 11 Figure 3: Overall distribution of quality ratings for 88 reports from 2017 and 101 evaluation reports from 2016 ...................................................................................................................................................... 12 Figure 4: Numbers and percentage of evaluation reports meeting UNICEF standards 2009-2017 ..... 13 Figure 5: Performance of quality assessment sections 2017. .............................................................. 14 Figure 6: Quality ratings for Section A (Object and context) 2017........................................................ 16 Figure 7: Quality ratings for Section B (purpose, objectives and scope) 2017 ..................................... 17 Figure 8: Quality ratings for Section C (methods) 2017 ........................................................................ 18 Figure 9: Quality ratings for Section D (findings) 2017 ......................................................................... 19 Figure 10: Quality ratings for Section E (conclusions and lessons) 2017 ............................................ 20 Figure 11: Quality ratings for Section F (recommendations) 2017 ....................................................... 21

Page 9: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

8

Figure 12: Quality ratings for Section G (structure, logic and clarity) 2017 .......................................... 22 Figure 13: Quality ratings for Section H (evaluation principles) 2017 ................................................... 24 Figure 14: Long term patterns in the inclusion of human rights based approaches and gender & equity (reports rated as satisfactory or highly satisfactory) ............................................................................. 25 Figure 15: Quality ratings for Section I (executive summary) 2017 ...................................................... 25 Figure 16: Distribution of 2017 evaluation report quality across the UNICEF regions ......................... 27 Figure 17: Number of evaluations assessing different evaluation objects by region, 2017 .................. 27 Figure 18: Distribution of 2017 evaluation report coverage across the UNICEF thematic areas ......... 28 Figure 19: Performance of UNICEF reports across the UN-SWAP overall and individual UN-SWAP criteria .................................................................................................................................................... 29 Figure 20: Proportion and number of evaluations using different methodological designs across UNICEF thematic areas, 2017 .............................................................................................................. 30 Figure 21: The number of different designs of evaluation in 2017 and the percentage of reports meeting UNICEF standards ................................................................................................................................ 31

Page 10: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

9

Introduction This review is a meta-analysis of the quality of the evaluation reports submitted to UNICEF’s Global Evaluation Reports Oversight System (GEROS) during 2017. It synthesizes results of 88 evaluation reports. It shares findings at the global level, as well as highlighting trends across regions, sectoral trends and quality assessment criteria. This report contributes to a wider body of knowledge, of similar GEROS meta-analysis reports produced each year since GEROS began in 2010.

The report was commissioned by UNICEF Evaluation Office. The key audiences are the Global Evaluation Committee, UNICEF senior management (at the global, regional and country levels), the UNICEF Evaluation Office, regional evaluation officers, and evaluation managers. It will be used to monitor progress, analyse strengths and identify current challenges to improve evaluations.

The UNICEF evaluation system is decentralised, which reflects the decentralised nature of the organization. The Evaluation Office and regional offices collaborate in order to strengthen the organization’s evaluation function. While the decentralized nature of the evaluation function ensures that evidence generated is relevant to the local context, it poses the challenge of setting up a consistent system to ensure good quality and credibility. Global Evaluation Reports Oversight System (GEROS) was established in 2010 to assess quality of evaluation reports and further inform developments of the organization’s evaluation function.

The GEROS process includes review of the quality of the evaluation reports and a meta-analysis each year by an external independent team. UNICEF adapted UNEG standards are used as the criteria of quality assessment (Annex 2). Annual Meta Analyses Reports have been produced each year since 2009.

Purpose, Scope and Objective of GEROS The purpose of GEROS is to support strengthening of the evaluation function to meet UNEG standards, ensure accountability, and promote use of robust evaluative evidence.

The purpose of this meta-analysis is to support the three objectives of GEROS (particularly objective 1):

Objective 1: Enabling environment for senior managers and executive board to make informed decisions based on a clear understanding of the quality of evaluation evidence and usefulness of evaluation reports

Objective 2: Feedback leads to stronger evaluation capacity of UNICEF and partners

Objective 3: UNICEF and partners are more knowledgeable about what works, where and for who.

GEROS is underpinned by United National Evaluation Group (UNEG) norms and standards (2016), UN System Wide Action Plan on gender equality (UN-SWAP) and other UNICEF-adapted standards, including equity and human-rights based approaches4. The system consists of rating evaluation reports commissioned by UNICEF Country Offices, Regional Offices and HQ divisions. All reports and the results of their quality assessment are made available in the UNICEF Global Evaluation and Research Database (ERDB), as well as made publicly available on the UNICEF external website. GEROS is an organization-wide system.

4 These are based on the UN Evaluation Group technical note for the UN-SWAP EPI: http://www.unevaluation.org/document/detail/1452

Page 11: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

10

Methodology This meta-analysis was conducted in March-June 2018 once all of the evaluation reports for 2017, had been assessed, submitted to UNICEF EO and accepted. Quantitative data was compiled regarding scores for different aspects of the reports using Excel. Analysis was carried out across multiple axes:

❖ Regional trends (regional and country levels) ❖ Trends by quality assessment criteria (including across time)

Object of the evaluation; Evaluation purpose, objectives and scope; Evaluation methodology; Findings; Conclusions and lessons learned; Recommendations; Evaluation principles (gender, human rights and equity); Report structure, logic and clarity; Executive summary

❖ Type of management arrangements for the evaluation ❖ Purpose ❖ Scope ❖ Results level ❖ Strategic Plan Objective Area correspondence ❖ UN-SWAP performance and trends

The comments made by reviewers on each evaluation quality assessment were filtered according to section and overall ratings, and then synthesized to identify common themes and thus explore any causal links between recurrent issues and particular ratings. In addition the reviews were trawled to identify good evaluation practices from the reports. Quantitative and qualitative data were triangulated, and compared with longitudinal data on findings from four previous years to map key trends and patterns.

Overview of evaluation reports included in the meta-analysis The list of 2017 evaluation reports to be reviewed was finalised by the UNICEF Evaluation Office, which involved monitoring the Evaluation and Research Database and undertaking an initial filtering to ensure that all reports were correctly classified as evaluations. In total, 88 evaluation reports, over a total of 89, were reviewed for 2017. This represents a reduction on the number of evaluation reports assessed in 2016 (see Table 1).

Table 1: Number of reports included in GEROS, 2011-2017

Year Reports Reviewed Cumulative

2011 88 88

2012 85 173

2013 96 269

2014 69 338

2015 90 428

2016 101 529

2017 88 617

The number of reports submitted by each region varies, as shown in the graph below (Figure 1). In 2017 the greatest number of reports was submitted from the West and Central Africa

Page 12: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

11

Region (17 reports); the least number of reports was submitted by HQ (5 reports). 66 UNICEF offices commissioned evaluations at least once in 2017, the same as in 20165.

Figure 1: Number of reports submitted to GEROS per region and HQ

Figure 2: Coverage of countries in 2017 evaluation reports

5 This coverage excludes country case studies that were not published separately

16

8

1513

7 7

17

5

ECAR EAPR ESAR LACR MENA ROSA WCAR HQ

2011 2012 2013 2014 2015 2016 2017

© DSAT Editor, DSAT for MSFT, GeoNames, Microsoft, Navteq, Thinkware Extract, WikipediaPowered by Bing

4 3 2 1

Page 13: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

12

Findings The following findings are the product of quantitative and qualitative analysis (including triangulation) of data from the GEROS evaluation quality assurance reviews of 88 UNICEF evaluations from 2017 (see Annex 3).

Overall findings on evaluation report quality Overall, the 72% of reports fully meeting UNICEF standards is at a similar level to the previous cycle (74% in 2016), which was an improvement on the ratings for 2015 (53%) and the same as the level achieved in 2014. Of these, 15% of reports were rated as highly satisfactory, which is threefold higher than in previous years; and 57% were rated satisfactory. No report was fully unsatisfactory, with the remaining 28% of reports being of sufficient quality to be used with caution if properly taking account of their limitations.

Deeper analysis of ratings (see Figure 3) reveals that the majority of evaluation reports reaching UNICEF standards are in the lower-band of the satisfactory rating. The quality of 2017 evaluations largely reproduced the classical ‘bell-shaped’ distribution identified in 2016. It also indicates that some of the ‘borderline’ reports along the ‘boundary’ between Satisfactory and Highly Satisfactory ratings bands have been successfully elevated in quality, but that the wider body of evaluation reports has remained consistent with previous year. This indicates that improvements to reports have not been spread evenly across the full portfolio; with the main improvements being better ratings within the band of reports that already meet the UNICEF standards.

The maximum points available to any report is 4, with reports scoring 3.5 or over rated as ‘highly satisfactory’. Seven evaluation reports were rated at the very upper end of the ‘satisfactory’ range, with a mean score of 3.3≤x<3.5. With improvements in one or two sections, these reports are likely to have been rated ‘highly satisfactory’; and include evaluations from ECAR, LACR, WCAR, MENA, and EASR6.

Figure 3: Overall distribution of quality ratings for 88 reports from 2017 and 101 evaluation reports from 2016

6 Albania 2017/001 Evaluation of the “Breaking the cycle of exclusion for Roma children through Early Childhood Development and Education”, Serbia 2017/006 Summative Evaluation of Child Care Reform in Serbia, Swaziland 2017/001 Evaluation Of The Swaziland Child Friendly Schools (CFS) Programme, LACR 2017/006 Multi-Country Evaluation of Early Child Education Policies in Latin America and the Caribbean, State of Palestine 2017/001 Evaluation for Humanitarian Action for Children, Nigeria 2017/002 Impact Evaluation of UNICEF Nigeria Girls’ Education Project Phase 3 (GEP3) Cash Transfer Programme (CTP) in Niger and Sokoto States, Republic of Cameroon 2017/001 Evaluation Du Programme Wash UNICEF-Cameroun 2013-2016

Highlysatisfactory

Satisfactory(upper)

SatisfactoryFair (upper)FairUnsatisfactory

2017 1317341860

2016 629401781

Page 14: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

13

Trends in report quality over time need to be interpreted with caution given the transition to a revised rating system in 2016. However, both the near-term and long-term trends are positive in terms of evaluations that are meeting the UNICEF report standards.

When analysed against previous years’ data (Figure 4), it can be seen that there has been a steady increase in quality from 2009 to 2011, a sharp increase in 2012 and then a steady increase until 2014. 2015 represented a sharp year-on-year decrease from 74% to 53%, with 2016 seeing this pattern reverse, and 2017 maintaining those gains. While individual years reveal numerous variations, the most important analysis revealed by the data is the long-term trend of a steady improvement in evaluation report quality since 2009; with 2017 reports sitting just below this 9-year linear trend-line.

Figure 4: Numbers and percentage of evaluation reports meeting UNICEF standards 2009-2017

Areas of progress and priorities for improvement Analysis of quality assessments relates to eight subsections of the GEROS assessment tool, which are derived from UNICEF reporting standards. Each sub-section comprises a varying number of relevant questions, with an accompanying set of indicators and rubric.

❖ Section A: Object of the evaluation ❖ Section B: Evaluation purpose, objectives and scope ❖ Section C: Evaluation methodology ❖ Section D: Findings ❖ Section E: Conclusions and lessons learned ❖ Section F: Recommendations ❖ Section G: Evaluation principles, human-rights, equity and gender equality ❖ Section H: Structure, logic and clarity of the report ❖ Section I: Executive summary

Overall it can be seen from Figure 5 that the strongest sections in 2017 were ‘Purpose, objectives and scope’ (Section B), ‘Structure’ (Section G), and ‘Recommendations’ (Section F). ‘Findings’ (Section D) and Background (Section A) were also rated strongly. These patterns echo the results for 2016 and 2015 evaluations almost exactly. By contrast, evaluation principles (HRBAP, gender equality, and equity), lessons learned and methods sections remain the lowest ratings. These results indicate that evaluations are continuing to focus on utility, but that the underlying technical credibility of the overall portfolio remains an area that requires strengthening.

17

29

44

32 33

75

6336%40%

42%

62%

69%74%

53%

74% 72%

2009 2010 2011 2012 2013 2014 2015 2016 2017

n Percent Linear (Percent)

Page 15: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

14

Figure 5: Performance of quality assessment sections 2017.

Common strengths • In general, all reports are now complete in terms of the required elements (with the

exception of lessons learnt – see below). They are logically structured and present information systematically.

• Normally, the purpose, objectives, and the scope of the evaluations are clearly explained. These are derived from the terms of reference.

• Findings sections predominantly address all the evaluation criteria and questions.

• In general, evaluations present both a narrative explanation and a graphical description of the Theory of Change which is included in the body of the report and/or in the annexes.

• Recommendations in most cases are clearly derived from findings and conclusions, and are appropriate in number – contributing to utility.

Recurrent weaknesses • There was a drop in the ratings for executive summaries, and renewed emphasis could

be placed on this important feature of utility.

• Data collection and analysis methods are still commonly listed without reference to a coherent overall design – which does not support the credibility of findings. Similarly, methods sections list constraints faced by the evaluation, but less often identify the inherent limitations of the selected methods.

• More generally, unexpected effects are not always identified or analysed, and examination of efficiency is mostly qualitative without clear reference to rubric.

• The integration of HRBAP, gender responsive evaluation designs, and analysis of equity is inconsistent – undermining the credibility of findings and recommendations to comprehensively advance the UNICEF mandate and principles.

• Lessons learned are frequently missing from the reports. In many cases when they are included, they tend to be misidentified and/or are not sufficiently generalizable (i.e.

17 34

19

10

23

19 27

622

50 42 35

58

24

52 4933 35

17 9

3215

3416 11

3728

43

2

5

6

1

122

. Highly satisfactory Satisfactory Fair Unsatisfactory .

Page 16: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

15

they do not draw lessons from the object of evaluation that can be applied in different contexts to contribute to general knowledge).

Table 2, below, summarizes the main areas requiring additional attention that were highlighted in the meta-analysis reports over the last six years. In several cases specific progress was mentioned from one year to another but they continue to be identified as areas for improvement in 2017.

Table 2: Priority areas for action to strengthen evaluation identified in meta evaluations, 2012-2017

2012 2013 2014 2015 2016 2017

Insufficient justification for the selection of evaluation criteria

Absent or weak description of the evaluated object’s theory of change

Limited mention of ethical considerations and safeguards

Weak cost analysis

Insufficient identification of unexpected effects

Inconsistent incorporation of gender, human rights from start to finish of an evaluation

Insufficient mapping or analysis of stakeholder contributions

Lack of clear identification of target stakeholders for recommendations or evidence of participation in developing recommendations

Lessons learned missing, unclear or not generalizable

Insufficiently robust analysis of contribution and causality

Object of the evaluation Section A of the Assessment tool covers the quality of the description of the object of the evaluation: description of specific object – e.g. project/ programme/ policy being evaluated; context; theory of change; and identification of stakeholders. The section scored ‘highly satisfactory’ in 19% of reports (25% in 2016), ‘satisfactory’ in 57% of reports (46% in 2016), ‘fair’ in 19% of reports (29% in 2016) and 5% ‘unsatisfactory’ (1% in 2016). This represents mixed change in performance since 2016, with more satisfactory reports; but also more unsatisfactory and fewer highly satisfactory reports in relation to this measure.

Reports were strongest regarding describing the context, with improvement in the use of a theory of change for the object being evaluated. Stakeholder analysis was rated as marginally weaker than the other areas. As with 2016, the median rating of reports was higher than the mean, indicating that a larger body of good quality reports exist, with a smaller number of poorer-quality reports influencing the overall ratings.

Page 17: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

16

Figure 6: Quality ratings for Section A (Object and context) 2017

At the decentralized level, description of the object of the evaluation and the context was a particular strength in MENA and ROSA; elaboration of the theory of change was a strength in HQ, EAPR and ESAR; and identification of stakeholders was a strength in ECAR and MENA.

Strong examples of background sections include: ECAR Albania 2017/001 Evaluation of the “Breaking the cycle of exclusion for Roma children through Early Childhood Development and Education” multi-country project in the Former Yugoslav Republic of Macedonia, Serbia and Albania; ESAR Angola 2017/007 Formative Evaluation of Angola Country Programme (2015-2019); LACRO 2017/006 Multi-Country Evaluation of Early Child Education Policies in Latin America and the Caribbean; ROSA Bangladesh 2017/001 Programme Evaluation of UNICEF Bangladesh Communication for Development (C4D) Programme from 2012 to 2016; WCAR Republic of Cameroon 2017/001 Evaluation Du Programme Wash UNICEF-Cameroun 2013-2016

Table 3: Heat map of average rating of indicators from 2017 evaluation reports for Section A.

Object Context ToC Stakeholders

Mean 78% 82% 78% 76%

Median 83% 89% 100% 83%

ECAR 80% 86% 87% 90%

EAPR 81% 83% 90% 81%

ESAR 71% 79% 90% 68%

LACR 81% 80% 69% 68%

MENA 86% 98% 52% 88%

ROSA 86% 89% 76% 81%

WCAR 77% 76% 67% 67%

HQ 62% 62% 93% 73%

Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Evaluation purpose, objectives and scope Section B focuses upon the evaluation’s purpose, objectives and scope. In 2017, 39% of reports were scored as ‘highly satisfactory’ (38% in 2016), 48% as ‘satisfactory’ (45% in 2016), 10% as ‘fair’ (14% in 2016) and 3% as ‘unsatisfactory’ (4% in 2016). This represents a small but important improvement in quality across the portfolio. According to evaluation theory, the

19%

57%

19%

5%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 18: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

17

purpose and scope have important implications for the utility of the evaluation, so the improvement of this area is an important result.

Figure 7: Quality ratings for Section B (purpose, objectives and scope) 2017

Across 2017 evaluations, the purpose and the scope (including objectives) were rated as equally high. Having a clear purpose section was a strength of reports from ECAR, MENA, and ESAR; whereas the strongest examples of having a clear scope were from ROSA, ECAR, WCAR, and LACR.

Some strong examples of clear purpose in evaluations include: ECAR Ukraine 2017/001 Evaluation of the Country Programme of Cooperation between the Government of Ukraine and UNICEF 2012-2016; EAPR Fiji (Pacific Islands) 2017/003 Final Evaluation Improving WASH in Solomon Islands (IWASH-SI) Project; ESARO 2017/010 Evaluation of ESAR Institutional Strengthening Support Initiative on Decentralized Programme Monitoring and Response; Evaluation Office 2017/003 Reducing Stunting in Children Under 5 Years of Age: A comprehensive evaluation of UNICEF’s strategies and programme performance – Global synthesis report; LACR Colombia 2017/007 Evaluación de adherencia y costo-efectividad del lineamiento para el manejo integrado de la desnutrición aguda en niños y niñas de 0 a 59 meses de edad, Departamento de La Guajira (2016-2017); MENA Jordan 2017/004 Evaluation of the Ma’an (Together) towards a Safe School Environment Programme 2009-2016; ROSA India 2017/002 Evaluation of UNICEF’s Community Based Disaster Risk Reduction and School Safety Programme, Bihar, India (2011-2016); WCAR Gabon 2017/001 Evaluation des interventions de la composante Politiques Sociales.

Table 4: Heat map of average rating of indicators from 2017 evaluation reports for Section B.

Purpose Scope

Mean 85% 86%

Median 100% 100%

ECAR 97% 93%

EAPR 83% 75%

ESAR 88% 82%

LACR 78% 87%

MENA 90% 83%

ROSA 83% 93%

WCAR 84% 88%

HQ 63% 73% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

39%

48%

10%

3%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 19: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

18

Evaluation methodology Section C assesses a broad range of criteria about the evaluation methodology, including criteria, methodological robustness (data collection and analysis) and ethics. At the top end of reports, the methods section reveals an improvement since 2015, with 22% of reports rated ‘highly satisfactory’ (12% in 2016), and 40% of reports rated ‘satisfactory’ (46% in 2016). However, methodological quality remains an area where improvements are needed across the wider portfolio, with 38% of reports not yet fully meeting UNICEF standards.

While all reports list a set of data collection and analysis methods sufficient to meet UNICEF standards, far fewer explicitly describe the overall evaluation design (the logic by which methods are combined to derive evaluative findings and conclusions). As a result, many share the same ‘implicit’ design based on qualitative synthesis of evaluation evidence. A tendency was also observed for reports to describe the evaluation as being mixed methods when it was simply including some quantitative data (such as logframe outputs) into an overall qualitative design, rather than combining and comparing QUANT/QUAL methods of analysis.

Figure 8: Quality ratings for Section C (methods) 2017

Overall, the most frequent strength of reports was regarding specifying and justifying relevant evaluative approaches and evaluation criteria; with significant weaknesses (with the exception of ECAR) in clearly specifying ethical considerations. Reports from ECAR, ROSA and MENA maintained the highest overall quality regarding methodologies.

Only ECAR was consistently strong with regard to ethics, which was among the weakest rating indicators overall. This is a similar situation to 2016, when MENA was the most consistent. Many evaluations can still be improved by ensuring that the ethical considerations taken by the evaluation are explicitly referenced and discussed in the final report – even where there is no clear risk of harm to participants. UN Evaluation Group ethical guidelines require that evaluation consider independence, impartiality, credibility, conflicts of interest, honesty and integrity, accountability, respect for dignity and diversity, rights, confidentiality, avoidance of harm, accuracy, completeness and reliability, and transparency.

Nineteen evaluations were rated as highly satisfactory with regard to methods, including: ECAR Kazakhstan 2017/001 Evaluation of ECE/ECD Systems in Kazakhstan; EAPR Indonesia 2017/014 Support to community sanitation in eastern Indonesia; ESAR Angola 2017/007 Fomative Evaluation of Angola Country Programme (2015-2019); LACR Guatemala 2017/008 Evaluation of UNICEF support to the National Strategy on the Protection of Human Rights of Girls and Boys 2014-2016; MENA State of Palestine 2017/001 Evaluation for Humanitarian Action for Children.

22%

40%36%

2%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 20: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

19

Table 5: Heat map of average rating of indicators from 2017 evaluation reports for Section C.

Criteria Methods Ethics

Mean 81% 86% 58%

Median 100% 92% 67%

ECAR 93% 93% 95%

EAPR 67% 92% 74%

ESAR 77% 84% 37%

LACR 74% 78% 66%

MENA 81% 88% 79%

ROSA 90% 93% 75%

WCAR 82% 79% 23%

HQ 73% 85% 27% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Findings Section D covers the completeness and logic of findings, quality of analysis, and integration of results based management (RBM). Of the 88 reports, there were 11% ‘highly satisfactory’ (11% in 2016) reports, 66% ‘satisfactory’ (60% in 2016), 17% ‘fair’ (28% in 2016) and 6% ‘unsatisfactory’ (1% in 2016).

Reports most consistently met UNICEF standards in terms of the completeness of findings – responding systematically to evaluation criteria and questions. However, the quality of evaluative analysis and the integration of RBM both require strengthening to more consistently meet UNICEF standards.

Figure 9: Quality ratings for Section D (findings) 2017

Reports from ROSA, EAPR, ESAR and ECAR were found to be strongest in terms of responding systematically to the evaluation matrix; evaluations from the ESAR, ROSA, and ECAR had the highest quality evaluative analysis on average; and EAPR was strongest in terms of integrating results based management. Examples of strong findings sections include ECAR Macedonia 2017/003 Evaluation of UNICEF Roma Health Mediators Programme; EAPR Indonesia 2017/014 Support to community sanitation in eastern Indonesia; ESAR Ethiopia 2017/008 An Impact Evaluation of Alternative Basic Education in Ethiopia; LACR

11%

66%

17%

6%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 21: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

20

Dominican Republic 2017/002 Evaluation of the initiative Timely Birth Registration in Prioritized Hospitals in the Dominican Republic.

Table 6: Heat map of average rating of indicators from 2017 evaluation reports for Section D.

Findings Analysis RBM

Mean 83% 77% 74%

Median 100% 83% 83%

ECAR 86% 83% 80%

EAPR 87% 74% 85%

ESAR 86% 85% 71%

LACR 74% 70% 65%

MENA 79% 75% 76%

ROSA 88% 83% 79%

WCAR 81% 70% 76%

HQ 83% 67% 50% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Conclusions and lessons learned This Section was found to be one of the weakest sections overall, with stronger conclusions but the absence or misinterpretation of lessons often accounting for reports rating as ‘fair’ in this section. 26% of reports were rated as ‘highly satisfactory’ (24% in 2016) and 28% rated as ‘satisfactory’ (32% in 2016) – in both cases these reports included relevant conclusions and lessons. 39% of reports rated as ‘fair’ (40% in 2016) either because of missing lessons or of conclusions that did not add substantive additional analysis to the findings. 7% of reports were rated as ‘unsatisfactory’ (5% in 2016).

Figure 10: Quality ratings for Section E (conclusions and lessons) 2017

Conclusions were found to be a strength of reports from the ECAR, WCAR, HQ and EAPR. While lessons learned were weaker overall, the most consistency in terms of quality was found in evaluations from ECAR. Some examples of strong reports are ECAR Moldova 2017/001 Evaluation of the Government of Moldova - UNICEF 2013-2017 Country Programme of Cooperation; EAPR Cambodia 2017/012 Evaluation of the UNDAF Cycles 2011-2015 and 2016-2018 in Cambodia; LACR Honduras 2017/001 Evaluación de implementación estrategia Retorno de la Alegría para la niñez migrante en Honduras; WCAR Nigeria 2017/002 Impact

26%28%

39%

7%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 22: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

21

Evaluation of UNICEF Nigeria Girls’ Education Project Phase 3 (GEP3) Cash Transfer Programme (CTP) in Niger and Sokoto States.

Table 7: Heat map of average rating of indicators from 2017 evaluation reports for Section E.

Conclusions Lessons

Mean 81% 57%

Median 89% 67%

ECAR 86% 85%

EAPR 81% 54%

ESAR 75% 38%

LACR 79% 67%

MENA 79% 52%

ROSA 79% 57%

WCAR 84% 49%

HQ 82% 40% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Recommendations Section F was adapted for 2016 to assess recommendations separately from lessons learned. This makes comparison with previous years unreliable. 22% of reports rated ‘highly satisfactory’ (16% in 2016), 60% ‘satisfactory’ (67% in 2016), 18% ‘fair’ (15% in 2016), and 0% ‘unsatisfactory’ (2% in 2016). Overall, a similar percentage of reports (82% in 2017, 83% in 2016) include recommendations that meet UNICEF standards, but the quality within these sets (meet/not-meet) has improved. There were no unsatisfactory recommendations: an important contributing factor to utility.

Figure 11: Quality ratings for Section F (recommendations) 2017

Recommendations remained stronger regarding the presentational elements required to support utility (targeting, prioritization) than they were in terms of their analytical grounding in the findings and conclusions. Recommendations from ECAR were found to be the best quality in terms of both analysis and presentational factors. Recommendations from EAPR, WCAR

22%

60%

18%

0%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 23: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

22

and ESAR were well structured; recommendations from MENA were strong in terms of analysis.

Examples of strong individual reports from across the regions and HQ include: ECAR Republic of Montenegro 2017/001 Evaluation of the Programme “Montenegro – Investment case on Early Childhood Development”; EAPR Indonesia 2017/014 Support to community sanitation in eastern Indonesia; ESAR Burundi 2017/005 Evaluation finale du projet de lutte contre la malnutrition dans la province de Ngozi au Burundi (mai 2013- décembre 2016); Evaluation Office 2017/010 Endline evaluation of the H4+ Joint Programme Canada and Sweden (Sida) 2011-2016; LACR Guatemala 2017/008 Evaluation of UNICEF support to the National Strategy on the Protection of Human Rights of Girls and Boys 2014-2016.

Table 8: Heat map of average rating of indicators from 2017 evaluation reports for Section F.

Analysis Presentation

Mean 74% 80%

Median 72% 100%

ECAR 85% 97%

EAPR 79% 81%

ESAR 75% 80%

LACR 74% 78%

MENA 83% 74%

ROSA 68% 62%

WCAR 65% 81%

HQ 58% 63% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Report structure, logic and clarity Section G focuses on the structure, logic, and clarity of the report, including style and presentation. The ratings show that reports improved again in this parameter, with 87% of 2017 reports meeting UNICEF standards compared to 74% in 2016. 31% of reports were found to be ‘highly satisfactory’, 56% ‘satisfactory’, 12% ‘fair’, and 1% ‘unsatisfactory’.

Figure 12: Quality ratings for Section G (structure, logic and clarity) 2017

31%

56%

13%

1%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 24: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

23

Once again, structure of reports was found to be the consistent and high quality feature, while the majority of reports were also complete to UNICEF standards. Comprehensive reports were found to be a particular strength in ECAR, MENA and HQ, whereas the logic and style of reports was a strength of ECAR, MENA, WCAR and ESAR evaluations. Some examples of well-structured reports include ECAR Serbia 2017/005 Summative evaluation to strengthen implementation of justice for children system in the Republic of Serbia (2010-2017); EAPR Philippines 2017/001 Evaluation of the UNICEF Philippines Country Office 'Early Childhood Care and Development' and 'Basic Education' components of the 7th GPH-UNICEF Country Programme 2012-2016; ESAR Somalia 2017/002 Real time evaluation of UNICEF SCO humanitarian response to the pre-famine crisis; WCAR Guinea Bissau 2017/003 Final Evaluation of the Community Health Component of the “Programme for reducing Maternal and Infant Mortality (PIMI) in Guinea-Bissau.

Table 9: Heat map of average rating of indicators from 2017 evaluation reports for Section G.

Completeness Structure

Mean 83% 91%

Median 83% 100%

ECAR 95% 95%

EAPR 85% 85%

ESAR 83% 92%

LACR 71% 87%

MENA 86% 95%

ROSA 71% 88%

WCAR 81% 94%

HQ 87% 90% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Evaluation Principles (HRBAP, gender equality and equity) Section H brings together the UNICEF commitments to human rights based approaches to programming (HRBAP), equity and gender equality. Gender equality is assessed through integration of the UN-SWAP evaluation performance indicator, including four criteria for assessment specified by the UN Evaluation Group. UN-SWAP performance and trends are also analysed in a standalone sub-section of the findings later in the report.

7% of reports were rated as ‘highly satisfactory’ (9% in 2016), 38% of reports were rated ‘satisfactory’ (30% in 2016), 42% ‘fair’ (38% in 2016), and 14% ‘unsatisfactory’ (24% in 2016). Thus, while 56% of reports do not currently meet UNICEF requirements for integration of human rights and gender equality, this is a small but important improvement since 2016 (62% of reports) – especially in terms of the 10 percentage point reduction in unsatisfactory reports.

Page 25: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

24

Figure 13: Quality ratings for Section H (evaluation principles) 2017

Although integration of evaluation principles requires improvement, there were some areas of strength. The integration of both human rights based approaches and gender equality was strongest in ECAR evaluation reports; while the integration of gender equality and equity (analyzing the differentiated effects of interventions on various socially defined groups) was strongest in MENA (which was the same case in 2016).

The six reports rated ‘highly satisfactory’ were: ECAR Moldova 2017/001 Evaluation of the Government of Moldova - UNICEF 2013-2017 Country Programme of Cooperation; ECAR Republic of Montenegro 2017/001 Evaluation of the Programme “Montenegro – Investment case on Early Childhood Development”; ECAR Ukraine 2017/001 Evaluation of the Country Programme of Cooperation between the Government of Ukraine and UNICEF 2012-2016"; LACR Ecuador 2017/001 Evaluación sumativa de los servicios de desarrollo infantil de los Centros Infantiles del Buen Vivir (CIBV) y Creciendo con Nuestros Hijos (CNH); MENA Jordan 2017/004 Evaluation of the Ma’an (Together) towards a Safe School Environment Programme 2009-2016; ROSA Bangladesh 2017/001 Programme Evaluation of UNICEF Bangladesh Communication for Development (C4D) Programme from 2012 to 2016.

Table 10: Heat map of average rating of indicators from 2017 evaluation reports for Section H.

HRBAP Gender and Equity

Mean 66% 71%

Median 67% 79%

ECAR 91% 86%

EAPR 67% 75%

ESAR 53% 57%

LACR 65% 72%

MENA 71% 89%

ROSA 65% 70%

WCAR 58% 65%

HQ 51% 57% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Overall, evaluation principles continue to represent the weakest performing area of evaluation reports, which is consistent with previous years. However, the long-term trend of incremental improvement has also been maintained (see Figure 14). As with 2016, while the average performance of reports is similar in terms of integrating HRBAP and gender & equity, there is greater consistency in the responsiveness of evaluations to gender.

7%

38%42%

14%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 26: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

25

Figure 14: Long term patterns in the inclusion of human rights based approaches and gender & equity (reports rated as satisfactory or highly satisfactory)

Executive summary The final section of analysis, Section I, gives additional focus to the executive summary as a key element of utility and the main point of interaction between many evaluations and senior decision makers. 65% of reports had executive summaries that met UNICEF standards, a decline from 81% in 2016 and an issue of concern. 25% of reports were rated ‘highly satisfactory’ (24% in 2016), 40% ‘satisfactory’ (57% in 2016), 32% ‘fair’ (18% in 2016) and 2% ‘unsatisfactory’ (1% in 2016).

Figure 15: Quality ratings for Section I (executive summary) 2017

Good quality executive summaries were a particular feature of evaluation reports from ECAR and MENA, as was the case in 2016. Executive summaries from ESAR were also above average quality, which represents an improvement for 2017. In general, the main ways to improve executive summaries are to ensure all key elements are covered (such as limitations or unexpected results) whilst keeping them as concise as possible.

Some individual strong examples of executive summaries are: ECAR Azerbaijan 2017/003 Summative Evaluation of the Access to Justice for Children project; ESAR Zambia 2017/002 Mid-Term Evaluation of the Millennium Development Goal Initiative Accelerating Progress towards maternal, neonatal and child mortality reduction in Zambia; LACR El Salvador 2017/001 Evaluacion Del Programa Triple E: Educacion y Desarrollo Integral De Primera Infancia, Empoderamiento De Familias y Jovenes y Entorino Protector Comunitario En

19%

34%

44%

50%

57%

33%

67% 66%

20%

34%

46%

52% 51%

33%

66%71%

2010 2011 2012 2013 2014 2015 2016 2017

Human rights-based approach Gender equality & equity

Linear (Human rights-based approach) Linear (Gender equality & equity)

25%

40%

32%

2%

Highly satisfactorySatisfactoryFairUnsatisfactory

Page 27: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

26

Comunidades Seleccionadas; MENA Egypt 2017/001 The Evaluation of Meshwary Project Phase II; ROSA Afghanistan 2017/003 Evaluation of WASH in Schools; WCAR Togo 2017/008 Evaluation d’impact du projet pilote des transferts monétaires au Togo.

Table 11: Heat map of average rating of indicators from 2017 evaluation reports for Section I

Executive summary

Mean 77%

Median 78%

ECAR 85%

EAPR 74%

ESAR 81%

LACR 69%

MENA 84%

ROSA 75%

WCAR 75%

HQ 67% Percentages represent the degree of compliance with UNICEF standards of all indicators in that section (100% = all indicators fully comply with UNICEF standards). Compliance is based on 100%=fully, 67%=mostly, 33%=partially, 0%=not.

Overall regional performance Comparison of evaluation reports across the UNICEF regions reveals a reduction in the number of reports from ESAR (normally the largest region for evaluation), but an improved average quality in the reports submitted by this Region. ROSA and EAPR also produced slightly fewer reports, but improved in the proportion of reports reaching at least a satisfactory level. By comparison, there was a large increase in the number of reports from ECAR, with 7 of these rated as ‘highly satisfactory’ (1 in 2016) – although also 3 ‘fair’ reports unlike in 2016. WCAR also increased the number of reports, but the ‘additional’ evaluations were rated ‘fair’; LACR saw an increased in the number and percentage of ‘fair’ evaluation reports, while MENA one additional report that was also rated ‘highly satisfactory’. The main difference relates to the Evaluation Office, with a significant reduction in the number of corporate evaluations being completed in 2017.

No particular indicator in the evaluation quality assessment was a predictor of the overall rating of a report. All reports that were rated ‘highly satisfactory’ overall did share some common attributes: they were rated as ‘highly satisfactory’ for both the inclusion of a comprehensive theory of change, and for the elaboration of a clear purpose.

Page 28: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

27

Figure 16: Distribution of 2017 evaluation report quality across the UNICEF regions

An analysis of the object of each evaluation indicates that all regions included either a country programme evaluation or a thematic evaluation within a country programme. However, distribution of more strategic evaluations is uneven, with some regions having a predominance in project and programme evaluations, whilst others – such as ROSA – having fewer evaluations overall but a higher proportion of strategic evaluations.

Figure 17: Number of evaluations assessing different evaluation objects by region, 2017

Overall thematic performance Quantitative analysis of alignment with Strategic Plan Outcomess (including cross-cutting issues) found that most evaluations cover multiple thematic areas. Similar distributions of quality were found across all thematic areas.

31

42

5

1 2

7

6 6

10

3

75

3

10

7

1

1

1

12

ECAR EAPR ESAR HQ LACR MENA ROSA WCAR

. Unsatisfactory Fair Satisfactory Highly Satisfactory .

0

2

4

6

8

10

12

14

16

18

ECAR EAPR ESAR HQ LACR MENA ROSA WCAR

Project Pilot/innovation Programme

Joint Programme Thematic area Country Programme

Organization/business unit Strategy

Page 29: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

28

The largest body of evaluative knowledge was generated for health and education, followed by child protection and social inclusion (see figure 17). These same areas were also most covered in 2016 evaluation reports. The priority areas for action to improve all reports are similar, with weaknesses in the articulation of human-rights based approaches (HRBAP), gender equality, ethics, and lessons learned. As expected, evaluations that successfully mainstreamed gender equality as a cross-cutting theme were also strongest regarding HRBAP and equity.

Figure 18: Distribution of 2017 evaluation report coverage across the UNICEF thematic areas

UN-SWAP performance and trends UN-SWAP evaluation performance indicator (EPI) criteria were assessed according to the standards and scoring system established by UNEG.7 Criteria cover the integration of GEEW within: the evaluation scope of analysis and indicators (Q1); evaluation criteria and questions (Q2); methodology; methods and tools (Q3); findings, conclusions and recommendations (Q4).

Each UN-SWAP EPI criterion was rated 0-3, providing an overall evaluation score of 0-12. The mean score of all evaluation reports assessed for UN-SWAP is used to calculate the overall performance of UNICEF evaluations according to the following scale:

❖ 0-3,5 points = Missing requirements ❖ 3,51-7,5 points = Approaches requirements ❖ 7,51-10,5 points= Meets requirements ❖ 10,51-12 = Exceeds Requirements

The aggregated average score for 2017 was 6.15, which is classified as Approaching Requirements. This is the same as the 2016 and 2015 cycles, indicating that advancing the inclusion of gender equality within UNICEF evaluations remains a challenge. The full UN-SWAP calculations table is included in Annex 4.

Reports were slightly stronger with regard to integrating gender in the scope, indicators, criteria and questions of evaluations. The priority for action to improve UN-SWAP remains to ensure gender analysis is used to inform evaluation findings, conclusions and

7 Two UNICEF evaluation reports were included in a 2016 independent assessment by UNEG of the application of the UN-SWAP EPI criteria across all UN entities; and were found to have been rated in accordance with the required standards.

Page 30: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

29

recommendations. Reports from MENA and ROSA represent good practice in this regard, rating significantly above the organizational average for UN-SWAP overall (see Table 13).

Table 12: Performance according to UN-SWAP evaluation criteria, 2017

Indicator n Average score Classification

Scope and indicators 88 1.50 Satisfactorily integrated

Criteria and questions 88 1.65 Satisfactorily integrated

Methods and tools 88 1.45 Partially integrated

Gender analysis 88 1.55 Satisfactorily integrated

Overall 88 6.15 Approaches requirements

Figure 19: Performance of UNICEF reports across the UN-SWAP overall and individual UN-SWAP criteria

In 2017, four evaluations were rated as fully integrating UN-SWAP requirements for all criteria: Moldova 2017/001, Evaluation of the Government of Moldova - UNICEF 2013-2017 Country Programme of Cooperation; Republic of Montenegro 2017/001, Evaluation of the Programme “Montenegro – Investment case on Early Childhood Development”; Azerbaijan 2017/003, Summative Evaluation of the Access to Justice for Children project; and Ukraine 2017/001, Evaluation of the Country Programme of Cooperation between the Government of Ukraine and UNICEF 2012-2016".

Table 13: Regional variations in report performance according to UN-SWAP criteria, 2017

UN-SWAP Variation from average

ECAR 7.63 Meeting 1.48 Above

EAPR 4.88 Approaching -1.27 Below

ESAR 5.27 Approaching -0.88 Below

LACR 5.62 Approaching -0.53 Below

MENA 8.86 Meeting 2.71 Above

ROSA 6.43 Approaching 0.28 Above

WCAR 5.24 Approaching -0.91 Below

HQ 6.40 Approaching 0.25 Above

14 198 13

2331

2829

44 26 48 39

7 12 4 7

Scope and indicators Criteria and questions Methods Analysis

Fully integrated3 points

Satisfactorily integrated2 points

Partially integrated1 point

Not at all integrated0 points

Page 31: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

30

Other observable patterns This section explores causal links between evaluation report quality and other characteristics of evaluations. Several axes have been considered, including geographic scope, management of the evaluation, purpose, level of results, and evaluation design based on the draft UNICEF evaluation taxonomy. The classification of reports is derived from the quality assessments.

Most evaluations (82%) are managed directly by UNICEF. Of these, 76% were rated as fully meeting UNICEF standards. These are almost identical patterns to 2016. Remaining evaluations either had unclear management arrangements (7%), were managed jointly with another development partner (5%) or were managed jointly with the government (6%). Once again, in 2017 there were no purely quantitative evaluations; most evaluations used mixed methods, with 17% of evaluations purely qualitative (19% in 2016). Unlike in previous years, however, there was no difference in the quality of reports between these methodological approaches.

Figure 20: Proportion and number of evaluations using different methodological designs across UNICEF thematic areas, 2017

The type of evaluation (Table 14) was strongly associated with the quality of evaluation reports, but with variations from previous years in the patterns. Project evaluations improved in both quality and number; as did strategy evaluations. Country programme evaluations and joint programme evaluations both reduced in number, but retained exactly the same level of quality as 2016. By comparison, programme and pilot/innovation evaluations reduced in both number and quality.

Table 14: The number of different types of evaluation in 2017 and 2016, and the percentage of reports meeting UNICEF standards

Type Number 2017

Change since 2016

Percent meeting standards

Change since 2016

Programme 24 -5 67% -7%

Project 15 +7 75% +25%

Country Programme

10 -5 83% 0%

Strategy 8 +5 89% +29%

WASH Nutrition HealthSocial

inclusion

Childprotectio

nHIV/AIDS

Education

Gender-equality(cross-

cutting)

Humanitarian

(cross-cutting)

Qualitative 3 4 9 7 8 3 9 7 5

Mixed methods 20 26 39 29 30 10 30 17 9

0%

25%

50%

75%

100%

Page 32: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

31

Pilot/innovation 2 -2 50% -50%

Thematic area 3 +1 75% +8%

Joint prog 1 -4 100% 0%

Organisational 0 -1 – –

System 0 -4 – –

Policy/norms 0 -2 – –

2017 saw a large increase in the number of quasi-experimental evaluations, which were correlated mostly with project-level evaluations. By comparison, the number of participatory evaluations declined, but the quality of these was maintained. The main improvement in quality related to evaluations with an action-research base design, while there was a drop in quality of case study evaluations to 59% meeting UNICEF standards (71% in 2016). Case study designs are correlated with programme evaluations, which were displaced by project evaluations in 2016.8

Figure 21: The number of different designs of evaluation in 2017 and the percentage of reports meeting UNICEF standards

Across the portfolio, only 3 evaluations were confined to assessing results at the output level (6 in 2016), with half meeting UNICEF standards (Table 14). By comparison, 42 evaluations attempted to assess results at the outcome-level (50 in 2016) and 43 included the impact criterion (41 in 2016). The overall quality of evaluations at different levels was similar.

Table 15: Number of reports meeting UNICEF standards for different levels of evaluation in 2017

Output Outcome Impact

Highly Satisfactory

1 5 7

8 This is partly explained by the smaller number of corporate evaluations in 2017, which historically have tended to use case study designs and to achieve highly satisfactory ratings .

2322

1412

5 53

21 1

83%

59%

86%

75%80%

40%33%

100% 100%

0%

Number of evaluations % meeting UNICEF standards

Page 33: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

32

Satisfactory 1 26 23

Fair 1 11 13

Unsatisfactory 0 0 0

Sum 3 42 43

Most evaluations, by a large margin, were scoped at the national level; 77 reports were for national-level evaluations and 72% of these fully met UNICEF standards (77% in 2016). There were fewer multi-country/sub-regional evaluations than in previous years (4 in 2017), but all of these met UNICEF standards. Only two global corporate evaluations were submitted for rating in 2017; both were rated satisfactory. The reduced number of corporate evaluations (7 were published in 2016) has implications for quality ratings on the overall statistics for evaluation report quality because these have historically been of high quality. Two evaluations were published by Programme Division – one global and one multi-country – and one by Supply Division, adding an important dimension to the evaluative knowledge available to UNICEF. Only one report, a programme evaluation, rated high satisfactory across all GEROS criteria: Republic of Montenegro 2017/001, Evaluation of the Programme “Montenegro – Investment case on Early Childhood Development”.

Page 34: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

33

Conclusions and Recommendations

Conclusion 1: UNICEF evaluation reports in 2017 have maintained the quality and coverage of the previous year, while being fewer in number but less less strategic in scope due to an increase in the number and proportion of project evaluations.

The GEROS system, including UN-SWAP assessment, was applied to 100% of evaluation reports uploaded to the Evaluation and Research Database9. The overall picture of GEROS data for 2017 reveals a consistency in performance by comparison with 2016. The portfolio of evaluations has changed in some ways – there are more quasi-experimental and project evaluations than previously, and fewer corporate evaluations – but the key indicators of performance have remained constant. The percentage of reports meeting UNICEF standards is similar to 2016, as is the overall UN-SWAP score and the number of countries covered.

Evaluations that were rated as ‘unsatisfactory’ were eliminated in 2017; and there was an increase in the number of reports rated ‘highly satisfactory’. However, most of this improvement is within the band of reports that already met UNICEF standards. There was an increase in the number of lower-level project evaluations applying high quality methods (such as quasi-experimental designs), but this being at the risk of few strategic evaluations (such as country-led or thematic evaluations). In addition, some consistent weaknesses remain in the overall portfolio. These relate to the integration of UNICEF principles (human-rights based approaches, gender equality, and equity), the articulation of evaluation designs and limitations, the analysis on unexpected/unintended results, and the inclusions of lessons. It is these key factors that seem to be preventing further improvement in the overall universe of evaluation reports.

Recommendation 1: In aligning the evaluation function to the UNICEF Strategic Plan 2018-2021, incentivize and support the use of more strategic evaluations by re-focusing away from project and output-level evaluations.

EVALUATION OFFICE

Longitudinal review of UNICEF meta analyses reveal some norms and standards that have been consistently achieved for a prolonged period of time. These include indicators around predominantly structural and narrative elements such as: report structure, the completeness of findings, and the inclusion of some basic information. Consistently delivering on these has meant that evaluation reports, including project level evaluations, are no longer rating as ‘unsatisfactory’.

However, there are other indicators that are either inconsistent, or persistent weaknesses across time and reports. Inconsistent indicators include those that broadly relate to the more analytical and critical elements of the GEROS framework: the specification of a report design, theories of change, detailed information on sampling approaches and sources, the inclusion of ethics, and the elaboration of limitations. Persistent weaknesses include the use of human-rights, equity and gender analyses, exploring unexpected effects, the elaboration of lessons learned, and describing the process for developing recommendations.

In combination, these two trends mean that GEROS has largely achieved its original aim of ensuring UNEG/UNICEF evaluation report standards are consistently applied, but that UNICEF business areas are not being incentivised to undertake more complex and strategic evaluations such as country-led, thematic, or multi-country by the GEROS system. With increasingly systematic application of basic reporting standards even for low-budget

9 Reports are classified as evaluations or research by submitters, and checked (and reclassified if necessary) by the Evaluation Office.

Page 35: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

34

evaluations, project evaluations are more frequently attaining a ‘satisfactory rating’ than more complex evaluations – incentivizing lower level evaluations.

Incentivizing more complex evaluations also requires addressing current weaknesses in terms of clear evaluation designs that genuinely apply mixed methods for both data collection and analysis. In particular, reports can better explain how different methods were sequenced and combined to achieve triangulation; and how specific methods were applied to examine criteria such as efficiency. Evaluation Office can review the current guidance and determine whether refreshed or new material on designs for strategic evaluations, including evaluations of innovations may be required to support this.

Conclusion 2: While the integration of human rights based approaches and gender equality commitments continues to improve over time, the pace of this change is insufficient to meet UNICEF targets, including for UN-SWAP.

As with the previous meta-analysis, the UN-SWAP evaluation performance indicator has remained static for 2017. Other indicators, and trend analysis over time, do indicate that incremental improvements are being made – but these are not at the pace required to fully meet UNICEF commitments. Accelerating the integration of gender-responsive and human rights-based evaluation designs and analysis remains a challenge across nearly all regions; although lessons may be available from the improvement demonstrated by ECAR in 2017.

Recommendation 2: To ensure that no child is left behind and to deliver on the UNICEF equity agenda, initiate urgent action to overcome persistent bottlenecks and to strengthen the full integration of HRBAP, equity and UN-SWAP requirements in all evaluations using UN Evaluation Group guidance and good practices.

EVALUATION OFFICE, AND REGIONAL EVALUATION AND GENDER ADVISORS

The achievements of UNICEF evaluations in consistently meeting or exceeding standards for purpose and objectives demonstrates the capacity of the evaluation function in UNICEF to improve final report quality through the improving evaluation management. For example, terms of reference can be used to ‘set evaluations up’ to better meet UNICEF requirements. The same level of success has not been achieved for integrating gender equality, human-rights and equity. Equity analysis is especially inconsistent, which is a particular concern for UNICEF. The meta-analysis strongly recommends that the evaluation office and regional advisors work with all evaluation managers to apply current UN Evaluation Group guidance10 on integrating human rights and gender equality within the full evaluation cycle: from planning, to ToR, to ensuring that evaluators are recruited who are sufficiently experienced in the required standards, and to quality assurance of evaluation processes and products.

Recommendation 3: Reassess the integration of gender, human-rights and equity indicators within the GEROS assessment tool, with a view to generating more detailed insights on the bottlenecks to delivering UNICEF commitments.

EVALUATION OFFICE

While the current UN-SWAP indicators, and dedicated gender, human-rights, and equity questions within the GEROS assessment tool provide an indication of overall performance

10 A growing body of material is available on concrete approaches to implementing human rights and gender equality in evaluations. This includes normative (theory-based) Integrating Human Rights and Gender Equality in Evaluation – Towards UNEG Guidance, a UNEG handbook, the revised UNICEF evaluation report standards, a guidance manual from UN Women on managing gender responsive evaluations, and recent good practice guidance from across the UN system compiled by UNEG. More broadly, Better Evaluation provides a list of materials on gender analysis: https://www.betterevaluation.org/en/search/site/gender%20equality.

Page 36: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

35

and trends, the challenge of improving this performance my require more disaggregated analysis of bottlenecks than is currently available. It may help, therefore, to revisit the current set of indicators on principles, with a view to providing more nuanced insights into where in the ‘evaluative-chain’ the biggest opportunities are for improvement.

Conclusion 3: Inconsistency in the inclusion and quality of lessons learned has important implications on both the quality assessment of reports and the utility of evaluations.

Comparison of GEROS data for 2016-2017 (where lessons are included in the same section as conclusions) and 2012-2015 (where lessons were included in the same section as recommendations) reveals that the inclusion of properly formulated lessons learned is the most inconsistent element of UNICEF evaluation reports. Without high quality generalizable lessons learned, evaluation evidence is useful mostly for the individual intervention being evaluated and as part of an overall picture of UNICEF effectiveness.

Not all evaluations have lessons and not all terms of references call for evaluation reports to provide lessons – but the current UNICEF standards include the expectation that lessons will be provided. Where lessons are present, these frequently focus on the intervention and the immediate context – reflecting the limited view of the evaluation team rather than the wider knowledge needs of UNICEF as an institution.

The exclusion of lessons limits the maximum rating of Section E in the quality assessment template to ‘fair’; which carries a weighting of 15% of the final rating for the report. Given increasing consistency in the presence of other elements within evaluation reports, lessons learned thus have an important influence on the final rating. Not all evaluations have designs that are well suited to generating lessons, and some have a scope within which it may not be feasible or useful to develop ‘original knowledge’. Evaluations submitted to GEROS are also unclear on whether reported lessons have been ‘identified’, or actually ‘learned’ (i.e. incorporated into programming and management).

The current assumptions within the evaluation standards (which form the basis for GEROS) that the blanket inclusion of lessons learned is desirable may not, therefore, be appropriate. However, there is currently no source of guidance on which types and levels of evaluation should include lessons learned, other than whether they are included as a requirement in the Terms of Reference (ToR). ToRs are also inconsistent with regard to lessons, and so there is a gap in the ensuring that evaluations contribute to the wider knowledge management function.

Recommendation 4: Clarify UNICEF standards regarding which types of evaluations are required to include lessons learned, and facilitate knowledge exchange to better support the development and sharing of lessons.

EVALUATION OFFICE

Building on a similar recommendation from the previous meta-analysis, it is recommended that the Evaluation Office review the standards on including lessons in all evaluation reports. The determination on which evaluations are required to include lessons learned (in both ToRs and final reports) should be accompanied with clear guidance that is both useful to evaluation managers, and can be incorporated in the GEROS standards (and reflected in the advice provided by regional helpdesks).

In addition to incorporating these reviewed standards on lessons into GEROS, where lessons learned are required, further knowledge exchange on good practices, and improved tools, guidance or templates may be developed to address current inconsistencies in the understanding of what constitutes a lesson. Moreover, the Evaluation Office should consider ways to enhance dissemination of good practices and lessons from evaluations to demonstrate the value of this evaluation purpose.

Page 37: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

36

Page 38: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

37

Appendices

Annex 1. Terms of Reference UNICEF Evaluation Office

Terms of Reference: Long Term Agreement (LTA) for the implementation of the Global Evaluation Reports Oversight System (GEROS). Period 2015-2018

1) Background and Introduction

The evaluation function seeks to strategically contribute to UNICEF’s performance by providing good-quality evidence for learning, decision making, policy advocacy, as well as accountability purposes. Reflecting the decentralized nature of UNICEF, the majority of evaluations supported by UNICEF are managed at a decentralized level. While the decentralized nature of the evaluation function ensures that evidence generated is relevant to the local context and therefore more likely to inform national policies for children, it poses the challenge of setting up a consistent corporate system to ensure good quality and credibility.

UNICEF’s Evaluation Office (EO) has in place an Evaluation Quality Oversight System (GEROS) since 2010. The GEROS is aimed at monitoring the impact of efforts to strengthen the UNICEF evaluation function globally. The system consists of rating evaluation reports commissioned by UNICEF Country Offices, Regional Offices and HQ divisions against the UNICEF/UNEG Evaluation Report Standards. All reports and the results of their quality assessment are made available in the UNICEF Global Evaluation and Research Database (EDB), as well as made publicly available on the UNICEF external website. GEROS is an organization-wide system. GEROS (annex 1)

The Global Evaluation Reports Oversight System (GEROS) has four main objectives:

1. Provide senior managers with a clear and short independent assessment of the quality and usefulness of individual evaluation reports, including those commissioned by their own offices;

2. Strengthen internal evaluation capacity by providing to commissioning offices feedback with practical recommendations on how to improve future evaluations. Commissioning offices can also use the feedback received to better assess the performance of external consultants to be hired for future evaluations.

3. Report on the quality of evaluations reports, by reviewing and assessing the quality of final evaluation reports commissioned by UNICEF Offices. Quality of evaluations reports is reported to senior management mainly through three channels: a) annual report of the EO Director to the Executive Board; b) the Global Evaluation Dashboard, and c) inclusion of this information in the Global Evaluation database;

4. Contribute to corporate knowledge management and organizational learning, by identifying evaluation reports of satisfactory quality to be used in meta-analysis to be shared within the organization, as well as facilitating internal and external sharing of satisfactory evaluations reports

The GEROS will be subject to an external assessment which will take place from June to September this year. The assessment will aim to: analyze the extent to which the GEROS has achieved its expected objectives; determine the conceptual clarity of the GEROS approach and the adequacy of its methodology and tools; and identify constraining and enabling factors for an effective system implementation. The assessment is also expected to propose recommendations and lessons to inform the forthcoming cycles of GEROS implementation.

Against this backdrop, the Evaluation Office is looking for an institution to:

i) Based on the findings, recommendations and lessons proceeding from the external assessment referred earlier, review and adjust the GEROS approach, methodology

Page 39: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

38

and tools in close collaboration with the Systemic Strengthening Unit of the Evaluation Office.

ii) Guided by the revised GEROS approach, review and rate the quality of final evaluation reports supported by UNICEF globally, at the country, regional offices and HQ divisions; and provide feedback aimed at increasing the quality of future evaluation reports.

iii) Further rate each evaluation report in terms of its compliance with the UN-SWAP assessment tool.

iv) Review the terms of references and inception reports produced by the regional and country offices, upon demand, as part of the regional technical assistance facilities.

2. Expected deliverables

The selected institution is expected to deliver the following:

i) Adjusted methodological approach and tools for GEROS, based on the findings and recommendations of the external assessment.

ii) Reviews and ratings of all evaluation reports, submitted in one year timeframe up to a maximum of 150 per year, in English, French and Spanish. Ratings of evaluation reports in Arabic and Russian will be desirable.

iii) Ratings of all final evaluation reports using the UN-SWAP assessment and meta- evaluation tool.

iv) Executive feedbacks for each of the evaluations reviewed. The executive feedback is also expected to include the rating of each report using the UN-SWAP tool.

v) An annual global analysis of trends, key weaknesses and strengths informing whether UNICEF is meeting the requirements and criteria as set out within the UN-SWAP;

vi) A meta-analysis on the quality of evaluation reports for each GEROS one-year cycle The meta-analysis is expected to include a Global analysis of trends, key weaknesses and strengths of the reports reviewed, including a sectoral/thematic analysis, lessons learned and good practices on Evaluation reports; and actionable recommendations to improve the GEROS system as well as the quality of Evaluation reports.

vii) Reviews of and feedback on terms of references and inception reports upon demand by Regional Offices as part of the regional facilities.

3. Management of the system

The Senior Evaluation Specialist, Systemic Strengthening, with support from the Knowledge Management Specialist of UNICEF’s Evaluation Office, will have responsibility of the overall management of the GEROS.

The selected institution is expected to appoint a project manager and to establish an explicit internal quality assurance system to ensure consistency of rating, quality and timely delivery of expected products; and overall coordination with UNICEF Evaluation Office. The project manager is expected to provide monthly updates including a tracking matrix highlighting the status of reviews, rating and executive feedback. At the end of each review cycle there will be a formal review and feedback process with both the Evaluation Office and the institution as part of the oversight and accountability role of such a global system.

4. Qualifications

Excellent and proved knowledge of evaluation methodologies and approaches

Proven experience on the design and implementation of quality assurance systems

preferably with UN agencies

Page 40: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

39

Proven experience in designing and conducting major evaluations

Excellent analytical and writing skills in English, French and Spanish; and Arabic are

Russian are desirable

Familiarity with UNEG/UNICEF evaluation standards is an asset

Sectorial knowledge of UNICEF areas of intervention (Child Protection, HIV-AIDS, WASH,

Education; Nutrition, Health, Social Inclusion, Gender Equality and Humanitarian Action) is an asset.

5. Duration of contract

The contract is expected to start the 1st of October 2015 and will expire on September 30th 2018.

6. Bidding documentation is as follow:

Cover letter explaining the value added of the proposed institution

Presentation of the institution, CVs of the project manager to be nominated for this contract, as well as of the specialists who will do the rating in English, French and Spanish

are mandatory and Arabic and Russian, are desirable

Description of the internal quality assurance system.

Technical proposal describing how the institution will equip itself to ensure the

deliverables as described in item 2 above.

Financial proposal, with the following details:

o Unit cost of reviewing individual reports using the GEROS assessment tool including executive feedbacks.

o Unit cost of reviewing individual reports using the UN-SWAP assessment tool

9. Payment

Cost of producing yearly meta-analysis on analysis of trends, key weaknesses and strengths of reports reviewed for both GEROS and the UN-SWAP (two separate reports).

Unit and cost of reviewing individual terms of references for regional and country offices

Unit and cost of reviewing individual inception reports for regional and country offices.

The total fee will depend on the actual number of completed quality reviews of evaluations the unit cost per report plus the cost of the meta- analyses. This is due to the uncertainty over the total number of actual reports that will be received. All billing would be based on actual work done so as to minimize cost if expected levels of activity did not materialize.

Page 41: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

40

Annex 2. GEROS evaluation quality assessment indicators

SECTION A: BACKGROUND (weight 5%)

Question 1.

Is the object of the evaluation clearly described?

Clear and relevant description of the intervention, including: location(s), timelines, cost/budget, and implementation status

Clear and relevant description of intended beneficiaries by type (i.e., institutions/organizations; communities; individuals…), by geographic location(s) (i.e., urban, rural, particular neighbourhoods, town/cites, sub-regions…) and in terms of numbers reached (as appropriate to the purpose of the evaluation)

Description of the relative importance of the object to UNICEF (e.g. in terms of size, influence, or positioning)

Question 2.

Is the context of the intervention clearly described?

Clear and relevant description of the context of the intervention (policy, socio-economic, political, institutional, international factors relevant to the implementation of the intervention)

Clear and relevant description (where appropriate) of the status and needs of the target groups for the intervention

Explanation of how the context relates to the implementation of the intervention

Question 3.

Is the results chain or logic well articulated?

Clear and complete description of the intervention's intended results

Intervention logic presented as a coherent theory of change, logic chain or logic framework

Question 4.

Are key stakeholders and their contributions clearly identified?

Identification of implementing agency(ies), development partners, primary duty bearers, secondary duty bearers, and rights holders

Identification of the specific contributions and roles of key stakeholders (financial or otherwise), including UNICEF

SECTION B: EVALUATION PURPOSE, OBJECTIVES AND SCOPE (weight 5%)

Question 5.

Is the purpose of the evaluation clearly described?

Specific identification of how the evaluation is intended to be used and to what this use is expected to achieve

Identification of appropriate primary intended users of the evaluation

Question 6.

Are the objectives and scope of the evaluation clear and realistic?

Clear and complete description of what the evaluation seeks to achieve by the end of the process with reference to any changes made to the objectives included in the ToR

Page 42: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

41

Clear and relevant description of the scope of the evaluation: what will and will not be covered (thematically, chronologically, geographically with key terms defined), as well as the reasons for this scope (e.g., specifications by the ToRs, lack of access to particular geographic areas for political or safety reasons at the time of the evaluation, lack of data/evidence on particular elements of the intervention)

SECTION C: EVALUATION METHODOLOGY (weight 15%)

Question 7.

Does the evaluation provide a relevant list of evaluation criteria that are explicitly justified as appropriate for the purpose of the evaluation? UNICEF evaluation standards refer to the OECD/DAC criteria. Not all OECD/DAC criteria are relevant to all evaluation objectives and scopes. Standard OECD DAC Criteria include: Relevance; Effectiveness; Efficiency; Sustainability; Impact. Evaluations should also consider equity, gender and human rights (these can be mainstreamed into other criteria). Humanitarian evaluations should consider Coverage; Connectedness; Coordination; Protection; Security.

Clear and relevant presentation of the evaluation framework including clear evaluation questions used to guide the evaluation

If the framework is OTHER than UNICEF standard criteria, or if not all standard criteria of the chosen framework are included, the reasons for this are clearly explained and the chosen framework is clearly described

Question 8.

Does the report specify methods for data collection, analysis, and sampling?

Clear and complete description of a relevant design and set of methods that are suitable for the evaluation's purpose, objectives and scope

Clear and complete description 0f the data sources, rationale for their selection and sampling strategy. This should include a description of how diverse perspectives are captured (or if not, provide reasons for this), how accuracy is ensured, and the extent to which data limitations are mitigated

Clear and complete description of the methods of analysis, including triangulation of multiple lines and levels of evidence (if relevant)?

Clear and complete description of limitations and constraints faced by the evaluation, including gaps in the evidence that was generated and mitigation of bias?

Question 9.

Are ethical issues and considerations described? The evaluation should be guided by the UNEG ethical standards for evaluation. As such, the evaluation report should include:

Explicit reference to the obligations of evaluators (independence, impartiality, credibility, conflicts of interest, accountability)

Description of ethical safeguards for participants appropriate for the issues described (respect for dignity and diversity, right to self-determination, fair representation, compliance with codes for vulnerable groups, confidentiality, and avoidance of harm)

ONLY FOR THOSE CASES WHERE THE EVALUATION INVOLVES INTERVIEWING CHILDREN: explicit reference is made to the UNICEF procedures for Ethical Research Involving Children

Page 43: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

42

SECTION D: EVALUATION FINDINGS (weight 20%)

Question 10.

Do the findings clearly address all evaluation objectives and scope?

Findings marshal sufficient levels of evidence to systematically address all of the evaluation's questions and criteria

Reference to the intervention's results framework in the formulation of the findings

Question 11.

Are evaluation findings derived from the conscientious, explicit and judicious use of the best available, objective, reliable and valid data and by accurate quantitative and qualitative analysis of evidence.

The evaluation clearly presents multiple lines (including multiple time series) and levels (output, outcome, and appropriate disaggregation) of credible evidence.

Findings are clearly supported by and respond to the evidence presented, including both positive and negative. Findings are based on clear performance indicators, standards, benchmarks, or other means of comparison.

Unexpected effects (positive and negative) are identified and analysed

The causal factors (contextual, organizational, managerial, etc.) leading to achievement or non-achievement of results are clearly identified. For theory-based evaluations, findings analyse the logical chain (progression -or not- from implementation to results).

Question 12.

Does the evaluation assess and use the intervention's Results Based Management elements?

Clear and comprehensive assessment of the intervention's monitoring system (including completeness and appropriateness of results/performance framework -including vertical and horizontal logic; M&E tools and their usage)

Clear and complete assessment of the use of monitoring data in decision making

SECTION E: EVALUATION CONCLUSIONS & LESSONS LEARNED (weight 15%)

Question 13.

Do the conclusions present an objective overall assessment of the intervention?

Clear and complete description of the strengths and weaknesses of the intervention that adds insight and analysis beyond the findings

Description of the foreseeable implications of the findings for the future of the intervention (if formative evaluation or if the implementation is expected to continue or have additional phase)

The conclusions are derived appropriately from findings

Question 14.

Are lessons learned correctly identified?

Correctly identified lessons that stem logically from the findings, presents an analysis of how they can be applied to different contexts and/or different

Page 44: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

43

sectors, and takes into account evidential limitations such as generalizing from single point observations.

SECTION F: RECOMMENDATIONS (weight 15%)

Question 15.

Are recommendations well grounded in the evaluation?

Recommendations are logically derived from the findings and/or conclusions

Recommendations are useful to primary intended users and uses (relevant to the intervention and provide realistic description of how they can be made operational in the context of the evaluation)

Clear description of the process for developing recommendations, including a relevant explanation if the level of participation of stakeholders at this stage is not in proportion with the level of participation in the intervention and/or in the conduct of the evaluation

Question 16.

Are recommendations clearly presented?

Clear identification of target group for action for each recommendation (or clearly clustered group of recommendations)

Clear prioritization and/or classification of recommendations to support use

SECTION G: EVALUATION STRUCTURE/PRESENTATION (weight 5%)

Question 17.

Does the evaluation report include all relevant information?

Opening pages include: Name of evaluated object, timeframe of the evaluation, date of report, location of evaluated object, names and/or organization(s) of the evaluator(s), name of organization commissioning the evaluation, table of contents -including, as relevant, tables, graphs, figures, annexes-; list of acronyms/abbreviations, page numbers

Annexes should include, when not present in the body of the report: Terms of Reference, Evaluation matrix, list of interviewees, list of site visits, data collection instruments (such as survey or interview questionnaires), list of documentary evidence Other appropriate annexes could include: additional details on methodology, copy of the results chain, information about the evaluator(s)

Question 18.

Is the report logically structured?

The structure is easy to identify and navigate (for instance, with numbered sections, clear titles and sub-titles

Context, purpose and methodology would normally precede findings, which would normally be followed by conclusions, lessons learned and recommendations

SECTION H: EVALUATION PRINCIPLES (weight 15%)

Page 45: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

44

Question 19.

Did the evaluation design and style consider incorporation of the UN and UNICEF's commitment to a human rights-based approach to programming, to gender equality, and to equity?

Reference and use of rights-based framework, and/or CRC, and/or CCC, and/or CEDAW and/or other rights related benchmarks in the design of the evaluation

Clear description of the level of participation of key stakeholders in the conduct of the evaluation, and description of the rationale for the chosen level of participation (for example, a reference group is established, stakeholders are involved as informants or in data gathering)

Stylistic evidence of the inclusion of these considerations can include: using human-rights language; gender-sensitive and child-sensitive writing; disaggregating data by gender, age and disability groups; disaggregating data by socially excluded groups.

Question 20.

Does the evaluation assess the extent to which the implementation of the intervention addressed gender, equity & child rights?

Identification and assessment of the presence or absence of equity considerations in the design and implementation of the intervention

Identification and assessment of the presence or absence of gender in the design and implementation of the intervention

Explicit analysis of the involvement in the object of right holders, duty bearers, and socially marginalized groups, and the differential benefits received by different groups of children

Clear proportionality between the level of participation in the intervention and in the evaluation, or clear explanation of deviation from this principle (this may be related to specifications of the ToRs, inaccessibility of stakeholders at the time of the evaluation, budgetary constraints, etc.)

Question 21.

Does the evaluation meet UN-SWAP evaluation performance indicators?

Note: this question will be rated according to UN-SWAP standards

GEEW is integrated in the Evaluation Scope of analysis and Indicators are designed in a way that ensures GEEW-related data will be collected

Evaluation Criteria and Evaluation Questions specifically address how GEEW has been integrated into the design, planning, implementation of the intervention and the results achieved.

A gender-responsive Evaluation Methodology, Methods and tools, and Data Analysis Techniques are selected.

The evaluation Findings, Conclusions and Recommendation reflect a gender analysis

SECTION I: EXECUTIVE SUMMARY (weight 5%)

Question 22.

Can the executive summary inform decision-making?

An executive summary is provided that is of relevant conciseness and depth for primary intended users

Page 46: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

45

Includes all necessary elements (overview of the intervention, evaluation purpose, objectives and intended audience, evaluation methodology, key findings, key conclusions, key recommendations)

Includes all the necessary information to understand the intervention and the evaluation AND does not contain information not already included in the rest of the report

Page 47: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

46

Annex 3. List of reports quality assessed

Country Seq # Evaluation Report Title GEROS

ECAR

Albania 2017/001

Evaluation of the “Breaking the cycle of exclusion for Roma children through Early Childhood Development and Education” multi-country project in the Former Yugoslav Republic of Macedonia, Serbia and Albania

Satisfactory

Armenia 2017/001

Final Evaluation of Mitigating Social Consequences of the Labour Migration and Maximising the Migrants’ Involvement in Local Development Project

Fair

Bosnia and Herzegovina

2017/001

Final Evaluation of the Justice for Every Child Project, December 2013-November 2017, Bosnia and Herzegovina

Highly Satisfactory

Croatia 2017/001

Strengthening justice system in matters involving child victims and witnesses in criminal proceedings in Croatia

Satisfactory

Kazakhstan

2017/001

Evaluation of ECE/ECD Systems in Kazakhstan Satisfactory

Macedonia

2017/001

Evaluation of the training programme for continuous professional development of social protection staff

Satisfactory

Macedonia

2017/002

Evaluation of the Early Literacy and Numeracy Programme

Fair

Macedonia

2017/003

Evaluation of UNICEF Roma Health Mediators Programme

Highly Satisfactory

Moldova 2017/001

Evaluation of the Government of Moldova - UNICEF 2013-2017 Country Programme of Cooperation

Highly Satisfactory

Republic of Montenegro

2017/001

Evaluation of the Programme “Montenegro – Investment case on Early Childhood Development”

Highly Satisfactory

Azerbaijan 2017/003

Summative Evaluation of the Access to Justice for Children project

Highly Satisfactory

Ukraine 2017/001

Ukraine Country Programme evaluation report: Evaluation of the Country Programme of Cooperation between the Government of Ukraine and UNICEF 2012-2016

Highly Satisfactory

Bosnia and Herzegovina

2017/002

Evaluation of the UNICEF-supported Component of the Project “Support for Durable Solutions of the Revised Strategy for Implementation of Annex VII of the Dayton Peace Agreement,in Bosnia and Herzegovina

Highly Satisfactory

Republic of Turkmenistan

2017/001

Evaluation of Turkmenistan's National Nutrition Programme

Fair

Page 48: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

47

Serbia 2017/005

Summative evaluation to strengthen implementation of justice for children system in the Republic of Serbia (2010-2017)

Satisfactory

Serbia 2017/006

Summative Evaluation of Child Care Reform in Serbia Satisfactory

EAPR

Cambodia 2017/005

Reducing Stunting in Children Under Five Years of Age: A Comprehensive Evaluation of UNICEF’s Strategies and Programme Performance – Cambodia Country Case Study

Satisfactory

Cambodia 2017/012

Evaluation of the UNDAF Cycles 2011-2015 and 2016-2018 in Cambodia

Satisfactory

Fiji (Pacific Islands)

2017/003 (Previously 2017/113)

Final Evaluation Improving WASH in Solomon Islands (IWASH-SI) Project

Satisfactory

Indonesia 2017/014

Support to community sanitation in eastern Indonesia Highly Satisfactory

Indonesia 2017/005

Monitoring and Evaluation of PKH Prestasi Pilot Project Fair

Philippines

2017/001

Evaluation of the UNICEF Philippine Country Office 'Early Childhood Care and Development' and 'Basic Education' components of the 7th GPH-UNICEF Country Programme 2012-2016

Satisfactory

Philippines

2017/002

Formative Evaluation of the UNICEF 7th Country Programme 2012-2018 in the Philippines

Satisfactory

Timor-Leste

2017/004

End of project review UNICEF/H & M Foundation ‘Alternative Pre-Schools and Parenting Education Project’

Satisfactory

ESAR

Angola 2017/007

Fomative Evaluation of Angola Country Programme (2015-2019)

Highly Satisfactory

Burundi 2017/005

Evaluation finale du projet de lutte contre la malnutrition dans la province de Ngozi au Burundi (mai 2013- décembre 2016)

Satisfactory

ESARO 2017/010

Evaluation of ESAR Institutional Strengthening Support Initiative on Decentralized Programme Monitoring and Response

Satisfactory

Ethiopia 2017/008

An Impact Evaluation of Alternative Basic Education in Ethiopia

Satisfactory

Madagascar

2017/001

Evaluation de la promotion des PFE - SFCG Fair

Page 49: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

48

Malawi 2017/001

Evaluation Report: UNICEF Malawi’s Child-Friendly Schools Construction Component

Satisfactory

Namibia 2017/002

Evaluation of Namibia’s Community Health Extension Workers Programme

Satisfactory

Republic of Mozambique

2017/001

Reducing Stunting in Children Under Five Years of Age: A Comprehensive Evaluation of UNICEF’s Strategies and Programme Performance - REPUBLIC OF MOZAMBIQUE COUNTRY CASE STUDY

Satisfactory

Rwanda 2017/002

Reducing Stunting in Children Under Five Years of Age: A Comprehensive Evaluation of UNICEF'S and Programme Performance Rwanda Country Case Study

Satisfactory

Somalia 2017/001

Evaluation of Social Mobilization Network (SMNet) Fair

Uganda 2017/020

End of project Evaluation Enhanced Resilience Karamoja Program (ERKP)

Fair

Zambia 2017/002

Mid-Term Evaluation of the Millennium Development Goal Initiative Accelerating Progress towards maternal, neonatal and child mortality reduction in Zambia

Satisfactory

Somalia 2017/002

Real time evaluation of UNICEF SCO humanitarian response to the pre-famine crisis

Satisfactory

Swaziland 2017/001

Evalution of the Swaziland Child Friendly Schools (CFS) Programme.

Satisfactory

Zambia 2017/005

Impact evaluation of hygiene and sanitation scaling up project

Fair

EO

Evaluation Office

2017/003

Reducing Stunting in Children Under 5 Years of Age: A comprehensive evaluation of UNICEF’s strategies and programme performance – Global synthesis report

Satisfactory

Evaluation Office

2017/010

Endline evaluation of the H4+ Joint Programme Canada and Sweden (Sida) 2011-2016

Satisfactory

HQ

Supply Division

2017/001

Evaluation of Supply Division's Supply Community Strategy

Fair

Programme Division

2017/003 (Previously 2017/103)

External Evaluation of UNICEF’s Scaling Up Nutrition and Immunization implemented in 13 sub-Saharan African countries over the course of 2013- 2016

Satisfactory

Programme Division

2017/010

Evaluation of RMNCH Trust Fund Activities Fair

LACR

Colombia 2017/007

Evaluación de adherencia y costo-efectividad del lineamiento para el manejo integrado de la desnutrición

Satisfactory

Page 50: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

49

aguda en niños y niñas de 0 a 59 meses de edad, Departamento de La Guajira (2016-2017)

Dominican Republic

2017/002

Evaluation of the initiative Timely Birth Registration in Prioritized Hospitals in the Dominican Republic

Satisfactory

Bolivia 2017/001

Evaluación de la Preparación y Respuesta del Plan de Acciones Inmediatas ante el Fenómeno El Niño 2015-2016

Fair

Bolivia 2017/002

Evaluacion De La Implementacion De Proyectos Demostrativos De Saneamiento En El Area Rural De Bolivia

Fair

LACR 2017/006

Multi-Country Evaluation of Early Child Education Policies in Latin America and the Caribbean

Satisfactory

Nicaragua 2017/001

Evaluation of UNICEF and GRACCS innovation pilot projects.

Satisfactory

El Salvador

2017/001

Evaluacion Del Programa Triple E: Educacion y Desarrollo Integral De Primera Infancia, Empoderamiento De Familias y Jovenes y Entorino Protector Comunitario En Comunidades Seleccionadas

Satisfactory

Ecuador 2017/001

Evaluación sumativa de los servicios de desarrollo infantil de los Centros Infantiles del Buen Vivir (CIBV) y Creciendo con Nuestros Hijos (CNH)

Fair

Guatemala

2017/008

Evaluation of UNICEF support to the National Strategy on the Protection of Human Rights of Girls and Boys 2014-2016

Highly Satisfactory

Haiti 2017/001

Reducing Stunting in Children Under Five Years of Age - Haiti Case Study

Satisfactory

Honduras 2017/001

Evaluación de implementación estrategia Retorno de la Alegría para la niñez migrante en Honduras

Satisfactory

Jamaica 2017/001

Evaluation of the “I Am Alive” Programme for Adolescent Girls Living with HIV

Fair

Peru 2017/001

Evaluacion " Mejorando LA Educacion Basica De Ninas y Ninos De La Amazonia y sur Andino Del Peru"

fair

MENA

Iraq 2017/001

Adolescent Development Program, Iraq: Participation of Adolescents and Youth for Social Cohesion

Fair

Egypt 2017/001

The Evaluation of Meshwary Project Phase II Satisfactory

Jordan 2017/004

Evaluation of the Ma’an (Together) towards a Safe School Environment Programme 2009-2016 - Jordan

Highly Satisfactory

Lebanon 2017/001

Evaluation of the Water, Sanitation and Hygiene (WASH) Programme within the UNICEF Country Programme in Lebanon (2013-2016)

Satisfactory

Lebanon 2017/002

Evaluation of the UNICEF Child Protection Programme for Vulnerable Children and Women in Lebanon 2013-2016

Satisfactory

Page 51: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

50

State of Palestine

2017/001

Evaluation for Humanitarian Action for Children Satisfactory

Tunisia 2017/001

Evaluation de la composante programme santé du programme de coopération Tunisie-UNICEF 2015-2019

Satisfactory

ROSA

Afghanistan

2017/006

Evaluation of Street Working Children's Project Satisfactory

Afghanistan

2017/003

Evaluation of WASH in Schools Satisfactory

Afghanistan

2017/001

Evaluation of Child Protection Action Network Fair

Bangladesh

2017/001

Programme Evaluation of UNICEF Bangladesh Communication for Development (C4D) Programme from 2012 to 2016

Highly Satisfactory

India 2017/001

Reducing Stunting in Children Under Five Years of Age: a comprehensive evaluation of UNICEF’s strategies and programme performance – India Country Case Study

Satisfactory

India 2017/002

Evaluation of UNICEF’s Community Based Disaster Risk Reduction and School Safety Programme, Bihar, India (2011-2016)

Highly Satisfactory

Nepal 2017/001

Evaluation of the Nepal Emergency Cash Transfer Programme through Social Assistance, Final Report

Fair

WCAR

Ghana 2017/018

End line Evaluation of the Project for Improving Access to Quality Health and Education Services in the Northern and Upper East Regions of Ghana

Fair

Ghana 2017/024

Evaluation Report of the UNICEF Ghana Education Programme (2012–2017): A Capacity Building Perspective

Satisfactory

Guinea 2017/003

Evaluation de la Composante Survie et Développement de l’enfant (CSD) du programme de coopération, UNICEF-GUINEE 2013-2017

Fair

Guinea Bissau

2017/002

Summative Evaluation of the UNICEF-EU Project on protecting Women and girls’ rights in Guinea Bissau

Fair

Niger 2017/001

Evaluation du Programme d’appui à la protection judiciaire juvénile (2013-2016)

Fair

Nigeria 2017/001

Independent Evaluation of UNICEF Nigeria Training Investments

Fair

Burkina Faso

2017/001

Evaluation de la mise en œuvre des activités génératrices de revenus par les associations de mères éducatrices.

Fair

Congo 2017/001

Évaluation des interventions des nations unies en faveur des refugiés au Congo

Satisfactory

Gabon 2017/001

Evaluation des interventions de la composante Politiques Sociales

Satisfactory

Page 52: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

51

Guinea Bissau

2017/003

Final Evaluation of the Community Health Component of the “Programme for reducing Maternal and Infant Mortality (PIMI) in Guinea-Bissau

Satisfactory

Mali 2017/001

Evaluation du projet sur la promotion d'EAH dans les structures de soin au Mali

Satisfactory

Niger 2017/002 (Previously 2017/044)

Reducing stunting in children under five years of age: a comprehensive evaluation of UNICEF’s strategies and programme performance - Niger Country Case Study

Satisfactory

Togo 2017/008

Evaluation d’impact du projet pilote des transferts monétaires au Togo

Satisfactory

Nigeria 2017/002

Impact Evaluation of UNICEF Nigeria Girls’ Education Project Phase 3 (GEP3) Cash Transfer Programme (CTP) in Niger and Sokoto States

Satisfactory

Republic of Cameroon

2017/001

Evaluation Du Programme Wash UNICEF-Cameroun 2013-2016

Satisfactory

Sierra Leone

2017/001

External Evaluation of Effectiveness of UNICEF Nutrition Accelerated Reduction of Child and Maternal Under-Nutrition in Seven Districts of Sierra Leone

Fair

Togo 2017/005

Evaluation Conjointe A mi Parcours Du Programme GSF Au Togo

Satisfactory

Page 53: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

52

Annex 4: UN-SWAP Calculations

Country Sequence Number

Scope of analysis and Indicators

Evaluation Criteria and Questions

Evaluation Methodology

Findings, Conclusions and Recommendation

Total

Moldova 2017/001 3 3 3 3 12

Republic of Montenegro

2017/001 3 3 3 3 12

Ukraine 2017/001 3 3 3 3 12

Jamaica 2017/001 3 3 3 3 12

Jordan 2017/004 3 3 3 3 12

Bangladesh 2017/001 3 3 3 3 12

India 2017/002 3 3 3 3 12

Lebanon 2017/001 3 3 2 3 11

Nigeria 2017/002 3 3 2 3 11

Republic of Cameroon

2017/001 3 3 2 3 11

Macedonia 2017/001 3 3 2 2 10

Malawi 2017/001 2 2 3 3 10

Swaziland 2017/001 3 3 2 2 10

Albania 2017/001 2 3 2 2 9

Azerbaijan 2017/003 3 3 1 2 9

Evaluation Office

2017/010 2 3 2 2 9

Iraq 2017/001 2 2 2 3 9

Bosnia and Herzegovina

2017/001 2 2 2 2 8

Croatia 2017/001 2 2 2 2 8

Bosnia and Herzegovina

2017/002 2 2 2 2 8

Angola 2017/007 1 3 1 3 8

Ethiopia 2017/008 3 2 2 1 8

Uganda 2017/020 2 2 2 2 8

Programme Division

2017/003 2017/103

2 2 2 2 8

Ecuador 2017/001 2 2 2 2 8

Egypt 2017/001 2 2 2 2 8

Lebanon 2017/002 2 2 2 2 8

Page 54: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

53

Niger 2017/002 2017/044

2 2 2 2 8

Serbia 2017/006 2 3 1 1 7

Fiji (Pacific Islands)

2017/003 2017/113

1 3 1 2 7

Indonesia 2017/014 2 2 1 2 7

Philippines 2017/001 2 2 2 1 7

State of Palestine

2017/001 2 2 2 1 7

Tunisia 2017/001 1 2 2 2 7

Guinea Bissau

2017/002 2 2 1 2 7

Macedonia 2017/002 2 2 1 1 6

Evaluation Office

2017/003 1 2 2 1 6

Programme Division

2017/010 1 2 1 2 6

Dominican Republic

2017/002 1 1 2 2 6

Bolivia 2017/001 2 2 1 1 6

LACR 2017/006 1 2 1 2 6

Guatemala 2017/008 1 2 1 2 6

Afghanistan 2017/006 2 1 2 1 6

Ghana 2017/024 1 2 1 2 6

Guinea 2017/003 1 3 1 1 6

Guinea Bissau

2017/003 1 2 1 2 6

Togo 2017/008 2 1 1 2 6

Turkmenistan

2017/001 1 1 1 2 5

Serbia 2017/005 1 1 1 2 5

Cambodia 2017/012 1 2 1 1 5

Madagascar 2017/001 1 1 2 1 5

Zambia 2017/002 2 1 1 1 5

Zambia 2017/005 1 1 1 2 5

El Salvador 2017/001 1 2 1 1 5

Peru 2017/001 1 2 1 1 5

Afghanistan 2017/001 0 1 2 2 5

Congo 2017/001 1 2 1 1 5

Page 55: UNICEF GEROS Meta-Analysis 2017 · 2019-06-07 · UNICEF GEROS META-ANALYSIS 2017 3 Seven evaluation reports were rated at the very upper end of the µsatisfactory¶ range, with a

UNICEF GEROS META-ANALYSIS 2017

54

Gabon 2017/001 1 2 1 1 5

Mali 2017/001 1 2 1 1 5

Armenia 2017/001 1 1 1 1 4

Macedonia 2017/003 1 0 2 1 4

Cambodia 2017/005 1 1 1 1 4

Timor-Leste 2017/004 1 1 1 1 4

Burundi 2017/005 1 1 1 1 4

Mozambique 2017/001 1 1 1 1 4

Rwanda 2017/002 1 1 1 1 4

Bolivia 2017/002 1 1 1 1 4

Nicaragua 2017/001 1 1 1 1 4

Haiti 2017/001 1 1 1 1 4

Honduras 2017/001 1 1 1 1 4

Afghanistan 2017/003 1 1 1 1 4

India 2017/001 1 1 1 1 4

Kazakhstan 2017/001 1 1 1 0 3

Philippines 2017/002 1 0 1 1 3

Namibia 2017/002 1 0 1 1 3

Somalia 2017/001 1 0 1 1 3

Supply Division

2017/001 1 0 1 1 3

Colombia 2017/007 1 1 0 1 3

Ghana 2017/018 0 1 1 1 3

Nigeria 2017/001 1 1 1 0 3

Burkina Faso 2017/001 1 1 1 0 3

Sierra Leone 2017/001 1 0 2 0 3

Indonesia 2017/005 1 0 1 0 2

Somalia 2017/002 0 0 1 1 2

Nepal 2017/001 0 0 1 1 2

Togo 2017/005 0 0 0 1 1

ESARO 2017/010 0 0 0 0 0

Niger 2017/001 0 0 0 0 0