validation business case v1 - europa · data validation in the ess faces a number of problems...

21
Commission européenne, B-1049 Bruxelles / Europese Commissie, B-1049 Brussel - Belgium. Telephone: (32-2) 299 11 11. Office: 05/45. Telephone: direct line (32-2) 2999659. Commission européenne, L-2920 Luxembourg. Telephone: (352) 43 01-1. EUROSTAT Business Case ESS.VIP.BUS VALIDATION Date: 09/01/2015 Doc. Version: 1.8 PM² Template v2.1.2 (Dec. 2013)

Upload: others

Post on 21-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

Commission européenne, B-1049 Bruxelles / Europese Commissie, B-1049 Brussel - Belgium. Telephone: (32-2) 299 11 11. Office: 05/45. Telephone: direct line (32-2) 2999659. Commission européenne, L-2920 Luxembourg. Telephone: (352) 43 01-1.

EUROSTAT

Business Case

ESS.VIP.BUS VALIDATION

Date: 09/01/2015 Doc. Version: 1.8

PM² Template v2.1.2 (Dec. 2013)

Page 2: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 2 / 21 Doc. Version: 1.8

Document Control Information

Settings Value

Document Title: Business Case

Project Title: ESS.VIP.BUS VALIDATION

Document Author: Ángel Simón

Project Owner: Marcel Jortay

Project Manager: Ángel Simón

Doc. Version: 1.8

Sensitivity: <Public, Basic, High>

Date: 09/01/2015

Document Approver(s) and Reviewer(s):

NOTE: All Approvers are required. Records of each approver must be maintained. All

Reviewers in the list are considered required unless explicitly listed as Optional.

Name Role Action Date

<Approve / Review>

Document history:

The Document Author is authorized to make the following types of changes to the document

without requiring that the document be re-approved:

• Editorial, formatting, and spelling

• Clarification

To request a change to this document, contact the Document Author or Owner.

Changes to this document are summarized in the following table in reverse chronological order

(latest version first).

Revision Date Created by Short Description of Changes

Configuration Management: Document Location

The latest version of this controlled document is stored in \\esapplnt\ESTAT-

ALL\VIP_Validation\PM2

Page 3: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 3 / 21 Doc. Version: 1.8

TABLE OF CONTENTS

1 PROJECT INITIATION REQUEST INFORMATION .................................................................................... 4

2 CONTEXT ............................................................................................................................................. 4

2.1 Situation Description and Urgency ................................................................................................... 4

2.1.1 Problem Statement .................................................................................................................... 4

2.1.2 Input from previous work .......................................................................................................... 6

2.2 Situation Impact ................................................................................................................................ 7

2.2.1 Impact on Processes and the Organization................................................................................ 7

2.2.2 Impact on Stakeholders and Users ............................................................................................ 7

2.3 Interrelations and Interdependencies .............................................................................................. 8

3 EXPECTED OUTCOMES ........................................................................................................................ 9

4 POSSIBLE ALTERNATIVES ..................................................................................................................... 9

4.1 Alternative A: Do nothing ................................................................................................................. 9

4.2 Alternative B: Data Validation IT Solutions ....................................................................................... 9

4.3 Alternative C: Methodological developments ................................................................................ 10

4.4 Alternative D: ESS.VIP VALIDATION ................................................................................................ 10

5 SOLUTION DESCRIPTION ................................................................................................................... 11

5.1 Legal Basis ....................................................................................................................................... 11

5.2 Benefits ........................................................................................................................................... 11

5.3 Success Criteria ............................................................................................................................... 11

5.4 Scope .............................................................................................................................................. 12

5.5 Solution Impact ............................................................................................................................... 13

5.6 Deliverables .................................................................................................................................... 13

5.7 Assumptions ................................................................................................................................... 16

5.8 Constraints ...................................................................................................................................... 16

5.9 Risks ................................................................................................................................................ 16

5.10 Costs, Effort and Funding Source .................................................................................................... 17

5.11 Roadmap ......................................................................................................................................... 18

5.12 Synergies and Interdependencies ................................................................................................... 19

5.13 Enablers .......................................................................................................................................... 19

6 GOVERNANCE ................................................................................................................................... 20

6.1 Project Owner (PO) ......................................................................................................................... 20

6.2 Solution Provider (SP) ..................................................................................................................... 20

6.3 Approving Authority ....................................................................................................................... 20

APPENDIX 1: STAKEHOLDERS ANALYSIS ................................................................................................. 21

Page 4: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 4 / 21 Doc. Version: 1.8

1 PROJECT INITIATION REQUEST INFORMATION

2 CONTEXT

2.1 Situation Description and Urgency

The ESS.VIP VALIDATION project started in January 2013. The original proposal was based on a stakeholder, feasibility and cost-benefit analysis; the project proposal was presented to and endorsed by the ESSC in November 2012.

This Business Case adapts the objectives and deliverables expressed in the original ex-ante evaluation to make it compliant with the ESS Vision 2020. This document presents the business case for the project. It focuses on the period February 2015 (expected date for approval of the revised proposal by ESSC) and November 2015 (original end date of the project – milestone for the updated project). References to deliverables or activities prior or after this period are necessary to frame the context of the project.

In the ESS.VIP programme, the VALIDATION project belongs to the category of business projects. However, due to its horizontal nature, the project could be reclassified as a cross-cutting project - building up common infrastructure for sharing information and services.

The ESS.VIP VALIDATION contributes to the implementation of the ESS Vision 2020 by:

• Creating a more efficient production chain with clearly attributed responsibilities to the different actors.

• Developing several standards for the description of the validation step, the description of the validation language, the development of functional specifications and the use of tools to be potentially shared within the ESS.

• Achieving these targets through collaboration with Member States and stakeholders.

• Providing a solution adapted to the principles of the ESS Enterprise Architecture.

• The ultimate aim is that data of the highest possible quality is available, with radical gains in the resources and time needed for the processing and improvement of the quality of the information disseminated.

2.1.1 Problem Statement

Data validation in the ESS faces a number of problems mainly derived from the specificities of the production process: the data disseminated by Eurostat is the result of joint production by the ESS involving a variety of partners. The production process is normally carried out in separated steps: data collection, processing and possible first stages of further compilation are done by Member States; another round of processing and final dissemination at European level are in the hands of Eurostat.

Project Title: ESS.VIP.VALIDATION – Common Data Validation Policy

Initiator: Pedro Diaz Munoz DG / Unit: ESTAT.E

Date of Request: 10.10.2012 Target Delivery

Date: 30/11/2015

Type of Delivery: ☐In-house ☐Outsourced ☒Mix ☐ Not-known

Page 5: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 5 / 21 Doc. Version: 1.8

Summary of problem statement

The specific problems are:

• Validation process: two steps

Traditionally, validation of European statistics is subdivided into two steps: the first one is performed by Member States before data transmission and the second one by Eurostat before data dissemination. Past experience shows that the data which Member States transmit to Eurostat may not be fit for dissemination as European statistics. In particular, when Member States do not publish themselves the data nationally or only a part of it, often data requires further processing by Eurostat, rather being ready for publication. Hence, the role of Eurostat is critical for spotting any errors in the figures transmitted. In many instances Eurostat and MSs perform similar validation checks ("double-work"), in other cases essential basic quality checks are performed by neither the MSs nor Eurostat ("validation gaps"), because of lack of coordination.

• Lack of harmonisation (documentation, coordination, formalisation)

Validation procedures performed by Eurostat (and by MSs) are not harmonised and systematically documented. Validation rules are not always documented and formally agreed with the corresponding thematic Working Group. Even when documented, validation rules are described using specific languages developed by individual process managers independently from each other, instead of using a common harmonised syntax1. The lack of coordination and the non-formalisation of the validation process produces a sub-optimal validation process (various cycles of quality check report from Eurostat to MSs followed by revisions of data by MSs: the so called "validation ping pong"). In some instances, even the requirements for the format indicated for data provision are not respected, preventing any automated further processing of the information received.

• Lack of harmonisation of technical (software) solutions

Moving towards an architected environment based on standard and shared validation services would allow easier integration of technical solutions. This transformation would also enable

1 The word "syntax" is systematically used in this paper to intend a language to express validation rules,

understandable at least by a specific category of human beings: the statisticians.

•Double validation work

•Possibility of validation gaps

Two steps validation process

•documentation

•coordination

•formalisation

•solutions

Lack of harmonisation

•Risk of non-comparability of results in the compliance monitoring across statistical domains

Subjective assessment of data quality

Page 6: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 6 / 21 Doc. Version: 1.8

reducing the IT development and maintenance costs at ESS level. The first step towards this harmonised approach is the use of a common registry for Validation Rules; the second step being the development and deployment of tools for validation services.

• Compliance monitoring: subjective assessment of data quality

The transmission of poorly validated data raises an issue of non-compliance: legal acts prescribe at least implicitly that data sent to Eurostat must have an adequate quality. Today standard compliance monitoring is limited to verifying data completeness and punctuality, while the data quality assessment is less structured and it is prone to subjective interpretation. Reporting of this type of quality problems to the management to trigger preventive / corrective actions is not at all systematic.

• Risk of publishing low quality results

Shared statistical production entails the risk that data are published or used for publication by Eurostat without detecting the quality problems. Recent experiences show that albeit not frequent, there have been incidents with an impact in the media. Experiences in this area have also been made through the cooperation with Google (the dissemination of Eurostat datasets by Google allowed to detect errors in data). Regardless where in the production chain of European statistics formally falls the responsibility of releasing bad data, users will attach this responsibility to the publishing organisation, and ultimately to the ESS.

• Particular validation requirements for the processing of micro data

Particular requirements for an efficient validation come from the processing of micro data. The

growing use and importance of statistics derived from them has led to frequent revisions which have to be validated and processed in parallel to new data. Some of them have to be processed under very tight time limitations. Microdata files are often big in volume and each record might contain a high number of variables, two aspects which pose particular performance challenges. Validation of microdata cannot restricted to the validation of consistency between the different variables in each individual record but has to look into aggregated data as well.

2.1.2 Input from previous work

� Eurostat internal VIP-Validation (2010- May 2013)

Eurostat internal VIP on Validation was concluded in 2013. While completing this first phase, the project was ready to move to the next phase in which it focused on the generalised implementation in statistical domains, the adaptation to IT tools, the specific focus on micro-data surveys and relation with ESS and international activities.

The maturity of the project justified the transition to a new phase (from Eurostat internal project to an ESS.VIP) supported by a favourable reaction of MSs in the DIME/ITDG and later by the ESSC, as well as also in many thematic Working Groups where the project was presented. There is a substantial amount of deliverables produced by the Eurostat VIP Validation. Among them are:

• A set of guidelines for documenting in detail the validation steps of the statistical production processes.

• A structured inventory of validation rules used in Eurostat.

• A template to deal with error messages according to category.

• A syntax for validation rules. This syntax has been developed under the name of VALS and it contributed to the development of the Validation and Transformations Language (VTL), an initiative of the SDMX community.

� ESS.VIP Validation (January 2013 – November 2014)

Page 7: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 7 / 21 Doc. Version: 1.8

After the Eurostat VIP on Validation , the current ESS.VIP Validation project already delivered:

• Contribution to VTL

• Participation in domain specific task forces and working groups to raise awareness about the importance of a harmonised and systematic approach to data validation

• Assessment of validation practices in Eurostat

• Regular reporting to the ESSC

2.2 Situation Impact

2.2.1 Impact on Processes and the Organization

The project has a clear ESS orientation, with Member States playing a key role in the project. The participation of Member States has been articulated at several levels:

• Management: the ESSC has been regularly informed about the progress of the project.

• Coordination: a Task Force with Member States was established in February 2014. The Task Force leads the project in its ESS dimension. The Task Force is studying and following-up the design and implementation of data validation practices, methodologies and solution in the ESS. The work of the Task Force will be continued and expanded by the ESSnet.

• Execution: an ESSnet is to be launched by end 2014 for analysing and delivering solutions on possible extensions to more sophisticated validation approaches and critical analysis of documents produced from the perspective of Member States national practices. The work of the ESSnet will enable the creation of synergies in the field of data validation inside the ESS.

Therefore, the main stakeholders for this project are:

• ESSC

• DIME-ITDG

• Thematic areas

• Member States participating in the Task Force or ESSnet

• Member States acting as data providers

• International organisations: o Acting as data providers or users

o Methodology or standard producers: e.g. SDMX, DDI, etc.

• Eurostat: o Solutions provider's units (IT, services and architecture) o Business units o Directors meeting

2.2.2 Impact on Stakeholders and Users

Member States are and will be consulted during the project in order to produce an inventory of data validation approaches in the ESS. The results of the project will impact Member States as key stakeholders of the solutions provided, being its three pillars:

• Efficiency gains in the validation process: optimal placement of validation tasks

• Communication during the statistical production chain within the ESS

• Validation oriented solutions The active participation of Member States in domain specific Working Groups enables the assessment and the translation of domain specific validation rules according to common guidelines and methodologies.

Page 8: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 8 / 21 Doc. Version: 1.8

IT solutions will be offered to Member States under the principle of subsidiarity and NSIs will be free to implement them in their production process totally, partially or not. In any case, the increased quality of the communication of validation rules to be applied enables the mutual understanding of tasks to be performed by each partner in the production chain and this knowledge will produce benefits independently of the solution used. International organisations will also benefit of an increased quality of data based on a common and agreed approach and a better communication of validation rules. SDMX community plays an active role in the project. The SDMX Task Force dedicated to develop VTL relied on the work performed by the ESS.VIP Validation. SDMX, once VTL is approved as an official standard, will have an important and active partner ready for starting developments and the use of the language.

2.3 Interrelations and Interdependencies

Due to its horizontal nature, the VALIDATION project is interrelated with many other ESS.VIPs, statistical domains and and modernisation initiatives. Thus interdependencies with other projects and activities are:

Business level

• The implementation of methodological developments under ESS.VIP.VALIDATION project will be tested in several statistical domains (e.g. via domain specific Task forces or workshops): Waste Statistics, Animal production and National Accounts. Animal Production Statistics Task Force was launched in the fourth quarter 2014 while the Task Force on National Accounts validation will be launched in 2015. Waste statistics launched during 2013 and 2014 workshops and an ESTP training with the aim to implement ESS.VIP VALIDATION principles to improve its efficiency and quality. Improvements will be measured during 2015.

ESS.VIP business projects

• ESS.VIP.SIMSTAT validation rules sets will be translated into validation syntax VTL. This exercise will serve as proof of concept for the completeness of the validation syntax VTL

• ESS.VIP.ADMIN (e.g. validation of statistical outcomes based entirely or partially on administrative data), ESS.VIP.SIMSTAT (e.g. validation specific to the production of Single Market Statistics), ESS.VIP.ESBRs (e.g. validation specific to Business Registries) will benefit of the definition of a forthcoming Validation Architecture to be drafted by the ESS.VIP VALIDATION

Cross-cutting projects

• The ESS.VIP.VALIDATION will closely collaborate with ESS.VIP.SERV in view to produce deliverables that fits to the Service Oriented Architecture (to be developed within ESS.VIP.SERV). A coherent validation service (under ESS.VIP.SERV responsibility) requires harmonised and coherent validation policy (under ESS.VIP.VALIDATION responsibility). Thus, all architecture and software development in the ESS.VIP.VALIDATION project will follow the service-oriented guidelines defined by the ESS.VIP.SERV project.

• The ESS.VIP.VALIDATION is participating to the development of VTL (Validation and Transformation Language) in the framework of the ESS.VIP.IMS project. The ESS.VIP.IMS project is also working on standards in the fields of data validation

Enterprise architecture

• ESS.VIP.VALIDATION will work closely with the Task Force on Enterprise Architecture in view to develop an ESS data validation architecture aligned with the

Page 9: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 9 / 21 Doc. Version: 1.8

ESS Reference Architecture. The ESS.VIP VALIDATION will identify the correlation between building blocks fitting together in the ESS reference architecture. The ESS reference architecture is due by the second quarter 2015.

3 EXPECTED OUTCOMES

The aims of the ESS.VIP VALIDATION are:

1. Deploy a coherent validation policy in the different statistical domains, in cooperation with MSs, and achieve its sustainability and flexibility in time. This policy will include in particular a distribution of validation tasks along the MS-Eurostat production chain for European statistics.

2. Establish standard definitions, guidelines and validation syntax. 3. Develop some common technical solutions to be shared within the ESS and used

by ESS partners on a voluntary basis. 4. Envisage solutions for more sophisticated validation actions ensuring the

coherence between data files, between Member States and the integrity of the data held in the ESS.

4 POSSIBLE ALTERNATIVES

4.1 Alternative A: Do nothing

General Description

No change in the current situation

SWOT Analysis

Strengths Weaknesses

- No resources needed in the short term for developments

- No changes can be required on legal basis

- Risk related to quality in the final data

- Subjective assessment of data quality in compliance monitoring

Opportunities Threats

- SDMX VTL can be produced by SDMX community and would enable future adaptations to IT solutions

- Duplicity of works in data validation: inefficiencies in the production chain

- No harmonised solutions and approaches between domains and partners

Qualitative Assessment

This alternative would lead to the risks and issues expressed in the chapter 2.2 – Problem statement of the current document.

4.2 Alternative B: Data Validation IT Solutions

General Description

This alternative would imply the development and dissemination of a common validation tool (software) to be used my members of the ESS SWOT Analysis

Strengths Weaknesses

- Focus on IT technical solution

- No changes in validation methodology: uncoordinated approach by thematic area

- No enhancement of a commonly agreed validation flow (roles and responsibilities)

Opportunities Threats

- No need for extensive consultation to Member States on validation approaches

- Communication of validation rules and common understanding of them are

Page 10: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 10 / 21 Doc. Version: 1.8

- Web services would allow an easy-to-implement approach

insufficient.

- No business perspective

Qualitative Assessment

This alternative would imply the dissemination of one or several validation tools (software) to be used by thematic area actors without taking into consideration users' requirements. This situation wouldn’t address the problems of lack of coherence in the validation approaches in the ESS.

4.3 Alternative C: Methodological developments

General Description

This alternative would imply the pure methodological development of a common data validation approach. SWOT Analysis

Strengths Weaknesses

- Adds coherence to data validation approaches

- Only the theoretical perspective is taken into account.

Opportunities Threats

- All partners would profit of a better understanding of the validation process as a whole

- Difficult implementation of a real communication chain of validation rules.

Qualitative Assessment

This alternative would fail on having a real implementation of the methodologies and guidelines developed as some of them would require IT solutions or, at least, the definition of a technical architecture to be implemented.

4.4 Alternative D: ESS.VIP VALIDATION

General Description

This solution takes into account the differences in data validation approaches in different domains and Member States and tries to draw a common methodological, architectural and communicational framework for a data validation to be used by different domains and different partners in the statistical production chain of European statistics.

Page 11: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 11 / 21 Doc. Version: 1.8

SWOT Analysis

Strengths Weaknesses

- Inclusive solution covering both technical and methodological aspects of data validation

- Quality and efficiency gains

- Difficult coordination of the project due to its horizontality in functions and links

Opportunities Threats

- Technical solutions can be adopted by Member States on a voluntary basis

- Synergies with international organisation thru the cooperation with SDMX on VTL developments

- A weak collaboration between solutions providers and business users can lead to poor results

Qualitative Assessment

This approach addresses the different problems linked to a poor or non-harmonised validation process definition. This approach takes into account the needs of different partners and end-users of the project: business users in different statistical domains and users in the statistical production chain.

Based on the above analysis of the potential alternatives, the chosen solution has been the approach corresponding to the ESS.VIP VALIDATION.

5 SOLUTION DESCRIPTION

5.1 Legal Basis

No legal bases are needed for this project. Some existing legal basis by thematic area could require adaptations for a full adoption of methodologies and guidelines.

5.2 Benefits

Harmonisation and standardisation of statistical validation methodologies and of IT tools: the project seeks to produce harmonised methodologies and generic IT solutions encompassing data validation in all statistical domains. Efficiency gain: the ultimate goal of the project is a gain in efficiency through a more coordinated production chain, without redundancy of operations and with data quality improvements.

Resource reduction: The optimal allocation of validation tasks along the production chain of European statistics will reduce, in the medium term, the resources requirements in all steps of the chain.

Reinforced integration: A better integration of validation tasks in the ESS and possibility of using common tools and procedures.

Foster a better compliance: agreement on ‘minimum data quality standard’ by the thematic area Working Groups will enhance transparency in compliance monitoring and lead to efficiency gains.

5.3 Success Criteria

Successful criteria for the project will be:

- Explicit and harmonised documentation of validation rules in 50% of the domains covering European statistics by 2020.

- Agreement on validation rules and attribution of responsibilities on validation tasks along the production chain in 50% of the domain covering European statistics Working by 2020.

Page 12: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 12 / 21 Doc. Version: 1.8

- Use of the central registry of validation rules and rulesets by 50% of domains by 2020. - Regular treatment of validation aspects and critical assessment of existing of validation

rules in 75% of the domain within 2017. - Deployment of the components of the validation architecture (validation rules registry,

adaptation of validation tools/services, …) by 2017.

5.4 Scope

The ESS.VIP Validation aims at developing the architectural and methodological framework

for data validation in the production chain of European statistics.

This framework will be based in three pillars: Methodology, communication and solutions.

Summary of elements in the scope of the project for each of the pillars

Based on data validation framework defined by ESS.VIP.Validation, specific domains Working Groups will reach agreements on validation procedures. Each working group should evaluate the need for eventual legal basis changes but ESS.VIP.Validation does not produce any legal change. Also, Member States or other organisations in the statistical production chain will be free either to use the IS/IT solutions provided or may opt for adapting their information systems to support the agreements reached in thematic areas Working Groups on validation standards. Thus, the ESS.VIP Validation will not impose any IT solution to Member States and will not provide supporting tools for ad-hoc information systems.

This data validation framework will serve as input for the creation of a validation service within the framework of ESS.VIP.SERV project. Thus, the validation service is out of the scope.

The project aims at designing IS/IT solutions to be shared and used on a voluntary basis by Member States. Thus, a General User Interface will be designed to store, use and maintain the validation rules registry so that to ensure the linkage with the data transmission engine (Edamis). But, any eventual necessary addition, adaptation or change to the transmission engine is out of the scope. On the other side, a Data Structure needs to be defined for those data files not compliant with SDMX (see microdata package).

The project will provide a common syntax or language to enable an effective communication of validation rules, errors and metrics among stakeholders. All outcomes of the project will be ready to be used by all Member States and different partners of the statistical production chain in the ESS.

Page 13: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 13 / 21 Doc. Version: 1.8

The scope of IT developments until end 2015 will be limited to develop a prototype to demonstrate the feasibility of the management of validation rules in a rule registry concerning datasets that comply with the SDMX information model and are equipped with SDMX-DSD to describe their structural metadata. Thus, the scope of validation service will be limited to structural validation.

Illustration of ESS.VIP.Validation's scope for IT solutions

5.5 Solution Impact

Process Solution

Impact Description

ESS strategy Improvements in coordination of activities of different stakeholders with efficiency and quality gains

Statistical processes Improvements in data quality; processes can be redesigned based on effective communication of validation processes

International organisations The contribution to the development of VTL can facilitate the establishment of a standard to share data, metadata and validation rules

5.6 Deliverables

The project is being articulated in four specific packages, corresponding to different layers of the ESS Enterprise Architecture (business, information, solution and infrastructure).

These packages and their deliverables are described here below.

Package 1 – Implementation (Business layer)

Develop a roadmap for a sustainable implementation of the ESS.VIP results in the different statistical domains, in cooperation with MS (Business and Solution layers of the Enterprise Architecture).

This is the Eurostat office-wide implementation of the previous methodological developments of the project (standard validation process description, documentation and selection of validation rules ensuring the minimum data quality standard, standard description-syntax of the rules and attribution of validation tasks between partners, procedures for handling error messages), i.e. their implementation by all domains/WGs.

Page 14: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 14 / 21 Doc. Version: 1.8

This package will take care of an implementation calendar including all domains (and WGs) monitoring the actions and reporting to the ESSC. Training needs form an essential part of the implementation plans to be developed in the package. The training need should be analysed from several perspectives:

o Business needs o Target audience o Technological approach o Methodological approach

Foreseen training materials can include: seminars, presentations, workbooks, self-study tutorials, etc. For each of the training needs identified, this package should propose the best set of materials.

The part of this package referring to the information layer will take care of the full use by all relevant domains of existing standards in terms of SEP (Single Entry Point) for data transmission and SMDX as a data model (in principle SDMX should fit the needs of at least most of the macro-data collections).

The following deliverables are expected under this package:

1.1 Codification in VTL of a set of validation rules of two domains of different nature (e.g.

micro data vs. macro data). 1.2 Assessment of validation practices in Eurostat: document with an overview of

validation practices, tools, and documentation by statistical domains (at least 75% of statistical domains) and a procedure for a periodical launch of the assessment.

1.3 Proposals for implementation of principles derived from the ESS.VIP Validation in the different domains (covering at least 50% of domains).

1.4 Validation handbook with methodologies, guidelines and templates to assist statistical domains in the application of a harmonised approach to data validation.

1.5 Proposal of a Validation architecture compatible and in line with the ESS Enterprise Architecture. This Validation business architecture should pay special attention to distribution of validation tasks within the statistical production chain.

Package 2 – Microdata (Business and Information layers)

This package focuses on the validation of microdata.

The validation of microdata poses particular requirements in terms of functionality and prerformance compared with the validation of aggregated statistical tables The SDMX-ML format (recommended format for the exchange of statistical data in form of aggregated tables) is not suitable for the transmission of microdata. However, the SDMX approach can be used to describe the metadata describing different types of microdata. This would allow building a uniform metadata repository covering aggregated data and microdata. Such uniform repository of metadata describing the records of a microdatafile is the prerequisite of a uniform tool to maintain validation rules and to apply them to the datafiles.

This approach will be evaluated by using microdata collections currently managed in the domain of social statistics. Once microdata have been described in a SDMX repository their description can be accessed by a system which manages validation rules and which relies on a description of the data to be validated.

This package constitutes the prerequisites so that work under the other packages of the ESS.VIP Validation project becomes useful for microdata processing and validation. The following deliverables are expected from this package:

Page 15: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 15 / 21 Doc. Version: 1.8

2.1 Incorporation of metadata describing several microdata collections in the Euro SDMX registry.

2.2 Assessment of the functionality and the completeness of such descriptions compared with the processing requirements for microdata.

Package 3 – Solutions (Solution layer)

Cooperation of business and IT functions to enable a technical solution to support the methodological approaches to validation (Solution layer and eventually infrastructure layer of the Enterprise Architecture). The following deliverables are expected from this package:

3.1 Definition of an information system architecture (conceptual level) for data validation based on the ESS Enterprise Reference Architecture

3.2 Functional specification of a Registry of validation rules and rulesets; and its associated graphical web interface (Validation Registry and GUI)

3.3 Technical specification of Validation Registry and GUI 3.4 Development of Validation Registry and GUI Prototype 3.5 User review of the Validation Registry and GUI Prototype 3.6 Definition of Structural Validation Service – version 1 3.7 Development of Structural Validation Service – version 1

The Validation Registry and GUI will enable users to create, view and manage validation rules and rules sets in a central repository. It will be possible to express the validation rules in VTL syntax. The GUI will be Web-based and will be available for both EUROSTAT and Member State users.

The prototype will not manage specific access rights for the domains, and will not include the execution of validation rules. The purpose of the prototype is to demonstrate the feasibility of a central validation registry in a format that allows user interaction. The prototype can be used by the reviewers to provide adequate feedback and to shape future developments.

A separate deliverable, the Structural Validation Service, will provide the functionality of validating data sets against their SDMX-DSD, and provide this functionality according the service-oriented guidelines set up by the ESS.VIP.SERV project.

This package will be in charge of drafting the governance body for versioning and adapting the developed tools and supporting policy for the tools. This governance body and supporting policy should take into account in particular:

- Multidisciplinary of users: different statistical domains users sharing the same tools

- Users roles: systems administrators, special rights users, read-only users, etc - Multi-organisational users: users of the systems can be part of different

organisations, like Eurostat, National Statistical institutes, other international organisations, etc.

- No tools can be imposed to users (subsidiarity principle).

Package 4 – General coordination & ESS involvement

This package promotes: the general coordination of the project within Eurostat and the ESS; the liaison with other office/ESS wide activities related to data validation to seek coherence of approaches; coordination of the activities of all work packages in the project with the activities performed by the ESSnet on Data Validation, the potential integration of more sophisticated validation solutions such as longitudinal validation and mirror checks,

Page 16: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 16 / 21 Doc. Version: 1.8

based on ESS wide shared warehouses, in order to ensure data integrity (all layers of the Enterprise Architecture). This package provides the overall coordination of the project, seeks coherence of wide validation approaches and contributes to the overall coordinated development of cross cutting issues within the ESS.VIP programme. It also aligns to international activities.

The package will contribute to the VTL development. The active contribution to the VTL development is a key element of the project representing a benefit for both the SDMX community and the ESS.

The following deliverables are expected from this package:

4.1 Periodic monitoring and reporting of the project.

4.2 Two workshops and an international conference on data validation. .

4.3 Creation and maintenance of a wiki page dedicated to internal users in Eurostat.

4.4 Creation and maintenance of a web page dedicated to general public and especially to ESS users.

4.5 Active participation with presentation of project deliverables and outputs in at least three meetings at the level of Working Group.

5.7 Assumptions

All dates and deliverables are subject to the proper allocation of resources in terms of budget and staff to be dedicated to the project.

There is the assumption that SDMX community will officialise VTL as standard language for communication of validation and transformation rules (target December 2014).

The assumptions for the IT developments will need to be further elaborated.

5.8 Constraints

The ESS Enterprise Architecture Reference Framework is currently under elaboration and is expected to be finalised by the first half 2015. Under this constraint, the ESS.VIP Validation needs to work on the assumption that the proposed solutions are compliant with the final version of this architecture.

Eurostat and Member States have their own validation approach and tools; therefore there might be difficulties to fit them in the proposed validation architecture.

Collaboration and communication between business units and solution provider units in Eurostat as well as in Member States need to be fluent.

Micro data domains are not making an extensive use of SDMX for defining the data structures. This may lead to the selection of alternative standards (e.g. DDI) or definition of a metadata environment for the data structures.

5.9 Risks

1. Changing requirements for IT tools and services Possible risk: Certain areas – like rule registry and rule engine – are not specified yet.

The specification of the GUI is not available either. Action: Active involvement of business users in functional specifications.

Reserve budget for change requests. 2. VTL adoption by SDMX community is delayed.

Possible risk: the SDMX community will delay the adoption of VTL.

Page 17: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 17 / 21 Doc. Version: 1.8

Action: Close monitoring of VTL development. Corrective action: an alternative would be to use the validation syntax VALS defined in

the project

3. Slow adoption by Members States of the validation principles.

Possible Risk: The main risk will be linked to the acceptance by domains of the proposed approach to validation.

Corrective actions: The project will focus on specific domains where there is sufficient maturity in validation. A study has already identified those more likely to succeed (Animal Production, Waste, Transport, External Trade, SBS, National Accounts). These spearheads will be used to develop good practices and as an example for other domains.

4. Separate development of the rules registry and validation tools.

Possible risk: A separate development of both systems may derive in incompatibilities.

Action: Active involvement of business units and solutions suppliers units with a very good communication of actions and interdependencies.

These risks are presented in a schematic way in the following table:

Risk No.

Risk S P M Mitigation: Proposals to eliminate / minimise the risk

Contingency

1 Changing requirements for IT tools and services

2 2 4 Active involvement of business users in functional specifications

Reserve budget for change requests.

2 VTL adoption by SDMX community is delayed

3 2 6 Cooperation with SDMX for its development

The development of VALS as Eurostat's Validation syntax may be adopted in case VTL is delayed. A plan for a later adoption of VTL to be approved.

3 Slow adoption by Member States or statistical domains

1 3 3 Communication with Member States and domains will be a key factor for the solutions implementation (methodological and IT)

In the extension of the project, procedures can be launched for the adoption of the common approaches

4 Separate development of the rules registry and validation engine

1 3 3 Well established communication channels to allow a good cooperation that could anticipate the requirements for future changes and developments for existing solutions

The ESS.VIP Validation will provide the functional requirements for each element included in the validation architecture for their future development and implementation

5.10 Costs, Effort and Funding Source

The project has a strategic value for the production of statistics between partners in the ESS. This value will be reflected in efficiency gains (tangible) and quality gains (intangible) in the final data disseminated. The cost of the project is clearly below the possible advantages of its implementation at multi-national scale. Project costs are mainly linked to development activities, internal resources for steering and building project documentation, communicating and reporting and Member State resources in the ESSnet and in adapting VIP solutions at different depths.

Page 18: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 18 / 21 Doc. Version: 1.8

While costs are mainly on centralised actions, benefits spread throughout the whole range of statistical domains and are tangible (more efficient production chains) and intangible (increased data quality). Estimated effort for Member States:

During this first phase of the project (until end 2015), Member States are expected to contribute by participating to ESSnet project, VALIDATION task force and technical workshops.

Funding source The project is performed on Eurostat’s resource with contribution in workforce from Member States through task-forces, consultations and other events. EUROSTAT will contribute with 90% of budget of an ESSnet.

5.11 Roadmap

Deadline Main events Deliverables

October 2014

• Evaluation of the ESSnet proposals

• Set up the Task Force on Data Validation for Animal Production Statistics following the ESS.VIP Validation deliverables

November 2014

• Progress report for the ESSC

• Finalisation of VTL with inclusion of comments from general public and meeting of SDMX Technical Working Group to decide on VTL

• Wiki for internal users (4.3)

December 2014

• ESSnet to start its activities

• Dissemination of deliverables, guidelines and principles from ESS.VIP Validation to general public

• ESSnet meeting (4.2)

• Functional Specification for the Validation Registry and GUI (3.2)

• Definition of structural validation service (3.6)

January 2015

• Starting the codification of validation rules from the first pilot statistical domain (SIMSTAT) in the VTL Language (start).

• Web page for general public (4.4)

• Technical Specification for the Validation Registry and GUI (3.3)

February 2015

• ESSnet report for the TF

• Final meeting of the ESS VIP Validation TF

• Proposal of the procedure to assess validation practices in Eurostat statistical domains in a periodical way

• Working Groups Handbook on Validation first version

• Validation handbook (1.4)

• Assessment of validation practices in Eurostat (1.2)

• Specification of three microdata collections within Eurostat’s metadata repository (2.1)

March 2015 • Implementation plan of the Working Groups Handbook

• Analysis of the survey on validation practices in the ESS (from ESSnet)

April 2015 • Codification of validation rules from the first pilot statistical domain (SIMSTAT) in the VTL (end).

• Documentation transferred to the ESSnet as input for the assessment work

• Codification of Rulesets in VTL (1.1)

• ESSnet Workshop (4.2)

• Definition of information system architecture (conceptual level) for data validation based on the ESS Enterprise Reference Architecture (3.1)

• Analysis of functionality of the microdata descriptions within Eurostat’s metadata

Page 19: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 19 / 21 Doc. Version: 1.8

repository with respect to their validation and their processing. Proposal how to enhance the SDMX standard to handle metadata for microdata (2.2)

May 2015 • ESSnet interim report

• ESSC - Progress Report

• Start Design of Validation Registry and GUI Prototype

• ESSC report (4.1)

June 2015 • Proposal of Validation architecture (1.5)

July 2015 • Workshop on data validation II (to be organised by the ESSnet)

• Start Development of Validation Registry and GUI Prototype

• ESSnet Workshop (4.2)

September 2015

• Proposals for implementation of ESS.VIP Validation principles (1.3)

October 2015

• Summary of actions and proposals for the future

• Project conclusions

• ESSnet conference on validation (4.2)

• Validation Registry and GUI Prototype (3.4)

November 2015

• End of ESSnet

• Closing of the current mandate of the project and decision on extensions and follow-up of the project

• User review of the Validation Registry and GUI Prototype (3.5)

• Development of version 1 of the structural validation service (3.7)

• ESSC report (4.1)

• Awareness actions (4.5)

5.12 Synergies and Interdependencies

ESS.VIP Validation will need to build and maintain synergies with:

a) International organisations:

SDMX development of VTL is an example on how the cooperation of the ESS.VIP Validation can contribute to a methodological development for SDMX and this development is of special interest to be analysed and used in the ESS. Several international organisations are cooperating for the development of VTL, between them: Banca d'Italia, OECD, UNESCO, CBS-Netherlands, ECB, DDI and SDMX experts, BIS, ILO, etc.

b) Cross-cutting, business and modernisation projects as described in point 2.3 of the present document

c) Enterprise architecture team developing a framework ESS architecture under which, the validation architecture can be defined.

d) Actors involved in the statistical production chain: domains in Eurostat and Member States:

• ESS.VIP Validation will have contacts with statistical domains and national delegates present in domains Working Groups.

• Extensive contact with National Statistical Institutes will be achieved as part of the activities of an ESSnet on Data Validation launched in December 2014.

• Participation and support to the activities of ad-hoc Task Forces on Data Validation launched in statistical domains.

5.13 Enablers

Enabler Yes/No Reference If No, briefly

Page 20: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 20 / 21 Doc. Version: 1.8

explain the

reason

PM² � http://www.cc.cec/wikis/display/PM2

BPM � http://www.cc.cec/wikis/display/bpmatec

Other

IT Related

RUP@EC � http://www.cc.cec/RUPatEC

CEAF � http://www.cc.cec/wikis/display/CEAF

SMP@EC � http://www.cc.cec/wikis/display/SMPAtEC/What+is+SMP@EC

VAST � http://ec.europa.eu/dgs/informatics/vast

CMMI

Other

6 GOVERNANCE

6.1 Project Owner (PO)

Project owner is Marcel JORTAY (Director E)

6.2 Solution Provider (SP)

Solution Provider is Mariana KOTZEVA (Acting Director B), co-owner of the project

6.3 Approving Authority

Approving roadmap includes:

• Internal Steering Committee

• Eurostat Directors Meeting

• DIME/ITDG

• European Statistical System Committee

Signature of the approving authority …………………………… Date ………

Page 21: VALIDATION Business Case v1 - Europa · Data validation in the ESS faces a number of problems mainly derived from the specificities of ... and it contributed to the development of

ESS.VIP.BUS VALIDATION Business Case

Date: 09/01/2015 21 / 21 Doc. Version: 1.8

APPENDIX 1: STAKEHOLDERS ANALYSIS

Stakeholder Needs Partners Eurostat

Internal Users

ES

SC

MS

in T

ask

Fo

rce

ES

Sn

et

MS

da

ta p

rovi

de

rs

Inte

rna

tion

al o

rga

nis

atio

ns

Sta

tistic

al D

om

ain

s

ES

S.V

IP V

alid

atio

n S

tee

ring

Co

mm

itte

e

IT s

olu

tion

s p

rovi

de

r

Efficiency Validation X X X X X X Common language X X X X X X X X Validation architecture (definition) X X X X X Efficient rules definition X X X X X X Attribution of validation tasks X X X X X X

Stakeholder Future Expectations Partners Eurostat

Internal Users

ES

SC

MS

in T

ask

Fo

rce

ES

Sn

et

MS

da

ta p

rovi

de

rs

Inte

rna

tion

al o

rga

nis

atio

ns

Sta

tistic

al D

om

ain

s

ES

S.V

IP

Val

ida

tion

S

tee

ring

Co

mm

itte

e

IT s

olu

tion

s p

rovi

de

r

Sophisticated validation methods X X X X X X X Common approach in ESS X X X X X X Validation architecture (implementation) X X X X X X X