subtitle “promoting the usage of administrative data in statistics · to improve the use of...

Subtitle “Promoting the usage of administrative data in Statistics

Estonia by describing and harmonising metadata”

Final grant report

2

Table of Contents

Executive Summary .............................................................................................................................. 3

List of acronyms .................................................................................................................................... 5

Introduction ........................................................................................................................................... 6

1. Obtaining knowledge about best practices of administrative data and metadata management

system from another Member State (study visit) ............................................................................... 7

1.1. Summary and difficulties encountered ................................................................................... 11

2. Analysing and compiling data about current agreements, data sources and data structure

descriptions .......................................................................................................................................... 12


3. Analysing the questionnaires and finding variables that could be replaced by administrative

data ....................................................................................................................................................... 15

3.1. Summary and encountered difficulties ................................................................................... 26

4. Mapping management processes of administrative data and metadata in Statistics Estonia .. 28


5. Creating vision document on how to give feedback to the data owners about data transmission

deadlines and agreed data structures ................................................................................................ 36

5.1. Summary and encountered difficulties ................................................................................... 38

6. Describing metadata for the data sources whose cooperation agreements are renewed in the

metadata system .................................................................................................................................. 39


7. Renewing cooperation agreements made with data owners before the year 2010 .................... 42


References ............................................................................................................................................ 45

3

Executive Summary

According to Statistics Estonia’s strategy, our goal is to produce high quality statistics with as

low administrative burden and as high efficiency as possible. In order to achieve this, we need

to improve the use of administrative data and describe the related metadata in our metadata

management system.

At the moment, Statistics Estonia uses over 100 different administrative data sources (state

registries) in the statistical production process. Managing, describing and improving the

related information and metadata of those sources is a challenging and ongoing process.

In this project we have described and standardised metadata for the data sources whose

cooperation agreement needed updating. During the process we also had the chance to

develop and strengthen the partnership with the data owners, which is the key element of

using the data of administrative sources.

Our project started with learning form Statistics Austria’s experience and we were able to

analyse and work through all our administrative data management related information to start

managing it more efficiently.

Efficient data management is only possible if we have optimized management processes.

During the project we were able to map the as-is and to-be processes of administrative data

and metadata management.

The volume of administrative data and metadata is growing fast, so it is now clear, that we

need to move towards more automated processes. For that reason we have created the vision

document for developing the new information system Administrative Data Gate, that will

allow to send automated feedback and reminders to the data owners and also automate the

data checking processes.

The grant project enabled us to analyse our current questionnaires domain by domain and

make suggestions to use additional administrative data sources to lower the response burden.

This analyse was a new approach for us, because usually the statisticians are responsible for

their statistical activities. But now we analysed different questionnaires together centrally and

had the opportunity to give the statisticians some new ideas, which sources to use and

improve the usage of administrative data in our organisation.

4

Statistics Estonia is very grateful for being able to realise the activities in this grant project. It

was really helpful that we could have our temporary employee who worked through a lot of

information and we were able to start moving towards more automated processes of managing

administrative data.

5

List of acronyms

ATAO – Statistics Design Department

BORA – The Beneficial Owners Register Act

EAS – Enterprise Estonia

ECAA – Estonian Civil Aviation Authority

EMDE – Electronic Maritime Information System

GSBPM – Generic Statistical Business Process Model

MUIS – System of (Estonian) Museums

RIHA – Administration system for the state information system

SE – Statistics Estonia

sDWH – Statistical Datawarehouse

TÖR – Working register

6

Introduction

The main objective of this grant project is to improve the use of administrative data sources in

Statistics Estonia. According to Statistics Estonia’s strategy our goal is to produce high quality

statistics with as low administrative burden and as high efficiency as possible.

Producing high quality statistics is possible only when we have standardised metadata and

efficient production processes. Harmonised metadata is usable across all statistical domains,

which means if one data source is used in different statistical activities the metadata will be

described only once. The described metadata will be available directly for the users and for the

systems in the live production environment.

Statistics Estonia has set the goal to reduce administrative and response burden for the

respondents. This is possible only if we use more administrative sources and quit using some

of the questionnaires or prefilling some values on the questionnaire to help the respondent to

answer.

Improving the use of administrative data for the statistical production has one key element –

close cooperation and partnership with the data owners. At the moment Statistics Estonia uses

over 100 different administrative sources and our goal is to build closer cooperation with the

data owners in order to ensure efficient negotiations and high quality data delivery. One

important part of the cooperation are valid and up to date data delivery contracts – it is important

that both sides of the contract know their responsibilities and that data owners know why the

data is needed and that it is securely processed and stored in Statistics Estonia.

We have planned several activities during the project that will help to re-use information,

produce statistics more efficiently, reduce administrative and response burden.

To perform the tasks of the grant project we have conducted weekly project team meetings,

where we discuss and agree on the tasks of upcoming week. We are using the web application

(JIRA) for assigning and monitoring planned activities. All the team member need to report

weekly on the progress of their tasks and possible difficulties, to ensure the execution of the

project on schedule.

7

Now the overview of the progress on the tasks of the grant project is given. The overview is

written task by task and also the level of progress is evaluated. For every task the encountered

difficulties and overall summary are described briefly.

1. Obtaining knowledge about best practices of administrative data

and metadata management system from another Member State

(study visit)

In order to use best practices available in other European statistical offices, we planned to have

a study visit at the beginning of the project. In order to choose our possible destinations, we

gathered some information from our colleagues, who have attended different Eurostat working

groups. We received information that Austria and Finland both have advanced systems for

managing metadata and administrative data. Both countries also conduct register based

population and household census.

Statistics Austria was able to welcome us in October to share their knowledge and experience.

And as Austria is considered one of the leaders in European statistical system concerning the

use of administrative data, we were happy to plan the 1,5 day agenda for the study visit.

The study visit took place from 17 to 19 of October and we had a very full agenda for the 18

and 19 of October. From Statistics Estonia four people attended the visit:

two Leading Methodologists from Statistics Design Department (responsible for

negotiations with data owners, describing metadata of administrative data and preparing

the contracts);

Developer from the Data Service Department (responsible for the data warehouse and

developing the new IT system for administrative data management and automated

controls);

Head of Data Description from the Statistics Design Department (responsible for the

processes of managing administrative data and describing metadata in Statistics

Estonia).

The agenda of the study visit was full of very useful and interesting topics for us. The overview

of the study visit by agenda topics is given below.

• Coordination and guideline for administrative data

8

The first topic was an introduction of how Statistics Austria manages their administrative data

and related information. Statistics Austria uses over 500 different data sources and over 50

sources are used for the register based census. They do not have data delivery contracts with all

the sources, because the management of the contracts would be too burdensome and their

national statistical law says that Statistics Austria can have the data for free from the data

owners.

Statistics Austria currently has a separate metadata database for administrative data. It is the

ACCESS-database used since 2008. The aim of this database is to get an overview of all

administrative data available in Statistics Austria, have the list of projects that use

administrative data and a search functionality to choose from the available data. In this database

also information about external and internal contact persons and organisation details are stored.

Statistics Austria plans to integrate the current metadata database to their new centralised data

and metadata management system Statistical Datawarehouse (sDWH). Then the metadata for

the administrative data will be extended and also data structures, attributes, classification lists,

quality indicators, data formats, statistical units, reference dates, key words and legal basis will

be available.

• Statistical Datawarehouse (sDWH) – Motivation

Statistics Austria has developed statistical datawarehouse to guarantee internal, house-wide,

easily accessible data/metadata platform. sDWH project was established in 2014 and technical

solution was fixed in 2015. In April 2017 Statistics Austria started the implementation phase

rolewise and departmentwise.

Metadata is described in the sDWH and has to be defined before it is possible to incorporate a

new dataset. This supports the housewide harmonisation of concepts and other metadata.

There are different roles for the sDWH users which helps to manage the workflow of data and

metadata management. For example the Administrator can define represented variables, data

sets and load the data, but Quality Manager has to approve or disapprove the represented

variables and data sets.

• Statistical Datawarehouse (sDWH) – Application (handling of metadata)

Statistics Austria also demonstrated the live demo of their sDWH. For us it was most impressive

to see that data and metadata are stored in one application and that it is also possible to link

different datasets in the system and visualise the results. In the sDWH all links are also

9

visualised, for example it can be seen which data set is used in which project. The system also

shows possible joining options and the data descriptions at the variable level are only one click

away.

sDWH enables to mark some variables and data sets as protected and then the in-house data

owner has to provide the permission for using the data set. The process of asking and granting

the permission is also part of the sDWH – all the permissions and explanations are stored in

one system, the users do not have to send separate e-mails for that.

• Register-based Census - an Overview

In 2001 the last traditional census was conducted in Austria, it’s cost was 72 million euros. In

2011 the register based census costed only 10 million euros.

Statistics Austria has more than 50 data sources for the register based census. In 2006 they also

had the register based census test, where methods, data procedures and use of registers were

successfully tested.

• Workflow of a Register-based Census

As Statistics Estonia is also planning to conduct register based census in 2021, it was interesting

to hear about the workflow of the Statistics Austria’s register based census team. They have 13-

15 persons permanently on the census team, and for the census in 2011 they also had about 20

temporary team members responsible for different tasks.

Every team member is responsible for capturing some of the data sources and they remind the

data owner one month in advance about the need to deliver the data.

Statistics Austria has process documentation system for management reporting, timetable and

production schedule.

The process documentation is available in ADAM/EVA database – for example the

timetable/calendar for execution of monthly, quarterly, yearly processes is held.

ADAM/EVA database and documentation (handling of metadata)

10

The data for the census is stored currently in ADAM/EVA database, but future plan is to

incorporate the data to sDWH. ADAM/EVA database is also used for metadata documentation.

There is a search option for tables, variables, attributes and variable values.

• Other projects based on ADAM/EVA database

ADAM/EVA database is also used for other projects in Statistics Austria. For example labour

force survey, national accounts, rich frame for social statistics, monitoring of education-related

employment, tracking of graduates and register-based labour market career.

Rich frame is used for calibration/post-stratification, non-response analysis and substituting

survey questions with administrative data.

• Statistical Datawarehouse (sDWH) – Future plans

Statistics Austria shared with us the future plans for the sDWH. They are planning to integrate

all administrative data and metadata in the warehouse. Then they will have fully integrated and

harmonised metadata management system.

Statistics Austria is also planning to create GeoWizard for automatic creation of working maps

for internal and external use and all the necessary data, metadata and information will be stored

in standardised way in sDWH.

For visualisation of statistical information Statistics Austria is currently in the process of

evaluation of the Tableau software. Visualisation is important internally for the heads of

departments to create reports about data usage and availability. Externally it is planned to

develop dashboards for disseminating statistics in more user-friendly format.

• Quality assessment for Register-based statistics / metadata of administrative data

Statistics Austria has developed three stages quality evaluation system for the administrative

data. The data quality is evaluated at the raw data phase, when the registers provide the data.

The next phase is combining and linking the data in the central database and then the next

evaluation process takes place. After combinations and imputations the data is available in the

final data pool and the quality of data is evaluated again.

• Census - Analysis of Residence

Statistics Austria introduced us how they avoid overcoverage of residents. They have the system

that if the person has only one record in the Central Persons Register, they have to confirm the

11

residence by answering the official letter. About 69 thousand letters were sent out last time to

confirm the residence. If the residence is not confirmed by answering the official letter the

person is a candidate for deletion. However, the local authorities have the opportunity to oppose

the deletions by proving that the person is still the resident of their municipality.

For conducting the census successfully, Statistics Austria has annual quality evaluation for the

residence data, all sources and outputs are analysed and evaluated.

• Business Register for Administrative Purposes and Beneficial Owner Register

Last topic in the agenda was the introduction of two Austrian registers.

Every entity taking part of the E-Government processes needs to be registered in one of the

state registers. The business register combines different registers and is the basis for statistical

registers.

The automatic data transmission times are different, some registers transmit the data to the

business registers weekly, but some registers have the online connection and the data is always

up to date.

The Austrian Beneficial Owner Register Act (BORA) obliges legal entities to register their

owners. This should equip financial supervisors with a tool to fight money laundering and

terrorism financing.

Due to the BR for administrative purposes Statistics Austria is optimal partner to technically

implement that register for the Austrian Ministry of Finance.

The BO register is a great business case and the BORA explicitly allows Statistics Austria the

usage of data for statistical purposes.

1.1. Summary and difficulties encountered

In conclusion the study visit was very successful for us, we had the opportunity to learn from

Statistics Austria’s experiences and best practices. Although we are at the different stage of

using and managing administrative data than Statistics Austria, we got new ideas about how to

optimise the processes of documenting metadata of administrative data.

Firstly, we were surprised to hear, that Statistics Austria does not have formal written contracts

with all the data owners. And as our national statistical law also says, that data from the

12

registries can be obtained for free for the purposes of official statistics, then we are considering

the solutions how to make the data transmission agreements more flexible. Right now we

mostly have written contracts with the data owners or if the data transmission is done for

piloting the data usage, we send the data request to get the data. We are currently developing

the form of data transmission agreement that would describe the needed data structures and

deadlines, but would be flexible and not so burdensome to change and keep up to date.

Secondly, we really appreciated the workflow management of sDWH. In our current metadata

management system iMeta the metadata can be described only by the metadata team members

and for the correct metadata we ask the input from statistical departments by Excel forms.

However, we are currently piloting new metadata information system Colectica, where the

workflow service is also integrated. So, in the future we want to implement the similar system

as Statistics Austria, that also the analytics can insert the metadata, but before publishing it for

use, the administrator from metadata team needs to approve the metadata.

Thirdly, after the visit we are convinced that in the future the metadata and administrative data

should be integrated to one information system in order to use the data more efficiently in the

statistical production process. The Data Service Department started piloting data virtualisation

tool Denodo, where data catalogues can be created that integrate data and metadata into one

system. Implementing this application would be most useful for the statistical departments,

because then they do not have to link data and metadata themselves anymore.

The main difficulty of performing this task, was finding suitable time for the study visit for

Statistics Estonia and Statistics Austria. It was our interest to have the study visit at the

beginning of the project to be able to use the gained knowledge in our further actions.

We were not sure whether we will get approval for the grant project application from the

Eurostat when we planned and attended the study visit in October. So there was the risk of not

getting refunded for the study visit.

2. Analysing and compiling data about current agreements, data

sources and data structure descriptions

In order to be able to start with the tasks of renewing cooperation agreements made before 2010

and analysing questionnaires to find variables that can be substituted with administrative data,

13

we started the process of analysing and compiling information about current agreements, data

sources etc.

As Statistics Estonia is currently struggling to manage the information related to administrative

data, we started the process of systemising and visualising the information we needed to

manage. It was the first task of our temporary staff.

An introductory task for the new employee was to create an overview Excel table of the data

that Statistics Estonia captures form administrative sources. The information in the table is

presented by data sets of different data sources. Each data set contains information about the

data structure, the transmission channel, the format, and the deadline for the data to be

transmitted. In addition, a brief description of the contract or data request has been provided

and also the purpose of using the data in Statistics Estonia. This task helped our new employee

to understand and see what kind of data Statistics Estonia receives from different data sources.

The basis for the overview table was already created and consisted of the list of all the registries

from whom Statistics Estonia gets data from. The first task was to add the information about

the data structure, the transmission channel, the format, the deadlines for the data to be

transmitted, a brief description of the contract or data request and the purpose of using the data.

All the necessary information was collected by searching through different documents and

information systems. The information was stored to document management system, metadata

management system, shared computer folders, Outlook mailbox and JIRA tasks. The

information has not been systematically stored or managed, so it made the task difficult for the

new employee to find and compile all the necessary information. The stored contracts and data

requests have not been always marked correctly as valid or not, so the hardest part of the task

was to make sure which of the contracts and data requests are still valid. We have new annexes

for every dataset we capture from the data owners, and new annex very often invalidates the

former annex, but not always. So it was challenging to go through all the annexes and find

currently valid ones.

In Statistics Estonia the web platform called Confluence is used to manage internal information

and to make it accessible to other colleagues. Every team has it’s own space or page in

Confluence and different overviews and guidelines can be stored and shared that way. We

decided that the overview table of different data sources also has to be visualised better and that

was the next task for the new employee.

A summary table of contracts and requests for administrative metadata was compiled to

Confluence. The overview is under Metadata team page, where the sub-page for administrative

14

data was created. The table contains a list of institutions and their registries which Statistics

Estonia has a contract with or from whom data is obtained through data requests. In case of a

contract, the date of signing and completion of the contract is attached to it. In addition, each

data source has the information about contact persons of the institution to whom it is possible

to turn to with data transfer issues. The compilation of the summary table gave an overview of

what existing contracts were signed before 2010 and which should be updated.

The previous task with an Excel table helped to get started with this task. The list of institutions

and their registries were taken from the Excel table and added to Confluence table. Our

employee started collecting information about contracts and data requests via local discs and

document management system called Livelink. Like in the previous task the most important

part of this task was to make sure which contracts and data requests are still valid, and also

which ones are the latest. Statistics Estonia keeps all the documents about each data source,

even the ones that are not valid anymore. The situation that all contracts and data requests were

stored in different places and were not in order made this task time-consuming. All the

information about the contracts and data requests came from inside the document. So the

employee had to read through every contract or data request file she found, in order to find the

right information for the table.

After compiling the overview table to the Confluence, we decided that we need sub-pages about

every data source. The main reason for that was, that Statistics Estonia captures many different

data sets with different deadlines from one data source or registry. Also there can be different

contact persons for different data sets and there are also different users in Statistics Estonia.

So our new employee linked new sub-pages to the Confluence overview table and these sub-

pages give the users more detailed information about the data source. Each sub-page has the

description of captured data set, deadlines, contact persons information, user information and

link to metadata management system, where the metadata of the data source is stored. In the

future we also plan to link there the information about the data warehouse tables, where the

administrative data is stored and can be accessible for the analysts of Statistics Estonia. This

would give any colleague of Statistics Estonia the full information about each dataset, which is

available for using.

15


Performing this task was crucial for having better overview of the administrative data related

information Statistics Estonia needs to manage. It also gave our temporary employee the needed

knowledge about data available for use to perform the analysis of questionnaires.

Main difficulties were already described above, but it is important to highlight the large amount

of information that our temporary employee had to work through and systemise. It was quite

time consuming, because different documents have been stored in different places for historical

reasons and now all this information had to compiled to visualise the existing situation.

Completing this task is a big step ahead for Statistics Estonia, because now we can understand

our needs for administrative data related information management system.

3. Analysing the questionnaires and finding variables that could

be replaced by administrative data

Statistics Estonia has 127 different questionnaires that the respondents have to fill out in order

to produce statistics. Our aim is to reduce the administrative and response burden by improving

the use of administrative sources. Although Statistics Estonia already uses about hundred

different data sources, we were still convinced that there are variables on the questionnaires that

can be replaced by administrative data. We have already about 35 questionnaires where we

prefill some variables for the respondents in order to make the answering more convenient and

less time consuming. When the grant application was written we chose some of the domains to

be analysed during the grant project and our temporary employee started the process as soon as

she had gotten the overview of our questionnaires and available administrative data.

During the period of October 2018 until March 2019, we have found new data sources for our

agricultural statistics domain, which is a really important domain in Estonia and so we decided

to include the domain to our grant project and analyse it more thoroughly.

The two domains that we are finished analysing by the submission of the intermediate report

are culture and agriculture.

First step of the analysis was to get the overview of the questionnaires and collected variables

of the culture and agriculture domains of statistical activities, and also to get the overview of

the administrative data in use.

16

Second step was to compare the variables of questionnaires with the administrative data already

in use to find possible new sources to replace questionnaire variables. For storing the new

information and for a better overview of which variables collected by questionnaires can be

replaced with administrative data, an Excel table was created. The Excel table contains the

questionnaire code, a specific number of the statistical activity, the name of the statistical

activity and then certain questions in the questionnaire with suggestions to replace with

administrative data.

In Estonia we have the state level administration system for the state information system called

RIHA. In RIHA every state information system needs to be registered. So actually RIHA is the

catalogue of the state’s information system, where information is stored about which data are

collected and processed and in which information systems. And also which services, including

X-Road services, are provided and who is using them.

X-Road is the backbone of e-Estonia: it is the data exchange layer that allows various public

and private sector e-service information systems to link up and function in harmony. X-Road

has developed into a tool that can write to multiple information systems, transmit large datasets

and perform searches across several information systems simultaneously. Today, X-Road is

implemented in Finland, Kyrgyzstan, Namibia, Faroe Islands, Iceland, Ukraine and other

countries. (e-Estonia, 2019)

The next logical step to find new data sources was to search the RIHA. If the information system

owner has registered and inserted all the necessary information to RIHA, it is very good source

of information for Statistics Estonia. Unfortunately, at the moment quite big part of the

information in RIHA is outdated, because it needs to be updated manually by the data owners.

But some development plans hopefully resolve this problem and keeping the information

updated in RIHA can be automated in the future.

Additionally, we searched from the Internet to find data that is already public and can also be

used by web-combing or other methodologies.

Third step was proposing to replace variables collected by questionnaires with the

administrative data. This step included face-to-face meetings with people that work on the fields

of culture and agriculture in Statistics Estonia.

17

Last step was planning the future activities according to the meetings held with the analytics of

cultural and agricultural statistics. In some cases we managed also to have negotiations and

meetings with the data owners to agree on the new data deliveries.

In the domain of culture we have 6 different questionnaires, which are divided to the following

statistical activities: Movie, Museum, Music, Radio and Television. We made proposals to

substitute some variables with new data sources to five different questionnaires. Our proposals

and the results are compiled in the table below.

Suggestions Outcomes

1. Data about all the Estonian movies

(movie type, name, duration) from

Estonian Film Database.

This suggestion was accepted and the next step is

to negotiate with Estonian Film Database

manager.

2. The number of museals in each

Estonian museum from Information

System of (Estonian) Museums

(MUIS).

These suggestions were accepted, but there is a

plan to rearrange some of the parts in museums’

questionnaires, so there’s actually no full

overview of what kind of data will be needed after

the questionnaires redesign.

3. The number of employees in

Estonian museums from Working

register (TÖR).

4. Music event names, types, number

of concerts, number of tickets sold,

ticket sales revenue and number of

visitors from sites that are officially

selling tickets online in Estonia (for

example Piletimaailm and Piletilevi).

This suggestion was accepted partly, because

there are multiple sites that are selling tickets

online. In addition to these online selling

companies, there are non-official sellers and also a

chance to buy concert tickets on site. So there’s no

accurate overview of how many people visited a

concert and how much was the ticket sales

revenue. However, we have signed the contract

with one of the sellers Piletimaailm and will be

receiving first dataset soon. Then our analytics can

pilot the data usability.

The information about music events names, types

and number of concerts can be found from the site

http://kultuur.info.

18

5. The number of employees with their

job titles in radio broadcasting stations

from Working register (TÖR).

This suggestion was accepted and as we are

capturing data from the Working register already,

the analytics just have to take the data into use.

6. The number of employees with their

job titles in television broadcasting

stations from Working register (TÖR).

This suggestion was accepted and as we are

capturing data from the Working register already,

the analytics just have to take the data into use.

In the domain of agricultural statistics we have 14 different questionnaires, which are divided

in the following statistical activities: Sown area of field crops, Purchase of livestock and

poultry, Livestock farming and meat production, Quarterly statistics of livestock farming,

Purchase and use of milk, Economic accounts for agriculture, Farm Structure Survey,

Agricultural products, Yields, Crop farming, Cereals, Dairy products, Organic farming, Supply

balance sheets of agricultural products and Agricultural products. Agricultural statistics is one

of the most important statistical domains in Estonia and also in Europe, but collecting data by

questionnaires has always been burdensome to respondents in that field. That is the reason, why

we decided to include agriculture, as one of the domains to our grant project. We started

analysing the domain in the fall and our initial analysis showed that there are still some data

sources that Statistics Estonia is not capturing and using for the agricultural statistics.

The Veterinary and Food Board was the data source we started negotiations with and as a first

step we asked them to send us some data sets for piloting the data usage. The data sets were

about slaughtered animals, production of honey and the number of pigs slaughtered at home.

Our analytics piloted the usability of the data and we compiled the data needs to start

negotiations with the Veterinary and Food Board.

Statistics Estonia’s data need was broad and we wanted to capture several data sets with

different data delivery deadlines and also it involved different analytics from our side and

different departments from the Veterinary and Food Board side. For effective discussions we

had several meetings to agree on the different data sets compositions and data delivery

deadlines.

We managed to agree on all the datasets and now we get monthly and yearly data set about

slaughtered animals. The monthly data set was immediately used for prefilling the

questionnaires. Also we now get yearly data sets about the production of honey and number of

pigs slaughtered at home.

We also had meetings with two other data owners Estonian Land Board and Agricultural Board.

Both sources are already in use in Statistics Estonia, but our data needs have widened and also

19

the composition of data in those registries have changed – so we need to work on new

agreements and getting access to available data.

Our proposals for agricultural statistics and the outcomes are compiled to the table below.


1. The number of slaughtered animals,

the weight of edible/unedible meat from

Veterinary and Food Board.

This suggestion was accepted and the

questionnaires are prefilled with the data from

monthly data set

2. The information about honey

production in Estonia from Veterinary

and Food Board

This suggestion was accepted and we have

received the yearly data set about 2018, which

was used for pre-filling the questionnaire. The

quality of the data is very good and next year the

data will not be asked with the questionnaire -

the statistics of honey production will be based

on administrative data only.

3. The number of pigs slaughtered at

home from Veterinary and Food Board

This suggestion was accepted and we already

received the yearly data set about 2018, which

was used for additional data source for validating

questionnaire data. In the future the data will be

used to substitute the collected variables.

4. Number of people employed in the

agriculture field with their job titles

from Working register (TÖR)

This suggestion was accepted, but needs a

further methodological analysis. The data from

the Working register is captured monthly, so if

the analysis shows the compatibility of the data,

it can be used for pre-filling the questionnaires.

5. The prices of land from the Estonian

Land Board, according to new

methodology

We have still ongoing negotiations with the

Estonian Land Board to receive the land prices

data from them. They have promised to make

spatial analysis taking into account the land use

data from the Estonian Agricultural Registers

and Information Board. Now we are waiting for

the new spatial analysis by Estonian Land Board

to see if this is sufficient for our data needs.

20

6. Organic farming data from the

Agricultural Board

The negotiations are still ongoing, the

Agricultural Board is a very important data

source for organic farming statistics. The

information system of the Agricultural Board is

in development and we have had several

meetings to explain Statistics Estonia’s

expanding data needs. We need more detailed

data about organic farming and we are

negotiating to get our data needs to be

considered in the new information system.

7. The number of fur animals, number

of animals slaughtered for fur, number

of skins sold etc. from Veterinary and

Food Board.

Recently we got information that the Estonian

Veterinary and Food Board will start collecting

information about the fur animals. Now the

negotiations are in the process of getting to know

the data composition and possibilities to get

access to the data.

Obligations:

REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE

COUNCIL (number of bovine animals, pigs, sheep, goats and poultry slaughtered in

slaughterhouses)


COUNCIL (carcass weight of bovine animals, pigs, sheep, goats and poultry slaughtered in

slaughterhouses)


COUNCIL (Production account: Other animal products: others)


COUNCIL (slaughtering carried out other than in slaughterhouses: pigs)

ESS Agreement on statistics of agricultural land prices and rents

COUNCIL REGULATION (EC) No 834/2007 of 28 June 2007 on organic production and

labelling of organic products and repealing Regulation (EEC) No 2092/91 and Commission

Regulation (EC) No 889/2008 of 5 September 2008 laying down detailed rules for the

21

implementation of Council Regulation (EC) No 834/2007 on organic production and

labelling of organic products with regard to organic production, labelling and control


COUNCIL (Production account: Other animal products: others)

In the domain of accommodation statistics we have 2 different questionnaires, which are

divided in the following statistical activities: Tourism and Accommodation activities. We made

proposals to substitute some variables with new data sources to only one questionnaire, because

our Tourism questionnaire only consists personal questions that can’t be replaced by

administrative data. Our proposals and the results are compiled in the table below.


1. The number of beds in

accommodation facilities from

Enterprise Estonia (EAS)

The next step for us was to check the definition

of “the number of beds” that is used in the EAS

database . Is it how many beds are in total, or

how many beds had been used?

Another important step for us was to make sure

how EAS manages their database. The main

question is: Does enterprises themselves

voluntarily add information to the database?

2. Wheelchair access in accommodation

facilities from Enterprise Estonia (EAS)

Obligations:

REGULATION (EU) No 692/2011 OF THE EUROPEAN PARLIAMENT AND OF

THE COUNCIL of 6 July 2011 concerning European statistics on tourism and repealing

Council Directive 95/57/EC

Commission Implementing Regulation (EU) No 1051/2011 of 20 October 2011

implementing Regulation (EU) No 692/2011 of the European Parliament and of the

Council concerning European statistics on tourism, as regards the structure of the quality

reports and the transmission of the data

In the domain of energy statistics we have 4 different questionnaires, which are divided in the

following statistical activities: Electric power stations; Energy; Energy production, sales and

22

fuel consumption; Consumption of fuel and energy. We made proposals to substitute some

variables with new data sources to only one questionnaire, which is “Energy”. Our proposals

and the results are compiled in the table below.


1. Data of produced, purchased and sold

electricity in Estonia from Elering.

Statistics Estonia is already receiving some

data from Elering. Our next step is to check

if Elering can give us necessary data

monthly.

5. Data of the fuel used for freight transport

from Estonian Road Administration.

SE is already using some of the data from

the Estonian Road Administration. Next step

is to check if we could also use the data

from the yearly car reviews. That would

enable us to find out the fuel usage of the

freight transport.

Obligation:

Regulation (EC) No 1099/2008 of the European Parliament and of the Council of 22

October 2008 on energy statistics

In the domain of transportation statistics we have 23 different questionnaires, which are divided

in the following statistical activities: Gas pipelines, Freight transport through ports, Freight

transport on the road, Ships in the harbor, Ship traffic, Ship-based economic and social

indicators, Ship registers, Marine accidents, Shipping-unloading, Air traffic, Flight accidents,

Traffic Register, Road transport, Sea transportation, International travel through ports, Railway

and rolling stock, Rail transport, Inland waterway transport, Vehicle registration, Tram-troll,

Tram and trolley transport, Aircraft Register, Air transport. We made proposals to substitute

some variables with new data sources to 3 questionnaires. Our proposals and the results are

compiled in the table below.

23


1. Number of air passengers, goods and mail

transported by air from Tallinn Airport

website "Air Traffic Review"

Our next step is to check, if Tallinn Airport

is willing to give us microdata about the

passengers, goods and mail.

2. The number of civil aircrafts from the

Estonian Civil Aviation Authority’s

(ECAA) website.

Our next step is to make sure how and who

is updating the website? And also how to

ensure that the website has relevant data.

3. Data about the trucks (total weight,

number of axles of the truck, type of

bodywork, type of engine) from Estonian

Road Administration.

SE is already using some of the data from

the Estonian Road Administration. Next step

is to check if we could also use the data

from the yearly car reviews.

Obligations:

Regulation (EU) No 70/2012 of the European Parliament and of the Council of 18

January 2012 on statistical returns in respect of the carriage of goods by road

Commission Regulation (EU) No 202/2010 of 10 March 2010 amending Regulation

(EC) No 6/2003 concerning the dissemination of statistics on the carriage of goods by

road

Commission Regulation (EC) No 1304/2007 of 7 November 2007 amending Council

Directive 95/64/EC, Council Regulation (EC) No 1172/98, Regulations (EC) No

91/2003 and (EC) No 1365/2006 of the European Parliament and of the Council with

respect to the establishment of NST 2007 as the unique classification for transported

goods in certain transport modes

Commission Regulation (EC) No 833/2007 of 16 July 2007 ending the transitional

period provided for in Council Regulation (EC) No 1172/98 on statistical returns in

respect of the carriage of goods by road

Commission Regulation (EC) No 642/2004 of 6 April 2004 on precision requirements

for data collected in accordance with Council Regulation (EC) No 1172/98 on statistical

returns in respect of the carriage of goods by road

Commission Regulation (EC) No 6/2003 of 30 December 2002 concerning the

dissemination of statistics on the carriage of goods by road

24

Commission Regulation (EC) No 2163/2001 of 7 November 2001 concerning the

technical arrangements for data transmission for statistics on the carriage of goods by

road

Commission Regulation (EU) No 520/2010 of 16 June 2010 amending Regulation (EC)

No 831/2002 concerning access to confidential data for scientific purposes as regards

the available surveys and statistical data sources

Directive 2009/42/EC of the European Parliament and of the Council of 6 May 2009 on

statistical returns in respect of carriage of goods and passengers by sea (Recast)

Commission Regulation (EC) No 1304/2007 of 7 November 2007 amending Council

Directive 95/64/EC, Council Regulation (EC) No 1172/98, Regulations (EC) No

91/2003 and (EC) No 1365/2006 of the European Parliament and of the Council with

respect to the establishment of NST 2007 as the unique classification for transported

goods in certain transport modes

2010/216/: Commission Decision of 14 April 2010 amending Directive 2009/42/EC of

the European Parliament and of the Council on statistical returns in respect of carriage

of goods and passengers by sea

Commission delegated decision of 3 February 2012 amending Directive 2009/42/EC of

the European Parliament and of the Council on statistical returns in respect of carriage

of goods and passengers by sea


February 2003 on statistical returns in respect of the carriage of passengers, freight and

mail by air

Commission Regulation (EC) No 158/2007 of 16 February 2007 amending Commission

Regulation (EC) No 1358/2003 as regards the list of Community airports

UNECE, ITF and Eurostat Common Questionnaire for Transport Statistics Gentlemen's

Agreement

Commission Regulation (EC) No 546/2005 of 8 April 2005 adapting Regulation (EC)

No 437/2003 of the European Parliament and of the Council as regards the allocation of

reporting-country codes and amending Commission Regulation (EC) No 1358/2003 as

regards the updating of the list of Community airports

Commission Regulation (EC) No 1358/2003 of 31 July 2003 implementing Regulation

(EC) No 437/2003 of the European Parliament and of the Council on statistical returns

25

in respect of the carriage of passengers, freight and mail by air and amending Annexes

I and II thereto

In the domain of IT, research and development statistics we have 5 different questionnaires,

which are divided in the following statistical activities: IT in the company, IT in the household,

Business Innovation Survey, Research and development, Research and Development (in the

company). We made proposals to substitute some variables with new data sources to 2

questionnaires. Our proposals and the results are compiled in the table below.


1. The number of employees in the research

and development field with their scientific

field, age and gender from Working Register

(TÖR)

This suggestion was accepted partly. The

information that TÖR has about the

employees in the research and development

field is not matching with the definitions

that specific questionnaires have.

But, TÖR can be used for checking the data

collected by questionnaire.

2. The number of Information and

Communication Technology specialists in a

company from Working Register (TÖR)

This suggestion was accepted partly.

Initially, TÖR could be used for checking

the data collected by questionnaire, and if

TÖR’s quality gets better, we might be able

to fully use it.

Obligations:


April 2004 concerning Community statistics on the information society

Commission Regulation (EC) No 753/2004 of 22 April 2004 implementing Decision

No 1608/2003/EC of the European Parliament and of the Council as regards statistics

on science and technology

Commission Implementing Regulation (EU) No 995/2012 of 26 October 2012 laying

down detailed rules for the implementation of Decision No 1608/2003/EC of the

European Parliament and of the Council concerning the production and development of

Community statistics on science and technology

26

3.1. Summary and encountered difficulties

The completion of this task was really challenging for us, because our temporary employee had

to work through a lot of information. However, we managed to analyse the questionnaires and

available data sources of culture, agriculture, accommodation, transportation, IT research and

development statistics and now we have the overview of the step by step processes that need to

be done in order to find new sources or new use cases for the administrative data already in use.

Some of our proposals were easily applicable, but some of the suggestions need further analysis

from the statistical domain experts.

In the field of culture we had six proposals. The proposals 2 and 3 are waiting for the redesign

of the questionnaire Museum and the redesign process will not be finished before 2021.

Regarding the proposal 1 to use data form the Estonian Film Database, we have already started

the negotiation process and drawn the draft cooperation agreement. Hopefully it will be signed

this year and next year we can start using the data.

The proposal 4 is already partly in production. We are currently receiving data from

Piletimaailm, but this company is not the only seller of culture events tickets in Estonia. So for

more complete data, we have started the negotiations with the other company Piletilevi.

However, the negotiations with the private sector companies are time consuming and we are

not sure when we will be able to receive data from Piletilevi.

The proposals 5 and 6 are about using the Working register (TÖR) data. As Working register

is a quite new register in Estonia, the data is still quite incomplete as regards of job titles.

However, we are expecting the completeness to get better by the end of this year and then it

will be able to use the data across all statistical domains.

In the field of agriculture we had seven proposals. Proposals 1, 2 and 3 are already in

production. The proposal 4 was also about using the Working register and it has to wait for

better data completeness and analysis form statistical domain experts.

We already received the first dataset form Estonian Land Board according to new methodology,

but the usability has to be analysed further and maybe we still need to process the data more,

before it can be used directly in our statistical production process.

27

Proposal 6 to receive further information on organic farming is still in the draft agreement

format. We have compiled our data needs and explained them to the Agricultural Board, but as

their information system is still in development, we have not been able to receive the data or

sign the new agreement yet. Hopefully we will be able to sign the agreement and get first

datasets at the beginning of 2020.

Proposal 7 is not in production yet, because we have not received the confirmation from

Veterinary and Food Board that they have data about fur animals. Our next step is to arrange

the meeting with the data owner and clarify our data needs.

In the field of accommodation statistics we had two proposals to start using data from Enterprise

Estonia. Our next step is to find out, how reliable is the information in this database. We

currently have information that the enterprises insert the information there themselves

voluntarily and that means the data completeness may not be that good.

In the field of energy statistics we also had two proposals. Proposal number 1 is about using

monthly data from Elering. Currently we are receiving data from Elering once a year and since

Elering is a private sector company the negotiations for more frequent data capturing will take

time. We have planned to have a meeting with them to discuss whether it would be possible to

start capturing monthly data in automated way for example using x-road.

Proposal 2 is about using more data from Estonian Road Administration. We are in the

negotiations process to renew our data delivery agreement and automate the data capturing from

the Estonian Road Administration. However, the negotiations are taking some time, because

the information systems of Estonian Road Administration are in the development process. We

are finalizing the draft agreement with our data needs and we hope to renew the agreement

during next year.

In the field of transport statistics we had three proposals. The proposal number 1 involves

getting microdata from Tallinn Airport. Unfortunately, their first answer was negative, because

they consider giving microdata to third parties as a security risk. At the moment it is still unclear

whether we would be able to justify our data needs legally and prove our data protection rules

will ensure that it is safe to send data to Statistics Estonia.

Proposal number 2 was about using data from Estonian Civil Aviation Authority’s website. Our

next step is to find out, how the renewal of the website is organised. For that we have to contact

28

the authority responsible for the website, hopefully we will get some answers by the end of this

year and then can decide whether the proposal can be realised in the production process.

Proposal number 3 was also about using additional data from Estonian Road Administration

and that will have to wait until the negotiations and renewal of the data delivery agreement have

been finished.

In the field of IT, research and development we had two proposals, both were about using the

data from Working register. We will wait until the end of this year to analyse the completeness

and quality of the register data and then can decide how different statistical domains can use

the data in their statistical activities.

Main difficulties of performing this task was going through huge amount of information and

trying to find new solutions and sources for the questionnaire-based statistics. Statistics Estonia

is aiming to use more administrative data and analysing questionnaires domainwise is

innovative approach for us that has not been done before because the lack of the human

resources.

4. Mapping management processes of administrative data and

metadata in Statistics Estonia

Statistics Estonia’s goal is to produce high quality statistics as efficiently as possible. Efficient

production is possible if we improve and widen our administrative data use. Wider use of

administrative data also reduces administrative and response burden. Statistics Estonia is

already using over one hundred administrative sources. However, it has become challenging to

manage all the information related to administrative data sources, for example information

about cooperation agreements, deadlines, process phases etc.

During the project we have started to analyse and map the processes of managing administrative

data and metadata in Statistics Estonia. The first task was to map the “as is” process. Below is

the result of the mapping of “as is” processes.

Figure 1. As-is process of managing administrative data and metadata in Statistics Estonia

This process map covers the process of using and managing administrative data from the first

phase where the data need is identified to the actual usage of the data in statistical production.

The project map involves five different departments of Statistics Estonia and the process goes

through the GSBPM phases Specify Needs, Design, Build, Collect, Process and Metadata

Management/Quality Management.

The central role in this process map has the Statistics Design Department (ATAO). The

Statistics Design Department was created in 2017 and since then it has the central role of

managing administrative data and metadata. The metadata management has been centralised in

Statistics Estonia Methodology Department since 2004 and managing and capturing

administrative data was formerly the responsibility of Data Warehouse Department. But as

Statistics Estonia has started using administrative data more and aims to create and develop

closer partnerships with the data owners, the management of metadata and administrative data

was decided to centralise to the Metadata team in the Statistics Design Department.

The process map above describes the processes after the creation of Statistics Design

Department. We are working on optimizing the processes of managing administrative data, it

means we want to provide the data more efficiently and in more standardized way for the

statistical production.

Below is the result of mapping the “to be” processes. For better understanding we split

administrative data management process. Our aim is to simplify the usage and analysis of

administrative data for the statistical departments and also to shorten the time of getting access

to new data. Figure 2 shows the process of managing new or changed data need. Metadata team

has the central role in this process and the process goes through the Specify Needs (1) and

Design (2) phase of GSBPM. After getting input from Analysts, the Design phase is carried out

by the Methodologists in Metadata team. The Design phase for administrative data includes

defining the variables that need to be captured from the data source, compiling information for

the data request or contract and preparing the data requests and contracts. In this phase, most of

the communication and negotiation with data owners takes place. The administrative data

manager’s role in this phase is similar to that of an intermediary or a “translator” – it is important

to define the data needs as clearly as possible.

Administrative data management in the Design phase includes describing metadata for

administrative data centrally, in cooperation with the owners of registers and statistical domain

departments.

31

The wide use of administrative data in SE has produced a lot of information related to data

sources. For example, information about cooperation agreements, data requests, data delivery

deadlines, data structures, formats, additional information about data, communication with data

owners, process phases, etc.

The deadlines for data transmission in SE are currently managed and visualised in the web

application JIRA. JIRA enables to monitor the process of data deliveries, data loading,

processing, etc. There are different tasks for every data delivery, and every task and subtask

can be assigned to a different person. Whenever problems or obstacles arise in some process

phase, the questions and answers are inserted in JIRA as comments. This enables to get an

overview of the workflow related to the specific dataset.

Figure 2. To-be process of agreements with data owners and managing administrative data and metadata

Figure 3 shows the data capturing process that ends with the making the data available for

analysts. This process goes through the Build (3) and Collect (4) phases of GSBPM.

Build and Collect phases for administrative data are the responsibility of the Data Service

Department. In these phases, pre-processing the data and making them available to the NSI’s

in-house applications is the role of administrative data managers. It is ensured through these

procedures that there are no duplicate data and that the data are ready for statistical analysis.

Administrative data are captured through different channels:

1) encrypted .csv or .xls(x) files by e-mail, FTP or cloud services;

2) X-Road services that are divided into:

• pull services – the data owner has developed an X-Road service the content of which is

suitable for SE. The data are pulled to SE through the X-Road service.

• push services to xGate – the data are pushed to SE through our xGate service. This is

the preferred channel for data capture, because SE validates the received data against XSD, and

the data delivery process is controlled by SE.

When administrative data have been captured through different channels, the loading processes

begin. The first step is loading the data to the Initial Observation Registry (IOR). When the data

are sent by .csv or .xls(x) files, the data will be loaded to Oracle database as they arrive. Loading

and processing the data that has been sent with files is time-consuming for us, because there are

constant problems with agreed data structures and wrong data formats.

When data are captured by X-Road pull services, the XML file is parsed to the IOR by Oracle

tools. When data are captured by xGate, the file is parsed and validated against the XSD file

generated in the iMeta system. After loading the data to IOR, it is possible to give the first

feedback about the received data. The captured data are unloadable if the formats are incorrect

or there are missing variables.

The next step is Data Staging Area (DSA), where data structure checks and conversions to

correct formats take place. These checks and conversions are done according to the metadata

descriptions in iMeta. It is also possible to develop more contextual checks, but for this, the

input for the rules is needed from statistical domain departments. After DSA, it is possible to

automatically generate a quality report about the delivered dataset.

34

The last step is to make the data available for users, which means that the data are loaded to

Final Observation Registry (FOR) and are pseudonymised if the data include personal data. The

process of pseudonymisation involves removing personal identification numbers, names and

contacts from the data. PIN-numbers are replaced with unique identifiers that allow the data to

be joined. The unique in-house identifiers are not derived from PIN-numbers, which means that

it is not possible to convert the unique identifiers mathematically to PIN-numbers.

The data are stored and versioned in Oracle databases, which are available for use to statistical

domain departments through SAS or R.

Figure 3. To-be process of data capturing and making it available to users


After the creation of Statistics Design Department the process of managing administrative data

changed already. However, our goal is to redesign the processes to provide administrative data

for statistical departments more efficiently and in the standardised way.

The main difficulty of mapping the current process was related to the fact that many

departments are involved in this process. This also makes optimizing the processes challenging,

because every step of the process has to be analysed thoroughly in order to find the solutions

of how to simplify the process and shorten the time used for different project steps.

For having better understanding how to make our administrative data management more

efficient, it was very helpful to read the document “Good practices in accessing, using and

contributing to the management of administrative data” (Eurostat, 2018). The main advantage

of this document is the compilation of experiences of different NSI’s. It is assuring to know that

other statistical offices are on the same path and we are all moving towards better partnerships

and administrative management processes. This document also gives an idea which are the

countries we could learn from and ask for guidance.

To-be processes were mapped with as much detail as possible. That enables us to monitor the

processes and make changes, if necessary.

Our next step is to create description of each process step and document how, who and what is

done in every stage of the process. The goal is to create written instructions in order to make

workflow more smooth and to enable new team members to know what to do more easily.

5. Creating vision document on how to give feedback to the data

owners about data transmission deadlines and agreed data

structures

One part of optimising and standardising the processes of managing administrative data related

information, is the automation of different notifications and feedbacks.

Currently we are sending e-mails prior to data transmission deadlines manually and only to

those data owners, who tend to forget their data deliveries.

37

At the moment Statistics Estonia does not have an information system for automated data

structure checks and for monitoring data transmission deadlines of administrative sources. We

are in the progress of working out the vision document on how to give feedback to the data

owners about data transmission deadlines and agreed data structures.

We have analysed what type of information we need to manage in the information system – this

includes the deadlines of data deliveries, related contacts and contract information and also the

information about data structures, formats and metadata.

We have also analysed different information systems that are already in use in our statistical

production process and there are some information systems that could be developed further to

provide some of the functionality needed for managing different information and send out

automatic notifications.

If the compliance with the agreed data structures and metadata would be checked automatically,

then we also could generate the quality report for sending the feedback to the data owners.

The analysis of our current information systems showed that we would need to develop new

information system to enable automated checks and feedback.

SE has created a vision document to develop new information system Administrative Data Gate.

It will help automate the administrative data management in Design, Build and Collect process

phases.

The main functionalities of the Administrative Data Gate are:

• Monitoring data deliveries and sending automated feedback and reminders to data

holders.

• Reading metadata from SE’s metadata management system and checking delivered data

against the agreed structures and content.

• Functionality to convert data to formats or structures needed by statistical domain

departments.

• Administrative Data Gate will allow to log and monitor every procedure that is done

with the specific dataset.

• Dashboard with main operations visible for users.

The Administrative Data Gate would actually become the one channel, where all the

administrative data goes through, as it is shown in Figure 4. The input data can come in different

formats (csv, txt, xls, ods, xml, json) or from different channels (x-road push/pull services, e-

mails), but all the data is guided through the Adminstrative Data Gate, where automated data

checking and corrections are done.

38

After the data checking, the quality feedback report is generated and sent to the data owner.

The quality feedback report’s content is not clear yet, but it will definitely contain information

about data structures and data formats compliance.

Figure 4. Dataflow through Administrative Data Gate

5.1. Summary and encountered difficulties

We have analysed our needs and have the overview of the functionality that is needed to manage

administrative data related information efficiently and also to run automated controls on

delivered data sets.

However, it has been difficult to decide whether we need to develop new information system

to provide the needed functionalities or can some of our used applications developed to fulfil

the needs. The analysis for this showed, that we need to develop new information system.

Now the challenge is to find financial and human resources to start the development process of

the Administrative Data Gate. Statistics Estonia has already applied for financial support from

the SF funds, but the feedback for the application has not arrived yet. So the timeline for the

development process is still unknown.

39

6. Describing metadata for the data sources whose cooperation

agreements are renewed in the metadata system

Statistics Estonia is using about one hundred different administrative data sources in our

statistical production process. Describing and harmonising the metadata for administrative data

is time consuming, because there are several metaobjects in our metadatadata management

system iMeta that have to be defined in order to fully document the captured data.

We are in the process of describing all the metadata for received administrative data, but during

this grant project we will concentrate on describing and standardising the metadata of those

data sources, whose cooperation agreements are signed before 2010.

We have done preparations for renewing the data delivery agreements and some of the metadata

is already described in our metadata management system.

The metadata description process involves also the data owners and analytics from statistical

departments. The steps for describing the metadata for administrative data are following:

• analysing already received data and adding variable descriptions, classifications and

code lists to our metadata management system;

• describing the rest of metadata related to the first sub-task according to Neuchâtel

terminology model (conceptual variables, statistical characteristics, statistical unit types);

• cooperating with the leaders of the statistical activities to describe and harmonize

metadata efficiently;

• describing metadata in the metadata system for additional data needs and giving the

input for cooperation agreements renewal process.

The Neuchâtel terminology model (Neuchâtel Group, 2004), has been used for describing the

variables in our metadata management system. In this model, the variables are described in

three levels – conceptual variable, statistical characteristic (object variable) and contextual

variable. Statistical unit type is an entity for which information is sought and for which statistics

are ultimately compiled. Statistical characteristic is a characteristic of a statistical unit type.

Conceptual variable (concept) provides a general description of the meaning of the statistical

characteristic without explicit reference to any particular statistical unit type. Contextual

variable describes the variable in the context of a statistical activity. Contextual variables can

be defined as register variables or cube variables.

40

Our goal was to describe and harmonise all the metadata of those administrative data sources,

whose cooperation agreement was signed before 2010.

So we started out with describing and harmonising all necessary metadata objects for:

Estonian Tax and Customs Board

National Institute for Health Development

Estonian Land Board

Agricultural Board

Agricultural Research Center

The Estonian Tax and Customs Board is a very important data source for us. They are the

owners of several state registers, and SE captures 80 different datasets from them every year.

The frequency of data capture varies from once a day to once a year. For this source we had to

describe and harmonise 483 different contextual variables and also all the corresponding

metadata objects. There were quite many variables that had to be specified with the data owners,

because the forms of tax and customs declarations are constantly changing and for the

contextual description of metadata, we had to be sure to understand each variable thoroughly.

The National Institute for Health Development is the source for death and birth statistics for

SE. From this source we capture 147 different variables. The content of those variables was

quite clear for us and it was not too troublesome to describe them in our metadata management

system. Unfortunately we found out, that National Institute for Health Development is starting

major developments in their information systems in order to unite different smaller registers

into one big register. That means we have to be ready for changes in data content and also revise

our metadata descriptions, when the development has taken place.

The Estonian Land Board has always been good cooperation partner for Statistics Estonia. They

are the owners of Address Data System, that enables all the registers to exchange address data

in harmonised way. For this source we had to describe 162 variables and corresponding

metadata objects. As we started to prepare the new data delivery contract and review our current

data needs, we also discovered that due to some changes in Estonian legislation the Estonian

Land Board does not collect some of the variables that are needed in our statistical production

process from the start of 2019. That means our analysts have to change the methodology of

their statistical activities.

41

For agricultural statistics, one of the most important source is the Agricultural Board. At the

moment, we capture data once a year, but with signing the new data delivery agreement we

would like to start capturing data twice a year. We described 88 variables and corresponding

metadata objects for Agricultural Board, some of those variables are still in draft version until

our negotiation process to renew the agreement is finalised. However, we have had several very

useful meetings with the source and also were able to incorporate the available data more

efficiently in our statistical production process.

Agricultural Research Center is also an important source for agricultural statistics. Hopefully,

we will start receiving twenty data sets and 63 variables from that source. As the negotiations

for new data delivery agreement are still in progress, also the metadata description is in draft

version. We are ready to change or supplement our current metadata descriptions, when the

agreement is finalised.


The main difficulty of performing this task is understanding the conceptual meaning of the data

correctly. For standardising and harmonising metadata of administrative data and documenting

it in our metadata management system iMeta, we needed to involve the data owners and also

the data users from our statistical departments.

There are two sources, Agricultural Board and Agricultural Research Center, whose metadata

descriptions are partly in draft version. That means that we have done all necessary preparations

for describing them, but they are not published in our metadata management system yet. We

are waiting to finalise the negotiations to renew the data delivery agreements and then can

publish also the metadata descriptions.

So although the data descriptions are done and managed centrally in Statistics Estonia, there

are still other parties to the process, whose knowledge had to be considered. This means that

the process is time consuming and some meetings for agreeing on data definitions have to be

conducted.

42

7. Renewing cooperation agreements made with data owners

before the year 2010

During the grant project we plan to renew the cooperation agreements which are in force and

signed before 2010. It is important, because before 2010 Statistics Estonia used a different

contract format, which did not specify for example the delivered data structure. We are now

moving towards automated data capturing and controlling systems, so it is really important to

agree on specific data structures, formats and metadata.

Our analysis of data delivery contracts showed that we need to renew our contracts with five

different institutions. And almost all those institutions own several registries from where

Statistics Estonia captures different data sets.

We have started preparing new agreements with:

Estonian Tax and Customs Board

National Institute for Health Development

Estonian Land Board

Agricultural Board

Agricultural Research Center

It is important to use the new contract format where the main part of the contract is updated and

also the annex for detailed data compositions. The main part of the new contract format consists

of:

1. General information (details of the parties and the purpose of the contract);

2. List of contract’s documents (annexes to the agreement are mentioned if any);

3. Object of the contract (content of the contract, explanation of the concept “data” and the

method of transmission);

4. Rights and obligations for the parties (a list of rights and obligations that all parties need

to follow);

5. Confidentiality (the confidentiality obligation for the parties is stated);

6. Contract performance obligations (consists following information: data transmission is

at no cost, but the costs of performance of the contract shall be borne by each party from

its budget);

7. Force majeure (a list of situations which obstruct the continuation or lawful existence

of a contract amidst the parties);

43

8. Modification, completion and termination of the contract (consists information about

the rules for modification, completion and termination of the contract to all parties);

9. Solving arguments (how the disputes arising from performance of the contract shall be

resolved);

10. Other terms;

11. Contact information.

New annex(es) include the composition of the data at the variable level and contact persons for

the transmission of data.

The renewing process included describing metadata for the captured datasets, because in the

annexes we always define the data composition in detailed level.

Estonian Tax and Customs Board is a major data source for us. The data delivery agreement

with them is in force since 2007. Since that SE’s data needs have grown and also quite many

changes in the registers of Tax and Customs Board have taken place. It was absolutely essential

to renew the cooperation agreement. For that we started the preparations from mapping the

actual data needs of SE. For every data set we had meetings with the analysts who need the data

and specified the data content. From those meetings we gathered questions and information that

needed to be negotiated with the data source.

Some of the negotiations with the Tax and Customs Board took place via e-mails and phone

calls. However, it is always more efficient to have the necessary persons around one table to

agree on something.

The data content negotiations needed the involvement of the subject matter experts from both

sides. As Statistics Estonia is using 80 different data sets from Tax and Customs Board, we had

to arrange several meetings to specify the data content.

There were also separate meeting with the lawyers of both parties. Statistics Estonia has worked

out the standard data delivery agreement. However, the Tax and Customs Board has their own

standard agreements for data exchange. So, it was necessary to address legal issues and work

out the agreement that suits both institutions. The legal negotiations were also successful and

we managed to sign the new data delivery agreement with the Tax and Customs Board in May

2019.

The National Institute for Health Development is an important source for population and social

statistics. With that institutions we have two separate data delivery agreements – one for each

44

register. Our goal is to have only one agreement with the National Institute for Health

Development that cover the birth and death data. At the mo

subtitle “promoting the usage of administrative data in statistics · to improve the use of...

Documents