subtitle “promoting the usage of administrative data in statistics · to improve the use of...
TRANSCRIPT
-
Subtitle “Promoting the usage of administrative data in Statistics
Estonia by describing and harmonising metadata”
Final grant report
-
2
Table of Contents
Executive Summary .............................................................................................................................. 3
List of acronyms .................................................................................................................................... 5
Introduction ........................................................................................................................................... 6
1. Obtaining knowledge about best practices of administrative data and metadata management
system from another Member State (study visit) ............................................................................... 7
1.1. Summary and difficulties encountered ................................................................................... 11
2. Analysing and compiling data about current agreements, data sources and data structure
descriptions .......................................................................................................................................... 12
2.1. Summary and difficulties encountered ................................................................................... 15
3. Analysing the questionnaires and finding variables that could be replaced by administrative
data ....................................................................................................................................................... 15
3.1. Summary and encountered difficulties ................................................................................... 26
4. Mapping management processes of administrative data and metadata in Statistics Estonia .. 28
4.1. Summary and difficulties encountered ................................................................................... 36
5. Creating vision document on how to give feedback to the data owners about data transmission
deadlines and agreed data structures ................................................................................................ 36
5.1. Summary and encountered difficulties ................................................................................... 38
6. Describing metadata for the data sources whose cooperation agreements are renewed in the
metadata system .................................................................................................................................. 39
6.1. Summary and difficulties encountered ................................................................................... 41
7. Renewing cooperation agreements made with data owners before the year 2010 .................... 42
7.1. Summary and difficulties encountered ................................................................................... 45
References ............................................................................................................................................ 45
-
3
Executive Summary
According to Statistics Estonia’s strategy, our goal is to produce high quality statistics with as
low administrative burden and as high efficiency as possible. In order to achieve this, we need
to improve the use of administrative data and describe the related metadata in our metadata
management system.
At the moment, Statistics Estonia uses over 100 different administrative data sources (state
registries) in the statistical production process. Managing, describing and improving the
related information and metadata of those sources is a challenging and ongoing process.
In this project we have described and standardised metadata for the data sources whose
cooperation agreement needed updating. During the process we also had the chance to
develop and strengthen the partnership with the data owners, which is the key element of
using the data of administrative sources.
Our project started with learning form Statistics Austria’s experience and we were able to
analyse and work through all our administrative data management related information to start
managing it more efficiently.
Efficient data management is only possible if we have optimized management processes.
During the project we were able to map the as-is and to-be processes of administrative data
and metadata management.
The volume of administrative data and metadata is growing fast, so it is now clear, that we
need to move towards more automated processes. For that reason we have created the vision
document for developing the new information system Administrative Data Gate, that will
allow to send automated feedback and reminders to the data owners and also automate the
data checking processes.
The grant project enabled us to analyse our current questionnaires domain by domain and
make suggestions to use additional administrative data sources to lower the response burden.
This analyse was a new approach for us, because usually the statisticians are responsible for
their statistical activities. But now we analysed different questionnaires together centrally and
had the opportunity to give the statisticians some new ideas, which sources to use and
improve the usage of administrative data in our organisation.
-
4
Statistics Estonia is very grateful for being able to realise the activities in this grant project. It
was really helpful that we could have our temporary employee who worked through a lot of
information and we were able to start moving towards more automated processes of managing
administrative data.
-
5
List of acronyms
ATAO – Statistics Design Department
BORA – The Beneficial Owners Register Act
EAS – Enterprise Estonia
ECAA – Estonian Civil Aviation Authority
EMDE – Electronic Maritime Information System
GSBPM – Generic Statistical Business Process Model
MUIS – System of (Estonian) Museums
RIHA – Administration system for the state information system
SE – Statistics Estonia
sDWH – Statistical Datawarehouse
TÖR – Working register
-
6
Introduction
The main objective of this grant project is to improve the use of administrative data sources in
Statistics Estonia. According to Statistics Estonia’s strategy our goal is to produce high quality
statistics with as low administrative burden and as high efficiency as possible.
Producing high quality statistics is possible only when we have standardised metadata and
efficient production processes. Harmonised metadata is usable across all statistical domains,
which means if one data source is used in different statistical activities the metadata will be
described only once. The described metadata will be available directly for the users and for the
systems in the live production environment.
Statistics Estonia has set the goal to reduce administrative and response burden for the
respondents. This is possible only if we use more administrative sources and quit using some
of the questionnaires or prefilling some values on the questionnaire to help the respondent to
answer.
Improving the use of administrative data for the statistical production has one key element –
close cooperation and partnership with the data owners. At the moment Statistics Estonia uses
over 100 different administrative sources and our goal is to build closer cooperation with the
data owners in order to ensure efficient negotiations and high quality data delivery. One
important part of the cooperation are valid and up to date data delivery contracts – it is important
that both sides of the contract know their responsibilities and that data owners know why the
data is needed and that it is securely processed and stored in Statistics Estonia.
We have planned several activities during the project that will help to re-use information,
produce statistics more efficiently, reduce administrative and response burden.
To perform the tasks of the grant project we have conducted weekly project team meetings,
where we discuss and agree on the tasks of upcoming week. We are using the web application
(JIRA) for assigning and monitoring planned activities. All the team member need to report
weekly on the progress of their tasks and possible difficulties, to ensure the execution of the
project on schedule.
-
7
Now the overview of the progress on the tasks of the grant project is given. The overview is
written task by task and also the level of progress is evaluated. For every task the encountered
difficulties and overall summary are described briefly.
1. Obtaining knowledge about best practices of administrative data
and metadata management system from another Member State
(study visit)
In order to use best practices available in other European statistical offices, we planned to have
a study visit at the beginning of the project. In order to choose our possible destinations, we
gathered some information from our colleagues, who have attended different Eurostat working
groups. We received information that Austria and Finland both have advanced systems for
managing metadata and administrative data. Both countries also conduct register based
population and household census.
Statistics Austria was able to welcome us in October to share their knowledge and experience.
And as Austria is considered one of the leaders in European statistical system concerning the
use of administrative data, we were happy to plan the 1,5 day agenda for the study visit.
The study visit took place from 17 to 19 of October and we had a very full agenda for the 18
and 19 of October. From Statistics Estonia four people attended the visit:
two Leading Methodologists from Statistics Design Department (responsible for
negotiations with data owners, describing metadata of administrative data and preparing
the contracts);
Developer from the Data Service Department (responsible for the data warehouse and
developing the new IT system for administrative data management and automated
controls);
Head of Data Description from the Statistics Design Department (responsible for the
processes of managing administrative data and describing metadata in Statistics
Estonia).
The agenda of the study visit was full of very useful and interesting topics for us. The overview
of the study visit by agenda topics is given below.
• Coordination and guideline for administrative data
-
8
The first topic was an introduction of how Statistics Austria manages their administrative data
and related information. Statistics Austria uses over 500 different data sources and over 50
sources are used for the register based census. They do not have data delivery contracts with all
the sources, because the management of the contracts would be too burdensome and their
national statistical law says that Statistics Austria can have the data for free from the data
owners.
Statistics Austria currently has a separate metadata database for administrative data. It is the
ACCESS-database used since 2008. The aim of this database is to get an overview of all
administrative data available in Statistics Austria, have the list of projects that use
administrative data and a search functionality to choose from the available data. In this database
also information about external and internal contact persons and organisation details are stored.
Statistics Austria plans to integrate the current metadata database to their new centralised data
and metadata management system Statistical Datawarehouse (sDWH). Then the metadata for
the administrative data will be extended and also data structures, attributes, classification lists,
quality indicators, data formats, statistical units, reference dates, key words and legal basis will
be available.
• Statistical Datawarehouse (sDWH) – Motivation
Statistics Austria has developed statistical datawarehouse to guarantee internal, house-wide,
easily accessible data/metadata platform. sDWH project was established in 2014 and technical
solution was fixed in 2015. In April 2017 Statistics Austria started the implementation phase
rolewise and departmentwise.
Metadata is described in the sDWH and has to be defined before it is possible to incorporate a
new dataset. This supports the housewide harmonisation of concepts and other metadata.
There are different roles for the sDWH users which helps to manage the workflow of data and
metadata management. For example the Administrator can define represented variables, data
sets and load the data, but Quality Manager has to approve or disapprove the represented
variables and data sets.
• Statistical Datawarehouse (sDWH) – Application (handling of metadata)
Statistics Austria also demonstrated the live demo of their sDWH. For us it was most impressive
to see that data and metadata are stored in one application and that it is also possible to link
different datasets in the system and visualise the results. In the sDWH all links are also
-
9
visualised, for example it can be seen which data set is used in which project. The system also
shows possible joining options and the data descriptions at the variable level are only one click
away.
sDWH enables to mark some variables and data sets as protected and then the in-house data
owner has to provide the permission for using the data set. The process of asking and granting
the permission is also part of the sDWH – all the permissions and explanations are stored in
one system, the users do not have to send separate e-mails for that.
• Register-based Census - an Overview
In 2001 the last traditional census was conducted in Austria, it’s cost was 72 million euros. In
2011 the register based census costed only 10 million euros.
Statistics Austria has more than 50 data sources for the register based census. In 2006 they also
had the register based census test, where methods, data procedures and use of registers were
successfully tested.
• Workflow of a Register-based Census
As Statistics Estonia is also planning to conduct register based census in 2021, it was interesting
to hear about the workflow of the Statistics Austria’s register based census team. They have 13-
15 persons permanently on the census team, and for the census in 2011 they also had about 20
temporary team members responsible for different tasks.
Every team member is responsible for capturing some of the data sources and they remind the
data owner one month in advance about the need to deliver the data.
Statistics Austria has process documentation system for management reporting, timetable and
production schedule.
The process documentation is available in ADAM/EVA database – for example the
timetable/calendar for execution of monthly, quarterly, yearly processes is held.
ADAM/EVA database and documentation (handling of metadata)
-
10
The data for the census is stored currently in ADAM/EVA database, but future plan is to
incorporate the data to sDWH. ADAM/EVA database is also used for metadata documentation.
There is a search option for tables, variables, attributes and variable values.
• Other projects based on ADAM/EVA database
ADAM/EVA database is also used for other projects in Statistics Austria. For example labour
force survey, national accounts, rich frame for social statistics, monitoring of education-related
employment, tracking of graduates and register-based labour market career.
Rich frame is used for calibration/post-stratification, non-response analysis and substituting
survey questions with administrative data.
• Statistical Datawarehouse (sDWH) – Future plans
Statistics Austria shared with us the future plans for the sDWH. They are planning to integrate
all administrative data and metadata in the warehouse. Then they will have fully integrated and
harmonised metadata management system.
Statistics Austria is also planning to create GeoWizard for automatic creation of working maps
for internal and external use and all the necessary data, metadata and information will be stored
in standardised way in sDWH.
For visualisation of statistical information Statistics Austria is currently in the process of
evaluation of the Tableau software. Visualisation is important internally for the heads of
departments to create reports about data usage and availability. Externally it is planned to
develop dashboards for disseminating statistics in more user-friendly format.
• Quality assessment for Register-based statistics / metadata of administrative data
Statistics Austria has developed three stages quality evaluation system for the administrative
data. The data quality is evaluated at the raw data phase, when the registers provide the data.
The next phase is combining and linking the data in the central database and then the next
evaluation process takes place. After combinations and imputations the data is available in the
final data pool and the quality of data is evaluated again.
• Census - Analysis of Residence
Statistics Austria introduced us how they avoid overcoverage of residents. They have the system
that if the person has only one record in the Central Persons Register, they have to confirm the
-
11
residence by answering the official letter. About 69 thousand letters were sent out last time to
confirm the residence. If the residence is not confirmed by answering the official letter the
person is a candidate for deletion. However, the local authorities have the opportunity to oppose
the deletions by proving that the person is still the resident of their municipality.
For conducting the census successfully, Statistics Austria has annual quality evaluation for the
residence data, all sources and outputs are analysed and evaluated.
• Business Register for Administrative Purposes and Beneficial Owner Register
Last topic in the agenda was the introduction of two Austrian registers.
Every entity taking part of the E-Government processes needs to be registered in one of the
state registers. The business register combines different registers and is the basis for statistical
registers.
The automatic data transmission times are different, some registers transmit the data to the
business registers weekly, but some registers have the online connection and the data is always
up to date.
The Austrian Beneficial Owner Register Act (BORA) obliges legal entities to register their
owners. This should equip financial supervisors with a tool to fight money laundering and
terrorism financing.
Due to the BR for administrative purposes Statistics Austria is optimal partner to technically
implement that register for the Austrian Ministry of Finance.
The BO register is a great business case and the BORA explicitly allows Statistics Austria the
usage of data for statistical purposes.
1.1. Summary and difficulties encountered
In conclusion the study visit was very successful for us, we had the opportunity to learn from
Statistics Austria’s experiences and best practices. Although we are at the different stage of
using and managing administrative data than Statistics Austria, we got new ideas about how to
optimise the processes of documenting metadata of administrative data.
Firstly, we were surprised to hear, that Statistics Austria does not have formal written contracts
with all the data owners. And as our national statistical law also says, that data from the
-
12
registries can be obtained for free for the purposes of official statistics, then we are considering
the solutions how to make the data transmission agreements more flexible. Right now we
mostly have written contracts with the data owners or if the data transmission is done for
piloting the data usage, we send the data request to get the data. We are currently developing
the form of data transmission agreement that would describe the needed data structures and
deadlines, but would be flexible and not so burdensome to change and keep up to date.
Secondly, we really appreciated the workflow management of sDWH. In our current metadata
management system iMeta the metadata can be described only by the metadata team members
and for the correct metadata we ask the input from statistical departments by Excel forms.
However, we are currently piloting new metadata information system Colectica, where the
workflow service is also integrated. So, in the future we want to implement the similar system
as Statistics Austria, that also the analytics can insert the metadata, but before publishing it for
use, the administrator from metadata team needs to approve the metadata.
Thirdly, after the visit we are convinced that in the future the metadata and administrative data
should be integrated to one information system in order to use the data more efficiently in the
statistical production process. The Data Service Department started piloting data virtualisation
tool Denodo, where data catalogues can be created that integrate data and metadata into one
system. Implementing this application would be most useful for the statistical departments,
because then they do not have to link data and metadata themselves anymore.
The main difficulty of performing this task, was finding suitable time for the study visit for
Statistics Estonia and Statistics Austria. It was our interest to have the study visit at the
beginning of the project to be able to use the gained knowledge in our further actions.
We were not sure whether we will get approval for the grant project application from the
Eurostat when we planned and attended the study visit in October. So there was the risk of not
getting refunded for the study visit.
2. Analysing and compiling data about current agreements, data
sources and data structure descriptions
In order to be able to start with the tasks of renewing cooperation agreements made before 2010
and analysing questionnaires to find variables that can be substituted with administrative data,
-
13
we started the process of analysing and compiling information about current agreements, data
sources etc.
As Statistics Estonia is currently struggling to manage the information related to administrative
data, we started the process of systemising and visualising the information we needed to
manage. It was the first task of our temporary staff.
An introductory task for the new employee was to create an overview Excel table of the data
that Statistics Estonia captures form administrative sources. The information in the table is
presented by data sets of different data sources. Each data set contains information about the
data structure, the transmission channel, the format, and the deadline for the data to be
transmitted. In addition, a brief description of the contract or data request has been provided
and also the purpose of using the data in Statistics Estonia. This task helped our new employee
to understand and see what kind of data Statistics Estonia receives from different data sources.
The basis for the overview table was already created and consisted of the list of all the registries
from whom Statistics Estonia gets data from. The first task was to add the information about
the data structure, the transmission channel, the format, the deadlines for the data to be
transmitted, a brief description of the contract or data request and the purpose of using the data.
All the necessary information was collected by searching through different documents and
information systems. The information was stored to document management system, metadata
management system, shared computer folders, Outlook mailbox and JIRA tasks. The
information has not been systematically stored or managed, so it made the task difficult for the
new employee to find and compile all the necessary information. The stored contracts and data
requests have not been always marked correctly as valid or not, so the hardest part of the task
was to make sure which of the contracts and data requests are still valid. We have new annexes
for every dataset we capture from the data owners, and new annex very often invalidates the
former annex, but not always. So it was challenging to go through all the annexes and find
currently valid ones.
In Statistics Estonia the web platform called Confluence is used to manage internal information
and to make it accessible to other colleagues. Every team has it’s own space or page in
Confluence and different overviews and guidelines can be stored and shared that way. We
decided that the overview table of different data sources also has to be visualised better and that
was the next task for the new employee.
A summary table of contracts and requests for administrative metadata was compiled to
Confluence. The overview is under Metadata team page, where the sub-page for administrative
-
14
data was created. The table contains a list of institutions and their registries which Statistics
Estonia has a contract with or from whom data is obtained through data requests. In case of a
contract, the date of signing and completion of the contract is attached to it. In addition, each
data source has the information about contact persons of the institution to whom it is possible
to turn to with data transfer issues. The compilation of the summary table gave an overview of
what existing contracts were signed before 2010 and which should be updated.
The previous task with an Excel table helped to get started with this task. The list of institutions
and their registries were taken from the Excel table and added to Confluence table. Our
employee started collecting information about contracts and data requests via local discs and
document management system called Livelink. Like in the previous task the most important
part of this task was to make sure which contracts and data requests are still valid, and also
which ones are the latest. Statistics Estonia keeps all the documents about each data source,
even the ones that are not valid anymore. The situation that all contracts and data requests were
stored in different places and were not in order made this task time-consuming. All the
information about the contracts and data requests came from inside the document. So the
employee had to read through every contract or data request file she found, in order to find the
right information for the table.
After compiling the overview table to the Confluence, we decided that we need sub-pages about
every data source. The main reason for that was, that Statistics Estonia captures many different
data sets with different deadlines from one data source or registry. Also there can be different
contact persons for different data sets and there are also different users in Statistics Estonia.
So our new employee linked new sub-pages to the Confluence overview table and these sub-
pages give the users more detailed information about the data source. Each sub-page has the
description of captured data set, deadlines, contact persons information, user information and
link to metadata management system, where the metadata of the data source is stored. In the
future we also plan to link there the information about the data warehouse tables, where the
administrative data is stored and can be accessible for the analysts of Statistics Estonia. This
would give any colleague of Statistics Estonia the full information about each dataset, which is
available for using.
-
15
2.1. Summary and difficulties encountered
Performing this task was crucial for having better overview of the administrative data related
information Statistics Estonia needs to manage. It also gave our temporary employee the needed
knowledge about data available for use to perform the analysis of questionnaires.
Main difficulties were already described above, but it is important to highlight the large amount
of information that our temporary employee had to work through and systemise. It was quite
time consuming, because different documents have been stored in different places for historical
reasons and now all this information had to compiled to visualise the existing situation.
Completing this task is a big step ahead for Statistics Estonia, because now we can understand
our needs for administrative data related information management system.
3. Analysing the questionnaires and finding variables that could
be replaced by administrative data
Statistics Estonia has 127 different questionnaires that the respondents have to fill out in order
to produce statistics. Our aim is to reduce the administrative and response burden by improving
the use of administrative sources. Although Statistics Estonia already uses about hundred
different data sources, we were still convinced that there are variables on the questionnaires that
can be replaced by administrative data. We have already about 35 questionnaires where we
prefill some variables for the respondents in order to make the answering more convenient and
less time consuming. When the grant application was written we chose some of the domains to
be analysed during the grant project and our temporary employee started the process as soon as
she had gotten the overview of our questionnaires and available administrative data.
During the period of October 2018 until March 2019, we have found new data sources for our
agricultural statistics domain, which is a really important domain in Estonia and so we decided
to include the domain to our grant project and analyse it more thoroughly.
The two domains that we are finished analysing by the submission of the intermediate report
are culture and agriculture.
First step of the analysis was to get the overview of the questionnaires and collected variables
of the culture and agriculture domains of statistical activities, and also to get the overview of
the administrative data in use.
-
16
Second step was to compare the variables of questionnaires with the administrative data already
in use to find possible new sources to replace questionnaire variables. For storing the new
information and for a better overview of which variables collected by questionnaires can be
replaced with administrative data, an Excel table was created. The Excel table contains the
questionnaire code, a specific number of the statistical activity, the name of the statistical
activity and then certain questions in the questionnaire with suggestions to replace with
administrative data.
In Estonia we have the state level administration system for the state information system called
RIHA. In RIHA every state information system needs to be registered. So actually RIHA is the
catalogue of the state’s information system, where information is stored about which data are
collected and processed and in which information systems. And also which services, including
X-Road services, are provided and who is using them.
X-Road is the backbone of e-Estonia: it is the data exchange layer that allows various public
and private sector e-service information systems to link up and function in harmony. X-Road
has developed into a tool that can write to multiple information systems, transmit large datasets
and perform searches across several information systems simultaneously. Today, X-Road is
implemented in Finland, Kyrgyzstan, Namibia, Faroe Islands, Iceland, Ukraine and other
countries. (e-Estonia, 2019)
The next logical step to find new data sources was to search the RIHA. If the information system
owner has registered and inserted all the necessary information to RIHA, it is very good source
of information for Statistics Estonia. Unfortunately, at the moment quite big part of the
information in RIHA is outdated, because it needs to be updated manually by the data owners.
But some development plans hopefully resolve this problem and keeping the information
updated in RIHA can be automated in the future.
Additionally, we searched from the Internet to find data that is already public and can also be
used by web-combing or other methodologies.
Third step was proposing to replace variables collected by questionnaires with the
administrative data. This step included face-to-face meetings with people that work on the fields
of culture and agriculture in Statistics Estonia.
-
17
Last step was planning the future activities according to the meetings held with the analytics of
cultural and agricultural statistics. In some cases we managed also to have negotiations and
meetings with the data owners to agree on the new data deliveries.
In the domain of culture we have 6 different questionnaires, which are divided to the following
statistical activities: Movie, Museum, Music, Radio and Television. We made proposals to
substitute some variables with new data sources to five different questionnaires. Our proposals
and the results are compiled in the table below.
Suggestions Outcomes
1. Data about all the Estonian movies
(movie type, name, duration) from
Estonian Film Database.
This suggestion was accepted and the next step is
to negotiate with Estonian Film Database
manager.
2. The number of museals in each
Estonian museum from Information
System of (Estonian) Museums
(MUIS).
These suggestions were accepted, but there is a
plan to rearrange some of the parts in museums’
questionnaires, so there’s actually no full
overview of what kind of data will be needed after
the questionnaires redesign.
3. The number of employees in
Estonian museums from Working
register (TÖR).
4. Music event names, types, number
of concerts, number of tickets sold,
ticket sales revenue and number of
visitors from sites that are officially
selling tickets online in Estonia (for
example Piletimaailm and Piletilevi).
This suggestion was accepted partly, because
there are multiple sites that are selling tickets
online. In addition to these online selling
companies, there are non-official sellers and also a
chance to buy concert tickets on site. So there’s no
accurate overview of how many people visited a
concert and how much was the ticket sales
revenue. However, we have signed the contract
with one of the sellers Piletimaailm and will be
receiving first dataset soon. Then our analytics can
pilot the data usability.
The information about music events names, types
and number of concerts can be found from the site
http://kultuur.info.
-
18
5. The number of employees with their
job titles in radio broadcasting stations
from Working register (TÖR).
This suggestion was accepted and as we are
capturing data from the Working register already,
the analytics just have to take the data into use.
6. The number of employees with their
job titles in television broadcasting
stations from Working register (TÖR).
This suggestion was accepted and as we are
capturing data from the Working register already,
the analytics just have to take the data into use.
In the domain of agricultural statistics we have 14 different questionnaires, which are divided
in the following statistical activities: Sown area of field crops, Purchase of livestock and
poultry, Livestock farming and meat production, Quarterly statistics of livestock farming,
Purchase and use of milk, Economic accounts for agriculture, Farm Structure Survey,
Agricultural products, Yields, Crop farming, Cereals, Dairy products, Organic farming, Supply
balance sheets of agricultural products and Agricultural products. Agricultural statistics is one
of the most important statistical domains in Estonia and also in Europe, but collecting data by
questionnaires has always been burdensome to respondents in that field. That is the reason, why
we decided to include agriculture, as one of the domains to our grant project. We started
analysing the domain in the fall and our initial analysis showed that there are still some data
sources that Statistics Estonia is not capturing and using for the agricultural statistics.
The Veterinary and Food Board was the data source we started negotiations with and as a first
step we asked them to send us some data sets for piloting the data usage. The data sets were
about slaughtered animals, production of honey and the number of pigs slaughtered at home.
Our analytics piloted the usability of the data and we compiled the data needs to start
negotiations with the Veterinary and Food Board.
Statistics Estonia’s data need was broad and we wanted to capture several data sets with
different data delivery deadlines and also it involved different analytics from our side and
different departments from the Veterinary and Food Board side. For effective discussions we
had several meetings to agree on the different data sets compositions and data delivery
deadlines.
We managed to agree on all the datasets and now we get monthly and yearly data set about
slaughtered animals. The monthly data set was immediately used for prefilling the
questionnaires. Also we now get yearly data sets about the production of honey and number of
pigs slaughtered at home.
We also had meetings with two other data owners Estonian Land Board and Agricultural Board.
Both sources are already in use in Statistics Estonia, but our data needs have widened and also
-
19
the composition of data in those registries have changed – so we need to work on new
agreements and getting access to available data.
Our proposals for agricultural statistics and the outcomes are compiled to the table below.
Suggestions Outcomes
1. The number of slaughtered animals,
the weight of edible/unedible meat from
Veterinary and Food Board.
This suggestion was accepted and the
questionnaires are prefilled with the data from
monthly data set
2. The information about honey
production in Estonia from Veterinary
and Food Board
This suggestion was accepted and we have
received the yearly data set about 2018, which
was used for pre-filling the questionnaire. The
quality of the data is very good and next year the
data will not be asked with the questionnaire -
the statistics of honey production will be based
on administrative data only.
3. The number of pigs slaughtered at
home from Veterinary and Food Board
This suggestion was accepted and we already
received the yearly data set about 2018, which
was used for additional data source for validating
questionnaire data. In the future the data will be
used to substitute the collected variables.
4. Number of people employed in the
agriculture field with their job titles
from Working register (TÖR)
This suggestion was accepted, but needs a
further methodological analysis. The data from
the Working register is captured monthly, so if
the analysis shows the compatibility of the data,
it can be used for pre-filling the questionnaires.
5. The prices of land from the Estonian
Land Board, according to new
methodology
We have still ongoing negotiations with the
Estonian Land Board to receive the land prices
data from them. They have promised to make
spatial analysis taking into account the land use
data from the Estonian Agricultural Registers
and Information Board. Now we are waiting for
the new spatial analysis by Estonian Land Board
to see if this is sufficient for our data needs.
-
20
6. Organic farming data from the
Agricultural Board
The negotiations are still ongoing, the
Agricultural Board is a very important data
source for organic farming statistics. The
information system of the Agricultural Board is
in development and we have had several
meetings to explain Statistics Estonia’s
expanding data needs. We need more detailed
data about organic farming and we are
negotiating to get our data needs to be
considered in the new information system.
7. The number of fur animals, number
of animals slaughtered for fur, number
of skins sold etc. from Veterinary and
Food Board.
Recently we got information that the Estonian
Veterinary and Food Board will start collecting
information about the fur animals. Now the
negotiations are in the process of getting to know
the data composition and possibilities to get
access to the data.
Obligations:
REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE
COUNCIL (number of bovine animals, pigs, sheep, goats and poultry slaughtered in
slaughterhouses)
REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE
COUNCIL (carcass weight of bovine animals, pigs, sheep, goats and poultry slaughtered in
slaughterhouses)
REGULATION (EC) No 138/2004 OF THE EUROPEAN PARLIAMENT AND OF THE
COUNCIL (Production account: Other animal products: others)
REGULATION (EC) No 1165/2008 OF THE EUROPEAN PARLIAMENT AND OF THE
COUNCIL (slaughtering carried out other than in slaughterhouses: pigs)
ESS Agreement on statistics of agricultural land prices and rents
COUNCIL REGULATION (EC) No 834/2007 of 28 June 2007 on organic production and
labelling of organic products and repealing Regulation (EEC) No 2092/91 and Commission
Regulation (EC) No 889/2008 of 5 September 2008 laying down detailed rules for the
-
21
implementation of Council Regulation (EC) No 834/2007 on organic production and
labelling of organic products with regard to organic production, labelling and control
REGULATION (EC) No 138/2004 OF THE EUROPEAN PARLIAMENT AND OF THE
COUNCIL (Production account: Other animal products: others)
In the domain of accommodation statistics we have 2 different questionnaires, which are
divided in the following statistical activities: Tourism and Accommodation activities. We made
proposals to substitute some variables with new data sources to only one questionnaire, because
our Tourism questionnaire only consists personal questions that can’t be replaced by
administrative data. Our proposals and the results are compiled in the table below.
Suggestions Outcomes
1. The number of beds in
accommodation facilities from
Enterprise Estonia (EAS)
The next step for us was to check the definition
of “the number of beds” that is used in the EAS
database . Is it how many beds are in total, or
how many beds had been used?
Another important step for us was to make sure
how EAS manages their database. The main
question is: Does enterprises themselves
voluntarily add information to the database?
2. Wheelchair access in accommodation
facilities from Enterprise Estonia (EAS)
Obligations:
REGULATION (EU) No 692/2011 OF THE EUROPEAN PARLIAMENT AND OF
THE COUNCIL of 6 July 2011 concerning European statistics on tourism and repealing
Council Directive 95/57/EC
Commission Implementing Regulation (EU) No 1051/2011 of 20 October 2011
implementing Regulation (EU) No 692/2011 of the European Parliament and of the
Council concerning European statistics on tourism, as regards the structure of the quality
reports and the transmission of the data
In the domain of energy statistics we have 4 different questionnaires, which are divided in the
following statistical activities: Electric power stations; Energy; Energy production, sales and
-
22
fuel consumption; Consumption of fuel and energy. We made proposals to substitute some
variables with new data sources to only one questionnaire, which is “Energy”. Our proposals
and the results are compiled in the table below.
Suggestions Outcomes
1. Data of produced, purchased and sold
electricity in Estonia from Elering.
Statistics Estonia is already receiving some
data from Elering. Our next step is to check
if Elering can give us necessary data
monthly.
5. Data of the fuel used for freight transport
from Estonian Road Administration.
SE is already using some of the data from
the Estonian Road Administration. Next step
is to check if we could also use the data
from the yearly car reviews. That would
enable us to find out the fuel usage of the
freight transport.
Obligation:
Regulation (EC) No 1099/2008 of the European Parliament and of the Council of 22
October 2008 on energy statistics
In the domain of transportation statistics we have 23 different questionnaires, which are divided
in the following statistical activities: Gas pipelines, Freight transport through ports, Freight
transport on the road, Ships in the harbor, Ship traffic, Ship-based economic and social
indicators, Ship registers, Marine accidents, Shipping-unloading, Air traffic, Flight accidents,
Traffic Register, Road transport, Sea transportation, International travel through ports, Railway
and rolling stock, Rail transport, Inland waterway transport, Vehicle registration, Tram-troll,
Tram and trolley transport, Aircraft Register, Air transport. We made proposals to substitute
some variables with new data sources to 3 questionnaires. Our proposals and the results are
compiled in the table below.
-
23
Suggestions Outcomes
1. Number of air passengers, goods and mail
transported by air from Tallinn Airport
website "Air Traffic Review"
Our next step is to check, if Tallinn Airport
is willing to give us microdata about the
passengers, goods and mail.
2. The number of civil aircrafts from the
Estonian Civil Aviation Authority’s
(ECAA) website.
Our next step is to make sure how and who
is updating the website? And also how to
ensure that the website has relevant data.
3. Data about the trucks (total weight,
number of axles of the truck, type of
bodywork, type of engine) from Estonian
Road Administration.
SE is already using some of the data from
the Estonian Road Administration. Next step
is to check if we could also use the data
from the yearly car reviews.
Obligations:
Regulation (EU) No 70/2012 of the European Parliament and of the Council of 18
January 2012 on statistical returns in respect of the carriage of goods by road
Commission Regulation (EU) No 202/2010 of 10 March 2010 amending Regulation
(EC) No 6/2003 concerning the dissemination of statistics on the carriage of goods by
road
Commission Regulation (EC) No 1304/2007 of 7 November 2007 amending Council
Directive 95/64/EC, Council Regulation (EC) No 1172/98, Regulations (EC) No
91/2003 and (EC) No 1365/2006 of the European Parliament and of the Council with
respect to the establishment of NST 2007 as the unique classification for transported
goods in certain transport modes
Commission Regulation (EC) No 833/2007 of 16 July 2007 ending the transitional
period provided for in Council Regulation (EC) No 1172/98 on statistical returns in
respect of the carriage of goods by road
Commission Regulation (EC) No 642/2004 of 6 April 2004 on precision requirements
for data collected in accordance with Council Regulation (EC) No 1172/98 on statistical
returns in respect of the carriage of goods by road
Commission Regulation (EC) No 6/2003 of 30 December 2002 concerning the
dissemination of statistics on the carriage of goods by road
-
24
Commission Regulation (EC) No 2163/2001 of 7 November 2001 concerning the
technical arrangements for data transmission for statistics on the carriage of goods by
road
Commission Regulation (EU) No 520/2010 of 16 June 2010 amending Regulation (EC)
No 831/2002 concerning access to confidential data for scientific purposes as regards
the available surveys and statistical data sources
Directive 2009/42/EC of the European Parliament and of the Council of 6 May 2009 on
statistical returns in respect of carriage of goods and passengers by sea (Recast)
Commission Regulation (EC) No 1304/2007 of 7 November 2007 amending Council
Directive 95/64/EC, Council Regulation (EC) No 1172/98, Regulations (EC) No
91/2003 and (EC) No 1365/2006 of the European Parliament and of the Council with
respect to the establishment of NST 2007 as the unique classification for transported
goods in certain transport modes
2010/216/: Commission Decision of 14 April 2010 amending Directive 2009/42/EC of
the European Parliament and of the Council on statistical returns in respect of carriage
of goods and passengers by sea
Commission delegated decision of 3 February 2012 amending Directive 2009/42/EC of
the European Parliament and of the Council on statistical returns in respect of carriage
of goods and passengers by sea
Regulation (EC) No 437/2003 of the European Parliament and of the Council of 27
February 2003 on statistical returns in respect of the carriage of passengers, freight and
mail by air
Commission Regulation (EC) No 158/2007 of 16 February 2007 amending Commission
Regulation (EC) No 1358/2003 as regards the list of Community airports
UNECE, ITF and Eurostat Common Questionnaire for Transport Statistics Gentlemen's
Agreement
Commission Regulation (EC) No 546/2005 of 8 April 2005 adapting Regulation (EC)
No 437/2003 of the European Parliament and of the Council as regards the allocation of
reporting-country codes and amending Commission Regulation (EC) No 1358/2003 as
regards the updating of the list of Community airports
Commission Regulation (EC) No 1358/2003 of 31 July 2003 implementing Regulation
(EC) No 437/2003 of the European Parliament and of the Council on statistical returns
-
25
in respect of the carriage of passengers, freight and mail by air and amending Annexes
I and II thereto
In the domain of IT, research and development statistics we have 5 different questionnaires,
which are divided in the following statistical activities: IT in the company, IT in the household,
Business Innovation Survey, Research and development, Research and Development (in the
company). We made proposals to substitute some variables with new data sources to 2
questionnaires. Our proposals and the results are compiled in the table below.
Suggestions Outcomes
1. The number of employees in the research
and development field with their scientific
field, age and gender from Working Register
(TÖR)
This suggestion was accepted partly. The
information that TÖR has about the
employees in the research and development
field is not matching with the definitions
that specific questionnaires have.
But, TÖR can be used for checking the data
collected by questionnaire.
2. The number of Information and
Communication Technology specialists in a
company from Working Register (TÖR)
This suggestion was accepted partly.
Initially, TÖR could be used for checking
the data collected by questionnaire, and if
TÖR’s quality gets better, we might be able
to fully use it.
Obligations:
Regulation (EC) No 808/2004 of the European Parliament and of the Council of 21
April 2004 concerning Community statistics on the information society
Commission Regulation (EC) No 753/2004 of 22 April 2004 implementing Decision
No 1608/2003/EC of the European Parliament and of the Council as regards statistics
on science and technology
Commission Implementing Regulation (EU) No 995/2012 of 26 October 2012 laying
down detailed rules for the implementation of Decision No 1608/2003/EC of the
European Parliament and of the Council concerning the production and development of
Community statistics on science and technology
-
26
3.1. Summary and encountered difficulties
The completion of this task was really challenging for us, because our temporary employee had
to work through a lot of information. However, we managed to analyse the questionnaires and
available data sources of culture, agriculture, accommodation, transportation, IT research and
development statistics and now we have the overview of the step by step processes that need to
be done in order to find new sources or new use cases for the administrative data already in use.
Some of our proposals were easily applicable, but some of the suggestions need further analysis
from the statistical domain experts.
In the field of culture we had six proposals. The proposals 2 and 3 are waiting for the redesign
of the questionnaire Museum and the redesign process will not be finished before 2021.
Regarding the proposal 1 to use data form the Estonian Film Database, we have already started
the negotiation process and drawn the draft cooperation agreement. Hopefully it will be signed
this year and next year we can start using the data.
The proposal 4 is already partly in production. We are currently receiving data from
Piletimaailm, but this company is not the only seller of culture events tickets in Estonia. So for
more complete data, we have started the negotiations with the other company Piletilevi.
However, the negotiations with the private sector companies are time consuming and we are
not sure when we will be able to receive data from Piletilevi.
The proposals 5 and 6 are about using the Working register (TÖR) data. As Working register
is a quite new register in Estonia, the data is still quite incomplete as regards of job titles.
However, we are expecting the completeness to get better by the end of this year and then it
will be able to use the data across all statistical domains.
In the field of agriculture we had seven proposals. Proposals 1, 2 and 3 are already in
production. The proposal 4 was also about using the Working register and it has to wait for
better data completeness and analysis form statistical domain experts.
We already received the first dataset form Estonian Land Board according to new methodology,
but the usability has to be analysed further and maybe we still need to process the data more,
before it can be used directly in our statistical production process.
-
27
Proposal 6 to receive further information on organic farming is still in the draft agreement
format. We have compiled our data needs and explained them to the Agricultural Board, but as
their information system is still in development, we have not been able to receive the data or
sign the new agreement yet. Hopefully we will be able to sign the agreement and get first
datasets at the beginning of 2020.
Proposal 7 is not in production yet, because we have not received the confirmation from
Veterinary and Food Board that they have data about fur animals. Our next step is to arrange
the meeting with the data owner and clarify our data needs.
In the field of accommodation statistics we had two proposals to start using data from Enterprise
Estonia. Our next step is to find out, how reliable is the information in this database. We
currently have information that the enterprises insert the information there themselves
voluntarily and that means the data completeness may not be that good.
In the field of energy statistics we also had two proposals. Proposal number 1 is about using
monthly data from Elering. Currently we are receiving data from Elering once a year and since
Elering is a private sector company the negotiations for more frequent data capturing will take
time. We have planned to have a meeting with them to discuss whether it would be possible to
start capturing monthly data in automated way for example using x-road.
Proposal 2 is about using more data from Estonian Road Administration. We are in the
negotiations process to renew our data delivery agreement and automate the data capturing from
the Estonian Road Administration. However, the negotiations are taking some time, because
the information systems of Estonian Road Administration are in the development process. We
are finalizing the draft agreement with our data needs and we hope to renew the agreement
during next year.
In the field of transport statistics we had three proposals. The proposal number 1 involves
getting microdata from Tallinn Airport. Unfortunately, their first answer was negative, because
they consider giving microdata to third parties as a security risk. At the moment it is still unclear
whether we would be able to justify our data needs legally and prove our data protection rules
will ensure that it is safe to send data to Statistics Estonia.
Proposal number 2 was about using data from Estonian Civil Aviation Authority’s website. Our
next step is to find out, how the renewal of the website is organised. For that we have to contact
-
28
the authority responsible for the website, hopefully we will get some answers by the end of this
year and then can decide whether the proposal can be realised in the production process.
Proposal number 3 was also about using additional data from Estonian Road Administration
and that will have to wait until the negotiations and renewal of the data delivery agreement have
been finished.
In the field of IT, research and development we had two proposals, both were about using the
data from Working register. We will wait until the end of this year to analyse the completeness
and quality of the register data and then can decide how different statistical domains can use
the data in their statistical activities.
Main difficulties of performing this task was going through huge amount of information and
trying to find new solutions and sources for the questionnaire-based statistics. Statistics Estonia
is aiming to use more administrative data and analysing questionnaires domainwise is
innovative approach for us that has not been done before because the lack of the human
resources.
4. Mapping management processes of administrative data and
metadata in Statistics Estonia
Statistics Estonia’s goal is to produce high quality statistics as efficiently as possible. Efficient
production is possible if we improve and widen our administrative data use. Wider use of
administrative data also reduces administrative and response burden. Statistics Estonia is
already using over one hundred administrative sources. However, it has become challenging to
manage all the information related to administrative data sources, for example information
about cooperation agreements, deadlines, process phases etc.
During the project we have started to analyse and map the processes of managing administrative
data and metadata in Statistics Estonia. The first task was to map the “as is” process. Below is
the result of the mapping of “as is” processes.
-
Figure 1. As-is process of managing administrative data and metadata in Statistics Estonia
-
This process map covers the process of using and managing administrative data from the first
phase where the data need is identified to the actual usage of the data in statistical production.
The project map involves five different departments of Statistics Estonia and the process goes
through the GSBPM phases Specify Needs, Design, Build, Collect, Process and Metadata
Management/Quality Management.
The central role in this process map has the Statistics Design Department (ATAO). The
Statistics Design Department was created in 2017 and since then it has the central role of
managing administrative data and metadata. The metadata management has been centralised in
Statistics Estonia Methodology Department since 2004 and managing and capturing
administrative data was formerly the responsibility of Data Warehouse Department. But as
Statistics Estonia has started using administrative data more and aims to create and develop
closer partnerships with the data owners, the management of metadata and administrative data
was decided to centralise to the Metadata team in the Statistics Design Department.
The process map above describes the processes after the creation of Statistics Design
Department. We are working on optimizing the processes of managing administrative data, it
means we want to provide the data more efficiently and in more standardized way for the
statistical production.
Below is the result of mapping the “to be” processes. For better understanding we split
administrative data management process. Our aim is to simplify the usage and analysis of
administrative data for the statistical departments and also to shorten the time of getting access
to new data. Figure 2 shows the process of managing new or changed data need. Metadata team
has the central role in this process and the process goes through the Specify Needs (1) and
Design (2) phase of GSBPM. After getting input from Analysts, the Design phase is carried out
by the Methodologists in Metadata team. The Design phase for administrative data includes
defining the variables that need to be captured from the data source, compiling information for
the data request or contract and preparing the data requests and contracts. In this phase, most of
the communication and negotiation with data owners takes place. The administrative data
manager’s role in this phase is similar to that of an intermediary or a “translator” – it is important
to define the data needs as clearly as possible.
Administrative data management in the Design phase includes describing metadata for
administrative data centrally, in cooperation with the owners of registers and statistical domain
departments.
-
31
The wide use of administrative data in SE has produced a lot of information related to data
sources. For example, information about cooperation agreements, data requests, data delivery
deadlines, data structures, formats, additional information about data, communication with data
owners, process phases, etc.
The deadlines for data transmission in SE are currently managed and visualised in the web
application JIRA. JIRA enables to monitor the process of data deliveries, data loading,
processing, etc. There are different tasks for every data delivery, and every task and subtask
can be assigned to a different person. Whenever problems or obstacles arise in some process
phase, the questions and answers are inserted in JIRA as comments. This enables to get an
overview of the workflow related to the specific dataset.
-
Figure 2. To-be process of agreements with data owners and managing administrative data and metadata
-
Figure 3 shows the data capturing process that ends with the making the data available for
analysts. This process goes through the Build (3) and Collect (4) phases of GSBPM.
Build and Collect phases for administrative data are the responsibility of the Data Service
Department. In these phases, pre-processing the data and making them available to the NSI’s
in-house applications is the role of administrative data managers. It is ensured through these
procedures that there are no duplicate data and that the data are ready for statistical analysis.
Administrative data are captured through different channels:
1) encrypted .csv or .xls(x) files by e-mail, FTP or cloud services;
2) X-Road services that are divided into:
• pull services – the data owner has developed an X-Road service the content of which is
suitable for SE. The data are pulled to SE through the X-Road service.
• push services to xGate – the data are pushed to SE through our xGate service. This is
the preferred channel for data capture, because SE validates the received data against XSD, and
the data delivery process is controlled by SE.
When administrative data have been captured through different channels, the loading processes
begin. The first step is loading the data to the Initial Observation Registry (IOR). When the data
are sent by .csv or .xls(x) files, the data will be loaded to Oracle database as they arrive. Loading
and processing the data that has been sent with files is time-consuming for us, because there are
constant problems with agreed data structures and wrong data formats.
When data are captured by X-Road pull services, the XML file is parsed to the IOR by Oracle
tools. When data are captured by xGate, the file is parsed and validated against the XSD file
generated in the iMeta system. After loading the data to IOR, it is possible to give the first
feedback about the received data. The captured data are unloadable if the formats are incorrect
or there are missing variables.
The next step is Data Staging Area (DSA), where data structure checks and conversions to
correct formats take place. These checks and conversions are done according to the metadata
descriptions in iMeta. It is also possible to develop more contextual checks, but for this, the
input for the rules is needed from statistical domain departments. After DSA, it is possible to
automatically generate a quality report about the delivered dataset.
-
34
The last step is to make the data available for users, which means that the data are loaded to
Final Observation Registry (FOR) and are pseudonymised if the data include personal data. The
process of pseudonymisation involves removing personal identification numbers, names and
contacts from the data. PIN-numbers are replaced with unique identifiers that allow the data to
be joined. The unique in-house identifiers are not derived from PIN-numbers, which means that
it is not possible to convert the unique identifiers mathematically to PIN-numbers.
The data are stored and versioned in Oracle databases, which are available for use to statistical
domain departments through SAS or R.
-
Figure 3. To-be process of data capturing and making it available to users
-
4.1. Summary and difficulties encountered
After the creation of Statistics Design Department the process of managing administrative data
changed already. However, our goal is to redesign the processes to provide administrative data
for statistical departments more efficiently and in the standardised way.
The main difficulty of mapping the current process was related to the fact that many
departments are involved in this process. This also makes optimizing the processes challenging,
because every step of the process has to be analysed thoroughly in order to find the solutions
of how to simplify the process and shorten the time used for different project steps.
For having better understanding how to make our administrative data management more
efficient, it was very helpful to read the document “Good practices in accessing, using and
contributing to the management of administrative data” (Eurostat, 2018). The main advantage
of this document is the compilation of experiences of different NSI’s. It is assuring to know that
other statistical offices are on the same path and we are all moving towards better partnerships
and administrative management processes. This document also gives an idea which are the
countries we could learn from and ask for guidance.
To-be processes were mapped with as much detail as possible. That enables us to monitor the
processes and make changes, if necessary.
Our next step is to create description of each process step and document how, who and what is
done in every stage of the process. The goal is to create written instructions in order to make
workflow more smooth and to enable new team members to know what to do more easily.
5. Creating vision document on how to give feedback to the data
owners about data transmission deadlines and agreed data
structures
One part of optimising and standardising the processes of managing administrative data related
information, is the automation of different notifications and feedbacks.
Currently we are sending e-mails prior to data transmission deadlines manually and only to
those data owners, who tend to forget their data deliveries.
-
37
At the moment Statistics Estonia does not have an information system for automated data
structure checks and for monitoring data transmission deadlines of administrative sources. We
are in the progress of working out the vision document on how to give feedback to the data
owners about data transmission deadlines and agreed data structures.
We have analysed what type of information we need to manage in the information system – this
includes the deadlines of data deliveries, related contacts and contract information and also the
information about data structures, formats and metadata.
We have also analysed different information systems that are already in use in our statistical
production process and there are some information systems that could be developed further to
provide some of the functionality needed for managing different information and send out
automatic notifications.
If the compliance with the agreed data structures and metadata would be checked automatically,
then we also could generate the quality report for sending the feedback to the data owners.
The analysis of our current information systems showed that we would need to develop new
information system to enable automated checks and feedback.
SE has created a vision document to develop new information system Administrative Data Gate.
It will help automate the administrative data management in Design, Build and Collect process
phases.
The main functionalities of the Administrative Data Gate are:
• Monitoring data deliveries and sending automated feedback and reminders to data
holders.
• Reading metadata from SE’s metadata management system and checking delivered data
against the agreed structures and content.
• Functionality to convert data to formats or structures needed by statistical domain
departments.
• Administrative Data Gate will allow to log and monitor every procedure that is done
with the specific dataset.
• Dashboard with main operations visible for users.
The Administrative Data Gate would actually become the one channel, where all the
administrative data goes through, as it is shown in Figure 4. The input data can come in different
formats (csv, txt, xls, ods, xml, json) or from different channels (x-road push/pull services, e-
mails), but all the data is guided through the Adminstrative Data Gate, where automated data
checking and corrections are done.
-
38
After the data checking, the quality feedback report is generated and sent to the data owner.
The quality feedback report’s content is not clear yet, but it will definitely contain information
about data structures and data formats compliance.
Figure 4. Dataflow through Administrative Data Gate
5.1. Summary and encountered difficulties
We have analysed our needs and have the overview of the functionality that is needed to manage
administrative data related information efficiently and also to run automated controls on
delivered data sets.
However, it has been difficult to decide whether we need to develop new information system
to provide the needed functionalities or can some of our used applications developed to fulfil
the needs. The analysis for this showed, that we need to develop new information system.
Now the challenge is to find financial and human resources to start the development process of
the Administrative Data Gate. Statistics Estonia has already applied for financial support from
the SF funds, but the feedback for the application has not arrived yet. So the timeline for the
development process is still unknown.
-
39
6. Describing metadata for the data sources whose cooperation
agreements are renewed in the metadata system
Statistics Estonia is using about one hundred different administrative data sources in our
statistical production process. Describing and harmonising the metadata for administrative data
is time consuming, because there are several metaobjects in our metadatadata management
system iMeta that have to be defined in order to fully document the captured data.
We are in the process of describing all the metadata for received administrative data, but during
this grant project we will concentrate on describing and standardising the metadata of those
data sources, whose cooperation agreements are signed before 2010.
We have done preparations for renewing the data delivery agreements and some of the metadata
is already described in our metadata management system.
The metadata description process involves also the data owners and analytics from statistical
departments. The steps for describing the metadata for administrative data are following:
• analysing already received data and adding variable descriptions, classifications and
code lists to our metadata management system;
• describing the rest of metadata related to the first sub-task according to Neuchâtel
terminology model (conceptual variables, statistical characteristics, statistical unit types);
• cooperating with the leaders of the statistical activities to describe and harmonize
metadata efficiently;
• describing metadata in the metadata system for additional data needs and giving the
input for cooperation agreements renewal process.
The Neuchâtel terminology model (Neuchâtel Group, 2004), has been used for describing the
variables in our metadata management system. In this model, the variables are described in
three levels – conceptual variable, statistical characteristic (object variable) and contextual
variable. Statistical unit type is an entity for which information is sought and for which statistics
are ultimately compiled. Statistical characteristic is a characteristic of a statistical unit type.
Conceptual variable (concept) provides a general description of the meaning of the statistical
characteristic without explicit reference to any particular statistical unit type. Contextual
variable describes the variable in the context of a statistical activity. Contextual variables can
be defined as register variables or cube variables.
-
40
Our goal was to describe and harmonise all the metadata of those administrative data sources,
whose cooperation agreement was signed before 2010.
So we started out with describing and harmonising all necessary metadata objects for:
Estonian Tax and Customs Board
National Institute for Health Development
Estonian Land Board
Agricultural Board
Agricultural Research Center
The Estonian Tax and Customs Board is a very important data source for us. They are the
owners of several state registers, and SE captures 80 different datasets from them every year.
The frequency of data capture varies from once a day to once a year. For this source we had to
describe and harmonise 483 different contextual variables and also all the corresponding
metadata objects. There were quite many variables that had to be specified with the data owners,
because the forms of tax and customs declarations are constantly changing and for the
contextual description of metadata, we had to be sure to understand each variable thoroughly.
The National Institute for Health Development is the source for death and birth statistics for
SE. From this source we capture 147 different variables. The content of those variables was
quite clear for us and it was not too troublesome to describe them in our metadata management
system. Unfortunately we found out, that National Institute for Health Development is starting
major developments in their information systems in order to unite different smaller registers
into one big register. That means we have to be ready for changes in data content and also revise
our metadata descriptions, when the development has taken place.
The Estonian Land Board has always been good cooperation partner for Statistics Estonia. They
are the owners of Address Data System, that enables all the registers to exchange address data
in harmonised way. For this source we had to describe 162 variables and corresponding
metadata objects. As we started to prepare the new data delivery contract and review our current
data needs, we also discovered that due to some changes in Estonian legislation the Estonian
Land Board does not collect some of the variables that are needed in our statistical production
process from the start of 2019. That means our analysts have to change the methodology of
their statistical activities.
-
41
For agricultural statistics, one of the most important source is the Agricultural Board. At the
moment, we capture data once a year, but with signing the new data delivery agreement we
would like to start capturing data twice a year. We described 88 variables and corresponding
metadata objects for Agricultural Board, some of those variables are still in draft version until
our negotiation process to renew the agreement is finalised. However, we have had several very
useful meetings with the source and also were able to incorporate the available data more
efficiently in our statistical production process.
Agricultural Research Center is also an important source for agricultural statistics. Hopefully,
we will start receiving twenty data sets and 63 variables from that source. As the negotiations
for new data delivery agreement are still in progress, also the metadata description is in draft
version. We are ready to change or supplement our current metadata descriptions, when the
agreement is finalised.
6.1. Summary and difficulties encountered
The main difficulty of performing this task is understanding the conceptual meaning of the data
correctly. For standardising and harmonising metadata of administrative data and documenting
it in our metadata management system iMeta, we needed to involve the data owners and also
the data users from our statistical departments.
There are two sources, Agricultural Board and Agricultural Research Center, whose metadata
descriptions are partly in draft version. That means that we have done all necessary preparations
for describing them, but they are not published in our metadata management system yet. We
are waiting to finalise the negotiations to renew the data delivery agreements and then can
publish also the metadata descriptions.
So although the data descriptions are done and managed centrally in Statistics Estonia, there
are still other parties to the process, whose knowledge had to be considered. This means that
the process is time consuming and some meetings for agreeing on data definitions have to be
conducted.
-
42
7. Renewing cooperation agreements made with data owners
before the year 2010
During the grant project we plan to renew the cooperation agreements which are in force and
signed before 2010. It is important, because before 2010 Statistics Estonia used a different
contract format, which did not specify for example the delivered data structure. We are now
moving towards automated data capturing and controlling systems, so it is really important to
agree on specific data structures, formats and metadata.
Our analysis of data delivery contracts showed that we need to renew our contracts with five
different institutions. And almost all those institutions own several registries from where
Statistics Estonia captures different data sets.
We have started preparing new agreements with:
Estonian Tax and Customs Board
National Institute for Health Development
Estonian Land Board
Agricultural Board
Agricultural Research Center
It is important to use the new contract format where the main part of the contract is updated and
also the annex for detailed data compositions. The main part of the new contract format consists
of:
1. General information (details of the parties and the purpose of the contract);
2. List of contract’s documents (annexes to the agreement are mentioned if any);
3. Object of the contract (content of the contract, explanation of the concept “data” and the
method of transmission);
4. Rights and obligations for the parties (a list of rights and obligations that all parties need
to follow);
5. Confidentiality (the confidentiality obligation for the parties is stated);
6. Contract performance obligations (consists following information: data transmission is
at no cost, but the costs of performance of the contract shall be borne by each party from
its budget);
7. Force majeure (a list of situations which obstruct the continuation or lawful existence
of a contract amidst the parties);
-
43
8. Modification, completion and termination of the contract (consists information about
the rules for modification, completion and termination of the contract to all parties);
9. Solving arguments (how the disputes arising from performance of the contract shall be
resolved);
10. Other terms;
11. Contact information.
New annex(es) include the composition of the data at the variable level and contact persons for
the transmission of data.
The renewing process included describing metadata for the captured datasets, because in the
annexes we always define the data composition in detailed level.
Estonian Tax and Customs Board is a major data source for us. The data delivery agreement
with them is in force since 2007. Since that SE’s data needs have grown and also quite many
changes in the registers of Tax and Customs Board have taken place. It was absolutely essential
to renew the cooperation agreement. For that we started the preparations from mapping the
actual data needs of SE. For every data set we had meetings with the analysts who need the data
and specified the data content. From those meetings we gathered questions and information that
needed to be negotiated with the data source.
Some of the negotiations with the Tax and Customs Board took place via e-mails and phone
calls. However, it is always more efficient to have the necessary persons around one table to
agree on something.
The data content negotiations needed the involvement of the subject matter experts from both
sides. As Statistics Estonia is using 80 different data sets from Tax and Customs Board, we had
to arrange several meetings to specify the data content.
There were also separate meeting with the lawyers of both parties. Statistics Estonia has worked
out the standard data delivery agreement. However, the Tax and Customs Board has their own
standard agreements for data exchange. So, it was necessary to address legal issues and work
out the agreement that suits both institutions. The legal negotiations were also successful and
we managed to sign the new data delivery agreement with the Tax and Customs Board in May
2019.
The National Institute for Health Development is an important source for population and social
statistics. With that institutions we have two separate data delivery agreements – one for each
-
44
register. Our goal is to have only one agreement with the National Institute for Health
Development that cover the birth and death data. At the mo