essnet big data - europa€¦ · data sources are and what the benefits and challenges of using big...
TRANSCRIPT
1
ESSnet Big Data
S p e c i f i c G r a n t A g r e e m e n t N o 1 ( S G A - 1 )
h t t p s : / / w e b g a t e . e c . e u r o p a . e u / f p f i s / m w i k i s / e s s n e t b i g d a t a
h t t p : / / w w w . c r o s - p o r t a l . e u / . . . . . . . . .
Framework Partnership Agreement Number 11104.2015.006-2015.720
Specific Grant Agreement Number 11104.2015.007-2016.085
W o rk P a c ka ge 4
A IS da ta
Mi l esto ne 4 . 1
P ro gress a nd t ec hni c a l repo r t o f f i r s t i nter na l W P - meet i n g
Version 2016-07-13
ESSnet co-ordinator:
Peter Struijs (CBS, Netherlands)
telephone : +31 45 570 7441
mobile phone : +31 6 5248 7775
Prepared by: Anke Consten (CBS, The Netherlands)
Olav Grøndal (SD, Denmark) Maarten Pouwels (CBS, Netherlands)
Marco Puts (CBS, Netherlands) Christina Pierrakou (ELSTAT, Greece)
Oyvind Langsrud (SSB, Norway) Dominik Rozkrut (GUS, Poland)
2
Index Participants first internal WP-meeting ......................................................................................................................... 3
Monday the 9th of May ................................................................................................................................................. 3
1. Big data at Statistics Netherlands ..................................................................................................................... 3
2. Experiences on analysing AIS data on national level for Inland Waterway transport ..................................... 3
3. Methodology on the AIS based statistics in Norway ........................................................................................ 4
4. Introduction to Sandbox and SPARK cluster .................................................................................................... 4
5. Introduction of AISLIB ....................................................................................................................................... 5
6. Alternatives for getting AIS data on European level ........................................................................................ 6
7. Data Handling ................................................................................................................................................... 7
Tuesday the 10th of May (till lunch) .............................................................................................................................. 8
Project matters ......................................................................................................................................................... 8
Concrete actions from this meeting ............................................................................................................................. 9
Decisions made so far ................................................................................................................................................... 9
ANNEX 1: use and integration of administrative records .......................................................................................... 11
3
Participants first internal WP-meeting
Christina Pierrakou √ Maarten Pouwels √
Dominik Rozkrut √ Øyvind Langsrud √
Marco Puts √ Olav Grøndal √
Anke Consten (Report) √
Monday the 9th of May
1. Big data at Statistics Netherlands By Marco Puts (data scientist at Statistics Netherlands)
Marco gives an introduction to the experiences on Big Data1 at statistics Netherlands. First he explained what big
data sources are and what the benefits and challenges of using big data sources are. Marco also told about the
pilot for possibilities of using mobile phone data for statistics about tourism and daytime population. He also told
about making traffic intensity statistics by using data of road sensors.
2. Experiences on analysing AIS data on national level for Inland Waterway
transport By Maarten Pouwels (expert on Maritime statistics at Statistics Netherlands)
Maarten shared his experiences with using Automatic Identification System (AIS) data on national level for the
statistics of Inland Waterway transport. AIS is a unique program that provides means for ships to electronically
broadcast ship data at regular intervals including: vessel identification, position, course and speed. The
consistency and wealth of AIS data provides a unique opportunity for traffic data statistics, leading to significantly
more refined data of activities and estimations of emissions of vessels. It’s difficult to specify the time that
vessels lay still. Olav recognises this problem for ferries and oil tankers in Denmark as well. It’s good to share
experiences on the used algorithms for all participants in WP4. Maarten's results show us that AIS data on
national level can be used for statistics of Inland Waterway transport. The data can be used for following ships in
Inland waterways, calculating the number/intensity of ships and visualising variations in time. We are glad with
the results so far, but some fine-tuning has to take place on the used algorithms. The experiences of Maarten by
using AIS data on national level are very useful input for using AIS data on European level, for example on:
Tools for visualising variations in time
Learning which information is available in which AIS-message and which information in AIS-messages can be used.
1 According to outcomes of the UNECE Project on Using Big Data of Official Statistics, the term Big Data describes large
volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.
4
3. Methodology on the AIS based statistics in Norway by Øyvind Langsrud (Maritime expert at statistics Norway)
Øyvind tells us about what he knows so far from using AIS data on national level at a governmental institution
called Kystverket (The Norwegian Coastal Administration). Øyvind shows us a website: www.havbase.no
This website is made and maintained at Kystverket. There are a lot of possibilities on the website, for example you
can choose:
A line, type of vessel and period in time and you will see a graph and table of the amount of vessels passing the selected line by type of vessel.
A harbour and period in time and you will see a graph and table of the amount and types of vessels by terminal for the chosen harbour.
A part of the sea and period in time and you will see the emissions in the selected area and period in time.
Showing routes for different vessels.
The data of Kystverket can be used later in this project for validating our results on calculating the amount of
ships at certain coordinates (task 3 traffic analyses) and estimating emissions (SGA-2). It’s also very useful for WP4
to learn about the used methodology by Kystverket.
Statistics Norway wants to use the data of Kystverket for improving their statistics about fuel consumption and
emissions. The website (www.havbase.no) will be improved by making an English version of it. Improvements will
also be made in order to report statistics about emissions to the United Nations.
4. Introduction to Sandbox and SPARK cluster By Marco Puts (data scientist at statistics Netherlands)
Marco gave an introduction to SPARK and HADOOP. The problem with Big Data is that you have to process very
large files with very small processors in computers. The Hadoop Distributed File System (HDFS) is a distributed file
system designed to run on commodity hardware. When one server goes down you will still have all files available.
5
After each step Hadoop writes the results to the harddisk. This takes time. The advantage of using SPARK is that
SPARK keeps all the processing as long as possible in the memory of the computer and only writes the final results
to the hard disk. This saves time. This introduction in SPARK and HADOOP is very useful, because we have to use
one of these tools for processing and analysing the AIS data on European level in the UNECE Sandbox.
Big Data Sandbox
5. Introduction of AISLIB By Olav Grøndal (Statistics Denmark)
Olav gave a presentation about AISLIB and how Statistics Denmark structures their data. AISLIB is an AIS library
which is maintained by the Danish Maritime Authority and used for decoding AIS messages on national level in
Denmark. The AIS data is invoked in Java code and writes the data into a database. It’s also possible to write the
results of decoding into a zip-file. This code also deletes a part of the duplicated records in the Danish AIS data.
This presentation is very useful in making a good choice in the tooling we are going to use in WP4 for decoding
the AIS data on European level.
6
6. Alternatives for getting AIS data on European level
We all discussed the possibilities of getting AIS data on European level. According to article 17a of the Reg. (EC)
No 223/2009 on European statistics, as amended by Regulation (EU) 2015/759, NSIs, ONAs and Eurostat have
the right to access and use, promptly and free of charge, all administrative records and to integrate those
administrative records with statistics, to the extent necessary for the development, production and dissemination
of European statistics, which are determined in the European Statistical Programme in accordance with Article 1.
Moreover, the NSIs and Eurostat should be consulted on, and involved in, the initial design, subsequent
development and discontinuation of administrative records built up and maintained by others, thus facilitating
the further use of those records for the purpose of producing European statistics. They should be involved in the
standardisation activities concerning administrative records that are relevant for the production of European
statistics.
For the pilot in WP4 we have already received 6 months (8 October 2015 till 12 April 2016) of AIS data from
Dirkzwager. This data will be stored and analysed in the SANDBOX of the UNECE, so it will be accessible for all the
members of WP4. For the future we also look for other possibilities of getting AIS data on European level,
according to the above mentioned Regulation. We decide to explore the following sources and compare them on
availability and costs:
European Maritime Safety Agency (EMSA) This data is free, but we expect that delivering AIS data by EMSA will take a while. There are also some difficulties because EMSA prescribes that we aren’t allowed to keep the data. After using the data we have to delete it. Hopefully we manage to get this free data for the future. Eurostat can play an important role in this.
7
Other alternatives
We also investigate the possibilities and costs of getting AIS data on European level on a regular base
from Dirkzwager, the Hellenic Coast Guard, Kystverket (The Norwegian Coastal Administration) and
Marine Traffic.
Note that we already have an answer from Kystverket. They have discussed the topic with
their legal experts and found out that sharing data from other European countries goes
beyond their authorization. They believe that each NSI must go to the AIS owner in their own
country or buy AIS data commercially from a market player such as DNV, Marine Traffic or
similar.
National data If European data are too costly or too hard to obtain, we will also investigate how national datasets can be obtained. For the Netherlands, Norway, Denmark, Greece and Poland national datasets are available or can be obtained.
The results of investigating these alternative sources will be described in deliverable 1.
7. Data Handling
We discussed which programming language and environment we should use for transformation of the 6 months
of AIS data on European level we got from Dirkzwager in this pilot. We also decided that we will store, process
and analyse the obtained data in the SANDBOX of the UNECE, so it will be accessible for all the members of WP4.
The outcome of our discussion about used programming language and environment is summarised in the scheme
beneath. In deliverable 1 we will describe the outcome of this discussion in more detail and we also will explain
why we did choose a specific tool. We also describe the possibilities of creating a JDBC connection in deliverable
1.
8
Tuesday the 10th of May (till lunch)
Project matters The last half day of our meeting we spent on arranging some project matters.
We decided that the second face to face meeting of WP4 will take place in Copenhagen on the 29th en 30th of
September 2016.
We talked about the project plan. Important question of Christina was what we mean in the project plan by: “AIS
data on a European level”. We defined it as AIS data on ships in European waters. But a better definition in this
pilot would be: “AIS data from whatever base stations in Europe catch.”
We also developed a structure for the WIKI-page of WP4 and we implemented it. The structure is ready now, but
we still have to upload all the important documents and reports.
We also divided the work within SGA-1 of WP4. The results are summarized in the table beneath.
Num Task Action deliverable2 Who Deadline
1. Progress and technical report of first internal WP-meeting
Milestone 1 Anke 31-05- 2016
2. Data access Investigate the possibility of obtaining raw and processed AIS Data at a European level (EMSA, Marine traffic, Dirkzwager, Kystverket, Hellenic Coast Guard)
4.1 Paul Esser (Statistics Netherlands), Anke, Christina, Øyvind
01-07- 2016
3. Data access Make an overview of already public available AIS-based statistics in Norway
4.1 Øyvind, Olav, Maarten, Marco and Christina
01-07- 2016
4. Data handling Which programming language and environment should be used
4.1 Olav and Marco 01-07- 2016
5. Data handling Where will the data be processed 4.1 Olav and Marco 01-07- 2016
6. Methodology and techniques
Build a reference frame of ships in European water (based on AIS data)
4.2 Olav and Marco/Maarten
01-01- 2017
7. Methodology and techniques
Find out how data from maritime statistics can be linked to AIS data
4.2 ALL 01-01- 2017
8. Methodology and techniques
Check whether information improves the quality of current statistical outputs and provides more information about the origin/destination of the cargo
4.2 ALL 01-01- 2017
9. Traffic analyses Calculate the number of ships at certain coordinates
4.3 On the agenda for meeting 29
th
and 30th
September
01-07-2017
10. Traffic analyses Visualise the results to analyse variations in time
4.3 On the agenda for meeting 29
th
and 30th
01-07- 2017
2Deliverables SGA-1:
4.1 report on creating a database with AIS-data for official statistics: possibilities and pitfalls
4.2 report about deriving harbour visits and linking data from port authorities with AIS data
4.3 Report about sea traffic analyses using AIS data
9
September
Finally we discussed what we would like to do in SGA-2, because the tender for SGA-2 will be at the end of June
2016. On this time we suggest for SGA-2:
Estimate emissions (already envisaged under SGA-2): Aim of this task is 1) to infer journeys from AIS-data, 2) visualise the results, 3) combine these journeys with a model to calculate emissions and 4) estimate the impact of carrying out these calculations at the European level on the quality of emissions calculations. This task also includes an overview of the used methodology for calculating emissions by for example Kystverket, JRC and other literature.
Future perspectives (already envisaged under SGA-2): Aim of this task is to summarise the project results and perform a qualitative cost-benefit analysis of using AIS-data for official statistics. These analyses should include aspects like sustainability of the data source, possibilities of improving international comparability, possibilities of datasharing (at micro- or aggregated level), quality improvement of current statistics and a sketch of a possible statistical process and needed infrastructure.
New statistical output Aim of this task is to investigate what new statistical output we can make by using AIS data.
Concrete actions from this meeting
Nr Who What When Status
1. Marco Provide GitBucket for WP4 Asap
2. Marco Provide AIS data on UNECE SANDBOX Asap
3. Anke Arrange rights for using SANDBOX to members WP4 6-6-2016
4. Marco Send link of experiences on using R-packages for visualising data 6-6-2016
5. All Upload documents to WIKI-page 6-6-2016
6. Øyvind Ask for specific references for estimating emissions at Kystverket 6-6-2016
7. Anke Update the projectplan 6-6-2016
Decisions made so far
Nr What
20160404.01 Generally every four weeks there will be a WebEx meeting on Tuesday afternoon
20160404.02 The next meeting will take place on the 9th and 10th of May in Heerlen.
20160404.04 Øyvind puts more effort in task 3 and 4 than in task 1 and 2.
20160404.05 We use the UNECE Sandbox for saving and sharing the AIS data within WP
20160404.06 We use GitBucket for sharing setups and codes.
20160404.07 We use Wiki for internal communication and file sharing
20160404.08 We use the AIS data on European level from Dirkzwager in this project
10
Nr What
20160510.01 Second face to face meeting will take place in Denmark on the 29th
en 30th
of September 2016
20160510.02 The definition of AIS data on European level for this pilot is: “AIS data from whatever base stations in Europe catch”
20160510.03 In SGA-1 we investigate whether AIS data improves the quality of current statistical outputs and provides more information about the origin/destination of the cargo. In SGA-2 we propose to investigate which new statistics we can make based on AIS-data
11
ANNEX 1: use and integration of administrative records The use and integration of administrative records are established in the Article 17a of the REGULATION (EC) No
223/2009 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 11 March 2009 on European statistics, as
amended by Regulation (EU) 2015/759 of the European Parliament and of the Council of 29 April 2015.
“...Article 17a:
1. In order to reduce the burden on respondents, the NSIs, other national authorities as referred to in Article 4,
and the Commission (Eurostat) shall have the right to access and use, promptly and free of charge, all
administrative records and to integrate those administrative records with statistics, to the extent necessary for
the development, production and dissemination of European statistics, which are determined in the European
Statistical Programme in accordance with Article 1.
2. The NSIs and the Commission (Eurostat) shall be consulted on, and involved in, the initial design, subsequent
development and discontinuation of administrative records built up and maintained by other bodies, thus
facilitating the further use of those records for the purpose of producing European statistics. They shall be
involved in the standardisation activities concerning administrative records that are relevant for the production of
European statistics.
3. Access by, and involvement of the NSIs, other national authorities and the Commission (Eurostat) pursuant to
paragraphs 1 and 2 shall be limited to administrative records within their own respective public administrative
system.
4. Administrative records made available by their owners to the NSIs, other national authorities and the
Commission (Eurostat) in order to be used for the production of European statistics shall be accompanied by
relevant metadata.
5. The NSIs and owners of administrative records shall establish the necessary cooperation mechanisms. ...”
http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02009R0223-20150608&qid=1435820363605&from=EN