essnet big data - europa€¦ · data sources are and what the benefits and challenges of using big...

11
1 ESSnet Big Data Specific Grant Agreement No 1 (SGA-1) https://webgate.ec.europa.eu/fpfis/mwikis/essnetbigdata http://www.cros-portal.eu/ ......... Framework Partnership Agreement Number 11104.2015.006-2015.720 Specific Grant Agreement Number 11104.2015.007-2016.085 Work Package 4 AIS data Milestone 4.1 Progress and technical report of first internal WP-meeting Version 2016-07-13 ESSnet co-ordinator: Peter Struijs (CBS, Netherlands) [email protected] telephone : +31 45 570 7441 mobile phone : +31 6 5248 7775 Prepared by: Anke Consten (CBS, The Netherlands) Olav Grøndal (SD, Denmark) Maarten Pouwels (CBS, Netherlands) Marco Puts (CBS, Netherlands) Christina Pierrakou (ELSTAT, Greece) Oyvind Langsrud (SSB, Norway) Dominik Rozkrut (GUS, Poland)

Upload: others

Post on 22-May-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

1

ESSnet Big Data

S p e c i f i c G r a n t A g r e e m e n t N o 1 ( S G A - 1 )

h t t p s : / / w e b g a t e . e c . e u r o p a . e u / f p f i s / m w i k i s / e s s n e t b i g d a t a

h t t p : / / w w w . c r o s - p o r t a l . e u / . . . . . . . . .

Framework Partnership Agreement Number 11104.2015.006-2015.720

Specific Grant Agreement Number 11104.2015.007-2016.085

W o rk P a c ka ge 4

A IS da ta

Mi l esto ne 4 . 1

P ro gress a nd t ec hni c a l repo r t o f f i r s t i nter na l W P - meet i n g

Version 2016-07-13

ESSnet co-ordinator:

Peter Struijs (CBS, Netherlands)

[email protected]

telephone : +31 45 570 7441

mobile phone : +31 6 5248 7775

Prepared by: Anke Consten (CBS, The Netherlands)

Olav Grøndal (SD, Denmark) Maarten Pouwels (CBS, Netherlands)

Marco Puts (CBS, Netherlands) Christina Pierrakou (ELSTAT, Greece)

Oyvind Langsrud (SSB, Norway) Dominik Rozkrut (GUS, Poland)

Page 2: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

2

Index Participants first internal WP-meeting ......................................................................................................................... 3

Monday the 9th of May ................................................................................................................................................. 3

1. Big data at Statistics Netherlands ..................................................................................................................... 3

2. Experiences on analysing AIS data on national level for Inland Waterway transport ..................................... 3

3. Methodology on the AIS based statistics in Norway ........................................................................................ 4

4. Introduction to Sandbox and SPARK cluster .................................................................................................... 4

5. Introduction of AISLIB ....................................................................................................................................... 5

6. Alternatives for getting AIS data on European level ........................................................................................ 6

7. Data Handling ................................................................................................................................................... 7

Tuesday the 10th of May (till lunch) .............................................................................................................................. 8

Project matters ......................................................................................................................................................... 8

Concrete actions from this meeting ............................................................................................................................. 9

Decisions made so far ................................................................................................................................................... 9

ANNEX 1: use and integration of administrative records .......................................................................................... 11

Page 3: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

3

Participants first internal WP-meeting

Christina Pierrakou √ Maarten Pouwels √

Dominik Rozkrut √ Øyvind Langsrud √

Marco Puts √ Olav Grøndal √

Anke Consten (Report) √

Monday the 9th of May

1. Big data at Statistics Netherlands By Marco Puts (data scientist at Statistics Netherlands)

Marco gives an introduction to the experiences on Big Data1 at statistics Netherlands. First he explained what big

data sources are and what the benefits and challenges of using big data sources are. Marco also told about the

pilot for possibilities of using mobile phone data for statistics about tourism and daytime population. He also told

about making traffic intensity statistics by using data of road sensors.

2. Experiences on analysing AIS data on national level for Inland Waterway

transport By Maarten Pouwels (expert on Maritime statistics at Statistics Netherlands)

Maarten shared his experiences with using Automatic Identification System (AIS) data on national level for the

statistics of Inland Waterway transport. AIS is a unique program that provides means for ships to electronically

broadcast ship data at regular intervals including: vessel identification, position, course and speed. The

consistency and wealth of AIS data provides a unique opportunity for traffic data statistics, leading to significantly

more refined data of activities and estimations of emissions of vessels. It’s difficult to specify the time that

vessels lay still. Olav recognises this problem for ferries and oil tankers in Denmark as well. It’s good to share

experiences on the used algorithms for all participants in WP4. Maarten's results show us that AIS data on

national level can be used for statistics of Inland Waterway transport. The data can be used for following ships in

Inland waterways, calculating the number/intensity of ships and visualising variations in time. We are glad with

the results so far, but some fine-tuning has to take place on the used algorithms. The experiences of Maarten by

using AIS data on national level are very useful input for using AIS data on European level, for example on:

Tools for visualising variations in time

Learning which information is available in which AIS-message and which information in AIS-messages can be used.

1 According to outcomes of the UNECE Project on Using Big Data of Official Statistics, the term Big Data describes large

volumes of high velocity, complex and variable data that require advanced techniques and technologies to enable the capture, storage, distribution, management, and analysis of the information.

Page 4: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

4

3. Methodology on the AIS based statistics in Norway by Øyvind Langsrud (Maritime expert at statistics Norway)

Øyvind tells us about what he knows so far from using AIS data on national level at a governmental institution

called Kystverket (The Norwegian Coastal Administration). Øyvind shows us a website: www.havbase.no

This website is made and maintained at Kystverket. There are a lot of possibilities on the website, for example you

can choose:

A line, type of vessel and period in time and you will see a graph and table of the amount of vessels passing the selected line by type of vessel.

A harbour and period in time and you will see a graph and table of the amount and types of vessels by terminal for the chosen harbour.

A part of the sea and period in time and you will see the emissions in the selected area and period in time.

Showing routes for different vessels.

The data of Kystverket can be used later in this project for validating our results on calculating the amount of

ships at certain coordinates (task 3 traffic analyses) and estimating emissions (SGA-2). It’s also very useful for WP4

to learn about the used methodology by Kystverket.

Statistics Norway wants to use the data of Kystverket for improving their statistics about fuel consumption and

emissions. The website (www.havbase.no) will be improved by making an English version of it. Improvements will

also be made in order to report statistics about emissions to the United Nations.

4. Introduction to Sandbox and SPARK cluster By Marco Puts (data scientist at statistics Netherlands)

Marco gave an introduction to SPARK and HADOOP. The problem with Big Data is that you have to process very

large files with very small processors in computers. The Hadoop Distributed File System (HDFS) is a distributed file

system designed to run on commodity hardware. When one server goes down you will still have all files available.

Page 5: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

5

After each step Hadoop writes the results to the harddisk. This takes time. The advantage of using SPARK is that

SPARK keeps all the processing as long as possible in the memory of the computer and only writes the final results

to the hard disk. This saves time. This introduction in SPARK and HADOOP is very useful, because we have to use

one of these tools for processing and analysing the AIS data on European level in the UNECE Sandbox.

Big Data Sandbox

5. Introduction of AISLIB By Olav Grøndal (Statistics Denmark)

Olav gave a presentation about AISLIB and how Statistics Denmark structures their data. AISLIB is an AIS library

which is maintained by the Danish Maritime Authority and used for decoding AIS messages on national level in

Denmark. The AIS data is invoked in Java code and writes the data into a database. It’s also possible to write the

results of decoding into a zip-file. This code also deletes a part of the duplicated records in the Danish AIS data.

This presentation is very useful in making a good choice in the tooling we are going to use in WP4 for decoding

the AIS data on European level.

Page 6: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

6

6. Alternatives for getting AIS data on European level

We all discussed the possibilities of getting AIS data on European level. According to article 17a of the Reg. (EC)

No 223/2009 on European statistics, as amended by Regulation (EU) 2015/759, NSIs, ONAs and Eurostat have

the right to access and use, promptly and free of charge, all administrative records and to integrate those

administrative records with statistics, to the extent necessary for the development, production and dissemination

of European statistics, which are determined in the European Statistical Programme in accordance with Article 1.

Moreover, the NSIs and Eurostat should be consulted on, and involved in, the initial design, subsequent

development and discontinuation of administrative records built up and maintained by others, thus facilitating

the further use of those records for the purpose of producing European statistics. They should be involved in the

standardisation activities concerning administrative records that are relevant for the production of European

statistics.

For the pilot in WP4 we have already received 6 months (8 October 2015 till 12 April 2016) of AIS data from

Dirkzwager. This data will be stored and analysed in the SANDBOX of the UNECE, so it will be accessible for all the

members of WP4. For the future we also look for other possibilities of getting AIS data on European level,

according to the above mentioned Regulation. We decide to explore the following sources and compare them on

availability and costs:

European Maritime Safety Agency (EMSA) This data is free, but we expect that delivering AIS data by EMSA will take a while. There are also some difficulties because EMSA prescribes that we aren’t allowed to keep the data. After using the data we have to delete it. Hopefully we manage to get this free data for the future. Eurostat can play an important role in this.

Page 7: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

7

Other alternatives

We also investigate the possibilities and costs of getting AIS data on European level on a regular base

from Dirkzwager, the Hellenic Coast Guard, Kystverket (The Norwegian Coastal Administration) and

Marine Traffic.

Note that we already have an answer from Kystverket. They have discussed the topic with

their legal experts and found out that sharing data from other European countries goes

beyond their authorization. They believe that each NSI must go to the AIS owner in their own

country or buy AIS data commercially from a market player such as DNV, Marine Traffic or

similar.

National data If European data are too costly or too hard to obtain, we will also investigate how national datasets can be obtained. For the Netherlands, Norway, Denmark, Greece and Poland national datasets are available or can be obtained.

The results of investigating these alternative sources will be described in deliverable 1.

7. Data Handling

We discussed which programming language and environment we should use for transformation of the 6 months

of AIS data on European level we got from Dirkzwager in this pilot. We also decided that we will store, process

and analyse the obtained data in the SANDBOX of the UNECE, so it will be accessible for all the members of WP4.

The outcome of our discussion about used programming language and environment is summarised in the scheme

beneath. In deliverable 1 we will describe the outcome of this discussion in more detail and we also will explain

why we did choose a specific tool. We also describe the possibilities of creating a JDBC connection in deliverable

1.

Page 8: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

8

Tuesday the 10th of May (till lunch)

Project matters The last half day of our meeting we spent on arranging some project matters.

We decided that the second face to face meeting of WP4 will take place in Copenhagen on the 29th en 30th of

September 2016.

We talked about the project plan. Important question of Christina was what we mean in the project plan by: “AIS

data on a European level”. We defined it as AIS data on ships in European waters. But a better definition in this

pilot would be: “AIS data from whatever base stations in Europe catch.”

We also developed a structure for the WIKI-page of WP4 and we implemented it. The structure is ready now, but

we still have to upload all the important documents and reports.

We also divided the work within SGA-1 of WP4. The results are summarized in the table beneath.

Num Task Action deliverable2 Who Deadline

1. Progress and technical report of first internal WP-meeting

Milestone 1 Anke 31-05- 2016

2. Data access Investigate the possibility of obtaining raw and processed AIS Data at a European level (EMSA, Marine traffic, Dirkzwager, Kystverket, Hellenic Coast Guard)

4.1 Paul Esser (Statistics Netherlands), Anke, Christina, Øyvind

01-07- 2016

3. Data access Make an overview of already public available AIS-based statistics in Norway

4.1 Øyvind, Olav, Maarten, Marco and Christina

01-07- 2016

4. Data handling Which programming language and environment should be used

4.1 Olav and Marco 01-07- 2016

5. Data handling Where will the data be processed 4.1 Olav and Marco 01-07- 2016

6. Methodology and techniques

Build a reference frame of ships in European water (based on AIS data)

4.2 Olav and Marco/Maarten

01-01- 2017

7. Methodology and techniques

Find out how data from maritime statistics can be linked to AIS data

4.2 ALL 01-01- 2017

8. Methodology and techniques

Check whether information improves the quality of current statistical outputs and provides more information about the origin/destination of the cargo

4.2 ALL 01-01- 2017

9. Traffic analyses Calculate the number of ships at certain coordinates

4.3 On the agenda for meeting 29

th

and 30th

September

01-07-2017

10. Traffic analyses Visualise the results to analyse variations in time

4.3 On the agenda for meeting 29

th

and 30th

01-07- 2017

2Deliverables SGA-1:

4.1 report on creating a database with AIS-data for official statistics: possibilities and pitfalls

4.2 report about deriving harbour visits and linking data from port authorities with AIS data

4.3 Report about sea traffic analyses using AIS data

Page 9: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

9

September

Finally we discussed what we would like to do in SGA-2, because the tender for SGA-2 will be at the end of June

2016. On this time we suggest for SGA-2:

Estimate emissions (already envisaged under SGA-2): Aim of this task is 1) to infer journeys from AIS-data, 2) visualise the results, 3) combine these journeys with a model to calculate emissions and 4) estimate the impact of carrying out these calculations at the European level on the quality of emissions calculations. This task also includes an overview of the used methodology for calculating emissions by for example Kystverket, JRC and other literature.

Future perspectives (already envisaged under SGA-2): Aim of this task is to summarise the project results and perform a qualitative cost-benefit analysis of using AIS-data for official statistics. These analyses should include aspects like sustainability of the data source, possibilities of improving international comparability, possibilities of datasharing (at micro- or aggregated level), quality improvement of current statistics and a sketch of a possible statistical process and needed infrastructure.

New statistical output Aim of this task is to investigate what new statistical output we can make by using AIS data.

Concrete actions from this meeting

Nr Who What When Status

1. Marco Provide GitBucket for WP4 Asap

2. Marco Provide AIS data on UNECE SANDBOX Asap

3. Anke Arrange rights for using SANDBOX to members WP4 6-6-2016

4. Marco Send link of experiences on using R-packages for visualising data 6-6-2016

5. All Upload documents to WIKI-page 6-6-2016

6. Øyvind Ask for specific references for estimating emissions at Kystverket 6-6-2016

7. Anke Update the projectplan 6-6-2016

Decisions made so far

Nr What

20160404.01 Generally every four weeks there will be a WebEx meeting on Tuesday afternoon

20160404.02 The next meeting will take place on the 9th and 10th of May in Heerlen.

20160404.04 Øyvind puts more effort in task 3 and 4 than in task 1 and 2.

20160404.05 We use the UNECE Sandbox for saving and sharing the AIS data within WP

20160404.06 We use GitBucket for sharing setups and codes.

20160404.07 We use Wiki for internal communication and file sharing

20160404.08 We use the AIS data on European level from Dirkzwager in this project

Page 10: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

10

Nr What

20160510.01 Second face to face meeting will take place in Denmark on the 29th

en 30th

of September 2016

20160510.02 The definition of AIS data on European level for this pilot is: “AIS data from whatever base stations in Europe catch”

20160510.03 In SGA-1 we investigate whether AIS data improves the quality of current statistical outputs and provides more information about the origin/destination of the cargo. In SGA-2 we propose to investigate which new statistics we can make based on AIS-data

Page 11: ESSnet Big Data - Europa€¦ · data sources are and what the benefits and challenges of using big data sources are. Marco also told about the pilot for possibilities of using mobile

11

ANNEX 1: use and integration of administrative records The use and integration of administrative records are established in the Article 17a of the REGULATION (EC) No

223/2009 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 11 March 2009 on European statistics, as

amended by Regulation (EU) 2015/759 of the European Parliament and of the Council of 29 April 2015.

“...Article 17a:

1. In order to reduce the burden on respondents, the NSIs, other national authorities as referred to in Article 4,

and the Commission (Eurostat) shall have the right to access and use, promptly and free of charge, all

administrative records and to integrate those administrative records with statistics, to the extent necessary for

the development, production and dissemination of European statistics, which are determined in the European

Statistical Programme in accordance with Article 1.

2. The NSIs and the Commission (Eurostat) shall be consulted on, and involved in, the initial design, subsequent

development and discontinuation of administrative records built up and maintained by other bodies, thus

facilitating the further use of those records for the purpose of producing European statistics. They shall be

involved in the standardisation activities concerning administrative records that are relevant for the production of

European statistics.

3. Access by, and involvement of the NSIs, other national authorities and the Commission (Eurostat) pursuant to

paragraphs 1 and 2 shall be limited to administrative records within their own respective public administrative

system.

4. Administrative records made available by their owners to the NSIs, other national authorities and the

Commission (Eurostat) in order to be used for the production of European statistics shall be accompanied by

relevant metadata.

5. The NSIs and owners of administrative records shall establish the necessary cooperation mechanisms. ...”

http://eur-lex.europa.eu/legal-content/EN/TXT/PDF/?uri=CELEX:02009R0223-20150608&qid=1435820363605&from=EN