big data reference architecture and layers work …€¦ · web view2019/05/31  · big data...

6
Big data REference Architecture and Layers Work Package Roma Meeting, 31 st May 2019 Reporting of the design workshop for WP D & WP E Request was to produce (at least starting to reason about): For information architecture: a data model for input data/working data/validated data, a metadata model for input data/working data/validated data For application architecture: a building block model for acquisition and recording and maybe ‘data wrangling’ and ‘data representation’ Group D+E participants: Remco, Peter, Frederick, Sonia Data layers After the explanation from the previous day the group started trying to understand the types of data that had to be accounted for on the data model. The 2 Work packages were compared regarding their data. Input data WP D (Smart Meters) WP E (AIS Ships data) Signal data from the Smart Meter Signal data from the Ship Subscription information Ship Register GIS Data to derive Geo code GIS Data to derive ports WP D – Smart Meter o Signal data from the Smart Meter Smart Meter ID Volume Used Timestamp o Subscription Information Address Name VAT

Upload: others

Post on 24-May-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Big data REference Architecture and Layers Work …€¦ · Web view2019/05/31  · Big data REference Architecture and Layers Work Package Roma Meeting, 31st May 2019 Reporting of

Big data REference Architecture and Layers Work Package

Roma Meeting, 31st May 2019

Reporting of the design workshop for WP D & WP E

Request was to produce (at least starting to reason about):

For information architecture: a data model for input data/working data/validated data, a metadata model for input data/working data/validated data

For application architecture: a building block model for acquisition and recording and maybe ‘data wrangling’ and ‘data representation’

Group D+E participants: Remco, Peter, Frederick, Sonia

Data layersAfter the explanation from the previous day the group started trying to understand the types of data that had to be accounted for on the data model. The 2 Work packages were compared regarding their data.

Input data

WP D (Smart Meters) WP E (AIS Ships data)Signal data from the Smart Meter Signal data from the ShipSubscription information Ship RegisterGIS Data to derive Geo code GIS Data to derive ports

WP D – Smart Metero Signal data from the Smart Meter

Smart Meter ID Volume Used Timestamp

o Subscription Information Address Name VAT

o GIS Data to derive Geo Code WP E – Ships Data

o Signal data from the Ship Ship id Antenna id - > Position Timestamp

o Ship Register Ship id Ship specifications

Page 2: Big data REference Architecture and Layers Work …€¦ · Web view2019/05/31  · Big data REference Architecture and Layers Work Package Roma Meeting, 31st May 2019 Reporting of

Ship nationalityo GIS Data to derive Ports

The major difference encountered had to do with the information complementing the signal data. The subscription information for WP D is provided by the same source as the signal data thus it’s directly linkable, while the Ship register comes from a different data source, most probably each NSI. In both cases the metadata is available on detailed technical specifications.

It was discussed if this input data was the collected data and it was not clear if data would be collected as is from the source or if there was an opportunity for pushing computation out.

The Reference Methodological Framework which is being followed in WP I tries to identify in the (wild) mobile phone data ecosystem a specific (possibly elaborated, not directly raw) data set or type of data such that

this will be the output of any preprocessing needed to be conducted by data holders

any statistical analysis to be carried out later on by NSIs can be possibly made using these data and not any other piece of information prior to the elaboration of this data set.

This way to achieve functional modularity could be applied in WP D and E in order to avoid a strong dependency on the technology for the generation of data while potentiating the principle collect one, use many.

From this perspective the focus was shifted to the working and validated data that would be necessary to shape any statistical product.

In both cases it was considered that this data could be where NSI specific processing could be required, and:

(i) computation would not be pushed out further(ii) all the necessary data elements are present to create the statistical products

Working/Validated data

WP D – Smart Metero Statistical Units

Page 3: Big data REference Architecture and Layers Work …€¦ · Web view2019/05/31  · Big data REference Architecture and Layers Work Package Roma Meeting, 31st May 2019 Reporting of

Household/Business/Dwelling Volume Used Timestamp Duration (day, month,…)

o Aggregated Datao Outputs

Household consumption Business Consumption Vacant dwelling

WP E – AIS Ships Datao Statistical Units (Mapped to Register)

Ships Ports Routes

o Aggregated Datao Output

Visits to Ports Duration of a trip Route

Convergence data

The data in the convergence layer should be already processed and in principle its computation could be pushed out to the data producers/holders.

The data elements in this layer should be impervious to technology changes and also make possible the creation of all the statistic products.

WP D – Smart Metero Mapped to Register Data

Identification removed… (not trivial as there are relations 1:N and N:1)o Validated Datao Combined Data

Data already classified WP E – AIS Ships Data

o Statistical Units (Mapped to Register)

Page 4: Big data REference Architecture and Layers Work …€¦ · Web view2019/05/31  · Big data REference Architecture and Layers Work Package Roma Meeting, 31st May 2019 Reporting of

Ships Position Cargo/Goods/Freight Number of Passengers Vessel classification Ship nationality Timestamp

To derive these data elements while pushing computation out it may be necessary to provide registers to the data producers/holders, as in the case of WP E AIS Ships Data where the Ships register is not held by the holders of the signal data. Other auxiliary data may have to be provided by the NSIs but the computation has not to be held on premises.

Business FunctionsThe business functions from BREAL were mapped to the 3 layers considered:

Source Layero Acquisition and Recording

Convergence Layero Data Integrationo Data Wranglingo Part of Data Representation

Statistical Layero Data Representationo Modelling and Interpretation

The validation is an important part in the convergence layer and a major result of the discussion was that validation, not yet of the outputs but more complex than what can be tackled at the source layer plays an important role in the convergence layer.

Whenever we are dealing with big data the validation business function assumes an more important role. It should not only deal with missing data or signal errors but also be confronted with other data the NSIs may possess.

Page 5: Big data REference Architecture and Layers Work …€¦ · Web view2019/05/31  · Big data REference Architecture and Layers Work Package Roma Meeting, 31st May 2019 Reporting of