tomas knap: unifiedviews in comsode pilot projects

34
The COMSODE project has received funding from the Seventh Framework Programme of the European Union in the grant agreement number 611358. UnifiedViews in COMSODE pilot projects Tomas Knap 1,2 , Jakub Klimek 2 1 EEA s.r.o., http://www.eea.sk/ 2 Charles University in Prague, Department of Software Engineering, XML and Web Engineering Research Group

Upload: semantic-web-company

Post on 13-Apr-2017

460 views

Category:

Data & Analytics


0 download

TRANSCRIPT

The COMSODE project has received funding from the Seventh Framework Programme

of the European Union in the grant agreement number 611358.

UnifiedViews in COMSODE pilot projects

Tomas Knap1,2, Jakub Klimek2

1EEA s.r.o.,

http://www.eea.sk/

2Charles University in Prague,

Department of Software Engineering,

XML and Web Engineering Research Group

Agenda

UnifiedViews

UnifiedViews in Open Data Node

Pilot Applications

Slovak Environmental Agency

Czech Trade Inspection Authority

UnifiedViews

UnifiedViews

A tool for management of RDF data processing tasks

Task = progression of data processing units (DPUs)

Sample task:

Extract data from SPARQL Endpoint A

Extract data from CSV file B

Refine data with SPARQL queries X,Y, Z

Deduplicate data using Linker L

Publish data to SPARQL Endpoint B

RDF Data Processing Task

UnifiedViews

UnifiedViews allows users to define, execute,

monitor, debug, schedule, and share tasks

UnifiedViews is ETL tool for RDF data

It differs from other ETL tools by natively

supporting RDF data

UnifiedViews provides set of plugins (DPUs) for

working with RDF data and new custom plugins

may be easily created

Open source, http://unifiedviews.eu

UnifiedViews Team

http://unifiedviews.eu

UnifiedViews Demo

UnifiedViews in Open Data Node

Open Data Node

Publication platform for (Linked) Open Data

Open Source

Developed in COMSODE project

2013-2015

Open Data Node

http://opendatanode.org/

Pilot Application

Slovak Environmental Agency

Mission of the Slovak Environmental Agency

(SEA)

Policy support

Design and Implementation

Data provider/integrator

LandCover, Environmental burden, waste dumps

Infrastructure provider

Data Services (DB servers)

Consultancy provider

Analysis and design of environmental information systems

Initial Situation/Motivation

SEA publishes various geospatial data from the

environmental domain

SEA wanted to explore potential to increase re-

use of their data if published as Linked data

Goals

To publish as Linked Data datasets on:

Protected sites, species distribution, bio-geographical regions, land

cover, contaminated sites registered as enviromental burdens

Harvest and convert source data to RDF

Source data is available in the Geography Markup Language (GML)

via an API provided by the Web Feature Service (WFS), typically in

INSPIRE format

Initial barrier: the vocabularies mapping the INSPIRE XML schemas to

RDF were not available

Interlink with relevant RDF/Linked data resources

Provide visualizations, interface for querying

Approach and IT solution

Successfully deployed ODN with UnifiedViews on

remote cloud infrastructure of SEA

For each dataset we built a transforming data

processing pipeline in UnifiedViews, which harvested

the data from the data service and converted it to

RDF via XSL transformations.

We also created pipelines for enriching the datasets

with links to external datasets

We associated these pipelines with datasets in

catalog

Approach and IT solution

Data Transformation Since GML is an XML format we converted it to

RDF via XSL transformations.

We extend XSL transformations developed by the

GeoKnow project (http://geoknow.eu)

The target vocabularies produced by the

transformations were derived from the INSPIRE

schemas and were simplified and adjusted to match

linked data conventions

Done in cooperation with SmartOpenData project

(http://www.w3.org/2015/03/inspire)

Approach and IT solution

Data Enrichment We link datasets to external datasets including

Geonames.org and datasets from the European

Environmental Agency:

Biogeographical regions 2011

Natura 2000

EUNIS

Benefits of the Semantic Solution

A key benefit of the RDF version of the SEA

datasets is that it is straightforward to combine it

with third-party datasets

We did the linkage to GeoNames, Natura

2000 and EUNIS datasets

Lessons Learned

Open Data Node (and UnifiedViews) was able

to transform, enrich and publish RDF data in a

simple way, allowing easy maintenance for the

future

Making the data you publish adhere to common

standards, such as the INSPIRE schemas,

make it more reusable

Reuse of XSL transformations from other projects

Next Steps

Linking more third-party datasets and extending

the coverage of the source data included in the

RDF version

Data visualizations are being designed

Developed as extensions of LDVMi (http://ldvm.net).

Demo

Pilot Application

Czech Trade Inspection Authority

Mission of Czech Trade Inspection

Authority (CTIA)

Monitors and inspects businesses and

individuals who

Supply goods

Sell goods

Provide services

Provided consumer credit

Operate marketplaces

Initial Situation

CTIA publishes CSV data about

Inspections

Penalties

Bans

Motivation

CTIA wanted to publish their data

To be used by third-party applications

Instead of building their own map

visualizations

Goals

CTIA wanted to (and managed to) be the first

Czech administrative government institution to

publish data in RDF (LOD)

CTIA wanted to publish additional anonymized

datasets

Approach and IT solution

UnifiedViews successfully deployed and

pipelines prepared to publish the source data as

Linked Open Data

Benefits of the Semantic Solution

A map application emerged

Uses RDF data combined with other datasets

Registry of Business Entities

Google Maps

Lessons Learned and Next Steps

Publishing data as LOD pays off

Publishing data as LOD is not difficult

All you need to start is a spare PC

CTIA is in the process of implementing the

COMSODE methodology for publising open

data

Demo

Resulting data published:

http://www.coi.cz/cz/spotrebitel/open-data-

databaze-kontrol-sankci-a-zakazu/

(in Czech)

Conclusions

Conclusions

UnifiedViews

http://unifiedviews.eu

Open Data Node

http://opendatanode.org

Pilots:

Slovak Environmental Agency

Czech Trade Inspection Authority

UnifiedViews in COMSODE pilot projects

Tomas Knap1,2, Jakub Klimek2

1EEA s.r.o.,

http://www.eea.sk/

2Charles University in Prague,

Department of Software Engineering,

XML and Web Engineering Research Group