methodology for the publication of linked open data from small and medium size dmo

25
ENTER 2015 Research Track Slide Number 1 Methodology for the publication of Linked Open Data from small and medium size DMO Ander García, Maria Teresa Linaza, Javier Franco and Miriam Juaristi Vicomtech-IK4, Spain [email protected] http://www.vicomtech.org

Category:

Education


1 download

TRANSCRIPT

ENTER 2015 Research Track Slide Number 1

Methodology for the publication of Linked Open Data from small and

medium size DMOAnder García, Maria Teresa Linaza, Javier Franco and

Miriam Juaristi

Vicomtech-IK4, [email protected]

http://www.vicomtech.org

ENTER 2015 Research Track Slide Number 2

Outline

• Introduction• Methodology• Application to a small DMO• Conclussions

ENTER 2015 Research Track Slide Number 3

Introduction

• Linked Open Data (LOD) = OD+LD:– non-privacy-restricted and non-confidential

data produced with public money – made available without any restrictions on its

usage or distribution– published on the Web– machine readable– its meaning is explicitly defined– it is linked to other external dataset

ENTER 2015 Research Track Slide Number 4

Introduction

• Publishing LOD involves 3 basic steps:– assign URIs to the entities described by the

dataset and dereference these URIs over the HTTP protocol into RDF representations

– set RDF links to other data sources on the Web– provide metadata about published data

• Researchers have proposed different methodologies to publish LOD for a variety of domains

ENTER 2015 Research Track Slide Number 5

Introduction

• Although previous examples could serve as guidelines for DMOs, the differences between domains require specific methodologies and examples for tourism LOD

• Benefits for DMOs:– Syntactic interoperatiblity– Reduction of the costs of applications– Provision of innovative added-value services

ENTER 2015 Research Track Slide Number 6

Introduction

• Tourism OD:– Several DMOs publish OD, for example Open

Data Euskadi from Basque Country publishes (starting from 2010):

• POIs• Stays• Restaurants• Cultural entities• ….

ENTER 2015 Research Track Slide Number 7

Introduction

• Tourism LOD:– Few examples– No linking to other datasets– No reuse of ontologies– No active URIs

ENTER 2015 Research Track Slide Number 8

Introduction

• Objective: Publication of 5 star LOD by DMOs

• Benefits for DMOs:– Syntactic interoperatiblity– Reduction of the costs of applications– Provision of innovative added-value services

ENTER 2015 Research Track Slide Number 9

Methodology

• Oriented to small and medium size DMOs• Based on Open Source tools• Main steps:

– Configuration– Pre-processing– Triplification– Publication

ENTER 2015 Research Track Slide Number 10

Configuration

• Non-technical issues:– Selection and categorization of data – Publication license, main options:

• public domain: free to share, create and adapt • attribution: requires to include mentions to source

data• share-alike: requires public reusers of your data to

share back changes (and attribute).• Keep-open: in case of redistribution of the data or

its adaptation, it requires a redistribution of a free version

ENTER 2015 Research Track Slide Number 11

Configuration

• Technical issues:– Design of URIs:– Multilingual data publicatin patterns:

• All languages associated to the same/diferent URI• Use labels• …

ENTER 2015 Research Track Slide Number 12

Pre-processing

• Extract, clean, and normalise data:– Format for strings and numbers– Format for multilingüal values– Storage of multivalued fields– Detection of errors and non-existant values

• LOD Refine software to transform original data based on formulas expressed on GRELL

ENTER 2015 Research Track Slide Number 13

Triplification

• Analyse the domain: ontologies, datasets, vocabularies

• Create new ontologies and/or vocabularies if required

• Express data as RDF triples• Link data to external datasets• LOD Refine software to generate RDF

triplles

ENTER 2015 Research Track Slide Number 14

Publication

• Both as OD (repository) and LOD (triple store)

• Include metadata adhering to the Data Catalog Vocabulary (DCAT)

• DKAN (OD) and Virtuoso (LOD) software

ENTER 2015 Research Track Slide Number 15

Methodology

Step Task Tool

Configuration

Select data -

Select the license to publish the data -

Design the URI scheme -

Select a multilingual data publication pattern -

Pre processing

Clean and normalise the data LOD Refine

Triplification

Select existing ontologies, vocabularies and LOD -

Define new ontologies and vocabularies (if required) Protégé

Triplification LOD Refine

Link to external LOD LOD Refine

Publication

Upload the RDF file to a triple store Virtuoso

Create the dataset and add metadata DKAN

Upload the resources of the datasets DKAN

ENTER 2015 Research Track Slide Number 16

Application to a small DMO

• Dataset: 143 POIs of a regional DMO, Urola Kosta, in four languages (Spanish, Basque, English, French) and five categories

• Configuration:– PDDL license, no restrictions– URI: /data/tourism/BASQUE_NAME

• Ñ replaced by ‘in’ and spaces by ‘_’

– Labels for multilingual data

ENTER 2015 Research Track Slide Number 17

Application to a small DMO

• Pre-processing– Names to title case, from ERREXIl to Errexil– Prefix added to telephone numbers:

(+34)943309230– Secondary mobile numbers stored in a new

column

ENTER 2015 Research Track Slide Number 18

Application to a small DMO

• Triplification– Ontologies:

• vCard: Contact information• Dublin Core: Metadata about the resource

– Linked Datasets: • Geonames: Locations• Dbpedia: Categories and locations

ENTER 2015 Research Track Slide Number 19

Application to a small DMO

• Triplification

ENTER 2015 Research Track Slide Number 20

Application to a small DMO

• Publication (OD and LOD)

ENTER 2015 Research Track Slide Number 21

Application to a small DMO

• Mobile application:

ENTER 2015 Research Track Slide Number 22

Application to a small DMO

• Mobile application:– Data available through three channels:

• Direct download (CSV, JSON, RDF,…)• DKAN Datastore API• Virtuoso SPARQL

ENTER 2015 Research Track Slide Number 23

Conclussions

• We have presented a methodology and Open Source tools for DMOs to publish five star tourism LOD

• The pre-processing and triplification steps are the hardest steps, but they are only done once per each type of data to be published. Then the performed operations can be applied again directly

ENTER 2015 Research Track Slide Number 24

Conclussions

• We have shown a example of a mobile application based on published data:– Data accesible through different channels:

direct download, DKAN API, SPARQL

• Tourism LOD can provide multiple benefits for DMOs and society but more tools, best practices and standars are required by DMOs

ENTER 2015 Research Track Slide Number 25

Future work

• More publication examples:– Statistical data, based on the RDF Data Cube

vocabulary– Data stored at relational databases

• Integrate standards such as the UNE 178301:2015 norm