methodology for the publication of linked open data from small and medium size dmo
Upload: international-federation-for-information-technologies-in-travel-and-tourism-ifitt
Post on 15-Jul-2015
44 views
TRANSCRIPT
ENTER 2015 Research Track Slide Number 1
Methodology for the publication of Linked Open Data from small and
medium size DMOAnder García, Maria Teresa Linaza, Javier Franco and
Miriam Juaristi
Vicomtech-IK4, [email protected]
http://www.vicomtech.org
ENTER 2015 Research Track Slide Number 2
Outline
• Introduction• Methodology• Application to a small DMO• Conclussions
ENTER 2015 Research Track Slide Number 3
Introduction
• Linked Open Data (LOD) = OD+LD:– non-privacy-restricted and non-confidential
data produced with public money – made available without any restrictions on its
usage or distribution– published on the Web– machine readable– its meaning is explicitly defined– it is linked to other external dataset
ENTER 2015 Research Track Slide Number 4
Introduction
• Publishing LOD involves 3 basic steps:– assign URIs to the entities described by the
dataset and dereference these URIs over the HTTP protocol into RDF representations
– set RDF links to other data sources on the Web– provide metadata about published data
• Researchers have proposed different methodologies to publish LOD for a variety of domains
ENTER 2015 Research Track Slide Number 5
Introduction
• Although previous examples could serve as guidelines for DMOs, the differences between domains require specific methodologies and examples for tourism LOD
• Benefits for DMOs:– Syntactic interoperatiblity– Reduction of the costs of applications– Provision of innovative added-value services
ENTER 2015 Research Track Slide Number 6
Introduction
• Tourism OD:– Several DMOs publish OD, for example Open
Data Euskadi from Basque Country publishes (starting from 2010):
• POIs• Stays• Restaurants• Cultural entities• ….
ENTER 2015 Research Track Slide Number 7
Introduction
• Tourism LOD:– Few examples– No linking to other datasets– No reuse of ontologies– No active URIs
ENTER 2015 Research Track Slide Number 8
Introduction
• Objective: Publication of 5 star LOD by DMOs
• Benefits for DMOs:– Syntactic interoperatiblity– Reduction of the costs of applications– Provision of innovative added-value services
ENTER 2015 Research Track Slide Number 9
Methodology
• Oriented to small and medium size DMOs• Based on Open Source tools• Main steps:
– Configuration– Pre-processing– Triplification– Publication
ENTER 2015 Research Track Slide Number 10
Configuration
• Non-technical issues:– Selection and categorization of data – Publication license, main options:
• public domain: free to share, create and adapt • attribution: requires to include mentions to source
data• share-alike: requires public reusers of your data to
share back changes (and attribute).• Keep-open: in case of redistribution of the data or
its adaptation, it requires a redistribution of a free version
ENTER 2015 Research Track Slide Number 11
Configuration
• Technical issues:– Design of URIs:– Multilingual data publicatin patterns:
• All languages associated to the same/diferent URI• Use labels• …
ENTER 2015 Research Track Slide Number 12
Pre-processing
• Extract, clean, and normalise data:– Format for strings and numbers– Format for multilingüal values– Storage of multivalued fields– Detection of errors and non-existant values
• LOD Refine software to transform original data based on formulas expressed on GRELL
ENTER 2015 Research Track Slide Number 13
Triplification
• Analyse the domain: ontologies, datasets, vocabularies
• Create new ontologies and/or vocabularies if required
• Express data as RDF triples• Link data to external datasets• LOD Refine software to generate RDF
triplles
ENTER 2015 Research Track Slide Number 14
Publication
• Both as OD (repository) and LOD (triple store)
• Include metadata adhering to the Data Catalog Vocabulary (DCAT)
• DKAN (OD) and Virtuoso (LOD) software
ENTER 2015 Research Track Slide Number 15
Methodology
Step Task Tool
Configuration
Select data -
Select the license to publish the data -
Design the URI scheme -
Select a multilingual data publication pattern -
Pre processing
Clean and normalise the data LOD Refine
Triplification
Select existing ontologies, vocabularies and LOD -
Define new ontologies and vocabularies (if required) Protégé
Triplification LOD Refine
Link to external LOD LOD Refine
Publication
Upload the RDF file to a triple store Virtuoso
Create the dataset and add metadata DKAN
Upload the resources of the datasets DKAN
ENTER 2015 Research Track Slide Number 16
Application to a small DMO
• Dataset: 143 POIs of a regional DMO, Urola Kosta, in four languages (Spanish, Basque, English, French) and five categories
• Configuration:– PDDL license, no restrictions– URI: /data/tourism/BASQUE_NAME
• Ñ replaced by ‘in’ and spaces by ‘_’
– Labels for multilingual data
ENTER 2015 Research Track Slide Number 17
Application to a small DMO
• Pre-processing– Names to title case, from ERREXIl to Errexil– Prefix added to telephone numbers:
(+34)943309230– Secondary mobile numbers stored in a new
column
ENTER 2015 Research Track Slide Number 18
Application to a small DMO
• Triplification– Ontologies:
• vCard: Contact information• Dublin Core: Metadata about the resource
– Linked Datasets: • Geonames: Locations• Dbpedia: Categories and locations
ENTER 2015 Research Track Slide Number 22
Application to a small DMO
• Mobile application:– Data available through three channels:
• Direct download (CSV, JSON, RDF,…)• DKAN Datastore API• Virtuoso SPARQL
ENTER 2015 Research Track Slide Number 23
Conclussions
• We have presented a methodology and Open Source tools for DMOs to publish five star tourism LOD
• The pre-processing and triplification steps are the hardest steps, but they are only done once per each type of data to be published. Then the performed operations can be applied again directly
ENTER 2015 Research Track Slide Number 24
Conclussions
• We have shown a example of a mobile application based on published data:– Data accesible through different channels:
direct download, DKAN API, SPARQL
• Tourism LOD can provide multiple benefits for DMOs and society but more tools, best practices and standars are required by DMOs