eswc ss 2013 - tuesday tutorial 1 maribel acosta and barry norton: providing linked data
TRANSCRIPT
![Page 1: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/1.jpg)
Providing Linked Data
Presented by: Barry Norton
Maribel Acosta
![Page 2: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/2.jpg)
Motivation: Music!
2
Visualiza3on Module
Metadata Streaming providers
Physical Wrapper
Downloads
Data acquisi3
on R2R Transf. LD Wrapper
Musical Content
Applica3
on
Analysis & Mining Module
LD Data set
Access
LD Wrapper
RDF/ XML
Integrated Dataset
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
RDFa
Other content
![Page 3: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/3.jpg)
LINKED DATA LIFECYCLE
EUCLID -‐ Querying Linked Data 3
![Page 4: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/4.jpg)
Linked Data Principles
1. Use URIs as names for things. 2. Use HTTP URIs so that users can look up
those names. 3. When someone looks up a URI, provide
useful informa9on, using the standards (RDF*, SPARQL).
4. Include links to other URIs, so that users can discover more things.
EUCLID -‐ Providing Linked Data 4
CH 1
![Page 5: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/5.jpg)
Linked Data Lifecycle
Linked Data Lifecycle
EUCLID -‐ Providing Linked Data 5
Source: Sören Auer. “The Seman3c Data Web” (slides) Source: José M. Alvarez. “My Linked Data Lifecycle”
Source: Michael Hausenblas. “Linked Data lifeyclcle”
![Page 6: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/6.jpg)
Core Tasks for Providing Linked Data
EUCLID -‐ Providing Linked Data 6
Based on the proposed LD lifecycles and the LD principles, we can iden3fy 3 main tasks for providing LD:
① Crea9ng: includes data extrac3on, crea3on of HTTP URIs, and vocabulary selec3on. (LD principles 1 & 2)
② Interlinking: involves the crea3on of (RDF) links to external data sets. (LD principle 4)
③ Publishing: consists of crea3ng the metadata and making the data set accessible. (LD principle 3)
![Page 7: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/7.jpg)
Agenda 1. Crea9ng Linked Data
2. Interlinking Linked Data
3. Publishing Linked Data
4. Linked Data publishing checklist
7 EUCLID -‐ Providing Linked Data
![Page 8: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/8.jpg)
CREATING LINKED DATA
EUCLID -‐ Querying Linked Data 8
![Page 9: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/9.jpg)
• The data of interest may be stored in a wide range or formats:
• Several tools support the process of mining data from different repositories, for example:
Extracting the Data
9 EUCLID -‐ Providing Linked Data
Spreadsheets or tabular data Databases Text
R2RML
![Page 10: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/10.jpg)
Using the RDF Data Model
EUCLID -‐ Providing Linked Data 10
• The RDF data model is used to represent the extracted informa3on
• The nodes represent the concepts/en33es within the data. A node corresponds to a URI, a blank node or a literal (only in predicates)
• The rela3onships between the concepts/en33es are modeled as arcs
Subject Object Predicate
![Page 11: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/11.jpg)
Naming Things: URIs • All the things or dis3nct en33es within the data must be named
• According to the Linked Data principles, the standard mechanism to name en33es is the URI
• Designing Cool URIs: – Leave out informa3on about the data regarding to: author, technologies, status, access mechanisms, …
– Simplicity: short, mnemonic URIs – Stability: maintain the URIs as long as possible – Manageability: issue the URIs in a way that you can manage
11 EUCLID -‐ Providing Linked Data
Source:hjp://www.w3.org/TR/cooluris/
![Page 12: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/12.jpg)
Selecting Vocabularies • Vocabularies model the concepts and the rela9onship between them in a knowledge domain
• Terms from well-‐known vocabularies should be reused wherever possible
• New terms should be define only if you can not find required terms in exis3ng vocabularies
• A large number of vocabularies in RDF are openly available, e.g., Linked Open Vocabularies (LOV)
12 EUCLID -‐ Providing Linked Data
![Page 13: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/13.jpg)
Selecting Vocabularies (2)
EUCLID -‐ Providing Linked Data 13
Linked Open Vocabularies
322 vocabularies classified by domain
Source:hjp://lov.okfn.org/dataset/lov/
![Page 14: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/14.jpg)
Selecting Vocabularies (3)
EUCLID -‐ Providing Linked Data 14
Linked Open Vocabularies: Analyzing MusicOntology
Source:hjp://lov.okfn.org/dataset/lov/details/vocabulary_mo.html
![Page 15: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/15.jpg)
Selecting Vocabularies (4)
EUCLID -‐ Providing Linked Data 15
Other lists of well-‐known vocabularies are maintained by:
• W3C SWEO Linking Open Data community project hjp://www.w3.org/wiki/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies
• Library Linked Data Incubator Group: Vocabularies in the library domain hjp://www.w3.org/2005/Incubator/lld/XGR-‐lld-‐vocabdataset-‐20111025
![Page 16: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/16.jpg)
INTERLINKING LINKED DATA
EUCLID -‐ Providing Linked Data 16
![Page 17: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/17.jpg)
Interlinking Data Sets • It’s one of the Linked Data principles!
• Involves the crea3on of RDF links between two different RDF data sets: – Links at instance level (rdfs:seeAlso, owl:sameAs) – Links at schema level (RDFS subclass/subproperty, OWL equivalent class/property, SKOS mapping proper9es)
• Appropriate links are detected via link discovery
EUCLID -‐ Providing Linked Data 17
4. Include links to other URIs, so that users can discover more things.
![Page 18: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/18.jpg)
Interlinking Data Sets (2)
Challenges for link discovery • Linked Data sets are heterogeneous in terms of vocabularies, formats and data representa3on
• Large range of knowledge domains
• Scalability: LD is composed of a large number of data sets and RDF triples, hence it is not possible to compare every possible en3ty pair
EUCLID -‐ Providing Linked Data 18
Source: Robert Isele. “LOD2 Webinar Series:Silk”
![Page 19: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/19.jpg)
Interlinking Data Sets (3)
Challenges for link discovery • It corresponds to the en9ty resolu9on problem:
deciding whether two en..es correspond to same object in the real world
• Name ambigui9es: typos, misspellings, different languages, homonyms
• Structural ambigui9es: same concepts/en33es with different structures. Requires the applica3on of ontology and schema matching techniques
EUCLID -‐ Providing Linked Data 19
![Page 20: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/20.jpg)
Interlinking Data Sets (4)
EUCLID -‐ Providing Linked Data 20
RDF data sets can be interlinked:
Manually • Involves the manual explora3on of
LD data sets and their RDF resources to iden3fy linking targets
• May not be feasible when the number of en33es within the data set is very large
Automatically • Using tools that perform link
discovery based on linkage rules, for example: Silk, Limes and xCurator
![Page 21: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/21.jpg)
owl:sameAs & rdfs:seeAlso • owl:sameAs
• Creates links between individuals • States that two URIs refer to the same individuals
• rdfs:seeAlso • States that a resource may provide addi3onal informa3on about the subject resource
• Links in MusicBrainz: – owl:seeAlso is used for music ar3sts – rdfs:seeAlso is used for albums
EUCLID -‐ Providing Linked Data 21
![Page 22: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/22.jpg)
SKOS • Simple Knowledge Organiza3on System
– hjp://www.w3.org/TR/skos-‐reference/
• Data model for knowledge organiza3on systems (thesauri, classifica3on scheme, taxonomies)
• SKOS data is expressed as RDF triples
• Allows the crea3on of RDF links between different data sets with the usage of mapping proper9es
EUCLID -‐ Providing Linked Data 22
![Page 23: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/23.jpg)
SKOS: Mapping Properties These proper3es are used to link SKOS concepts (par3cularly instances) in different schemes:
• skos:closeMatch: links two concepts that are sufficiently similar (some3mes can be used interchangeably)
• skos:exactMatch: indicates that the two concepts can be used interchangeably. • Axiom: It is a transi9ve property
• skos:relatedMatch: states an associa3ve mapping link between two concepts
EUCLID -‐ Providing Linked Data 23
![Page 24: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/24.jpg)
Example of SKOS exact match
SKOS: Mapping Properties (2)
EUCLID -‐ Providing Linked Data 24
mo:MusicArtist skos:exactMatch dbpedia-‐ont:MusicalArtist.
@prefix skos: <http://www.w3.org/2004/02/skos/core#> @prefix mo: <http://purl.org/ontology/mo/> @prefix dbpedia-‐ont: <http://dbpedia.org/ontology/> @prefix schema: <http://schema.org/>
mo:MusicGroup skos:exactMatch schema:MusicGroup.
mo:MusicGroup skos:exactMatch dbpedia-‐ont:Band.
![Page 25: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/25.jpg)
Example of SKOS close match
SKOS: Mapping Properties (3)
EUCLID -‐ Providing Linked Data 25
mo:SignalGroup skos:closeMatch schema:MusicAlbum.
@prefix skos: <http://www.w3.org/2004/02/skos/core#> @prefix mo: <http://purl.org/ontology/mo/> @prefix dbpedia-‐ont: <http://dbpedia.org/ontology/> @prefix schema: <http://schema.org/>
mo:SignalGroup skos:closeMatch dbpedia-‐ont:Album.
![Page 26: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/26.jpg)
Integrity conditions • Guarantee consistency and avoid contradic3ons in the rela3onships between SKOS concepts
SKOS: Mapping Properties (4)
EUCLID -‐ Providing Linked Data 26
skos:Mapping Relation
skos:close Match
skos:exact Match
skos:related Match
Symmetric & Transi9ve
Disjoint with
Par3al Mapping Rela3on diagram with integrity condi3ons
Symmetric
![Page 27: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/27.jpg)
PUBLISHING LINKED DATA
EUCLID -‐ Providing Linked Data 27
![Page 28: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/28.jpg)
Publishing Linked Data Once the RDF data set has been created and interlinked, the publishing process involves the following tasks:
1. Metadata crea3on for describing the data set
2. Making the data set accessible
3. Exposing the data set in Linked Data repositories
4. Valida9ng the data set
EUCLID -‐ Providing Linked Data 28
![Page 29: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/29.jpg)
• Consists of providing (machine-‐readable) metadata of RDF data sets which can be processed by engines
• This informa3on allows for:
– Efficient and effec3ve search of data sets
– Selec3on of appropriate data sets (for consump3on or interlinking)
– Get general sta3s3cs of the data sets
EUCLID -‐ Providing Linked Data 29
Describing RDF Data Sets
![Page 30: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/30.jpg)
Describing RDF Data Sets (2) • The common language for describing RDF data sets is VoID (Vocabulary of Interlinked Data sets)
• Defines an RDF data set with the predicate void:Dataset
• Covers 4 types of metadata:
EUCLID -‐ Providing Linked Data 30
• General metadata
• Structural metadata
• Descrip3ons of linksets • Access metadata
![Page 31: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/31.jpg)
VoID: General Metadata • General metadata is used by users to iden3fy appropriate data sets.
• Specifies informa3on about descrip3on of the data set, contact person/organiza3on, the license of the data set, data subject and some technical features.
• VoID (re)uses predicates from the Dublin Core Metadata1 and FOAF2 vocabularies.
EUCLID -‐ Providing Linked Data 31
1 hjp://dublincore.org/documents/2010/10/11/dcmi-‐terms/ 2 hjp://xmlns.com/foaf/spec/
![Page 32: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/32.jpg)
VoID: General Metadata (2)
Predicate Range Descrip9on dcterms:title Literal Name of the data set.
dcterms:description Literal Descrip3on of the data set.
dcterms:source RDF resource Source from which the data set was derived.
dcterms:creator RDF resource Primarily responsible of crea3ng the data set.
dcterms:date xsd:date Time associated with an event in the life-‐cycle of the resource.
dcterms:created xsd:date Date of crea3on of the data set.
dcterms:issued xsd:date Date of publica3on of the data set.
dcterms:modified xsd:date Date on which the data set was changed.
foaf:homepage Literal Name of the data set.
dcterms:publisher RDF resource En3ty responsible for making the data set available.
dcterms:contributor RDF resource En3ty responsible for making contribu3ons to the data set.
EUCLID -‐ Providing Linked Data 32
Source: hjp://www.w3.org/TR/void/#metadata General Information Contains informa3on about the crea3on of the data set
![Page 33: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/33.jpg)
VoID: General Metadata (3)
Other Information • License of the data set: specifies the usage condi3ons of
the data. The license can be pointed with the property dcterms:license
• Category of the data set: to specify the topics or domains covered by the data set, the property dcterms:subject can be used
• Technical features: the property void:feature can be used to express technical proper3es of the data (e.g. RDF serializa3on formats)
EUCLID -‐ Providing Linked Data 33
![Page 34: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/34.jpg)
VoID: Structural Metadata
EUCLID -‐ Providing Linked Data 34
• Provides high-‐level informa3on about the internal structure of the data set
• This metadata is useful when exploring or querying the data set
• Includes informa3on about resources, vocabularies used in the data set, sta3s3cs and examples of resources in the data set
![Page 35: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/35.jpg)
VoID: Structural Metadata (2)
EUCLID -‐ Providing Linked Data 35
Information about resources • Example resources: allow users to get an impression of the
kind of resources included in the data set. Examples can be shown with the property void:exampleResource
• Pajern for resource URIs: the void:uriSpace property can be used to state that all the en3ty URIs in a data set start with a given string
:MusicBrainz a void:Dataset; void:exampleResource <http://musicbrainz.org/artist/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d> .
:MusicBrainz a void:Dataset; void:uriSpace "http://musicbrainz.org/" .
![Page 36: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/36.jpg)
VoID: Structural Metadata (3)
EUCLID -‐ Providing Linked Data 36
Vocabularies used in the data set • The void:vocabulary property iden3fies the vocabulary or
ontology that is used in a data set
• Typically, only the most relevant vocabularies are listed
• This property can only be used for en3re vocabularies. It cannot be used to express that a subset of the vocabulary occurs in the data set.
:MusicBrainz a void:Dataset; void:vocabulary <http://purl.org/ontology/mo/> .
![Page 37: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/37.jpg)
VoID: Structural Metadata (4)
EUCLID -‐ Providing Linked Data 37
Source: hjp://www.w3.org/TR/void/#metadata
Statistics about a data set Express numeric sta3s3cs about a data set:
Predicate Range Descrip9on
void:triples Number Total number of triples contained in the data set.
void:entities Number Total number of en33es that are described in the data set. An en3ty must have a URI, and match the void:uriRegexPajern
void:classes Number Total number of dis3nct classes in the data set.
void:properties Number Total number of dis3nct proper3es in the data set.
void:distinctSubjects Number Total number of dis3nct subjects in the data set.
void:distinctObjects Number Total number of dis3nct objects in the data set.
void:documents Number Total number of documents, in case that the data set is published as a set of individual documents.
![Page 38: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/38.jpg)
VoID: Structural Metadata (5)
EUCLID -‐ Providing Linked Data 38
Partitioned data sets • The void:subset property provides descrip3on of parts of a
data set
• Data sets can be par33oned based on classes or proper9es:
• void:classPartition contains only instances of a par3cular class • void:propertyPartition contains only triples with a par3cular predicate
:MusicBrainz a void:Dataset; void:subset :MusicBrainzArtists .
:MusicBrainz a void:Dataset; void:classPartition [ void:class mo:Release .] ; void:propertyParition [ void:property mo:member .] .
![Page 39: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/39.jpg)
VoID: Describing Linksets
EUCLID -‐ Providing Linked Data 39
• Linkset: collec3on of RDF links between two RDF data sets
:DS1 :DS2
:LS1 :LS2
Image based on hjp://seman3cweb.org/wiki/File:Void-‐linkset-‐conceptual.png
owl:sameAs
@PREFIX void:<http://rdfs.org/ns/void#> @PREFIX owl:<http://www.w3.org/2002/07/owl#> :DS1 a void:Dataset . :DS2 a void:Dataset . :DS1 void:subset :LS1 . :LS1 a void:Linkset; void:linkPredicate owl:sameAs; void:target :DS1, :DS2 .
![Page 40: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/40.jpg)
VoID: Describing Linksets (2)
EUCLID -‐ Providing Linked Data 40
Example
@PREFIX void:<http://rdfs.org/ns/void#> @PREFIX skos:<http://www.w3.org/2002/07/owl#> :MusicBrainz a void:Dataset . :DBpedia a void:Dataset . :MusicBrainz void:classPartition :MBArtists . :MBArtists void:class mo:MusicArtist . :MBArtists a void:Linkset; void:linkPredicate skos:exactMatch; void:target :MusicBrainz, :DBpedia .
![Page 41: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/41.jpg)
The access metadata describes the methods of accessing the actual RDF data set
* This assumes that the default graph of the SPARQL endpoint contains the data set. VoID cannot express that a data set is contained a specific named graph. This can be specified with SPARQL 1.1. Service Descrip3on
VoID: Access Metadata
EUCLID -‐ Providing Linked Data 41
Method Predicate
Descrip9on
URI look up endpoint void:uriLookupEndpoint Specifies the URI of a service for accessing the data set (different from the SPARQL protocol)
Root resource void:rootResource URI of the top concepts (only for data sets structured as trees)
SPARQL endpoint void:sparqlEndpoint Provides access to the data set via the SPARQL protocol.*
RDF data dumps void:dataDump Specifies the loca3on of the dump file. If the data set is split into mul3ple files, then several values of this property are provided.
CH 5
![Page 42: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/42.jpg)
Providing Access to the Data Set The data set can be accessed via different mechanisms:
EUCLID -‐ Providing Linked Data 42
RDFa RDF dump
SPARQL endpoint
Dereferencing HTTP URIs
![Page 43: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/43.jpg)
Dereferencing HTTP URIs • Allows for easily exploring certain resources contained in the data set
• What to return for a URI? • Immediate descrip9on: triples where the URI is the subject.
• Backlinks: triples where the URI is the object. • Related descrip9ons: informa3on of interest in typical usage scenarios.
• Metadata: informa3on as author and licensing informa3on.
• Syntax: RDF descrip3ons as RDF/XML and human-‐readable formats.
• Applica3ons (e.g. LD browsers) render the retrieved informa3on so it can be perceived by a user.
EUCLID -‐ Providing Linked Data 43
Source: How to Publish Linked Data on The Web -‐ Chris Bizer, Richard Cyganiak, Tom Heath.
CH 1
![Page 44: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/44.jpg)
Dereferencing HTTP URIs (2) Example: Dereferencing
EUCLID -‐ Providing Linked Data 44
![Page 45: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/45.jpg)
RDFa • RDFa = “RDF in ajributes”
• Extension to HTML5 for embedding RDF within HTML pages: – The HTML is processed by the browser, the (human) consumer don’t see the RDF data
– The RDF triples within the page are consumed by APIs to extract the (semi-‐)structured data
• It is considered as the bridge between the Web of Data and the Web of Documents
• It is a complete serializa9on of RDF
EUCLID -‐ Providing Linked Data 45
![Page 46: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/46.jpg)
RDFa: Attributes A]ribute role A]ribute Descrip9on
Syntax prefix List of prefix-‐name IRIs pairs
vocab IRI that specifies the vocabulary where the concept is defined
Subject about Specifies the subject of the rela3onship
Predicate
property Express the rela3onship between the subject and the value
rel Defines a rela3on between the subject and a URL
rev Express reverse rela3onships between two resources
Resource
href Specifies an object URI for the rel and rev ajributes
resource Same as href (used when href is not present)
src Specifies the subject of a rela3onship
Literal
datatype Express the datatype of the object of the property ajribute
content Supply machine-‐readable content for a literal
xml:lang, lang Specifies the language of the literal
Macro typeof Indicate the RDF type(s) to associate with a subject
inlist An object is added to the list of a predicate. EUCLID -‐ Providing Linked Data
46
![Page 47: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/47.jpg)
RDFa: Example Extracting RDF from HTML
EUCLID -‐ Providing Linked Data 47
<div class="ar3stheader" about="hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_" typeof="hjp://purl.org./ontology/mo/MusicGroup"> … </div>
<hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_>
HTML (+RDFa):
RDF:
![Page 48: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/48.jpg)
RDFa: Example Extracting RDF from HTML
EUCLID -‐ Providing Linked Data 48
<div class="ar3stheader" about="hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_" typeof="hjp://purl.org./ontology/mo/MusicGroup"> … </div>
<hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_> <hjp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#type>
HTML (+RDFa):
RDF:
![Page 49: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/49.jpg)
RDFa: Example Extracting RDF from HTML
EUCLID -‐ Providing Linked Data 49
<div class="ar3stheader" about="hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_" typeof="hjp://purl.org./ontology/mo/MusicGroup"> … </div>
<hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d#_> <hjp://www.w3.org/1999/02/22-‐rdf-‐syntax-‐ns#type> <hjp://purl.org./ontology/mo/MusicGroup>.
HTML (+RDFa):
RDF:
![Page 50: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/50.jpg)
RDFa: Example (2)
Extracting RDF from MusicBrainz.org
EUCLID -‐ Providing Linked Data 50
hjp://musicbrainz.org/ar3st/b10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d
![Page 51: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/51.jpg)
RDFa: Example (2)
Extracting RDF from MusicBrainz.org
EUCLID -‐ Providing Linked Data 51
Source: hjp://www.w3.org/2007/08/pyRdfa/
![Page 52: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/52.jpg)
RDFa: Example (2)
Extracting RDF from MusicBrainz.org
EUCLID -‐ Providing Linked Data 52
hjp://www.w3.org/2007/08/pyRdfa/extract?uri=hjp%3A%2F%2Fmusicbrainz.org%2Far3st%2Fb10bbbfc-‐cf9e-‐42e0-‐be17-‐e2c3e1d2600d&format=nt
Watch the EUCLID screencast: http://vimeo.com/euclidproject
![Page 53: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/53.jpg)
RDF Dump • An RDF dump refers to a file which contains (part of) a data set specified in an RDF format (RDF/XML, N-‐Triples, N-‐Quads)
• The data set can be split into several RDF dumps
• A list of available data sets available as RDF dumps can be found at: – hjp://www.w3.org/wiki/DataSetRDFDumps
EUCLID -‐ Providing Linked Data 53
![Page 54: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/54.jpg)
SPARQL Endpoint • The SPARQL endpoint refers to the URI of the listener of the SPARQL protocol service, which handles requests for SPARQL protocol opera3ons
• The user submits SPARQL queries to the SPARQL endpoint in order to retrieve only a desired subset of the RDF data set
• List of available SPARQL endpoints: • hjp://www.w3.org/wiki/SparqlEndpoints • hjp://labs.mondeca.com/sparqlEndpointsStatus/
EUCLID -‐ Providing Linked Data 54
CH 2
![Page 55: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/55.jpg)
Using Linked Data Catalogs • Data catalogs, markets or repositories are pla{orms dedicated to provide access to a wide range of data sets from different domains
• Allow data consumers to easily find and use the data
• Usually the catalogs offer relevant metadata about the crea3on of the data set
EUCLID -‐ Providing Linked Data 55
![Page 56: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/56.jpg)
Using Linked Data Catalogs (2) How to publish an RDF data set into a catalog?
EUCLID -‐ Providing Linked Data 56
Create your own data catalog
Recommended for big organiza3ons/ins3tu3ons aiming at providing a large number of data sets
Use a data management system, for example:
Upload your data set into an exis3ng catalog
Allows data consumers to easily find new data sets
Common LD catalogs are: -‐ -‐ The Linking Open Data Cloud
![Page 57: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/57.jpg)
Validating Data Sets There are different ways to validate the published RDF data set:
EUCLID -‐ Providing Linked Data 57
General validators
Parsing & Syntax
• Vapour -‐ Performs two types of tests: without content nego3a3on and reques3ng RDF/XML content
hjp://validator.linkeddata.org/vapour
• URI Debugger -‐ Retreieves the HTTP responses of accessing a URI hjp://linkeddata.informa3k.hu-‐berlin.de/uridbg/
• RDF Triple-‐Checker – Dereferences namespaces associated with the resources used in the document
hjp://graphite.ecs.soton.ac.uk/checker/
• W3C RDF/XML Valida9on Service – Evaluates the syntax of RDF/XML documents and displays the RDF triples in it hjp://validator.linkeddata.org/vapour
• W3C Markup Valida9on Service – Checks syntac3c correctness for web documents with RDFa markup hjp://validator.w3.org/
• RDF:ALERTS – Validates syntax, undefined resources, datatype and other types of errors hjp://swse.deri.org/RDFAlerts/
Accessibility
![Page 58: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/58.jpg)
Validating Data Sets (2) Example: Validating URIs with Vapour
EUCLID -‐ Providing Linked Data 58
Source: hjp://idi.fundacionc3c.org/vapour
![Page 59: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/59.jpg)
Validating Data Sets (3) Example: Validating URIs with Vapour
EUCLID -‐ Providing Linked Data 59
Source: hjp://idi.fundacionc3c.org/vapour
![Page 60: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/60.jpg)
Validating Data Sets (4)
Example: Validating URIs with Vapour
EUCLID -‐ Providing Linked Data 60
Source: hjp://idi.fundacionc3c.org/vapour
Example: Validating URIs with Vapour hjp://dbpedia.org/page/The_Beatles
hjp://dbpedia.org/data/The_Beatles.xml
HTML conten
t
RDF do
cumen
t
![Page 61: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/61.jpg)
PROVIDING LINKED DATA: CHECKLIST
EUCLID -‐ Providing Linked Data 61
![Page 62: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/62.jpg)
Providing Linked Data: Checklist (1) Creating Linked Data o All the relevant en33es/concepts were effec3vely extracted from the raw data ?
o Are all the created URIs dereferenceable? o Are you reusing terms from widely accepted vocabularies?
EUCLID -‐ Providing Linked Data 62
![Page 63: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/63.jpg)
Providing Linked Data: Checklist (2) Interlinking Linked Data o Is the data set linked to other RDF data sets? o Are the created vocabulary terms linked to other vocabularies?
EUCLID -‐ Providing Linked Data 63
![Page 64: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/64.jpg)
Providing Linked Data: Checklist (3) Publishing Linked Data o Do you provide data set metadata? o Do you provide informa3on about licensing? o Do you provide addi3onal access methods? o Is the data set available in LD catalogs? o Did the data set pass the valida3on tests?
EUCLID -‐ Providing Linked Data 64
![Page 65: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/65.jpg)
Summary
EUCLID -‐ Providing Linked Data 65
• The Linked Data lifecycle: • 3 core tasks: crea3ng, interlinking and publishing
• Crea3on of Linked Data: • Extrac3ng relevant data, using URIs to name en33es and selec3ng vocabularies and expressing the data using the RDF data model
• Interlinking Linked Data: • Challenges of link discovery, using Silk to create links between two data sets and using SKOS links
• Publishing Linked Data: • Crea3on of data set metadata; publishing the data set via RDF dumps, SPARQL endpoints or RDFa; using RDFa and schema.org to enrich search results, and uploading the data set to a LD catalog
In this chapter we studied:
![Page 66: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/66.jpg)
The Web & Linked Data
• Linked Data catalogs • Applica9ons
![Page 67: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/67.jpg)
CKAN • CKAN is an open source pla{orm for developing data set catalogs
• Implement useful tools for data publishers to support: • Data harves3ng • Crea3on of metadata • Access mechanisms to the data set • Upda3ng the data set • Monitoring the access to the data set
EUCLID -‐ Providing Linked Data 67
![Page 68: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/68.jpg)
CKAN (2)
EUCLID -‐ Providing Linked Data 68 Source: hjp://ckan.org
![Page 69: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/69.jpg)
CKAN (3)
EUCLID -‐ Providing Linked Data 69 Source: hjp://ckan.org
![Page 70: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/70.jpg)
• The Data Hub is a community-‐run data catalog which contains more than 5,000 data sets1
• “(…) is an openly editable open data catalogue, in the style of Wikipedia”.2
• It is implemented on top of the CKAN pla{orm
• Allows the crea3on of groups: – The Linking Open Data Cloud group exclusively contains Linked Data sets
EUCLID -‐ Providing Linked Data 70
1 According to the informa3on presented in the portal on March 2013 2 Source: hjp://datahub.io/about
The Data Hub
![Page 71: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/71.jpg)
EUCLID -‐ Providing Linked Data 71
Source: hjp://datahub.io/
The Data Hub (2)
![Page 72: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/72.jpg)
The Data Hub (3)
EUCLID -‐ Providing Linked Data 72
Source: hjp://datahub.io/
![Page 73: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/73.jpg)
The Linking Open Data Cloud
EUCLID -‐ Providing Linked Data 73
September 2011
Source: Linking Open Data cloud diagram, by Richard Cyganiak and Anja Jentzsch
![Page 74: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/74.jpg)
The Linking Open Data Cloud
How to publish an RDF data set in this cloud? 1. The data set must follow the Linked Data principles 2. The data set must contain at least 1,000 RDF triples 3. The data set must contain at least 50 RDF links to a
data set that is already in the diagram 4. Access to the data set must be provided
Once these criteria are met, the data publisher must add the data set to the Data Hub catalog, and contact the administrators of the Linking Open Data Cloud group
EUCLID -‐ Providing Linked Data 74
Source: hjp://lod-‐cloud.net/
![Page 75: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/75.jpg)
Linked Data & Search Engines
EUCLID -‐ Providing Linked Data 75
• Search engines collect informa3on about web resources in order to produce richer search results by improving the display of the results
• This is only possible if the search engines are able to understand the content within the web pages
• The HTML pages must be annotated with machine-‐readable content to describe their content:
Mark up format Vocabulary
![Page 76: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/76.jpg)
RDFa for marking up data
EUCLID -‐ Providing Linked Data 76
• RDFa is used to provide (semi-‐)structured Linked Data embedded in web content
• Examples: – Some search engines use RDFa, e.g., Google, Yahoo! and Bing
– Facebook’s Open Graph is based on RDFa
![Page 77: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/77.jpg)
Google Rich Snippets
EUCLID -‐ Providing Linked Data 77
• Embedding seman3cs via RDFa (or microformats/microdata) enhances search results:
![Page 78: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/78.jpg)
Google Rich Snippets (2)
EUCLID -‐ Providing Linked Data 78
![Page 79: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/79.jpg)
Schema.org
EUCLID -‐ Providing Linked Data 79
• Collec3on of schemas/vocabularies to markup the HTML pages
• It is recognized by Bing, Google, Yahoo! and Yandex
• Covers a wide range of knowledge domains
• It also offers an extension mechanism in case the publisher is interested in adding new concepts to the vocabularies
![Page 80: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/80.jpg)
Schema.org (2)
EUCLID -‐ Providing Linked Data 80
The vocabularies cover the following topics:
Source: hjp://schema.org/docs/schemas.html
“The world is too rich, complex and interes.ng for a single schema to describe fully on its own. With schema.org we aim to find a balance, by providing a core schema that covers lots of situa.ons, alongside extension mechanisms for extra detail.” (Dan Brickley, schema.org)
![Page 81: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/81.jpg)
EUCLID -‐ Providing Linked Data 81
Integrates(/aligns) exis3ng vocabularies where appropriate, e.g. rNews
Source: hjp://schema.org/Ar3cle
Schema.org (3)
![Page 82: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/82.jpg)
Google Knowledge Graph
EUCLID -‐ Providing Linked Data 82
• The user is able to find answer to their queries without browsing pages
• Provides detailed informa3on
![Page 83: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/83.jpg)
Google Knowledge Graph (2)
EUCLID -‐ Providing Linked Data 83
• Google Search results include structured data from Freebase
• Might disambiguate
search terms
![Page 84: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/84.jpg)
Freebase
EUCLID -‐ Providing Linked Data 84
• Knowledge base of
structured data
• Data is stored as a graph
• Describes data from
different domains
![Page 85: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/85.jpg)
Bing Snapshot
EUCLID -‐ Providing Linked Data 85
• Provides structured data related to the search term
• Includes a significant number of en33es from more domains
• Connects data from LinkedIn
• Is is powered by the graph engine Trinity.RDF
![Page 86: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/86.jpg)
Bing Snapshot (2)
EUCLID -‐ Providing Linked Data 86
![Page 87: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/87.jpg)
Open Graph Protocol
EUCLID -‐ Providing Linked Data 87
• It was originally created by Facebook
• Allows describing web content as graph objects, establishing connec3ons between people and objects
• The descrip3ons are embedded in the web page as RDFa data
• Supports descrip3on of several domains: basic metadata, music, video, ar3cles, books, websites and user profiles
Source: hjp://ogp.me/
![Page 88: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/88.jpg)
Open Graph Protocol (2)
EUCLID -‐ Providing Linked Data 88
Source: hjp://ogp.me/
Who is using Open Graph protocol?
Source: hjp://ogp.me/
Google Mixi
Consumers Publishers
IMDb
Microso� NHL
Posterous
Rojen Tomatoes TIME
![Page 89: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/89.jpg)
Open Graph Protocol (3)
EUCLID -‐ Providing Linked Data 89
• Facebook expands vocabulary of rela3onships beyond “friendship” and “like” more ac9ons!
Source: hjps://developers.facebook.com/docs/opengraph/
![Page 90: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/90.jpg)
Open Graph Protocol & Facebook
EUCLID -‐ Providing Linked Data 90
List of domains and ac9ons
Source: hjps://developers.facebook.com/docs/opengraph/
• Listen • Create a playlist
• Watch • Rate • Wants to watch
• Rate • Read • Quote • Wants to read
• Achieve • High score
• Bike • Run • Walk
• Like • Recommend • Follow
General
Music
Movies & TV
Games
Fitness
Book
How can we exploit these links and rela3onships?
![Page 91: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/91.jpg)
Facebook Graph Search
EUCLID -‐ Providing Linked Data 91
• focuse on people and their interests, exploi3ng how everything is related to each other
• Queries are specified using natural language
• Takes advantage of context and suggest possible queries
• Allows for building more complex (expressive) queries that are not possible with normal search: – For example, “music liked by me and friends who live in my city”
![Page 92: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/92.jpg)
Facebook Graph Search (2)
EUCLID -‐ Providing Linked Data 92
Context (informa3on from profile):
Graph search sugges9ons:
![Page 93: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/93.jpg)
Facebook Graph Search (3)
EUCLID -‐ Providing Linked Data 93
Results
![Page 94: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/94.jpg)
Facebook Graph Search (4)
EUCLID -‐ Providing Linked Data 94
Observations • Allows for conjunc3ve queries (applying filter over intermediate
results = “apply operator”)
• Disjunc9ve queries are not supported: – For example: “My friends who like Seman3cWeb.com OR ReadWrite”
• Post search is not supported – It is not possible to search in post content submijed to the 3meline
• User privacy segngs affect the results
![Page 95: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/95.jpg)
Tools for providing Linked Data
• Extrac9ng data from spreadsheets: OpenRefine • Extrac9ng data from RDBMS: R2RML • Extrac9ng data from text: Zemanta, OpenCalais, GATE • Interlinking data sets: Silk
![Page 96: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/96.jpg)
EXTRACTING DATA FROM SPREADSHEETS WITH OPENREFINE
EUCLID -‐ Providing Linked Data 96
![Page 97: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/97.jpg)
Integrate Chart Data • Task: Integrate latest chart
informa3on into your RDF database.
• Data may be available in non-‐RDF formats: – Plain text – CSV, TSV, separator-‐based
files – HTML tables – Spreadsheets
(OpenDocument, Excel, …) – XML – JSON – …
97
LD Data set
Access
Integrated Data Set
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
CSV/ TSV HTML Spreadsheets JSON
Data acquisi3
on
EUCLID -‐ Providing Linked Data
![Page 98: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/98.jpg)
Example Data
The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million
Queen, 90.5 million
98
hjp://en.wikipedia.org/wiki/ List_of_best-‐selling_music_ar3sts
Ar3st Country of origin
Period ac3ve
Release-‐year of first charted record
Total cer3fied units (from available markets)[Notes]
The Beatles United Kingdom
1960–1970[4] 1962[4] Total available cer9fied units:
250 million[show]
Elvis Presley United States
1954–1977[28] 1954[28] Total available cer9fied units:
203.3 million[show]
Michael Jackson[Note 2]
United States
1964–2009[32] 1971[32] Total available cer9fied units:
157.4 million[show]
Madonna United States
1979–present[44] 1982[44] Total available cer9fied units:
160.1 million[show]
Led Zeppelin United Kingdom
1968–1980[50] 1969[50] Total available cer9fied units:
135.5 million[show]
Queen United Kingdom
1971–present[53] 1973[53] Total available cer9fied units:
90.5 million[show]
{
"artist": { "class": "artist", "name": "The Beatles"
}, "rank": 1,
"value": 250 million }, …
CSV
JSON
HTML tables
EUCLID -‐ Providing Linked Data
![Page 99: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/99.jpg)
OpenRefine • transforms and cleans messy
input data sets.
• is an open-‐source successor of Google Refine.
• allows for en3ty reconcilia3on against SPARQL endpoints or RDF data.
• is extended with plugins that enhance its func3onality, e.g. for RDF support.
99 EUCLID -‐ Providing Linked Data
Quick Facts
![Page 100: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/100.jpg)
Use of OpenRefine
100
1. Messy input data is imported, transformed into a table represen-‐ta3on and cleaned.
3. Define the structure of the RDF output.
4. The data is exported into some RDF syntax.
2. En3ty reconcilia3on is applied to allow for interlinking with exis3ng data sets.
The Beatles, 250 million Elvis Presley, 203.3 million Michael Jackson, 157.4 million Madonna, 160.1 million Led Zeppelin, 135.5 million
Queen, 90.5 million
CSV
musicbrainz:b10bbbfc-cf9e-42e0-be17-e2c3e1d2600d :totalSales "25000000000"^^xsd:int . musicbrainz:01809552-4f87-45b0-afff-2c6f0730a3be :totalSales "2.033E10"^^xsd:int . musicbrainz:f27ec8db-af05-4f36-916e-3d57f91ecf5e :totalSales "1.574E10"^^xsd:int . musicbrainz:79239441-bfd5-4981-a70c-55c3f15c1287 :totalSales "1.601E10"^^xsd:int . musicbrainz:678d88b2-87b0-403b-b63d-5da7465aecc3 :totalSales "1.355E10"^^xsd:int . musicbrainz:0383dadf-2a4e-4d10-a46a-e9e041da8eb3 :totalSales "9.05E9"^^xsd:int .
RDF
EUCLID -‐ Providing Linked Data
![Page 101: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/101.jpg)
Typical steps: • Group and explore data
items • Dele3ng columns or rows
based on filter condi3on • Split columns into several
columns based on condi3on
• Modify messy data items with GREL, a powerful expression language
• Replay steps from a previous Refine project
101 EUCLID -‐ Providing Linked Data
Data Transformation
![Page 102: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/102.jpg)
How to Generate RDF?
• Addi3onal problem: data needs to be interlinked with exis3ng MusicBrainz data
• This is the point where plugins come into play: – RDF Refine: developed by DERI – An extension of OpenRefine to support RDF
102
? RDF
EUCLID -‐ Providing Linked Data
![Page 103: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/103.jpg)
Core Capabilities
• Interlinking of data by en3ty reconcilia3on – Against SPARQL endpoints, RDF dumps – Discovery of relevant RDF data sets
• RDF export with the help of RDF skeletons – Define the vocabulary and graph structure of the RDF serializa3on
– In Turtle, RDF/XML
103 EUCLID -‐ Providing Linked Data
![Page 104: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/104.jpg)
Typical steps: • Define a reconcilia3on service • Select specific types to reconcile against • Start reconciling a column against the
service
104 EUCLID -‐ Providing Linked Data
Entity Reconciliation
![Page 105: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/105.jpg)
Define RDF Skeletons
• An RDF skeleton defines the structure of the RDF triples that are exported
EUCLID -‐ Providing Linked Data 105
![Page 106: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/106.jpg)
RDF Skeletons
03.09.13 106 106 EUCLID -‐ Providing Linked Data
![Page 107: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/107.jpg)
EXTRACTING DATA FROM RDBMS WITH R2RML
EUCLID -‐ Providing Linked Data 107
![Page 108: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/108.jpg)
W3C RDB2RDF
• Task: Integrate data from rela3onal DBMS with Linked Data
• Approach: map from rela3onal schema to seman3c vocabulary with R2RML
• Publishing: two alterna3ves – – Translate SPARQL into
SQL on the fly – Batch transform data into
RDF, index and provide SPARQL access in a triplestore
108
LD Data set
Access
Integrated Data in
Triplestore
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
Data acquisi3
on
EUCLID -‐ Providing Linked Data
R2RML Engine
Rela3onal DBMS
![Page 109: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/109.jpg)
W3C RDB2RDF • The W3C made, last year, two recommenda3ons for mapping between rela3onal databases and RDF: – Direct mapping directly exposes data as RDF
• Not allowance for vocabulary mapping • No allowance for interlinking (unless URIs used in rela3onal data) • Not appropriate for this topic
– R2RML, the RDB to RDF mapping language • Allows vocabulary mapping (subject, predicate and object maps with class op3ons)
• Allows interlinking – URIs can be constructed • Means to provide MusicBrainz RDF/SPARQL itself
EUCLID -‐ Providing Linked Data 109
hjp://www.w3.org/2001/sw/rdb2rdf/
![Page 110: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/110.jpg)
MusicBrainz Next Gen Schema
EUCLID -‐ Providing Linked Data 110
• Ar9st As pre-‐NGS, but
further ajributes
• Ar9st Credit Allows joint credit
• Release Group Cf. ‘album’
versus:
• Release • Medium
• Track • Track List
• Work • Recording
Source: hjps://wiki.musicbrainz.org/Next_Genera3on_Schema
![Page 111: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/111.jpg)
Music Ontology
• OWL ontology with following core concepts (classes) and rela3onships (proper3es):
EUCLID -‐ Providing Linked Data 111
Source: hjp://musicontology.com
![Page 112: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/112.jpg)
R2RML Class Mapping • Mapping tables to classes is ‘easy’:
lb:Artist a rr:TriplesMap ; rr:logicalTable [rr:tableName "artist"] ; rr:subjectMap [rr:class mo:MusicArtist ; rr:template "http://musicbrainz.org/artist/{gid}#_"] ; rr:predicateObjectMap [rr:predicate mo:musicbrainz_guid ; rr:objectMap [rr:column "gid" ; rr:datatype xsd:string]] .
EUCLID -‐ Providing Linked Data 112
![Page 113: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/113.jpg)
R2RML Property Mapping • Mapping columns to proper3es can be easy:
lb:artist_name a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT artist.gid, artist_name.name FROM artist INNER JOIN artist_name ON artist.name =
artist_name.id"""] ; rr:subjectMap lb:sm_artist ; rr:predicateObjectMap [rr:predicate foaf:name ; rr:objectMap [rr:column "name"]] .
EUCLID -‐ Providing Linked Data 113
![Page 114: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/114.jpg)
NGS Advanced Relations
EUCLID -‐ Providing Linked Data 114
• Major en33es (Ar3st, Release Group, Track, etc.) plus URL are paired (l_ar3st_ar3st)
• Each pairing of instances refers to a Link
• Links have types (cf. RDF proper3es) and ajributes
Source: hjp://wiki.musicbrainz.org/Advanced_Rela3onship
![Page 115: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/115.jpg)
R2RML Advanced Mapping • Mapping advanced rela3onships (SQL joins): lb:artist_member a rr:TriplesMap ; rr:logicalTable [rr:sqlQuery """SELECT a1.gid, a2.gid AS band FROM artist a1 INNER JOIN l_artist_artist ON a1.id =
l_artist_artist.entity0 INNER JOIN link ON l_artist_artist.link = link.id INNER JOIN link_type ON link_type = link_type.id INNER JOIN artist a2 on l_artist_artist.entity1 = a2.id WHERE link_type.gid='5be4c609-‐9afa-‐4ea0-‐910b-‐12ffb71e3821' AND link.ended=FALSE"""] ; rr:subjectMap lb:sm_artist ; rr:predicateObjectMap [rr:predicate mo:member_of ; rr:objectMap [rr:template "http://musicbrainz.org/artist/
{band}#_" ; rr:termType rr:IRI]] .
EUCLID -‐ Providing Linked Data 115
![Page 116: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/116.jpg)
EXTRACTING DATA FROM TEXT
EUCLID -‐ Providing Linked Data 116
![Page 117: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/117.jpg)
OpenCalais
• Not easily customised/extended • Domain-‐specific coverage varies
EUCLID -‐ Providing Linked Data 117
Source: hjp://viewer.opencalais.com/
![Page 118: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/118.jpg)
DBpedia Spotlight
• Not easily customised/extended • Is currently only available for English
EUCLID -‐ Providing Linked Data 118
Source: hjp://dbpedia-‐spotlight.github.com/demo/
hjp://dbpedia.org/page/Slowcore
hjp://dbpedia.org/page/Dorothy_Parker
![Page 119: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/119.jpg)
Zemanta
EUCLID -‐ Providing Linked Data 119
Source: hjp://www.zemanta.com/demo/
• Common problem with general purpose, open-‐domain seman3c annota3on tools
• Best results require bespoke customisa3on
![Page 120: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/120.jpg)
• General Architecture for Text Engineering • Free open-‐source (LGPL) framework and development environment
• Started 1996, large developer community • Used worldwide by many organisa3ons to build bespoke solu3ons; e.g. Press Associa3on and the Na3onal Archive
• Informa3on Extrac3on in many languages
GATE
EUCLID -‐ Providing Linked Data 120
hjp://www.gate.ac.uk/
![Page 121: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/121.jpg)
• Increases recall over DBpedia by deriving new lexicalisa3ons for URIs from link anchor texts, disambigua3on pages, and redirect pages
GATE Example -‐ LODIE
EUCLID -‐ Providing Linked Data 121
![Page 122: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/122.jpg)
Precision and Recall • Generic services typically very low recall • Combina3on is one solu3on
• Other solu3on is custom extrac3on
122
PER LOC ORG TOTAL DB Spotlight 0.97 / 0.40 0.82 / 0.46 0.86 / 0.31 0.85 / 0.39 Zemanta 0.96 / 0.84 0.89 / 0.62 0.82 / 0.57 0.90 / 0.68 LODIE 0.81 / 0.82 0.73 / 0.76 0.56 / 0.59 0.71 / 0.74
Zemanta ∩ LODIE 1.00 / 0.74 0.95 / 0.45 0.97 / 0.42 0.97 / 0.54
Zemanta U LODIE 0.94 / 0.93 0.77 / 0.76 0.72 / 0.71 0.82 / 0.81
EUCLID -‐ Providing Linked Data
![Page 123: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/123.jpg)
Custom GATE Gazetteer • Retrieve MusicBrainz en3ty/label/class with SPARQL query
123 EUCLID -‐ Providing Linked Data
![Page 124: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/124.jpg)
GATECloud • Custom (e.g. based around custom gazejeer) GATE pipelines can be executed on the cloud:
124 EUCLID -‐ Providing Linked Data
![Page 125: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/125.jpg)
INTERLINKING DATA SETS WITH SILK
EUCLID -‐ Providing Linked Data 125
![Page 126: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/126.jpg)
Interlinking with Silk • Task: Create links between
the data set and external Linked Data sources.
• Approach: Crea3on of specified links by querying the target data sets
• Alterna9ves: – Manual crea3on of
linkage rules by the user – Automa3c learning
linkage rules by submi�ng predefined SPARQL queries
126
LD Data set
Access
Integrated Data Set
Interlinking Cleansing Vocabulary Mapping
SPARQL Endpoint
Publishing
CSV/ TSV HTML Spreadsheets JSON
Data acquisi3
on
EUCLID -‐ Providing Linked Data
![Page 127: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/127.jpg)
Link Discovery with Silk • Open source tool for discovering RDF links between data items within different Linked Data sources
• It is based on the Silk Link Specifica3on Language (Silk-‐LSL) for expressing linkage rules
• It accesses the target RDF data sets via SPARQL endpoints to generate RDF links
EUCLID -‐ Providing Linked Data 127
Source: Robert Isele. “LOD2 Webinar Series:Silk”
![Page 128: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/128.jpg)
Silk Variants • Silk Single Machine
• Generates RDF links on a single machine • Data sets can reside either locally or in remote machines • Provides mul3threading and caching
• Silk MapReduce • Uses a cluster composed of mul3ple machines • Based on Hadoop and designed to scale to big data sets
• Silk Server • Used within applica3ons that consume Linked Data from the Web
while keeping track of known en33es • Provides an HTTP API for matching en33es from an incoming
stream
EUCLID -‐ Providing Linked Data 128 Source: hjp://wifo5-‐03.informa3k.uni-‐mannheim.de/bizer/silk/
![Page 129: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/129.jpg)
Source: Silk workflow is par3ally based on “LOD2 Webinar Series: Silk -‐(Simplified) Linking Workflow” by Rober Isele.
Silk Workflow
EUCLID -‐ Providing Linked Data 129
Select LD data sets
• Iden3fy suitable data sets in LD catalogs*
• Select the two data sets to link
Specify LD data sets
• Specify the access method to the data set (RDF dump, SPARQL endpoint)*
• Specify the en3ty types to be linked
Write linkage rule
• Specifies how to compare the resources
• Use Silk-‐LSL • The rules can also be learnt
Generate RDF links
• Output links can be stored in a file or a triple store
• Can discover SKOS links
Silk framework
* See sec3on “Publishing Linked Data”
![Page 130: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/130.jpg)
Linkage Rule Components • Linkage rules define the condi3ons to create the links between the data sets. These rules are composed of:
EUCLID -‐ Providing Linked Data 130 Source: hjp://wifo5-‐03.informa3k.uni-‐mannheim.de/bizer/silk/
RDF Paths • Describe the elements to be
compared • Example: ?a/rdfs:label
Transforma9ons • Apply transforma3ons to the
result set of an RDF path • Examples: LowerCase,
Concatenate, Replace, …
Comparators • Compute the similarity of two
inputs • Examples: String similarity
metrics, Date similarity, …
Aggrega9ons • Compute an aggregated value
from mul3ple comparators • Examples: Min, Max, Avg, various
means, Euclidian distance …
1 2
3 4
![Page 131: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/131.jpg)
Silk Workbench • Web applica3on built on top of Silk, which allows the crea3on of projects to manage the crea3on of links between RDF data sets
• The data sets can be stored locally or accessed remotely by specifying the SPARQL endpoint
• The user is able to create customized linking tasks: – The tool offers a graphical editor to create linkage rules by combining the linkage rules components via drag & drop elements
– Includes support for (automa3c) learning linkage rules EUCLID -‐ Providing Linked Data 131
![Page 132: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/132.jpg)
Project configuration
Silk Workbench (2)
EUCLID -‐ Providing Linked Data 132
1 2
3
4
1. Project: name and components (data sources, linking tasks and output tasks)
2. Data sources: specifica3on of the data sets to be interlinked
3. Linking task: specifica3on of the linkage rules and type of links to be created
4. Output task: mechanism to store the results from the lnking process
2
![Page 133: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/133.jpg)
Editing a linking task
Silk Workbench (3)
EUCLID -‐ Providing Linked Data 133
1
4
2
3
1. Linkage rule components 2. Graphical editor: the items from (1) are dragged &
dropped in this area, and connected to compose the linkage rules
3. Generate links: based on the defined linkage rules in (2), the data sets are accessed to discover possible links
4. Learn: automa3c learning of linkage rules
![Page 134: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/134.jpg)
Adding a linkage rule
Silk Workbench (4)
EUCLID -‐ Providing Linked Data 134
The previous linkage rule states: 1. Retrieve the foaf:name values from MusicBrainz and
the rdfs:label from DBpedia 2. Apply lower case transforma3on to the output of (1) 3. Compare the output from (2) using the metric
“Levenshtein distance”. If this distance is greater than 0.90, then create a link.
1 2
3
![Page 135: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/135.jpg)
Generate Links
Silk Workbench (5)
EUCLID -‐ Providing Linked Data 135
![Page 136: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/136.jpg)
Learn Rules
Silk Workbench (6)
EUCLID -‐ Providing Linked Data 136
![Page 137: ESWC SS 2013 - Tuesday Tutorial 1 Maribel Acosta and Barry Norton: Providing Linked Data](https://reader035.vdocuments.us/reader035/viewer/2022081403/55508310b4c9051e5b8b47c5/html5/thumbnails/137.jpg)
For exercises, quiz and further material visit our website:
EUCLID -‐ Providing Linked Data 137
@euclid_project euclidproject euclidproject
http://www.euclid-‐project.eu
Other channels:
eBook Course