data in rdf
TRANSCRIPT
Applied Semantic WebTimely. Practical. Reliable.http://applied-semantic-web.org
Emanuele Della [email protected]://emanueledellavalle.org
Data in RDF
Emanuele Della Valle - http://applied-semantic-web.org
Share, Remix, Reuse — Legally
This work is licensed under the Creative Commons Attribution 3.0 Unported License.
Your are free:
• to Share — to copy, distribute and transmit the work
• to Remix — to adapt the work
Under the following conditions
• Attribution — You must attribute the work by inserting– “© applied-semantic-web.org” at the end of each reused slide– a credits slide stating: these slides are partially based on
“Data in RDF” by Emanuele Della Valle http://applied-semantic-web.org/2010/03/02_RDF.ppt
To view a copy of this license, visit http://creativecommons.org/licenses/by/3.0/
2
Emanuele Della Valle - http://applied-semantic-web.org 3
Data Interchange: RDF
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Looking for a flexible data model
Why• Application are always changing
(competitive environment) • People are always adding more features• Graceful evolution is important
Optimal: relational model• Relational model is remarkably flexible• Supports graceful evolution
– Change => Add another table– Existing queries are unaffected
• Easily accommodates new data – Without affecting existing queries
• Allows data to be easily combined ("joined") in new ways• 25+ years of relational database experience
4
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Resource Description Framework
The adaptation of the relational model to the Web give rise to RDF
From tuples to triples
Any relational data can be represented as triples• Row Key --> Subject• Column --> Property• Value --> Value
5
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Representing relational data in RDF (almost)
E.g., drug data
Represented in RDF (almost)
Drug Category Formula
BB.2 Anxiolytics
C16H21NO2
Drug Name
BB.2 Propranolol
BB.2 Propranololo
BB.2 プロプラノロール
BB.2
Anxiolytics C16H21NO2 Propranolol Propranololo プロプラノロール
CategoryFormula
Is a Drug
Legend
resource
literal
Name
6
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Representing relational data in RDF (almost)
Two important problems• Once out of the database internal ID (e.g., BB.2) becomes
useless• Once out of the database internal names of schema element
(e.g., Category) becomes useless as well
RDF solves it by using URI• Internal ID should be replaced by URI• Internal schema names should be replaced by URI• Values do (always) not need to be URI-fied
http://dbpedia.org/resource/Propranolol
http://dbpedia.org/resource/Category:Anxiolytics
C16H21NO2
http://www.w3.org/2004/02/skos/core#subject
http://dbpedia.org/ontology/formula
http://www.w3.org/2000/01/rdf-schema#label
http://dbpedia.org/ontology/Drug
http://www.w3.org/1999/02/22-rdf-syntax-ns#type
Legend
resource
literal
7
Propranolol
Propranololo
プロプラノロール
Emanuele Della Valle - http://applied-semantic-web.org
Which URI should we use?• Popular ones! Data merge will take place automatically!
RDF in a nutshell
Representing data in RDF Q/A 1/4
http://dbpedia.org/resource/Propranolol
http://dbpedia.org/resource/Category:Anxiolytics
http://www.w3.org/2004/02/skos/core#subject
+http://dbpedia.org/resource/Propranolol
http://dbpedia.org/ontology/knownFor
http://dbpedia.org/resource/Propranolol
http://dbpedia.org/resource/Category:Anxiolytics
http://www.w3.org/2004/02/skos/core#subject
=http://dbpedia.org/ontology/knownFor
8
http://dbpedia.org/resource/James_W._Black
http://dbpedia.org/resource/James_W._Black
Emanuele Della Valle - http://applied-semantic-web.org
Where do I find popular URIs?• A difficult question with no clear answer• The best place to keep an eye on is
– the Linking Open Data Project http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOpenData
– and in particular the following pages of the Wiki- Data Sets
http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/DataSets
- Semantic Web Search Engines http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/SemanticWebSearchEngines
- Common Vocabularies http://esw.w3.org/topic/TaskForces/CommunityProjects/LinkingOpenData/CommonVocabularies
• I normally use– Sindice API through sameas.org
- e.g. http://sameas.org/html?q=propranolol&x=0&y=0
RDF in a nutshell
Representing data in RDF Q/A 2/4
9
Emanuele Della Valle - http://applied-semantic-web.org
What is a value? When shall we URI-fy a value? • Literals cannot be used to merge different data set• E.g., what if we what to merge two resources based on their
labels?
– BRCA may refer to different thing on the Webe.g., try http://en.wikipedia.org/wiki/BRCA
• URI-fy any value that can be eventually used to merge different dataset and leave the other values as literals
RDF in a nutshell
Representing data in RDF Q/A 3/4
BRCA
http://www.w3.org/2000/01/rdf-schema#label
BRCA
http://www.w3.org/2000/01/rdf-schema#label
+ = ?
10
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Other data structure in RDF
Trees can be represented in RDF
Anything can be represented in RDF
11
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF
Three alternatives to write triples, in1. RDF/XML: Standard serialization in XML
<Description about=”subject”>
<property>value</property>
</Description>– e.g., http://dbpedia.org/data/Propranolol.rdf – Check-out the triples using http://www.w3.org/RDF/Validator/
2. NTriples: Simple (verbose) reference serialization (for specifications only)
<http://...subject> <http://...predicate> “value” .– http://www.w3.org/RDF/Validator/ARPServlet?URI=http%3A%2F%2Fdbpe
dia.org%2Fresource%2FPropranolol&PARSE=Parse+URI%3A+&TRIPLES_AND_GRAPH=PRINT_TRIPLES&FORMAT=PNG_EMBED
• N3 and Turtle: Developer-friendly serializations :subject :property “value” .– e.g. http://dbpedia.org/data/Propranolol.n3
12
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in Turtle - namespaces
URI terms can be abbreviated using namespaces@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix dbpedia: <http://dbpedia.org/resource/> .@prefix dbpedia-owl: <http://dbpedia.org/ontology/> .
dbpedia:Propranolol rdf:type dbpedia-owl:Drug .
<http://www.w3.org/1999/ 02/22-rdf-syntax-ns#type> = 'a'
dbpedia:Propranolol a dbpedia-owl:Drug .
13
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in Turtle - Literals
Literals: "Propranolol"• Literals with language tags: " プロプラノロール "@jp
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
dbpedia:Propranolol rdfs:label "Propranolol"@en . dbpedia:Propranolol rdfs:label "Propranololo"@it . dbpedia:Propranolol rdfs:label " プロプラノロール "@jp .
• Typed literals: "3.14"^^xsd:float
dbpedia:Propranolol dbpprop:molecularWeight "259.34"^^xsd:float .
14
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in Turtle - Convience Syntax
Abbreviating repeated subjects:dbpedia:Propranolol rdfs:label "Propranololo"@it . dbpedia:Propranolol dbpprop:molecularWeight "259.34"^^xsd:float .
... is the same as ...dbpedia:Propranolol rdfs:label "Propranololo"@it ; dbpprop:molecularWeight "259.34"^^xsd:float .
Abbreviating repeated subject/predicate pairs:dbpedia:Propranolol rdfs:label "Propranolol"@en . dbpedia:Propranolol rdfs:label "Propranololo"@it . dbpedia:Propranolol rdfs:label " プロプラノロール "@cn .
... is the same as ...dbpedia:Propranolol rdfs:label "Propranolol"@en , "Propranololo"@it , "プロプラノロール "@cn .
15
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML - basics
W3C standardized an RDF/XML syntax [1]
The basic idea is to insert an XML element for each node (sobject and value) and arch (predicate)
Es.
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:skos="http://www.w3.org/2004/02/skos/core#" <rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol">
<skos:subject> <rdf:Description
rdf:about="http://dbpedia.org/resource/Category:Anxiolytics"/> </skos:subject> </rdf:Description></rdf:RDF>
[1] RDF/XML Syntax Specification available at http://www.w3.org/TR/rdf-syntax-grammar/
dbpedia:Propranolol category:Anxiolytics
skos:subject
propertyelement
Root tag
16
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML - abbreviation
A possible abbreviation is using rdf:resource in the property elements
Es.<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#” xmlns:skos="http://www.w3.org/2004/02/skos/core#" <rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol">
<skos:subject rdf:resource="http://dbpedia.org/resource/Category:Anxiolytics"/> </rdf:Description></rdf:RDF>
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML - abbreviation
Encoding Literals• datatype
<rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol"><dbpprop:molecularWeight
rdf:datatype="http://www.w3.org/2001/XMLSchema#float" >259.34</dbpprop:molecularWeight></rdf:Description>
• Language tag
<rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol"><rdfs:label xml:lang="it">Propranololo</rdfs:label>
</rdf:Description>
18
dbpedia:Propranolol
dbpprop:molecularWeight
dbpedia:Propranolol
rdfs:label
“Propranololo”@it
"259.34"^^xsd:float
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML – rdf:type abbreviation
It’s possible to abbreviate rdf:type• Es.
• Long form <rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol">
<rdf:type rdf:resource="http://dbpedia.org/ontology/Drug"/></rdf:Description>
• Abbreviated form<dbpedia-owl:Drug rdf:about="http://dbpedia.org/resource/Propranolol" />
dbpedia:Propranolol
http://dbpedia.org/ontology/Drug
rdf:type
19
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML – more statements 1/3
E.g.
can be• listed
<rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol"><rdf:type rdf:resource="http://dbpedia.org/ontology/Drug"/>
</rdf:Description> <rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol">
<skos:subject rdf:resource="http://dbpedia.org/resource/Category:Anxiolytics"/> </rdf:Description>• nested
<rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol"> <rdf:type rdf:resource="http://dbpedia.org/ontology/Drug"/> <skos:subject rdf:resource="http://dbpedia.org/resource/Category:Anxiolytics"/></rdf:Description>
• shortened to a minimum<dbpedia-owl:Drug rdf:about="http://dbpedia.org/resource/Propranolol" > <skos:subject rdf:resource="http://dbpedia.org/resource/Category:Anxiolytics"/></dbpedia-owl:Drug>
20
dbpedia:Propranolol category:Anxiolytics
skos:subject
http://dbpedia.org/ontology/Drug
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML – more statements 2/3 E.g.
can be• listed
<rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol"> <skos:subject rdf:resource="http://dbpedia.org/resource/Category:Anxiolytics"/></rdf:Description><rdf:Description rdf:about="http://dbpedia.org/resource/Category:Anxiolytics"> <skos:broader rdf:resource="http://dbpedia.org/resource/Category:Psycholeptics"/>
</rdf:Description>
• nested<rdf:Description rdf:about="http://dbpedia.org/resource/Propranolol"> <skos:subject> <rdf:Description
rdf:about="http://dbpedia.org/resource/Category:Anxiolytics"> <skos:broader rdf:resource="http://dbpedia.org/resource/Category:Psycholeptics"/> </rdf:Description> </skos:subject></rdf:Description>
21
dbpedia:Propranolol category:Anxiolytics
skos:subject
Category:Psycholeptics
skos:broader
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Serializing RDF in XML – more statements 3/3
E.g.
Applying all abbreviations becomes<dbpedia-owl:Drug rdf:about="http://dbpedia.org/resource/Propranolol" > <rdfs:label xml:lang="it">Propranololo</rdfs:label> <skos:subject> <skos:concept rdf:about="http://dbpedia.org/resource/Category:Anxiolytics"> <skos:broader> <skos:concept rdf:about="http://dbpedia.org/resource/Category:Psycholeptics"/> </skos:broader> </skos:concept> </skos:subject></dbpedia-owl:Drug>
22
dbpedia:Propranolol category:Anxiolytics
skos:subject
Category:Psycholeptics
skos:broader
http://dbpedia.org/ontology/Drugskos:concept
“Propranololo”@itrdfs:label
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
Merging XML files (representing an RDF model)
Suppose you have to merge the two following XML
Merging the XML trees is difficult, but being RDF …
<Drug rdf:about="Propranolol">
<skos:subject>
<skos:concept rdf:about="Anxiolytics"/>
</skos:subject >
<excretion>
<AnatomicalStructure rdf:about="Kidney"/>
</excretion>
</Drug>
<skos:concept rdf:about="Anxiolytics"
skos:isSubjectOf="Propranolol">
<canDamage>
< AnatomicalStructure
rdf:about="Kidney"/>
</canDamage>
</skos:concept>
Propranolol
Anxiolytics
Drug
rdf: type
rdf: type
skos:subject
Concept
Kidney
rdf: type
AnatomicalStructure
excretion
Propranolol
Anxiolytics
rdf:type
Concept
Kidney
rdf:type
AnatomicalStructure
canDamage
skos:isSubjectOf
23
Emanuele Della Valle - http://applied-semantic-web.org
Propranolol
Anxiolytics
rdf: type
Concept
Kidney
rdf: type
AnatomicalStructure
canDamage
skos:isSubjectOf
RDF in a nutshell
Merging XML files 2/2
It’s (just) a matter to merge the two RDF graphs
NOTE: It works out nicely because both RDF/XML documents refer to the same resources and use the same vocabularies.
U
Propranolol
Anxiolytics
Drug
rdf: type
rdf: type
skos:subject
Concept
Kidney
AnatomicalStructure
canDamage
excretion
skos:isSubjectOf
rdf: type
24
Propranolol
Anxiolytics
Drug
rdf : type
rdf : type
skos:subject
Concept
Kidney
rdf : type
AnatomicalStructure
excretion
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Principles
1. Things must be identified with dereferenceable HTTP URIs.
2. If such a URI is dereferenced asking for the MIME-type application/rdf+xml, a data source must return an RDF/XML description of the identified resource.
– Information Resources
– Non-Information Resources
3. Besides RDF links to resources within the same data source, RDF descriptions should also contain RDF links to resources provided by other data sources
25
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
More on Content Negotiation
Approach
E.g. • Resource identifier
– http://dbpedia.org/resource/Propranolol • RDF representation describing Propranolol
– http://dbpedia.org/data/Propranolol.rdf • HTML representation describing Propranolol
– http://dbpedia.org/page/Propranolol
Test it! http://idi.fundacionctic.org/vapour
26
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipes for Serving Information as Linked Data
Recipes [1]
1. Serving static RDF files
2. Serving information stored in a DB as (virtual) RDF
3. Using a RDF storage with a Linked Data Interface
4. Wrap API to serve (virtual) RDF
[1] http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/
27
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipe 1 - Serving static RDF files 1/3
IDEA: negotiate content using Apache mod_rewrite by writing a .htaccess file
Directory layout• apachedocumentroot/
– examples/- .htaccess- content/
- file.html- file.rdf
Expected behavior• Requesting http://localhost/examples/file• a HTML browser is redirected to
http://localhost/examples/content/file.html• a Linked Data application is redirected to
http://localhost/examples/content/file.rdf
28
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipe 1 - Serving static RDF files 2/3# Turn off MultiViews
Options –MultiViews
# Rewrite engine setup
RewriteEngine On
RewriteBase /examples
# Rewrite rule to serve HTML content from the vocabulary URI if requested
RewriteCond %{HTTP_ACCEPT} !application/rdf\+xml.*(text/html|application/xhtml\+xml)
RewriteCond %{HTTP_ACCEPT} text/html [OR]
RewriteCond %{HTTP_ACCEPT} application/xhtml\+xml [OR]
RewriteCond %{HTTP_USER_AGENT} ^Mozilla/.*
RewriteRule ^example/$ content/file.html [R=303]
# Rewrite rule to serve RDF/XML content if requested
RewriteCond %{HTTP_ACCEPT} application/rdf\+xml
RewriteRule ^example/ content/file.rdf [R=303]
# Rewrite rule to serve RDF/XML content by default
RewriteRule ^example/ content/file.rdf [R=303]
[source: http://www.w3.org/TR/swbp-vocab-pub/#recipe4example ]
29
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipe 1 - Serving static RDF files 3/3
Testing the configuration
30
http://localhost/examples/file
http://localhost/examples/content/file.html
http://localhost/examples/content/file.html
http://localhost/examples/file
http://localhost/examples/content/file.rdf
http://localhost/examples/content/file.rdf
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipe 2 - DB as (virtual) RDF
General Idea
Ready to use solutions• D2R Server
– http://sites.wiwiss.fu-berlin.de/suhl/bizer/d2r-server/index.html#quickstart
• OpenLink Virtuoso – http://virtuoso.openlinksw.com/Whitepapers/pdf/Virtuoso_SQL_to_RDF_Mapping.pdf – http://virtuoso.openlinksw.com/Whitepapers/pdf/Deploying_Linked_Data_Final1.pdf
• Triplify– http://triplify.org/Overview
31
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipe 3 - RDF storage + Linked Data Interface
General Idea
Ready to use solutions• Pubby: a Linked Data interfaces to SPARQL endpoints.
– http://www4.wiwiss.fu-berlin.de/pubby/
32
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Recipe 4 - Wrap API to serve (virtual) RDF
General Idea
No “general” solutions, only ad hoc ones• Virtuoso Sponger: a framework for developing Linked Data
wrappers (called cartridges) around different types of data sources– http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/VirtSponger– http://virtuoso.openlinksw.com/dataspace/dav/wiki/Main/
VirtSpongerCartridgeSupportedDataSources • SIOC Exporters
– http://sioc-project.org/exporters
33
[source http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/Bizer-ESWC2007-RDFbookmashup.pdf ]
Emanuele Della Valle - http://applied-semantic-web.org
Publishing RDF as Linked Data
Which recipe should I use?
How much data do you want to serve? • several hundred RDF triples: recipe 1• Larger dataset: recipe 2, 3 and 4
How is your data currently stored? • In a relational database: recipe 2• Available through an API: recipe 4• As RDF dump: recipe 3
How often does your data change? • Frequently: recipe 2 and 4• Rarely: recipe 1 and 3
34
Emanuele Della Valle - http://applied-semantic-web.org
RDF in a nutshell
RDF Resources
RDF at the W3C - primer and specifications• http://www.w3.org/RDF/
Semantic Web tools - community maintained list; includes triple store, programming environments, tool sets, and more• http://esw.w3.org/topic/SemanticWebTools
302 Semantic Web Videos and Podcasts - includes a section specifically on RDF videos• http://www.semanticfocus.com/blog/entry/title/302-semantic-
web-videos-and-podcasts/
35
Emanuele Della Valle - http://applied-semantic-web.org
credits
Some slides are derived from• “How to Publish Linked Data on the Web” by Chris Bizer,
Richard Cyganiak and Tom Heath http://sites.wiwiss.fu-berlin.de/suhl/bizer/pub/LinkedDataTutorial/
36