rdf for developers paul groth thanks to eyal oren, stefan schlobach for slides
TRANSCRIPT
RDF FOR DEVELOPERSPaul Groth
Thanks to Eyal Oren, Stefan Schlobach for slides
Contents
• The Web of Data• A Data Model• A Query Language
• Query Time…• Building Apps• Exposing Data
Why can’t we query all the information on the Web as a database?
Data Interoperability on the Web• Different formats, different structures, different
vocabularies, different concepts, different meaning
• Data should be structured (not ASCII)• Structure should be data-oriented (not HTML)• Meaning of data should be clear (not XML)• Data should have standard APIs (not Flickr)• Reusable mappings between data are needed
(not XSLT)
What is the Semantic Web?“There is lots of data we all use every day, and it’s not
part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar? No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.”
Tim Berners-Lee
The Web of Data
Enrich your data
Standard Data Model• RDF: basic data format (triples form hypergraph)
– john likes mary .– isbn:10993 dc:title "The Pelican Brief" .
• RDFS : simple schema language (subclass, subproperty)– dc:title rdfs:subPropertyOf rdfs:label .– Jeep isa Car .
• OWL: rich schema language (constraints, relations)– likes isa owl:symmetricProperty .
• RDF allows interlinked graphs of arbitrary structure • RDFS and OWL allow inferencing of implicit information
Standard API: SPARQL
• Query language for RDF– Based on existing ideas (e.g. SQL)– Supported widely
• Standard query protocol– How to send a query over HTTP– How to respond over HTTP
RDF BASICS
RDF Data Model: Summary
• Resources • Triples
– Statements: typed links between triples– May involve literals as property values
• Graph– Set of Statements– Interlinked by reusing same URIs acrossed triples
RDF Syntax
• Several Different Formats– Standard RDF/XML– NTriples– Turtle
SPARQL
SQL?
Formulate a query on the relational model students(name, age, address)
Structured Query Language (SQL)SELECT name data neededFROM student data sourceWHERE age>20 data constraint
name age address
Alice 21 Amsterdam
SPARQL
Standard RDF query language based on existing ideas standardised by W3C widely supported
Standard RDF query protocol how to send a query over HTTP how to respond over HTTP
SPARQL Query Syntax
• SPARQL uses a select-from-where inspired syntax (like SQL):
select: the entities (variables) you want to returnSELECT ?city
from: the data source (RDF dataset)FROM <http://example.org/geo.rdf>
where: the (sub)graph you want to get the information fromWHERE {?city geo:areacode “010” .}
Including additional constraints on objects, using operatorsWHERE {?city geo:areacode ?c. FILTER (?c > 010)}
prologue: namespace informationPREFIX geo: <http://example.org/geo.rdf#>
SPARQL Query Syntax
PREFIX geo: <http://example.org/geo/>
SELECT ?cityFROM <http://example.org/geoData.rdf>WHERE {?city geo:areacode ?c .
FILTER (?c > 010)}
SPARQL Graph Patterns
• The core of SPARQL WHERE clause specifies graph pattern
pattern should be matched pattern can match more than once
Graph pattern: an RDF graph with some nodes/edges as variables
hasCapital? ?
type
EuropeanCountry “020”^^xsd:integer
?
Basis: triple patterns Triples with one/more variables Turtle syntax
?X geo:hasCapital geo:Amsterdam ?X geo:hasCapital ?Y ?X geo:areacode "020" ?X ?P ?Y
All of them match this graph:
hasCapitalNetherlands Amsterdam “020”
areacode
Basis: triple pattern
• A very basic query
PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { ?X geo:hasCapital ?Y .}
Conjunctions: several patterns
• A pattern with several graphs, all must match
PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { {?X geo:hasCapital ?Y }
{?Y geo:areacode "020" } }
PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { ?X geo:hasCapital ?Y .
?Y geo:areacode "020" . }
equivalent to
Conjunctions: several patterns
• A pattern with several graphs, all must match
PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { {?X geo:hasCapital ?Y }
{?Y geo:areacode "020" } }
PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { ?X geo:hasCapital [ geo:areacode "020" ].}
equivalent to
Note: Turtle syntax again
?X geo:name ?Y ; geo:areacode ?Z . ?X geo:name ?Y . ?X geo:areacode ?Z . ?country geo:capital [ geo:name
“Amsterdam” ] .
Alternative Graphs: UNION
• A pattern with several graphs, at least one should match
PREFIX geo: <http://example.org/geo/>SELECT ?cityWHERE { { ?city geo:name "Parijs"@nl } UNION { ?city geo:name "Paris"@fr .}}
Optional Graphs
RDF is semi-structured Even when the schema says some object can have a particular
property, it may not always be present in the data Example: persons can have names and email addresses, but Frank is
a person without a known email address
nameperson001
“Antoine”
nameperson002 “Frank”
Optional Graphs (2)
“Give me all people with first names, and if known their email address”
An OPTIONAL graph expression is needed
PREFIX : <http://example.org/my#>SELECT ?person ?name ?emailWHERE { ?person :name ?name . OPTIONAL { ?person :email ?email }}
Testing values of nodes• Tests in FILTER clause have to be validated
for matching subgraphs
RDF model-related operators isLiteral(?aNode) isURI(?aNode) STR(?aResource)
Interest of STR?SELECT ?X ?NWHERE { ?X ?P ?N .
FILTER (STR(?P)="areacode") } For resources with names only partly known For literals with unknown language tags
Testing values of nodes
• Tests in FILTER clause Comparison:
?X <= ?Y, ?Z < 20, ?Z = ?Y, etc. Arithmetic operators
?X + ?Y, etc. String matching using regular expressions
REGEX (?X,"netherlands","i") matches "The Netherlands"
PREFIX geo: <http://example.org/geo/>SELECT ?X ?NWHERE { ?X geo:name ?N .
FILTER REGEX(STR(?N),"dam") }
Filtering results
Tests in FILTER clause Boolean combination of these test expressions
&& (and), || (or), ! (not) (?Y > 10 && ?Y < 30)
|| !REGEX(?Z,"Rott")
PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geo.rdf>WHERE {?X geo:areacode ?Y ;
geo:name ?Z . FILTER ((?Y > 10 && ?Y < 30) || !REGEX(STR(?X),"Rott")) }
Boolean comparisons and datatypes
Reminder: RDF has basic datatypes for literals XML Schema datatypes:xsd:integer, xsd:float,
xsd:string, etc. Datatypes can be used in value
comparison X < “21”^^xsd:integer
and be obtained from literals DATATYPE(?aLiteral)
Solution modifiers
ORDER BYSELECT ?dog ?age
WHERE { ?dog a Dog ; ?dog :age ?age . } ORDER BY DESC(?age)
LIMITSELECT ?dog ?ageWHERE { ?dog a Dog ; ?dog :age ?age . }ORDER BY ?dogLIMIT 10
SELECT Query Results
SPARQL SELECT queries return solutions that consist of variable bindings For each variable in the query, it gives a value
(or a list of values). The result is a table, where each column
represents a variable and each row a combination of variable bindings
Query result: example Query: “return all countries with the cities
they contain, and their areacodes, if known”
Result (as table of bindings):
X Y Z
Netherlands Amsterdam “020”
Netherlands DenHaag “070”
PREFIX geo: <http://example.org/geo/>SELECT ?X ?Y ?ZWHERE { ?X geo:containsCity ?Y.
OPTIONAL {?Y geo:areacode ?Z} }
SELECT Query results: format Query: return all capital cities
Results as an XML document :
PREFIX geo: <http://example.org/geo/>SELECT ?X ?YWHERE { ?X geo:name ?Y .}
<sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> <variable name="X"/> <variable name="Y"/> </head> <results> <result> <binding name="X"><uri>http://example.org/Paris</uri></binding> <binding name="Y"><literal>Paris</literal></binding> </result> <result> <binding name="X"><uri>http://example.org/Paris</uri></binding> <binding name="Y"><literal xml:lang="nl">Parijs</literal></binding> </result> ... </results></sparql>
Header
Results
Result formats
• RDF/XML• N3• JSON• TEXT
<sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head>
<variable name="x"/> <variable name="hpage"/>
</head> <results>
<result><binding name=”x">
<literal datatype=”…/XMLSchema#integer">30</literal>
</binding><binding name="hpage">
<uri>http://work.example.org/bob/</uri>
</binding>
</result></results></sparql>
Query Result forms
SELECT queries return variable bindings Do we need something else?
Statements from RDF original graph Data extraction
New statements derived from original data according to a specific need Data conversion, views over data
SPARQL CONSTRUCT queries
Construct-queries return RDF statements The query result is either a subgraph of the original graph, or a
transformed graph
PREFIX geo: <http://example.org/geo/>CONSTRUCT {?X geo:hasCapital ?Y }WHERE { ?X geo:hasCapital ?Y .
?Y geo:name "Amsterdam" }
Subgraph query:
hasCapitalNetherlands Amsterdam
SPARQL CONSTRUCT queries
Construct-queries return RDF statements The query result is either a subgraph of the original graph, or a
transformed graph
PREFIX geo: <http://example.org/geo/>PREFIX my: <http://example.org/myNS/>CONSTRUCT {?Y my:inCountry ?X}WHERE { ?X geo:hasCapital ?Y}
inCountry
Transformation query:
Netherlands Amsterdam
SPARQL queries
SELECT: table (variable bindings)select ?x where { … }
CONSTRUCT: graphconstruct { … } where { … }
ASK: yes/noask { … }
DESCRIBE: graphdescribe dbpedia:Amsterdam
Conclusion
• RDF is the data model• SPARQL is the query language• URIs enable data integration and information
enrichment
SPARQL Exercise
• http://github.com/tialaramex/sparql-query• Try it on any sparql query endpoint
RDF Conclusions revisited
• Make explicit statements about resources on the web
• Machine readable data– Machines know that these are statements– Machines know how statements relate resources– Machines can compare values– Machines can dereference URLs
Building Apps that use the Web of Data
A Simple View of a Web App
• Template with variables– <body> <p> $x </p></body>
• Fill in the variables from a data source– $x = $db->get(“name”)
• Parameters to query supplied by the user interface
• Hard part: building the right query, and using the right data structure
A Programmatic Approach
• LAMP stack• Use mySQL for data storage and query• Allows for procedural modification of data• Allows for integration of user input
What do we gain by this substitution?
APIs
• PHP: RAP – RDF– http://www.seasr.org/wp-content/plugins/
meandre/rdfapi-php/doc/• Python: RDFLib
– http://www.rdflib.net/• C: Redland
– http://librdf.org/• Java: Jena
– http://jena.sourceforge.net/
APIs
• PHP: RAP – RDF– http://www.seasr.org/wp-content/plugins/
meandre/rdfapi-php/doc/• Python: RDFLib
– http://www.rdflib.net/• C: Redland
– http://librdf.org/• Java: Jena
– http://jena.sourceforge.net/
SPARQL API
Graph API
SPARQL Revisited
How do we parse these results?
SPARQL APIs
• Build a string that contains the query• Execute the query on an endpoint• Results are returned as an array or iterator
over key-value pairs• Usually some sort of basic type mapping, e.g.
uri’s are turned into URI objects• Can access the variable names from the query
Example in PHP$url = “http://example.net:2020/sparql”;$query_str = “select * where {?s a ?o} limit 10;$client = ModelFactory::getSparqlClient(url);$query_obj = new ClientQuery();$query_obj->query($query_str);$results = $client->query($query_obj)
foreach($results as $line){ $value = $line[’?s']; if($value != "") echo $value->toString()."<br>";}
Retrieving Resources
• What if a resource is described outside of the database?
• What if you want to work with all the data about a resource?
• Solution:– Download (curl, urlget) the RDF graph of the
resource– Then what…
…dbowl:Country a rdfs:Class ; rdfs:label "countries" .
dbowl:capital a rdf:Property ; rdfs:label "countries capital" .… <http://localhost:2020/countries/USA> a dbowl:Country ; rdfs:label "countries #USA" ; dbowl:capital "Washington D.C." ,
<http://www.dbpedia.org/resource/Washington D.C.> ; vocab:hasName "USA" .
<http://localhost:2020/countries/Netherlands> a dbowl:Country ; rdfs:label "countries #Netherlands" ; dbowl:capital "Amsterdam" , <http://www.dbpedia.org/resource/Amsterdam> ; vocab:hasName "Netherlands" .
rdfs:label a rdf:Property .
<http://localhost:2020/countries/Germany> a dbowl:Country ; rdfs:label "countries #Germany" ; dbowl:capital "Berlin" , <http://www.dbpedia.org/resource/Berlin> ; vocab:hasName "Germany" .
Do we work with this directly?
RDF is a graph…so let’s work with graphs
Netherlands
“Netherlands”
Country
Db:Amsterda
m
Db:Belgium
Db-owl: City
Db-owl: PopulatedPlace/capitalrdfs:label
a
a
neighbours
a
@prefix Db: http://dbpedia.org/resource/@prefix Db-owl: http://dbpedia.org/ontology/
Graph Model API
• Model = load(file)• Model.getResource(URI)• Resource Objects
– StmtIter = listProperties()– StmtIter = getProperty(“property”)
• Statement– Resource = Statement.getSubject()– Resource = Statement.getPredicate()– Resource = Statement.getObject()
• Using these we can traverse the graph…
Note on Graphs
• Remember objects may be literals• They also support building your own RDF
model programmatically– Model.createResource(URL)– Resource.addProperty(value)
• Many APIs also have “sparqlesque” languages for querying your data
• This may be a way of mashing up your data
PHP Example
// Get the netherlands resource// don’t let create resource fool you$netherlands = $model->createResource($netherlandsURL);
// Retrieve the value of the neighbor property$statement = $netherlands->getProperty($neighbors);$value = $statement->getObject();
// List the Neighbors of the Netherlandsecho '<b>Neighbors of '.$netherlands->getLabel().':</b><BR>';foreach ($netherlands->listProperties($neighbors) as $country) { echo $country->getLabelObject().'<BR>';};
What about ontologies?
• PopulatedPlace– City
• Amsterdam
– Country• Netherlands• Belgium
Ontology Models
• Provides methods for querying using a specific vocabulary
• Generally, RDFS or OWL• Changes the perspective of a particular RDF
Graph• You can use models both approaches
simultaneously
Ontology API
• listNamedClassess()• listSubClass(Class)• getInstances(Class)• listProperties()• getDomain(Property)• getRange(Property)• getSubProperties(Property)
A note on ontology models
• The model allow for navigation and creation not reasoning
• Other inference- models can be used to perform reasoning– These may be expensive– Better to reasoning in a specialized store
When would you use an ontology model?When would you use a graph model?
Basic APIs
• Performing Queries– sparql
• Choosing the right data structure– Graph– Ontology
• But..– You still have to build the template– You still have to interact with the user– You still have to create the queries
Web Frameworks and RDF
• ActiveRDF– Ruby on Rails and RDF– http://www.activerdf.org/
• Django and RDF– http://semanticdjango.org/
Object RDF Mapping
• Equivalent to Object relational mapping– Multiple inheritance– Focus on properties
• Python– RDF Alchemy– Surfrdf
• Java– Elmo– jenabean
Other APIs
• Sindice.com– Semantic web search engine– API for finding thins on the semantic web
• Sameas.org– Expanding searches– Eliminating duplicates
• Yahoo Boss– Live search engine– Find web pages with RDFa descriptions
• OpenCalais– Entity extraction on text to linked data
Other cool things
• Javascript RDF parser– http://www.jibbering.com/rdf-parser/
• PHP RDF Classes– http://arc.semsol.org/
• Or just see…– http://esw.w3.org/topic/SemanticWebTools
Exposing Data on the Semantic Web
The Web of Data
MySource
Databases
Country Capital Remark Neighbor
Netherlands
Amsterdam “The Hague…
Belgium
…. … …. …
We want RDF…
@prefix : <http://example.org/countries#> . :Netherlands a :Country; rdfs:label "TheNetherlands" ; :capital :Amsterdam ; :neighbours :Germany, :Belgium .
Relational Terminology
Country Capital Remark Neighbor
Netherlands
Amsterdam “The Hague…
Belgium
…. … …. …
Country
Relation/Table Attribute
Tuple
Classes
• Table is a class• But instances may have classes…• Don’t forget about keys…
Country Capital Remark Neighbor
Netherlands
Amsterdam “The Hague…
Belgium
…. … …. …
Country
Relation/Table Attribute
Tuple
Properties
• Attributes = properties• Watch out for naming (country? or label?)• What are the domain and range?
Country Capital Remark Neighbor
Netherlands
Amsterdam “The Hague…
Belgium
…. … …. …
Country
Relation/Table Attribute
Tuple
Namespaces
• What is the domain of your application?• What is the scope of your application?• http://example.org/countries#
Country Capital Remark Neighbor
Netherlands
Amsterdam “The Hague…
Belgium
…. … …. …
Country
Relation/Table Attribute
Tuple
URIs and instances
• Tuple = instance• Objects = instances
Country Capital Remark Neighbor
Netherlands
Amsterdam “The Hague…
Belgium
…. … …. …
Country
Relation/Table Attribute
Tuple
An RDF Model
Netherlands
“Netherlands”
Country
Amsterdam
Belgium
City
has_capitalrdfs:label
a
a
neighbours
a
What have we gained?
• Clear definition of types• Explicit definitions of relations• Graph queries with SPARQL• But… we are still not connected to the linked
data cloud…
MySource
Linked data principles
• Use URIs as names of things• Use HTTP URIs so people can lookup stuff• Provide useful descriptions at your HTTP URIs• Include links to other URIs• See linkeddata.org
Use External Concepts and Properties
Netherlands
“Netherlands”
Country
Db:Amsterda
m
Db:Belgium
Db-owl: City
Db-owl: PopulatedPlace/capitalrdfs:label
a
a
neighbours
a
@prefix Db: http://dbpedia.org/resource/@prefix Db-owl: http://dbpedia.org/ontology/
Finding External Ontologies
From http://semanticweb.org/wiki/OntologyAlso see prefix.cc
What have we gained? (2)
• Clear definition of types• Explicit definitions of relations• Graph queries with SPARQL• An enriched data set!
Practically: Use a mapping tool
• Triplify – http://triplify.org• Squirrel Rdf -
http://jena.sourceforge.net/SquirrelRDF/• Virtuso RDF View• D2RQ -
http://www4.wiwiss.fu-berlin.de/bizer/d2rq/
D2RQ Architecture
Getting started with D2RQ
• Download and install D2RQ server– http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/
• d2r-server-0.7/doc/example• Go through the quick start• Load a database into Mysql
– Make one..– mysql < test.spl
A Simple Database
Country Capital
Netherlands Amsterdam
USA Washington D.C.
Germany Berlin
Countries
Creating a Mapping File
> generate-mapping -o mapping.n3 –u db-user -p db-password jdbc:mysql://localhost/test
> d2r-server mapping.n3
Browsing
RDF Output
Defines a lot fo
r us
Mapping File (Preamble)
@prefix map: <file:/Users/pgroth/Dropbox/develop/d2r-server-0.7/mapping.n3#> .@prefix db: <> .@prefix vocab: <http://localhost:2020/vocab/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .@prefix jdbc: <http://d2rq.org/terms/jdbc/> .
map:database a d2rq:Database;d2rq:jdbcDriver "com.mysql.jdbc.Driver";d2rq:jdbcDSN "jdbc:mysql://localhost/test";d2rq:username "root";d2rq:password “****";jdbc:autoReconnect "true";jdbc:zeroDateTimeBehavior "convertToNull";.
Defining Classes
• ClassMap defines classes• Attached to a particular database• Has a set of property bridges
# Table countriesmap:countries a d2rq:ClassMap;
d2rq:dataStorage map:database;d2rq:uriPattern "countries/@@countries.country|urlify@@";d2rq:class vocab:countries;d2rq:classDefinitionLabel "countries";.
Defining Properties
• Map database columns to RDF properties• Example: Attaches a property capital to all
countriesmap:countries_country a d2rq:PropertyBridge;
d2rq:belongsToClassMap map:countries;d2rq:property vocab:countries_country;d2rq:propertyDefinitionLabel "countries country";d2rq:column "countries.country";.
map:countries_capital a d2rq:PropertyBridge;d2rq:belongsToClassMap map:countries;d2rq:property vocab:countries_capital;d2rq:propertyDefinitionLabel "countries capital";d2rq:column "countries.capital";
Assigning identifiers
• countries/@@countries.country|urlify@@• 4 Mechanisms to assign identifiers to
instances in a database1. URI Pattern
– http://example.org/countries/@@countries.country@@
– @@ identifies the column to use – |urlify converts special characters for use in a URL
Assigning Identifiers (2)
2. Relative URIs- countries/@@countries.country|urlify@@- relative to the base URL
3. URI Columns- if the column already has a uri - use d2rq:uriColumns
4. Blank Nodes- d2rq:bNodeIdColumns- blank node per distinct set of values
Updating a Mapping File
• Mapping files do not automatically link to linked data or external ontologies
• Notice: @prefix vocab: <http://localhost:2020/vocab/resource/>
• See http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/– Many ways of making your mapping more
expressive
Ways to Connect
• Refer to external ontologies – @prefix dbowl: http://dbpedia.org/ontology/– d2rq:property dbowl:capital– d2rq:class dbowl:Country;
• Refer to external items– d2rq:uriPattern
http://www.dbpedia.org/resource/@@countries.capital@@;
Use SameAs
• Say that your concept is the “same as” another concept using owl:sameas
• See http://sameas.org/• Has a special meaning so that systems no that
your concept can be considered identical to another.
Results
Accessing the data
• Sparql Queries (on the fly)
• Dumping the data to RDF– ./dump-rdf –m mapping.n3 -f N3 -b
http://localhost:2020/ > test.nt
…dbowl:Country a rdfs:Class ; rdfs:label "countries" .
dbowl:capital a rdf:Property ; rdfs:label "countries capital" .… <http://localhost:2020/countries/USA> a dbowl:Country ; rdfs:label "countries #USA" ; dbowl:capital "Washington D.C." ,
<http://www.dbpedia.org/resource/Washington D.C.> ; vocab:hasName "USA" .
<http://localhost:2020/countries/Netherlands> a dbowl:Country ; rdfs:label "countries #Netherlands" ; dbowl:capital "Amsterdam" , <http://www.dbpedia.org/resource/Amsterdam> ; vocab:hasName "Netherlands" .
rdfs:label a rdf:Property .
<http://localhost:2020/countries/Germany> a dbowl:Country ; rdfs:label "countries #Germany" ; dbowl:capital "Berlin" , <http://www.dbpedia.org/resource/Berlin> ; vocab:hasName "Germany" .
Databases -> RDF
• Allows us to integrate existing databases with the Semantic Web
• Relational Model maps fairly directly to RDF– Caveats: No unique identifiers, lack of explicit
ontological structure• Mapping Systems work well• Experiment with D2RQ!
– It will be used in your final assignment