rdf for developers paul groth thanks to eyal oren, stefan schlobach for slides

Post on 26-Dec-2015

227 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

RDF FOR DEVELOPERSPaul Groth

Thanks to Eyal Oren, Stefan Schlobach for slides

Contents

• The Web of Data• A Data Model• A Query Language

• Query Time…• Building Apps• Exposing Data

Why can’t we query all the information on the Web as a database?

Data Interoperability on the Web• Different formats, different structures, different

vocabularies, different concepts, different meaning

• Data should be structured (not ASCII)• Structure should be data-oriented (not HTML)• Meaning of data should be clear (not XML)• Data should have standard APIs (not Flickr)• Reusable mappings between data are needed

(not XSLT)

What is the Semantic Web?“There is lots of data we all use every day, and it’s not

part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar? No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.”

Tim Berners-Lee

The Web of Data

Enrich your data

Standard Data Model• RDF: basic data format (triples form hypergraph)

– john likes mary .– isbn:10993 dc:title "The Pelican Brief" .

• RDFS : simple schema language (subclass, subproperty)– dc:title rdfs:subPropertyOf rdfs:label .– Jeep isa Car .

• OWL: rich schema language (constraints, relations)– likes isa owl:symmetricProperty .

• RDF allows interlinked graphs of arbitrary structure • RDFS and OWL allow inferencing of implicit information

Standard API: SPARQL

• Query language for RDF– Based on existing ideas (e.g. SQL)– Supported widely

• Standard query protocol– How to send a query over HTTP– How to respond over HTTP

RDF BASICS

RDF Data Model: Summary

• Resources • Triples

– Statements: typed links between triples– May involve literals as property values

• Graph– Set of Statements– Interlinked by reusing same URIs acrossed triples

RDF Syntax

• Several Different Formats– Standard RDF/XML– NTriples– Turtle

SPARQL

SQL?

Formulate a query on the relational model students(name, age, address)

Structured Query Language (SQL)SELECT name data neededFROM student data sourceWHERE age>20 data constraint

name age address

Alice 21 Amsterdam

SPARQL

Standard RDF query language based on existing ideas standardised by W3C widely supported

Standard RDF query protocol how to send a query over HTTP how to respond over HTTP

SPARQL Query Syntax

• SPARQL uses a select-from-where inspired syntax (like SQL):

select: the entities (variables) you want to returnSELECT ?city

from: the data source (RDF dataset)FROM <http://example.org/geo.rdf>

where: the (sub)graph you want to get the information fromWHERE {?city geo:areacode “010” .}

Including additional constraints on objects, using operatorsWHERE {?city geo:areacode ?c. FILTER (?c > 010)}

prologue: namespace informationPREFIX geo: <http://example.org/geo.rdf#>

SPARQL Query Syntax

PREFIX geo: <http://example.org/geo/>

SELECT ?cityFROM <http://example.org/geoData.rdf>WHERE {?city geo:areacode ?c .

FILTER (?c > 010)}

SPARQL Graph Patterns

• The core of SPARQL WHERE clause specifies graph pattern

pattern should be matched pattern can match more than once

Graph pattern: an RDF graph with some nodes/edges as variables

hasCapital? ?

type

EuropeanCountry “020”^^xsd:integer

?

Basis: triple patterns Triples with one/more variables Turtle syntax

?X geo:hasCapital geo:Amsterdam ?X geo:hasCapital ?Y ?X geo:areacode "020" ?X ?P ?Y

All of them match this graph:

hasCapitalNetherlands Amsterdam “020”

areacode

Basis: triple pattern

• A very basic query

PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { ?X geo:hasCapital ?Y .}

Conjunctions: several patterns

• A pattern with several graphs, all must match

PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { {?X geo:hasCapital ?Y }

{?Y geo:areacode "020" } }

PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { ?X geo:hasCapital ?Y .

?Y geo:areacode "020" . }

equivalent to

Conjunctions: several patterns

• A pattern with several graphs, all must match

PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { {?X geo:hasCapital ?Y }

{?Y geo:areacode "020" } }

PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geoData.rdf>WHERE { ?X geo:hasCapital [ geo:areacode "020" ].}

equivalent to

Note: Turtle syntax again

?X geo:name ?Y ; geo:areacode ?Z . ?X geo:name ?Y . ?X geo:areacode ?Z . ?country geo:capital [ geo:name

“Amsterdam” ] .

Alternative Graphs: UNION

• A pattern with several graphs, at least one should match

PREFIX geo: <http://example.org/geo/>SELECT ?cityWHERE { { ?city geo:name "Parijs"@nl } UNION { ?city geo:name "Paris"@fr .}}

Optional Graphs

RDF is semi-structured Even when the schema says some object can have a particular

property, it may not always be present in the data Example: persons can have names and email addresses, but Frank is

a person without a known email address

email

nameperson001

“aisaac@few.vu.nl”

“Antoine”

nameperson002 “Frank”

Optional Graphs (2)

“Give me all people with first names, and if known their email address”

An OPTIONAL graph expression is needed

PREFIX : <http://example.org/my#>SELECT ?person ?name ?emailWHERE { ?person :name ?name . OPTIONAL { ?person :email ?email }}

Testing values of nodes• Tests in FILTER clause have to be validated

for matching subgraphs

RDF model-related operators isLiteral(?aNode) isURI(?aNode) STR(?aResource)

Interest of STR?SELECT ?X ?NWHERE { ?X ?P ?N .

FILTER (STR(?P)="areacode") } For resources with names only partly known For literals with unknown language tags

Testing values of nodes

• Tests in FILTER clause Comparison:

?X <= ?Y, ?Z < 20, ?Z = ?Y, etc. Arithmetic operators

?X + ?Y, etc. String matching using regular expressions

REGEX (?X,"netherlands","i") matches "The Netherlands"

PREFIX geo: <http://example.org/geo/>SELECT ?X ?NWHERE { ?X geo:name ?N .

FILTER REGEX(STR(?N),"dam") }

Filtering results

Tests in FILTER clause Boolean combination of these test expressions

&& (and), || (or), ! (not) (?Y > 10 && ?Y < 30)

|| !REGEX(?Z,"Rott")

PREFIX geo: <http://example.org/geo/>SELECT ?XFROM <http://example.org/geo.rdf>WHERE {?X geo:areacode ?Y ;

geo:name ?Z . FILTER ((?Y > 10 && ?Y < 30) || !REGEX(STR(?X),"Rott")) }

Boolean comparisons and datatypes

Reminder: RDF has basic datatypes for literals XML Schema datatypes:xsd:integer, xsd:float,

xsd:string, etc. Datatypes can be used in value

comparison X < “21”^^xsd:integer

and be obtained from literals DATATYPE(?aLiteral)

Solution modifiers

ORDER BYSELECT ?dog ?age

WHERE { ?dog a Dog ; ?dog :age ?age . } ORDER BY DESC(?age)

LIMITSELECT ?dog ?ageWHERE { ?dog a Dog ; ?dog :age ?age . }ORDER BY ?dogLIMIT 10

SELECT Query Results

SPARQL SELECT queries return solutions that consist of variable bindings For each variable in the query, it gives a value

(or a list of values). The result is a table, where each column

represents a variable and each row a combination of variable bindings

Query result: example Query: “return all countries with the cities

they contain, and their areacodes, if known”

Result (as table of bindings):

X Y Z

Netherlands Amsterdam “020”

Netherlands DenHaag “070”

PREFIX geo: <http://example.org/geo/>SELECT ?X ?Y ?ZWHERE { ?X geo:containsCity ?Y.

OPTIONAL {?Y geo:areacode ?Z} }

SELECT Query results: format Query: return all capital cities

Results as an XML document :

PREFIX geo: <http://example.org/geo/>SELECT ?X ?YWHERE { ?X geo:name ?Y .}

<sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head> <variable name="X"/> <variable name="Y"/> </head> <results> <result> <binding name="X"><uri>http://example.org/Paris</uri></binding> <binding name="Y"><literal>Paris</literal></binding> </result> <result> <binding name="X"><uri>http://example.org/Paris</uri></binding> <binding name="Y"><literal xml:lang="nl">Parijs</literal></binding> </result> ... </results></sparql>

Header

Results

Result formats

• RDF/XML• N3• JSON• TEXT

<sparql xmlns="http://www.w3.org/2005/sparql-results#"> <head>

<variable name="x"/> <variable name="hpage"/>

</head> <results>

<result><binding name=”x">

<literal datatype=”…/XMLSchema#integer">30</literal>

</binding><binding name="hpage">

<uri>http://work.example.org/bob/</uri>

</binding>

</result></results></sparql>

Query Result forms

SELECT queries return variable bindings Do we need something else?

Statements from RDF original graph Data extraction

New statements derived from original data according to a specific need Data conversion, views over data

SPARQL CONSTRUCT queries

Construct-queries return RDF statements The query result is either a subgraph of the original graph, or a

transformed graph

PREFIX geo: <http://example.org/geo/>CONSTRUCT {?X geo:hasCapital ?Y }WHERE { ?X geo:hasCapital ?Y .

?Y geo:name "Amsterdam" }

Subgraph query:

hasCapitalNetherlands Amsterdam

SPARQL CONSTRUCT queries

Construct-queries return RDF statements The query result is either a subgraph of the original graph, or a

transformed graph

PREFIX geo: <http://example.org/geo/>PREFIX my: <http://example.org/myNS/>CONSTRUCT {?Y my:inCountry ?X}WHERE { ?X geo:hasCapital ?Y}

inCountry

Transformation query:

Netherlands Amsterdam

SPARQL queries

SELECT: table (variable bindings)select ?x where { … }

CONSTRUCT: graphconstruct { … } where { … }

ASK: yes/noask { … }

DESCRIBE: graphdescribe dbpedia:Amsterdam

Conclusion

• RDF is the data model• SPARQL is the query language• URIs enable data integration and information

enrichment

SPARQL Exercise

• http://github.com/tialaramex/sparql-query• Try it on any sparql query endpoint

RDF Conclusions revisited

• Make explicit statements about resources on the web

• Machine readable data– Machines know that these are statements– Machines know how statements relate resources– Machines can compare values– Machines can dereference URLs

Building Apps that use the Web of Data

A Simple View of a Web App

• Template with variables– <body> <p> $x </p></body>

• Fill in the variables from a data source– $x = $db->get(“name”)

• Parameters to query supplied by the user interface

• Hard part: building the right query, and using the right data structure

A Programmatic Approach

• LAMP stack• Use mySQL for data storage and query• Allows for procedural modification of data• Allows for integration of user input

What do we gain by this substitution?

APIs

• PHP: RAP – RDF– http://www.seasr.org/wp-content/plugins/

meandre/rdfapi-php/doc/• Python: RDFLib

– http://www.rdflib.net/• C: Redland

– http://librdf.org/• Java: Jena

– http://jena.sourceforge.net/

APIs

• PHP: RAP – RDF– http://www.seasr.org/wp-content/plugins/

meandre/rdfapi-php/doc/• Python: RDFLib

– http://www.rdflib.net/• C: Redland

– http://librdf.org/• Java: Jena

– http://jena.sourceforge.net/

SPARQL API

Graph API

SPARQL Revisited

How do we parse these results?

SPARQL APIs

• Build a string that contains the query• Execute the query on an endpoint• Results are returned as an array or iterator

over key-value pairs• Usually some sort of basic type mapping, e.g.

uri’s are turned into URI objects• Can access the variable names from the query

Example in PHP$url = “http://example.net:2020/sparql”;$query_str = “select * where {?s a ?o} limit 10;$client = ModelFactory::getSparqlClient(url);$query_obj = new ClientQuery();$query_obj->query($query_str);$results = $client->query($query_obj)

foreach($results as $line){  $value = $line[’?s'];  if($value != "")     echo $value->toString()."<br>";}

Retrieving Resources

• What if a resource is described outside of the database?

• What if you want to work with all the data about a resource?

• Solution:– Download (curl, urlget) the RDF graph of the

resource– Then what…

…dbowl:Country a rdfs:Class ; rdfs:label "countries" .

dbowl:capital a rdf:Property ; rdfs:label "countries capital" .… <http://localhost:2020/countries/USA> a dbowl:Country ; rdfs:label "countries #USA" ; dbowl:capital "Washington D.C." ,

<http://www.dbpedia.org/resource/Washington D.C.> ; vocab:hasName "USA" .

<http://localhost:2020/countries/Netherlands> a dbowl:Country ; rdfs:label "countries #Netherlands" ; dbowl:capital "Amsterdam" , <http://www.dbpedia.org/resource/Amsterdam> ; vocab:hasName "Netherlands" .

rdfs:label a rdf:Property .

<http://localhost:2020/countries/Germany> a dbowl:Country ; rdfs:label "countries #Germany" ; dbowl:capital "Berlin" , <http://www.dbpedia.org/resource/Berlin> ; vocab:hasName "Germany" .

Do we work with this directly?

RDF is a graph…so let’s work with graphs

Netherlands

“Netherlands”

Country

Db:Amsterda

m

Db:Belgium

Db-owl: City

Db-owl: PopulatedPlace/capitalrdfs:label

a

a

neighbours

a

@prefix Db: http://dbpedia.org/resource/@prefix Db-owl: http://dbpedia.org/ontology/

Graph Model API

• Model = load(file)• Model.getResource(URI)• Resource Objects

– StmtIter = listProperties()– StmtIter = getProperty(“property”)

• Statement– Resource = Statement.getSubject()– Resource = Statement.getPredicate()– Resource = Statement.getObject()

• Using these we can traverse the graph…

Note on Graphs

• Remember objects may be literals• They also support building your own RDF

model programmatically– Model.createResource(URL)– Resource.addProperty(value)

• Many APIs also have “sparqlesque” languages for querying your data

• This may be a way of mashing up your data

PHP Example

// Get the netherlands resource// don’t let create resource fool you$netherlands = $model->createResource($netherlandsURL);

// Retrieve the value of the neighbor property$statement = $netherlands->getProperty($neighbors);$value = $statement->getObject();

// List the Neighbors of the Netherlandsecho '<b>Neighbors of '.$netherlands->getLabel().':</b><BR>';foreach ($netherlands->listProperties($neighbors) as $country) {     echo $country->getLabelObject().'<BR>';};

What about ontologies?

• PopulatedPlace– City

• Amsterdam

– Country• Netherlands• Belgium

Ontology Models

• Provides methods for querying using a specific vocabulary

• Generally, RDFS or OWL• Changes the perspective of a particular RDF

Graph• You can use models both approaches

simultaneously

Ontology API

• listNamedClassess()• listSubClass(Class)• getInstances(Class)• listProperties()• getDomain(Property)• getRange(Property)• getSubProperties(Property)

A note on ontology models

• The model allow for navigation and creation not reasoning

• Other inference- models can be used to perform reasoning– These may be expensive– Better to reasoning in a specialized store

When would you use an ontology model?When would you use a graph model?

Basic APIs

• Performing Queries– sparql

• Choosing the right data structure– Graph– Ontology

• But..– You still have to build the template– You still have to interact with the user– You still have to create the queries

Web Frameworks and RDF

• ActiveRDF– Ruby on Rails and RDF– http://www.activerdf.org/

• Django and RDF– http://semanticdjango.org/

Object RDF Mapping

• Equivalent to Object relational mapping– Multiple inheritance– Focus on properties

• Python– RDF Alchemy– Surfrdf

• Java– Elmo– jenabean

Other APIs

• Sindice.com– Semantic web search engine– API for finding thins on the semantic web

• Sameas.org– Expanding searches– Eliminating duplicates

• Yahoo Boss– Live search engine– Find web pages with RDFa descriptions

• OpenCalais– Entity extraction on text to linked data

Other cool things

• Javascript RDF parser– http://www.jibbering.com/rdf-parser/

• PHP RDF Classes– http://arc.semsol.org/

• Or just see…– http://esw.w3.org/topic/SemanticWebTools

Exposing Data on the Semantic Web

The Web of Data

MySource

Databases

Country Capital Remark Neighbor

Netherlands

Amsterdam “The Hague…

Belgium

…. … …. …

We want RDF…

@prefix : <http://example.org/countries#> . :Netherlands a :Country; rdfs:label "TheNetherlands" ; :capital :Amsterdam ; :neighbours :Germany, :Belgium .

Relational Terminology

Country Capital Remark Neighbor

Netherlands

Amsterdam “The Hague…

Belgium

…. … …. …

Country

Relation/Table Attribute

Tuple

Classes

• Table is a class• But instances may have classes…• Don’t forget about keys…

Country Capital Remark Neighbor

Netherlands

Amsterdam “The Hague…

Belgium

…. … …. …

Country

Relation/Table Attribute

Tuple

Properties

• Attributes = properties• Watch out for naming (country? or label?)• What are the domain and range?

Country Capital Remark Neighbor

Netherlands

Amsterdam “The Hague…

Belgium

…. … …. …

Country

Relation/Table Attribute

Tuple

Namespaces

• What is the domain of your application?• What is the scope of your application?• http://example.org/countries#

Country Capital Remark Neighbor

Netherlands

Amsterdam “The Hague…

Belgium

…. … …. …

Country

Relation/Table Attribute

Tuple

URIs and instances

• Tuple = instance• Objects = instances

Country Capital Remark Neighbor

Netherlands

Amsterdam “The Hague…

Belgium

…. … …. …

Country

Relation/Table Attribute

Tuple

An RDF Model

Netherlands

“Netherlands”

Country

Amsterdam

Belgium

City

has_capitalrdfs:label

a

a

neighbours

a

What have we gained?

• Clear definition of types• Explicit definitions of relations• Graph queries with SPARQL• But… we are still not connected to the linked

data cloud…

MySource

Linked data principles

• Use URIs as names of things• Use HTTP URIs so people can lookup stuff• Provide useful descriptions at your HTTP URIs• Include links to other URIs• See linkeddata.org

Use External Concepts and Properties

Netherlands

“Netherlands”

Country

Db:Amsterda

m

Db:Belgium

Db-owl: City

Db-owl: PopulatedPlace/capitalrdfs:label

a

a

neighbours

a

@prefix Db: http://dbpedia.org/resource/@prefix Db-owl: http://dbpedia.org/ontology/

Finding External Ontologies

From http://semanticweb.org/wiki/OntologyAlso see prefix.cc

What have we gained? (2)

• Clear definition of types• Explicit definitions of relations• Graph queries with SPARQL• An enriched data set!

Practically: Use a mapping tool

• Triplify – http://triplify.org• Squirrel Rdf -

http://jena.sourceforge.net/SquirrelRDF/• Virtuso RDF View• D2RQ -

http://www4.wiwiss.fu-berlin.de/bizer/d2rq/

D2RQ Architecture

Getting started with D2RQ

• Download and install D2RQ server– http://www4.wiwiss.fu-berlin.de/bizer/d2r-server/

• d2r-server-0.7/doc/example• Go through the quick start• Load a database into Mysql

– Make one..– mysql < test.spl

A Simple Database

Country Capital

Netherlands Amsterdam

USA Washington D.C.

Germany Berlin

Countries

Creating a Mapping File

> generate-mapping -o mapping.n3 –u db-user -p db-password jdbc:mysql://localhost/test

> d2r-server mapping.n3

Browsing

RDF Output

Defines a lot fo

r us

Mapping File (Preamble)

@prefix map: <file:/Users/pgroth/Dropbox/develop/d2r-server-0.7/mapping.n3#> .@prefix db: <> .@prefix vocab: <http://localhost:2020/vocab/resource/> .@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .@prefix d2rq: <http://www.wiwiss.fu-berlin.de/suhl/bizer/D2RQ/0.1#> .@prefix jdbc: <http://d2rq.org/terms/jdbc/> .

map:database a d2rq:Database;d2rq:jdbcDriver "com.mysql.jdbc.Driver";d2rq:jdbcDSN "jdbc:mysql://localhost/test";d2rq:username "root";d2rq:password “****";jdbc:autoReconnect "true";jdbc:zeroDateTimeBehavior "convertToNull";.

Defining Classes

• ClassMap defines classes• Attached to a particular database• Has a set of property bridges

# Table countriesmap:countries a d2rq:ClassMap;

d2rq:dataStorage map:database;d2rq:uriPattern "countries/@@countries.country|urlify@@";d2rq:class vocab:countries;d2rq:classDefinitionLabel "countries";.

Defining Properties

• Map database columns to RDF properties• Example: Attaches a property capital to all

countriesmap:countries_country a d2rq:PropertyBridge;

d2rq:belongsToClassMap map:countries;d2rq:property vocab:countries_country;d2rq:propertyDefinitionLabel "countries country";d2rq:column "countries.country";.

map:countries_capital a d2rq:PropertyBridge;d2rq:belongsToClassMap map:countries;d2rq:property vocab:countries_capital;d2rq:propertyDefinitionLabel "countries capital";d2rq:column "countries.capital";

Assigning identifiers

• countries/@@countries.country|urlify@@• 4 Mechanisms to assign identifiers to

instances in a database1. URI Pattern

– http://example.org/countries/@@countries.country@@

– @@ identifies the column to use – |urlify converts special characters for use in a URL

Assigning Identifiers (2)

2. Relative URIs- countries/@@countries.country|urlify@@- relative to the base URL

3. URI Columns- if the column already has a uri - use d2rq:uriColumns

4. Blank Nodes- d2rq:bNodeIdColumns- blank node per distinct set of values

Updating a Mapping File

• Mapping files do not automatically link to linked data or external ontologies

• Notice: @prefix vocab: <http://localhost:2020/vocab/resource/>

• See http://www4.wiwiss.fu-berlin.de/bizer/d2rq/spec/– Many ways of making your mapping more

expressive

Ways to Connect

• Refer to external ontologies – @prefix dbowl: http://dbpedia.org/ontology/– d2rq:property dbowl:capital– d2rq:class dbowl:Country;

• Refer to external items– d2rq:uriPattern

http://www.dbpedia.org/resource/@@countries.capital@@;

Use SameAs

• Say that your concept is the “same as” another concept using owl:sameas

• See http://sameas.org/• Has a special meaning so that systems no that

your concept can be considered identical to another.

Results

Accessing the data

• Sparql Queries (on the fly)

• Dumping the data to RDF– ./dump-rdf –m mapping.n3 -f N3 -b

http://localhost:2020/ > test.nt

…dbowl:Country a rdfs:Class ; rdfs:label "countries" .

dbowl:capital a rdf:Property ; rdfs:label "countries capital" .… <http://localhost:2020/countries/USA> a dbowl:Country ; rdfs:label "countries #USA" ; dbowl:capital "Washington D.C." ,

<http://www.dbpedia.org/resource/Washington D.C.> ; vocab:hasName "USA" .

<http://localhost:2020/countries/Netherlands> a dbowl:Country ; rdfs:label "countries #Netherlands" ; dbowl:capital "Amsterdam" , <http://www.dbpedia.org/resource/Amsterdam> ; vocab:hasName "Netherlands" .

rdfs:label a rdf:Property .

<http://localhost:2020/countries/Germany> a dbowl:Country ; rdfs:label "countries #Germany" ; dbowl:capital "Berlin" , <http://www.dbpedia.org/resource/Berlin> ; vocab:hasName "Germany" .

Databases -> RDF

• Allows us to integrate existing databases with the Semantic Web

• Relational Model maps fairly directly to RDF– Caveats: No unique identifiers, lack of explicit

ontological structure• Mapping Systems work well• Experiment with D2RQ!

– It will be used in your final assignment

top related