sparqling constraints for rdf michael schmidt, 20.03.2008 joint work with prof. georg lausen,...

48
SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Post on 19-Dec-2015

215 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

SPARQLing Constraints for RDF

Michael Schmidt, 20.03.2008joint work with Prof. Georg Lausen, Michael Meier

Page 2: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

About… Michael Schmidt

2001-2006: Studies of Applied Computer Science in Saarbrücken

2006: Started my PhD in Saarbrücken with Prof. Christoph Koch Focus on XML, XQuery, Streams

Since 2007: at Freiburg University with Prof. Georg Lausen Focus on SPARQL, RDF

Page 3: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Table of Contents

SPARQLing Constraints for RDF Constraints for RDF

Types of constraints Encoding of constraints in RDF Satisfiability

SPARQL in the context of constraints Extracting constraints with SPARQL Checking constraints with SPARQL Exploiting constraints: Semantic Query Optimization

SP2Bench: A SPARQL Performance Benchmark

Page 4: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

PART I

SPARQLing Constraints for RDF

Page 5: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

SPARQLing Constraints for RDF

RDF Data Format

•Machine-readable information

•Established in the Semantic Web

SPARQL Query Language

•W3C Recommendation since January

Constraints

•Primary and Foreign Keys

•Cardinality Constraints, …bases on

Page 6: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Why Constraints?

Restricting the state space of the database Maintenance of data consistency

(e.g. when data is updated) Semantic Query Optimization Better understanding of the data In our scenario: Translation of Relational

Schemata to RDF without loss of information

Page 7: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Our Contribution

Extension of RDF by constraints Key constraints, cardinality constraints, … Seamless integration into the RDF Framework

Study of the role of SPARQL in this context Checking constraints with SPARQL Specification of user-defined constraints Optimization of SPARQL queries under constraints

(Semantic Query Optimization)

Page 8: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The RDF Data Format

Three Types of Elements URIs: represent physical or logical resources Blank nodes: resources without fixed URI Literals: represent values

RDF Triples: (subject, predicate, object) subject U U B predicate U object U U B U L

Page 9: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Example RDF Triple

Subject Predicate Object

„Joe“

name

URI Literal

Graph Representation:

Person1 name

Person1 „Joe“

RDF Triple

Page 10: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

RDF Databases

RDF Databases are Collections of Triples

Currently no support for specification of primary/foreign key constraints

Person1 name„Joe“

knows

Person2name

„Pete“

rdf:typeStudent

ssn „1234“

„2345“ssn

Personrdfs:subClassOf

rdf:type

Page 11: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Mapping Relational Data to RDF

name faculty

Joe CS

Fred CS

matric name

11111 John

22222 Ed

taught_by name

Joe DB

Fred Web

c_id s_id

Fred 11111

Fred 22222

Teachers Students

Courses Participants

+ NOT NULL constraint

Page 12: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

A Naive Translation ApproachStudents

name

Teachers

Courses

t1 t2 s1 s2

c1 c2

“Joe“ “Fred““CS“ “CS“ “11111“ “22222““John“ “Ed“

“DB“ “Web“

namename name

name name

matric matric

facultyfaculty

taught_by taught_by

Participants

p1 p2

s_id s_id

c_idc_id

“Joe“ “Fred“ “22222““Fred““11111““Fred“

rdf:type

Page 13: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Improving the TranslationStudents

name

Teachers

Courses

t1 t2 s1 s2

c1 c2

Joe Fred“CS“ “CS“ 11111 22222“John“ “Ed“

“DB“ “Web“

namename name

name name

matric matric

facultyfaculty

taught_by

taught_by

Participants

p1 p2

s_ids_id

c_idc_id

rdf:type

Page 14: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Encoding Primary Key Constraints Encoding of constraints in the schema layer New namespace „rdfc“ RDF Bags

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

namefacultyfaculty

T_Key

rdfc:Key

rdf:_1name

rdfc:Key

rdf:Bag

Page 15: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

taught_by

Courses

c1 c2

“DB“ “Web“

name nametaught_by taught_by rdfc:FKey

name

T_Key

rdfc:Key

rdf:_1 namerdfc:Key

rdf:Bag

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

facultyfaculty

C_FKey

rdfc:FKey rdf:Bagrdfc:ref

rdf:_1

Page 16: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Other Types of Constraints

Let C, C1, C2 be classes and Qi, Ri properties Primary Keys

Key(C,[Q1,…Qn])

Foreign Keys

FKey(C1,[Q1,…Qn],C2,[R1,…Rn])

Cardinality Constraints

Min(C,n,R), Max(C,n,R) for n N

Functionality/Totality Constraints

Func(C,Q), Total(C,Q) Singleton Constraints: Single(C)

Page 17: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

RDFS Constraints

Let Ci denote classes, Qi denote properties Subclass Constraint

SubC(C1,C2)

Subproperty Constraint

SubP(Q1,Q2)

Property Domain/Range

PropD(Q,C), PropR(Q,C)

Restrict the state space of the database

No „axioms“ that are used for inferencing

Page 18: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Satisfiability

Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?

in general undecidable

Primary keys + Foreign Keys

Singleton

Max-Cardinality

Subclass + Subproperty

Property Domain + Property Range

always satisfiable

Page 19: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Satisfiability

Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?

Primary keys + Foreign Keys

Singleton

Max-Cardinality

Subclass + Subproperty

Property Domain + Property Range

Min-Cardinality

undecidable

in general undecidable

Page 20: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Satisfiability

Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?

Unary primary keys

Unary foreign keys

Min-Cardinality + Max-Cardinality

Subclass + Subproperty

Property Domain + Property Range

decidable in ExpTime

in general undecidable

Page 21: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The SPARQL Query Language

SELECT ?name ?facultyWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty.}

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

namefacultyfaculty

?name ?faculty

Joe “CS“

Fred “CS“

Operator AND („.“)

Page 22: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The SPARQL Query Language

SELECT ?name ?facultyWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“)}

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

namefacultyfaculty

?name ?faculty

Joe “CS“

Operator FILTER

Page 23: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The SPARQL Query Language

SELECT ?name ?faculty ?titleWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. OPTIONAL { ?teacher title ?title. }}

title

„Professor“

?name ?faculty ?title

Joe “CS“

Fred “CS“ “Professor“

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

namefacultyfaculty

Operator OPTIONAL

Page 24: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Extracting Primary Key Constraints

SELECT ?keyname ?class ?keyattWHERE { ?class rdfc:Key ?keyname. ?keyname rdf:type rdfc:Key. ?keyname ?bagrel ?keyatt. FILTER (?bagrel!=rdf:type)}

?keyname ?class ?keyatt

T_Key Teachers name

T_Key

rdfc:Key

rdf:_1 namerdfc:Key

rdf:Bag

Teachers

… …

Page 25: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Extracting Foreign Key Constraints

SELECT ?keyname ?class ?keyatt ?refWHERE { ?class rdfc:FKey ?keyname. ?keyname rdf:type rdfc:FKey. ?keyname ?bagrel ?keyatt. ?keyname rdfc:ref ?ref. FILTER (?bagrel!=rdf:type && ?bagrel!=rdfc:ref)} ORDER BY ?keyname

taught_byCoursesrdfc:FKey

T_Key

rdfc:Key

rdf:_1

name

rdfc:Key

rdf:Bag

Teachers

C_FKey

rdfc:FKey rdf:Bag

rdfc:ref

rdf:_1

?keyname ?class ?keyatt ?ref

C_FKey Courses taught_by T_Key

Page 26: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Use SPARQL „ASK“ query form (returns „yes“ exactly if query contains a result, no

otherwise)

Constraint checks possible for many natural constraints Primary Keys + Foreign Keys Cardinality Constraints …

Checking Constraints with SPARQL

A SPARQL query checks a constraint C if it returns yes for each graph that violates C, no otherwise.

Page 27: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Checking Constraints with SPARQL

Checking primary key constraints

ASK { ?x rdf:type C. ?y rdf:type C. ?x p1 ?p1; [...]; pn ?pn. ?y p1 ?p1; [...]; pn ?pn. FILTER (?x!=?y)}

Key(C,[p1,. . . ,pn])

Returns „yes“ exactly if constraint is violated.

Page 28: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Checking Constraints with SPARQL

Checking primary key constraints (example)

ASK { ?x rdf:type Teachers. ?y rdf:type Teachers. ?x name ?name. ?y name ?name FILTER (?x!=?y)}

Returns „no“ (i.e., constraint holds)

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

namefacultyfaculty

Page 29: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Checking Constraints with SPARQL

Checking foreign key constraints

ASK { ?x rdf:type C; p1 ?p1; [...]; pn ?pn. OPTIONAL { ?y rdf:type D; q1 ?p1; [...]; qn ?pn. } FILTER (!bound(?y))}

FKey(C,[p1,. . . ,pn],D,[q1,... qn])

Returns „yes“ exactly if constraint is violated.

Page 30: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Semantic Query Optimization

Idea: use constraint knowledge to find a more efficient query execution plan

Has been studied in the context of relational and datalog databases…

… and might now be applicable in the context of RDF and SPARQL

Page 31: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Semantic Query Optimization

SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }}

Page 32: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Students

name

Teachers

Courses

t1 t2 s1 s2

c1 c2

Joe Fred“CS“ “CS“ 11111 22222“John“ “Ed“

“DB“ “Web“

namename name

name name

matric matric

facultyfaculty

taught_by

taught_by

Participants

p1 p2

s_ids_id

c_idc_id

A Solution Candidate Subgraph

Page 33: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Semantic Query Optimization

SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }}

Key(Students,[matric])

FKey(Participants, [s_id], Student, [matric])

Total(Students,[name])

Page 34: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Semantic Query Optimization

SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. ?student rdf:type Students; matric ?studentmatric; name ?studentname.}

Key(Teacher, [name])

FKey(Courses, taught_by, Teacher, [name])

Page 35: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Semantic Query Optimization

SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?student rdf:type Students; matric ?studentmatric; name ?studentname.}

Other optimizations possible: Rewriting of filter expressions Elimination from redundant rdf:type specifications …

Page 36: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Future Work

Study of other types of constraints and the interaction between constraints

Development of a schematic approach to Semantic Query Optimization Mapping to SQL/Datalog? SPARQL-specific semantic optimizations?

Efficient constraint checking algorithms

Page 37: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

PART II

SP2B – A SPARQL Performance Benchmark

Page 38: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

PART II: SP2Bench

Up-to-date no benchmark for SPARQL has been proposed LUBM: focus on OWL and reasoning Loose collection of benchmark queries for LUBM

SP2B fills this gap Settled in the DBLP scenario Data generator for creating large arbitrarily large

datasets + 16 benchmark queries

Currently submitted for publication, will be made available online soon

Page 39: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The SP2Bench Data Generator

Creates bibliography documents similar to DBLP Mirrors vital key characteristics found in original

DBLP data Structure of entities (Articles, Journals, Books, …) Relations between authors Quantity of entities (development over time) Citation system

Combines the benefits of both a real-world scenario and the possibility to generate arbitrarily large documents.

Page 40: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The DBLP RDF Schema

sc sc sc

sc

sc

sc

sc

sc

sc

Page 41: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The SP2Bench Queries

Operate on top of the characteristics that are mirrored by the data generator

Designed to test… … typical SPARQL operators and combinations … SPARQL solution modifiers … existing (but also obvious future) optimizations … RDF data access patterns … the impact of indices on data … and many other characteristics such as result

size, different graph patterns, etc.

Page 42: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Benchmark Queries

SELECT ?yrWHERE { ?proc rdf:type bench:Journal. ?proc dc:title "Journal 1 (1940)"^^xsd:string. ?proc dcterms:issued ?yr.}

Simple Constant result size (exactly 1 result) Might be answered very fast with index

Q1

Page 43: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Benchmark QueriesSELECT DISTINCT ?person ?name Q5WHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person2. ?person foaf:name ?name. ?person2 foaf:name ?name2. FILTER(?name=?name2). }

Q5a

SELECT DISTINCT ?person ?nameWHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person. ?person foaf:name ?name. }

Q5b

Equivalent in our scenario

Tests implicit vs. explicit joins

We found that Q5a is much more challenging for current engines

Page 44: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Benchmark QueriesSELECT DISTINCT ?title Q7WHERE { ?class rdfs:subClassOf foaf:Document. ?doc rdf:type ?class. ?doc dc:title ?title. ?bag2 ?member2 ?doc. ?doc2 dcterms:references ?bag2. OPTIONAL { ?class3 rdfs:subClassOf foaf:Document. ?doc3 rdf:type ?class3. ?doc3 dcterms:references ?bag3. ?bag3 ?member3 ?doc. OPTIONAL { ?class4 rdfs:subClassOf foaf:Document. ?doc4 rdf:type ?class4. ?doc4 dcterms:references ?bag4. ?bag4 ?member4 ?doc3. } FILTER (!bound(?doc4)). }FILTER (!bound(?doc3)). }

Q7 Double Closed-

World-Negation

Returns all publications that are cited at least once, but only cited by cited publications

Page 45: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Benchmark Results

We tested several SPARQL engines ARQ Sesame Virtuoso …

Results demonstrate that … … there are differences between engines … there is still room for improvement in current

implementation … there is poor support for several SPARQL

specifics

Page 46: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

Thank you for your attention!• C. Bizer.D2R MAP-A Database to RDF Mapping Language. In WWW (Posters), 2003.• C.Bizer, R.Cyganiak, J. Garbers, and O. Maresch. D2RQ: Treading Non-RDF Relational Databases as Virtual RDF Graphs. User Manual and Language Specification.• J. J. King. QUIST: A System for Semantic Query Optimization in Relational Databases. Distributed systems, Vol. II, pages 287-294, 1986.• G. Lausen. Relational Databases in RDF. In Joint ODBIS & SWDB Workshop on Semantic Web, Ontologies, Databases, 2007. To appear.• B. Motik, I. Horrocks, and U. Sattler. Bridging the Gap Between OWL and Relational Databases, In WWW, pages 807-816, 2007.• J. Pérez, M. Arenas, and C. Gutierrez. Semantics and Complexity of SPARQL. In CoRR Technical Report cs.DB/0605124, 2006.

• Recourse Description Framework (RDF): Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-schema/. W3C Recommendation, February 10, 2004.• RDF Vocabulary Description Language 1.0: RDF Schema. http://www.w3.org/TR/rdf-schema/. W3C Recommendation, Febuary 10, 2004.• RDF Semantics.http://www.w3.org/TR/rdf-mt/. W3C Recommendation, February 10, 2004.• S.T. Shenoy and Z.M. Ozsoyoglu. A System for Semantic Query Optimization. In SIGMOD, pages 181-195, 1987.• SPAQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/. W3C Proposed Recommendation, November 12, 2007.• G.E. Weddell. A Theory of Functional Dependencies for Object-Oriented Data Models. In DOOD, pages 165-184, 1989.

Page 47: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

PART III

Additional Resources

Page 48: SPARQLing Constraints for RDF Michael Schmidt, 20.03.2008 joint work with Prof. Georg Lausen, Michael Meier

The SPARQL Query Language

Operator UNIONSELECT ?name ?facultyWHERE { { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“). } UNION { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Fred“). }}

?name ?faculty

Joe “CS“

Fred “CS“

name

Teachers

t1 t2

Joe Fred“CS“ “CS“

namefacultyfaculty