sparqling constraints for rdf michael schmidt, 20.03.2008 joint work with prof. georg lausen,...
Post on 19-Dec-2015
215 views
TRANSCRIPT
SPARQLing Constraints for RDF
Michael Schmidt, 20.03.2008joint work with Prof. Georg Lausen, Michael Meier
About… Michael Schmidt
2001-2006: Studies of Applied Computer Science in Saarbrücken
2006: Started my PhD in Saarbrücken with Prof. Christoph Koch Focus on XML, XQuery, Streams
Since 2007: at Freiburg University with Prof. Georg Lausen Focus on SPARQL, RDF
Table of Contents
SPARQLing Constraints for RDF Constraints for RDF
Types of constraints Encoding of constraints in RDF Satisfiability
SPARQL in the context of constraints Extracting constraints with SPARQL Checking constraints with SPARQL Exploiting constraints: Semantic Query Optimization
SP2Bench: A SPARQL Performance Benchmark
PART I
SPARQLing Constraints for RDF
SPARQLing Constraints for RDF
RDF Data Format
•Machine-readable information
•Established in the Semantic Web
SPARQL Query Language
•W3C Recommendation since January
Constraints
•Primary and Foreign Keys
•Cardinality Constraints, …bases on
Why Constraints?
Restricting the state space of the database Maintenance of data consistency
(e.g. when data is updated) Semantic Query Optimization Better understanding of the data In our scenario: Translation of Relational
Schemata to RDF without loss of information
Our Contribution
Extension of RDF by constraints Key constraints, cardinality constraints, … Seamless integration into the RDF Framework
Study of the role of SPARQL in this context Checking constraints with SPARQL Specification of user-defined constraints Optimization of SPARQL queries under constraints
(Semantic Query Optimization)
The RDF Data Format
Three Types of Elements URIs: represent physical or logical resources Blank nodes: resources without fixed URI Literals: represent values
RDF Triples: (subject, predicate, object) subject U U B predicate U object U U B U L
Example RDF Triple
Subject Predicate Object
„Joe“
name
URI Literal
Graph Representation:
Person1 name
Person1 „Joe“
RDF Triple
RDF Databases
RDF Databases are Collections of Triples
Currently no support for specification of primary/foreign key constraints
Person1 name„Joe“
knows
Person2name
„Pete“
rdf:typeStudent
ssn „1234“
„2345“ssn
Personrdfs:subClassOf
rdf:type
Mapping Relational Data to RDF
name faculty
Joe CS
Fred CS
matric name
11111 John
22222 Ed
taught_by name
Joe DB
Fred Web
c_id s_id
Fred 11111
Fred 22222
Teachers Students
Courses Participants
+ NOT NULL constraint
A Naive Translation ApproachStudents
name
Teachers
Courses
t1 t2 s1 s2
c1 c2
“Joe“ “Fred““CS“ “CS“ “11111“ “22222““John“ “Ed“
“DB“ “Web“
namename name
name name
matric matric
facultyfaculty
taught_by taught_by
Participants
p1 p2
s_id s_id
c_idc_id
“Joe“ “Fred“ “22222““Fred““11111““Fred“
rdf:type
Improving the TranslationStudents
name
Teachers
Courses
t1 t2 s1 s2
c1 c2
Joe Fred“CS“ “CS“ 11111 22222“John“ “Ed“
“DB“ “Web“
namename name
name name
matric matric
facultyfaculty
taught_by
taught_by
Participants
p1 p2
s_ids_id
c_idc_id
rdf:type
Encoding Primary Key Constraints Encoding of constraints in the schema layer New namespace „rdfc“ RDF Bags
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
T_Key
rdfc:Key
rdf:_1name
rdfc:Key
rdf:Bag
taught_by
Courses
c1 c2
“DB“ “Web“
name nametaught_by taught_by rdfc:FKey
name
T_Key
rdfc:Key
rdf:_1 namerdfc:Key
rdf:Bag
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
facultyfaculty
C_FKey
rdfc:FKey rdf:Bagrdfc:ref
rdf:_1
Other Types of Constraints
Let C, C1, C2 be classes and Qi, Ri properties Primary Keys
Key(C,[Q1,…Qn])
Foreign Keys
FKey(C1,[Q1,…Qn],C2,[R1,…Rn])
Cardinality Constraints
Min(C,n,R), Max(C,n,R) for n N
Functionality/Totality Constraints
Func(C,Q), Total(C,Q) Singleton Constraints: Single(C)
RDFS Constraints
Let Ci denote classes, Qi denote properties Subclass Constraint
SubC(C1,C2)
Subproperty Constraint
SubP(Q1,Q2)
Property Domain/Range
PropD(Q,C), PropR(Q,C)
Restrict the state space of the database
No „axioms“ that are used for inferencing
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
in general undecidable
Primary keys + Foreign Keys
Singleton
Max-Cardinality
Subclass + Subproperty
Property Domain + Property Range
always satisfiable
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
Primary keys + Foreign Keys
Singleton
Max-Cardinality
Subclass + Subproperty
Property Domain + Property Range
Min-Cardinality
undecidable
in general undecidable
Satisfiability
Given an RDF vocabulary and a set of constraints. Is there a non-empty RDF graph that satisfies the constraints?
Unary primary keys
Unary foreign keys
Min-Cardinality + Max-Cardinality
Subclass + Subproperty
Property Domain + Property Range
decidable in ExpTime
in general undecidable
The SPARQL Query Language
SELECT ?name ?facultyWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty.}
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
?name ?faculty
Joe “CS“
Fred “CS“
Operator AND („.“)
The SPARQL Query Language
SELECT ?name ?facultyWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“)}
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
?name ?faculty
Joe “CS“
Operator FILTER
The SPARQL Query Language
SELECT ?name ?faculty ?titleWHERE { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. OPTIONAL { ?teacher title ?title. }}
title
„Professor“
?name ?faculty ?title
Joe “CS“
Fred “CS“ “Professor“
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
Operator OPTIONAL
Extracting Primary Key Constraints
SELECT ?keyname ?class ?keyattWHERE { ?class rdfc:Key ?keyname. ?keyname rdf:type rdfc:Key. ?keyname ?bagrel ?keyatt. FILTER (?bagrel!=rdf:type)}
?keyname ?class ?keyatt
T_Key Teachers name
T_Key
rdfc:Key
rdf:_1 namerdfc:Key
rdf:Bag
Teachers
… …
Extracting Foreign Key Constraints
SELECT ?keyname ?class ?keyatt ?refWHERE { ?class rdfc:FKey ?keyname. ?keyname rdf:type rdfc:FKey. ?keyname ?bagrel ?keyatt. ?keyname rdfc:ref ?ref. FILTER (?bagrel!=rdf:type && ?bagrel!=rdfc:ref)} ORDER BY ?keyname
taught_byCoursesrdfc:FKey
T_Key
rdfc:Key
rdf:_1
name
rdfc:Key
rdf:Bag
Teachers
C_FKey
rdfc:FKey rdf:Bag
rdfc:ref
rdf:_1
?keyname ?class ?keyatt ?ref
C_FKey Courses taught_by T_Key
…
…
Use SPARQL „ASK“ query form (returns „yes“ exactly if query contains a result, no
otherwise)
Constraint checks possible for many natural constraints Primary Keys + Foreign Keys Cardinality Constraints …
Checking Constraints with SPARQL
A SPARQL query checks a constraint C if it returns yes for each graph that violates C, no otherwise.
Checking Constraints with SPARQL
Checking primary key constraints
ASK { ?x rdf:type C. ?y rdf:type C. ?x p1 ?p1; [...]; pn ?pn. ?y p1 ?p1; [...]; pn ?pn. FILTER (?x!=?y)}
Key(C,[p1,. . . ,pn])
Returns „yes“ exactly if constraint is violated.
Checking Constraints with SPARQL
Checking primary key constraints (example)
ASK { ?x rdf:type Teachers. ?y rdf:type Teachers. ?x name ?name. ?y name ?name FILTER (?x!=?y)}
Returns „no“ (i.e., constraint holds)
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty
Checking Constraints with SPARQL
Checking foreign key constraints
ASK { ?x rdf:type C; p1 ?p1; [...]; pn ?pn. OPTIONAL { ?y rdf:type D; q1 ?p1; [...]; qn ?pn. } FILTER (!bound(?y))}
FKey(C,[p1,. . . ,pn],D,[q1,... qn])
Returns „yes“ exactly if constraint is violated.
Semantic Query Optimization
Idea: use constraint knowledge to find a more efficient query execution plan
Has been studied in the context of relational and datalog databases…
… and might now be applicable in the context of RDF and SPARQL
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }}
Students
name
Teachers
Courses
t1 t2 s1 s2
c1 c2
Joe Fred“CS“ “CS“ 11111 22222“John“ “Ed“
“DB“ “Web“
namename name
name name
matric matric
facultyfaculty
taught_by
taught_by
Participants
p1 p2
s_ids_id
c_idc_id
A Solution Candidate Subgraph
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. OPTIONAL { ?student rdf:type Students; matric ?studentmatric; name ?studentname. }}
Key(Students,[matric])
FKey(Participants, [s_id], Student, [matric])
Total(Students,[name])
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?teacher rdf:type Teachers; name ?teachername. ?student rdf:type Students; matric ?studentmatric; name ?studentname.}
Key(Teacher, [name])
FKey(Courses, taught_by, Teacher, [name])
Semantic Query Optimization
SELECT ?teachername ?coursename ?studentnameWHERE { ?course rdf:type Courses; taught_by ?teachername; name ?coursename. ?participant rdf:type Participants; c_id ?teachername; s_id ?studentmatric. ?student rdf:type Students; matric ?studentmatric; name ?studentname.}
Other optimizations possible: Rewriting of filter expressions Elimination from redundant rdf:type specifications …
Future Work
Study of other types of constraints and the interaction between constraints
Development of a schematic approach to Semantic Query Optimization Mapping to SQL/Datalog? SPARQL-specific semantic optimizations?
Efficient constraint checking algorithms
PART II
SP2B – A SPARQL Performance Benchmark
PART II: SP2Bench
Up-to-date no benchmark for SPARQL has been proposed LUBM: focus on OWL and reasoning Loose collection of benchmark queries for LUBM
SP2B fills this gap Settled in the DBLP scenario Data generator for creating large arbitrarily large
datasets + 16 benchmark queries
Currently submitted for publication, will be made available online soon
The SP2Bench Data Generator
Creates bibliography documents similar to DBLP Mirrors vital key characteristics found in original
DBLP data Structure of entities (Articles, Journals, Books, …) Relations between authors Quantity of entities (development over time) Citation system
Combines the benefits of both a real-world scenario and the possibility to generate arbitrarily large documents.
The DBLP RDF Schema
sc sc sc
sc
sc
sc
sc
sc
sc
The SP2Bench Queries
Operate on top of the characteristics that are mirrored by the data generator
Designed to test… … typical SPARQL operators and combinations … SPARQL solution modifiers … existing (but also obvious future) optimizations … RDF data access patterns … the impact of indices on data … and many other characteristics such as result
size, different graph patterns, etc.
Benchmark Queries
SELECT ?yrWHERE { ?proc rdf:type bench:Journal. ?proc dc:title "Journal 1 (1940)"^^xsd:string. ?proc dcterms:issued ?yr.}
Simple Constant result size (exactly 1 result) Might be answered very fast with index
Q1
Benchmark QueriesSELECT DISTINCT ?person ?name Q5WHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person2. ?person foaf:name ?name. ?person2 foaf:name ?name2. FILTER(?name=?name2). }
Q5a
SELECT DISTINCT ?person ?nameWHERE { ?article rdf:type bench:Article. ?article dc:creator ?person. ?inproc rdf:type bench:Inproceedings. ?inproc dc:creator ?person. ?person foaf:name ?name. }
Q5b
Equivalent in our scenario
Tests implicit vs. explicit joins
We found that Q5a is much more challenging for current engines
Benchmark QueriesSELECT DISTINCT ?title Q7WHERE { ?class rdfs:subClassOf foaf:Document. ?doc rdf:type ?class. ?doc dc:title ?title. ?bag2 ?member2 ?doc. ?doc2 dcterms:references ?bag2. OPTIONAL { ?class3 rdfs:subClassOf foaf:Document. ?doc3 rdf:type ?class3. ?doc3 dcterms:references ?bag3. ?bag3 ?member3 ?doc. OPTIONAL { ?class4 rdfs:subClassOf foaf:Document. ?doc4 rdf:type ?class4. ?doc4 dcterms:references ?bag4. ?bag4 ?member4 ?doc3. } FILTER (!bound(?doc4)). }FILTER (!bound(?doc3)). }
Q7 Double Closed-
World-Negation
Returns all publications that are cited at least once, but only cited by cited publications
Benchmark Results
We tested several SPARQL engines ARQ Sesame Virtuoso …
Results demonstrate that … … there are differences between engines … there is still room for improvement in current
implementation … there is poor support for several SPARQL
specifics
Thank you for your attention!• C. Bizer.D2R MAP-A Database to RDF Mapping Language. In WWW (Posters), 2003.• C.Bizer, R.Cyganiak, J. Garbers, and O. Maresch. D2RQ: Treading Non-RDF Relational Databases as Virtual RDF Graphs. User Manual and Language Specification.• J. J. King. QUIST: A System for Semantic Query Optimization in Relational Databases. Distributed systems, Vol. II, pages 287-294, 1986.• G. Lausen. Relational Databases in RDF. In Joint ODBIS & SWDB Workshop on Semantic Web, Ontologies, Databases, 2007. To appear.• B. Motik, I. Horrocks, and U. Sattler. Bridging the Gap Between OWL and Relational Databases, In WWW, pages 807-816, 2007.• J. Pérez, M. Arenas, and C. Gutierrez. Semantics and Complexity of SPARQL. In CoRR Technical Report cs.DB/0605124, 2006.
• Recourse Description Framework (RDF): Concepts and Abstract Syntax. http://www.w3.org/TR/rdf-schema/. W3C Recommendation, February 10, 2004.• RDF Vocabulary Description Language 1.0: RDF Schema. http://www.w3.org/TR/rdf-schema/. W3C Recommendation, Febuary 10, 2004.• RDF Semantics.http://www.w3.org/TR/rdf-mt/. W3C Recommendation, February 10, 2004.• S.T. Shenoy and Z.M. Ozsoyoglu. A System for Semantic Query Optimization. In SIGMOD, pages 181-195, 1987.• SPAQL Query Language for RDF. http://www.w3.org/TR/rdf-sparql-query/. W3C Proposed Recommendation, November 12, 2007.• G.E. Weddell. A Theory of Functional Dependencies for Object-Oriented Data Models. In DOOD, pages 165-184, 1989.
PART III
Additional Resources
The SPARQL Query Language
Operator UNIONSELECT ?name ?facultyWHERE { { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Joe“). } UNION { ?teacher rdf:type Teachers. ?teacher name ?name. ?teacher faculty ?faculty. FILTER (?name=„Fred“). }}
?name ?faculty
Joe “CS“
Fred “CS“
name
Teachers
t1 t2
Joe Fred“CS“ “CS“
namefacultyfaculty