release 0.1a original contributors - read the...

rdfextras DocumentationRelease 0.1a

Original contributors

May 02, 2015

Contents

1 Plug-ins Overview 31.1 SPARQL query processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Stores . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 321.3 Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 681.4 Utils . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71

2 Introduction to basic tasks in rdflib 772.1 Parsing RDF into rdflib graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 772.2 Using SPARQL to query an rdflib 3 graph . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 782.3 Using MySQL as a triple store with rdflib/rdfextras . . . . . . . . . . . . . . . . . . . . . . . . . . . 792.4 Transitive Traversal . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 802.5 Working with RDFLib and RDFExtras, the basics . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

3 Techniques 913.1 Extending SPARQL Basic Graph Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

4 Epydoc API docs 93

5 Indices and tables 95

Python Module Index 97

i

rdfextras Documentation, Release 0.1a

RDFExtras is a collection of packages and plug-ins that provide extra functionality based on RDFLib 3. The commondenominator is “non-core-rdflib”.

The main RDFExtras project acts as a focal point for RDFLib-associated packages and plug-ins with distinct uses,such as SPARQL query processors (numbering one, thus far), commandline tools, serializers/parsers, experimental orunmaintained stores and similar.

Warning: The rdfextras packages are to be considered unstable in general. Useful, sometimes near core, but notcurrently guaranteed never to be renamed, refactored, reshuffled or redesigned.

Contents 1


2 Contents

CHAPTER 1

Plug-ins Overview

The current set of RDFLib and RDFExtras plug-ins includes RDF parsers, serializers, stores and the “sparql-p”SPARQL query processor:

1.1 SPARQL query processor

The pure Python no-sql SPARQL implementation bits that were in the RDFLib development trunk are now inrdfextras.sparql.

This “default” SPARQL implementation has been developed from the original sparql-p implementation (by IvanHerman, Dan Krech and Michel Pelletier) and over time has evolved into a full implementation of the W3C SPARQL

3


Algebra, providing coverage for the full SPARQL grammar including all combinations of GRAPH. The implementationincludes unit testing and has been run against the new DAWG testsuite.

May 02, 2015

1.1.1 “sparql-p” (default) SPARQL implementation

Originally, on Wednesday 24 August, 2005:

rdflib and SPARQL by Michel Pelletier:

As some of you know rdflib has been slowly growing SPARQL support. It started when Ivan Hermanfrom the W3C implemented the SPARQL query logic for rdflib and contributed it back to us. The bulk ofhis work is in the rdflib.sparql module. While not a complete SPARQL implementation, because it lackeda parser, it represented the bulk of the work necessary to implement a SPARQL query language, ie, theactual query logic.

On the parser front I have made some progress. You can find it in rdflib.sparql.grammar in the currentSVN. It depends on the excellent pyparsing library to parse SPARQL queries into a structured tokenobject from which all of the relevant bits of data about a particular SPARQL query can be extracted. Thegrammar is still young and being tested and it doesn’t work for all queries, but it’s a start. I’ve written ascript that applies the grammar to all of the standard SPARQL tests, so that over time I can keep workingon it until all the tests pass. Once we have a working parser that parses all the known SPARQL test queriesthen we can implement the last piece, the thin glue layer between the parser and Ivan’s query logic. I’mhoping that by rdflib 2.4 or 2.5 we can brag about having full SPARQL support as well as being able tosuccessfully run and prove all of the standard tests. This would be a huge milestone for us as it woulddrive more developers to rdflib, if only because they want a framework against which to test and verifythe spec: it encourages the existing SPARQL gurus out there to come our way because of the amazinglylow barrier of entry rdflib provides by being pure Python.

Subsquently, on 10 Oct 2005:

SPARQL in RDFLib (Version 2.1) by Ivan Herman

This is a short overview of the query facilities added to RDFLib. These are based on the July 2005 versionof the SPARQL draft worked on at the W3C. For a lack of a better word, I refer to this implementation assparql-p.

Thanks to the work of Daniel Krech and mainly Michel Pelletier, sparql-p is now fully integrated withthe newer versions of RDFLib (version 2.2.2 or later), whereas earlier versions were distributed as separatepackages. This integration has led to some minor adjustments in class naming and structure comparedto earlier versions. If you are looking for the documentation of the separate package, please refer to anearlier version of this document. Be warned, though, that the earlier versions are now deprecated in favourof RDFLib 2.2.2 or later.

The SPARQL draft describes its facilities in terms of a query language. A full SPARQL implementationshould include a parser of that language mapping on the underlying implementation. sparql-p does notinclude such parser yet, only the underlying SPARQL engine and its API. The description below showshow the mapping works. This also means that the API is not the full implementation of SPARQL: someof the features should be left to the parser that could use this API. This is the case, for example, of namedGraphs facilities that could be mapped using RDFLib Graph instances: all query is performed on suchan instance in the first place! In any case, the implementation of sparql-p covers (I believe) the mostfrequently used cases of SPARQL. — Intro

Later still, on May 19 2006:

SPARQL BisonGen Parser Checked in to RDFLib blog post by Chimezie

4 Chapter 1. Plug-ins Overview

http://sourceforge.net/projects/pyparsing/

http://rdflib.net

http://www.w3.org/TR/rdf-sparql-query/

http://rdflib.net/2005/09/10/rdflib-2.2.2/README/

http://dev.w3.org/cvsweb/~checkout~/2004/PythonLib-IH/Doc/sparqlDesc.html?rev=1.8

http://copia.posterous.com/sparql-bisongen-parser-checked-in-to-rdflib


I just checked in the most recent version of what had been an experimental, generated (see:http://copia.ogbuji.net/blog/2005-04-27/Of_BisonGe) parser for the full SPARQL syntax, I had beenworking on to hook up with sparql-p. It parses a SPARQL query into a set of Python objects rep-resenting the components of the grammar:

http://svn.rdflib.net/trunk/rdflib/sparql/bison/

The parses itself is a Python/C extension, so the setup.py had to be modified in order to compile it into aPython module.

I also checked in a test harness that’s meant to work with the DAWG test cases:

http://svn.rdflib.net/trunk/test/BisonSPARQLParser

I’m currently stuck on this test case, but working through it:

http://www.w3.org/2001/sw/DataAccess/tests/#optional-outer-filter-with-bound

The test harness only checks for parsing, it doesn’t evaluate the parsed query against the correspondingset of test data, but can be easily be extended to do so.

I’m not sure about the state of those test cases, some have been ‘accepted’ and some haven’t. I cameacross a couple that were illegal according to the most recent SPARQL grammar (the bad tests are notedin the test harness). Currently the parser is stand-alone, it doesn’t invoke sparql-p for a few reasons:

I wanted to get it through parsing the queries in the test case first

Our integrated version of sparql-p is outdated as there is a more recent version that Ivan has been workingon with some improvements we should consider integrating

Some of the more complex combinations of Graph Patterns don’t seem solvable without re-working /extending the expansion tree solver. I have some ideas about how this could be done (to handle things likenested UNIONS and OPTIONALs) but wanted to get a working parser in first

And later yet, on Sun, 01 Apr 2007

SPARQL Algebra, Reductions, Forms and Mappings for Implementations a post to public-sparql-devby Chimezie

I’ve been gearing up to an attempt at implementing the Compositional SPARQL semantics expressed inboth the ‘Semantics of SPARQL’ and ‘Semantics and Complexity of SPARQL’ papers with the goal ofreusing existing sparql-p which already implements much of the evaluation semantics. Some intermediategoals are were neccessary for the first attempt at such a design [1]:

• Incorporate rewrite rules outlined in the current DAWG SPARQL WD

• Incorporate reduction to Disjunctive Normal Form outlined in Semantics and Complexity ofSPARQL

• Formalize a mapping from the DAWG algebra notation to that outlined in Semantics of SPARQL

• Formalize a mapping from the compositional semantics to sparql-p methods

In attempting to formalize the above mappings I noticed some interesting parallels that I thought youand Ivan might be interested in (given the amount independent, effort that was put into both the formalsemantics and the implementations). In particular

The proposed disjunctive normal form of SPARQL patterns coincides directly with the ‘query’ API ofsparql-p [2] which essentially implements evaluation of SPARQL patterns of the form:

(P1 UNION P2 UNION .... UNION PN) OPT A) OPT B) ... OPT C)

I.e., DNF extended with OPTIONAL patterns.

1.1. SPARQL query processor 5

http://copia.ogbuji.net/blog/2005-04-27/Of_BisonGe

http://svn.rdflib.net/trunk/rdflib/sparql/bison/

http://svn.rdflib.net/trunk/test/BisonSPARQLParser

http://www.w3.org/2001/sw/DataAccess/tests/#optional-outer-filter-with-bound

http://www.w3.org/2001/sw/DataAccess/rq23/rq24-algebra.html

http://www.mail-archive.com/[email protected]/msg00040.html

http://www.dcc.uchile.cl/~cgutierr/papers/sparql.pdf

http://www.dcc.uchile.cl/~cgutierr/papers/sparql.pdf

http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf

http://dev.w3.org/cvsweb/~checkout~/2004/PythonLib-IH/Doc/sparqlDesc.html?rev=1.11&content-type=text/html;%20charset=iso-8859-1


In addition, I had suggested [3] to the DAWG that they consider formalizing a function symbol whichrelates a set of triples to the IRIs of the graphs in which they are contained. As Richard Newman points out,this is implemented [4] by most RDF stores and in RDFLib in particular by the ConjunctiveGraph.contextsmethod:

contexts((s,p,o)) -> uri1,uri2,...

I had asked their thoughts on performance impact on evaluating GRAPH patterns declaratively instead ofimperatively (the way they are defined in both the DAWG semantics and the Jorge P. et. al papers) andI’m curious on your thoughts on this as well.

Finally, an attempt at a formal mapping from DAWG algebra evaluation operators to the operators outlinedin the Jorge P.et. al papers is below:

merge(𝜇1,𝜇2) = 𝜇1 𝜇2Join(Omega1,Omega2) = Filter(R,Omega1 Omega2)Filter(R,Omega) = [[(P FILTER R)]](D,G)Diff(Omega1,Omega2,R) = (Omega1 \ Omega2) 𝜇 | 𝜇 in Omega1Omega2 and *not* 𝜇 |= RUnion(Omega1,Omega2) = Omega1 Omega2

Related documents

May 02, 2015

sparql - “Compositional Semantics” SPARQL engine

This module implements (with sparql-p) compositional forms and semantics as outlined in the pair of Jorge Pérez et.al papers:

Semantics of SPARQL Semantics and Complexity of SPARQL

It also implements rewrite rules expressed in the SPARQL Algebra

Compositional Semantics (Jorge P. et. al syntax)

Definition 3.5 (Compatible Mappings) Two mappings 𝜇1 : V → T and 𝜇2 : V → T are compatibles if for every?X dom(𝜇1) dom(𝜇2) it is the case that 𝜇1(?X) = 𝜇2(?X), i.e. when 𝜇1 𝜇2 is also a mapping.

Definition 3.7 (Set of Mappings and Operations) Omega1 and Omega2 are sets of mappings

I. Omega1 Omega2 = 𝜇1 𝜇2 | 𝜇1 Omega1, 𝜇2 Omega2 are compatible mappings II. Omega1 Omega2 = 𝜇 | 𝜇1 Omega1 or 𝜇2 Omega2 III. Omega1 \ Omega2 = 𝜇1 Omega1 | for all 𝜇 Omega2, 𝜇 and 𝜇 are not compatible IV. LeftJoin1(Omega1, Omega2) = ( Omega1 Omega2 ) ( Omega1 \ Omega2 )

NOTE: sparql-p implements the notion of compatible mappings with the ‘clash’ attribute defined on instances of_SPARQLNode (in the evaluation expansion tree)

An RDF dataset is a set D = G0, <u1,G1>,... <un,Gn>

where G0, . . . ,Gn are RDF graphs, u1, . . . , un are IRIs, and n 0.

NOTE: A SPARQL RDF dataset is equivalent to an RDFLib ConjunctiveGraph so we introduce a function rdflibDS(D)which returns the ConjunctiveGraph instance associated with the dataset D

Every dataset D is equipped with a function dD such that dD(u) = G if u,Gi D and dD(u) = otherwise


http://ing.utalca.cl/~jperez/papers/sparql_semantics.pdf

http://arxiv.org/abs/cs.DB/0605124

http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra


Let D be an RDF dataset and G an RDF graph in D

Definition 3.9 (Graph Pattern Evaluation) [[.]](D,G) Is the notation used to indicate the evaluation of a graphpattern.

I. [[(P1 AND P2)]](D,G) = [[P1]](D,G) [[P2]](D,G)II. [[(P1 UNION P2)]](D,G) = [[P1]](D,G) [[P2]](D,G)III.[[(P1 OPT P2)]](D,G) = LeftJoin1([[P1]](D,G),[[P2]](D,G))IV. If u I, then

[[(u GRAPH P)]](D,G) = [[P]](D,dD(u))if ?X V , then

[[(?X GRAPH P)]](D,G) =[[P]](D,G) ?X -> rdflibDS(D).contexts(P)

V. [[(P FILTER R)]](D,G) = 𝜇 [[P]](D,G) | 𝜇 |= R.

Note: RDFLib’s ConjunctiveGraph.contexts method is used to append bindings for GRAPH variables. The FILTERsemantics are implemented ‘natively’ in sparql-p by python functions

(http://dev.w3.org/cvsweb/2004/PythonLib-IH/Doc/sparqlDesc.html?rev=1.11#Constraini)

Equivalence with SPARQL Algebra (from DAWG SPARQL Algebra to Jorge.P et. al forms)merge(𝜇1,𝜇2) = 𝜇1 𝜇2Join(Omega1,Omega2) = Filter(R,Omega1 Omega2)Filter(R,Omega) = [[(P FILTER R)]](D,G)Diff(Omega1,Omega2,R) = (Omega1 \ Omega2) 𝜇 | 𝜇 in Omega1 Omega2 and *not* 𝜇 |= RUnion(Omega1,Omega2) = Omega1 Omega2

#LeftJoin(Omega1,Omega2,R)= Filter(R,Join(Omega1,Omega2)) Diff(Omega1,Omega2,R)

Graph Pattern rewrites and reductions[[t1, t2, . . . , tn]]D = [[(t1 AND t2 AND · · · AND tn)]]D

Proposition 3.13 The above proposition implies that it is equivalent to consider basic graph patterns or triple patternsas the base case when defining SPARQL general graph patterns.

BGP reduction and Disjunctive Normal Forms or Union-Free BGP Step 5 of http://www.w3.org/TR/rdf-sparql-query/#convertGraphPattern

Replace Join(, A) by A Replace Join(A, ) by A

Disjunctive Normal Form of SPARQL Patterns See: http://en.wikipedia.org/wiki/Disjunctive_normal_form

From Proposition 1 of ‘Semantics and Complexity of SPARQL’

I. (P1 AND (P2 UNION P3)) ((P1 AND P2) UNION (P1 AND P3))II. (P1 OPT (P2 UNION P3)) ((P1 OPT P2) UNION (P1 OPT P3))III.((P1 UNION P2) OPT P3) ((P1 OPT P3) UNION (P2 OPT P3))IV. ((P1 UNION P2) FILTER R) ((P1 FILTER R) UNION (P2 FILTER R))

The application of the above equivalences permits to translate any graph pattern into an equivalent one of the form:


http://dev.w3.org/cvsweb/2004/PythonLib-IH/Doc/sparqlDesc.html?rev=1.11#Constraini

http://www.w3.org/TR/rdf-sparql-query/#convertGraphPattern

http://www.w3.org/TR/rdf-sparql-query/#convertGraphPattern

http://en.wikipedia.org/wiki/Disjunctive_normal_form


P1 UNION P2 UNION P3 UNION ... UNION P

NOTE: sprarql-p SPARQL.query API is geared for evaluation of SPARQL patterns already in DNF:

May 02, 2015

Comprehensive RDF Query API

Original post by Chimezie Ogbuji (edited for contemporary use by Graham Higgins <[email protected]>)

RDFLib’s support for SPARQL has come full circle and I wasn’t planning on blogging on the developments until theyhad settled some – and they have. In particular, the last piece was finalizing a set of APIs for querying and resultprocessing that fit well within the framework of RDFLib’s various Graph API’s. The other issue was for the queryAPIs to accomodate eventual support for other querying languages that are capable of picking up the slack whereSPARQL is wanting (transitive closures, for instance – try composing a concise SPARQL query for calculating thetransitive closure of a given node along the rdfs:subClassOf property and you’ll immediately see what I mean).

Querying Every Graph instance has a query method through which RDF queries can be dispatched:

def query(self,strOrQuery, initBindings=, initNs=,DEBUG=False, processor="sparql")

"""Executes a SPARQL query (eventually will support Versa queries withsame method) against this Conjunctive Graph

:Params:

:strOrQuery: - Is either a string consisting of the SPARQL queryor an instance of rdflib.sparql.bison.Query.Query

:initBindings: - A mapping from variable name to an RDFLib term(used for initial bindings for SPARQL query)

:initNS: - A mapping from a namespace prefix to an instance ofrdflib.Namespace (used for SPARQL query)

:DEBUG: - A boolean flag passed on to the SPARQL parser andevaluation engine

:processor: - The kind of RDF query. Choose ’sparql’ to use thepure-Python "nOSQL" SPARQL processor, choose ’sparql2sql’ touse the pure-Python "SPARQL2SQL" SPARQL processor.

"""

The first positional argument strOrQuery is either a query string or a pre-compiled query object (compiled usingthe appropriate BisonGen mechanism for the target query language). Pre-compilation can be useful for avoidingredundant parsing overhead for queries that need to be evaluated repeatedly:

from rdfextras.sparql2sql.bison import ParsequeryObject = Parse(sparqlString)

The initBindings keyword argument is a dictionary that maps variables to their values. The dictionary is expectedto be a mapping from variables to RDFLib terms. This is passed on to the SPARQL processor as initial variablebindings.


http://dev.w3.org/cvsweb/~checkout~/2004/PythonLib-IH/Doc/Attic/pythondoc-sparql.html?rev=1.5&content-type=text/html;%20charset=iso-8859-1#sparql.SPARQL.query-method

http://copia.posterous.com/comprehensive-rdf-query-apis-for-rdflib

http://posterous.com/people/10xO4b8IeU9

mailto:[email protected]


initNs is yet another top-level parameter for the query processor: a namespace mapping from prefixes to namespaceURIs.

The DEBUG flag is pretty self-explanatory. When set to True, it will cause additional print statements to appear forthe parsing of the query (when the sparql2sql processor is selected) as well as the patterns and constraints passedon to the processor (for SPARQL queries).

Finally, the processor keyword specifies which kind of processor to use to evaluate the query: sparql orsparql2sql.

Result formats SPARQL has two result formats (JSON and XML). Thanks to Ivan Herman’s recent contributionthe SPARQL processor now supports both formats. The query method (above) returns instances of QueryResult,a common class for RDF query results which define the following method:

def serialize(self,format=’xml’):# real code required ...pass

The format argument determines which result format to use. For SPARQL queries, the allowable values are: graph– for CONSTRUCT / DESCRIBE queries (in which case a resulting Graph object is returned), json,or xml. Theresulting object also acts as an iterator over the bindings to allow for manipulation in the host language (Python).

May 02, 2015

SPARQL in RDFLib (Version 2.1)

author Ivan Herman [email protected]

date 2005/10/10 15:40:35

Introduction This is a short overview of the query facilities added to RDFLib. These are based on the July 2005version of the SPARQL draft worked on at the W3C. For a lack of a better word, I refer to this implementation assparql-p.

Thanks to the work of Daniel Krech and mainly Michel Pelletier, sparql-p is now fully integrated with the newerversions of RDFLib (version 2.2.2 or later), whereas earlier versions were distributed as separate packages. Thisintegration has led to some minor adjustments in class naming and structure compared to earlier versions. If you arelooking for the documentation of the separate package, please refer to an earlier version of this document. Be warned,though, that the earlier versions are now deprecated in favour of RDFLib 2.2.2 or later.

The SPARQL draft describes its facilities in terms of a query language. A full SPARQL implementation should includea parser of that language mapping on the underlying implementation. sparql-p does not include such parser yet,only the underlying SPARQL engine and its API. The description below shows how the mapping works. This alsomeans that the API is not the full implementation of SPARQL: some of the features should be left to the parser thatcould use this API. This is the case, for example, of named Graphs facilities that could be mapped using RDFLibGraph instances: all query is performed on such an instance in the first place! In any case, the implementation ofsparql-p covers (I believe) the most frequently used cases of SPARQL.

The SPARQL facilities ar based on a wrapper class called SPARQLGraph around the basic Graph object defined byRDFLib. Ie, all programs using sparql-p should be of the form:

from rdfextras.sparql import sparqlGraphsparqlGr = sparqlGraph.SPARQLGraph()

the sparqlGr object thus created has the same methods as a Graph type object would have, extended with thesparql-p facilities. An alternative way of creating the sparql-p graph is to use an existing Graph instance:



http://rdflib.net


http://rdflib.net/2005/09/10/rdflib-2.2.2/README/

http://dev.w3.org/cvsweb/~checkout~/2004/PythonLib-IH/Doc/sparqlDesc.html?rev=1.8

http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.Graph




sparqlGr = sparqlGraph.SPARQLGraph(graph=myExistingGraph)

Basic SPARQL The basic SPARQL construct is as follows (using the query syntax of the SPARQL document):

SELECT ?a ?b ?cWHERE ?a P ?x .

Q ?b ?a .?x R ?c

The meaning of this construction is simple: the ‘?a’, ‘?b’, etc, symbols (referred to as ‘unbound’ symbols) are queriedwith the constraint that the tuples listed in the WHERE clause are ‘true’, i.e., part of the triple store. This functionalityis translated into a Python method as:

from rdfextras.sparql import GraphPatternselect = ("?a","?b","?c")where = GraphPattern([("?a",P,"?x"),(Q,"?b","?a"),("?x",R,"?c")])result = sparqlGr.query(select,where)

where result is a list of tuples, each giving possible binding combinations for ”?a”, ”?b”, and ”?c”, respectively. P,Q, R, etc, must be the rdflib incarnations of RDF resources, i.e., URIRef, Literal, etc. The object of each patterncan also be one of the following Python types:

• integer

• long

• float

• string

• unicode

• datetime.date,

• datetime.time,

• datetime.datetime

these are transformed into a Literal with the corresponding XML Schema datatype on the fly. This allows codingin the form:

select = ("?a","?b","?c")where = GraphPattern([("?a",P,"?x"),(Q,"?b","?a"),("?x",R,"?c"),

("?x",S,"Some Literal Here"),("?x",R,43)])result = sparqlGr.query(select,where)

Note that the SPARQL draft mandates datetime only, not separate date and time, but it was obvious to add this into thePython implementation (and useful in practice). See also the note above on literals, as well as the additional sectionon datatypes.

As a further convenience to the user, if select consists of a single entry, it is not necessary to use a tuple and justgiving the string value will do. Similarly, if the where consists of one single tuple, the array construction may beskipped, and the single tuple is accepted as an argument. Finally, if select consists of one entry, result is a listof the values rather than tuples of (single) values.

The GraphPattern class instance can be built up gradually via the addPattern() and addPatterns()methods (the former takes one tuple, the latter a list of tuples).


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.term.URIRef

http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.term.Literal



The draft describes nested patterns, too, but it also draws the attention to the fact that nested patterns can be turnedinto regular patterns by possibly repeating some patterns. In other words, nested patterns can be handled by a parserand is therefore not implemented on this API level.

The SPARQLError exception is raised (beyond the possible exceptions raised by rdflib) if there are inconsistenciesin the select or where clauses (e.g., the tuples do not have the correct length or there are incorrect data in the tuplesthemselves).

Constraining Values SPARQL makes it possible to constrain values through operators, like:

SELECT ?a,?b,?cWHERE ?a P ?x .

Q ?b ?a .?x R ?c .FILTER ?x < 10

...

The draft also refers to the fact that application specific functions can also be used in the ‘FILTER’ part. There are twoways to translate this feature into sparql-p (see below for a further discussion).

Global Constraint This version is based on constraints that refer to the whole binding of the pattern and is thereforeexecuted against the full binding once available. Here is how it looks in sparql-p:

select = ("?a","?b","?c")where = GraphPattern([("?a",P,"?x"),(Q,"?b","?a"),

("?x",R,"?c"),("?x",S,"Some Literal Here")])where.addConstraint(func1)where.addConstraints([func2,func3,...])result = sparqlGr.query(select,where)

Each function in the constraints is of the form:

def func1(bindingDir) :# ....return True # or False

where bindingDir is a dictionary of the possible binding, ie, of the form

"?a" : Resource1, "?b" : Resource2, ...

Adding several constraints (in a list or via a series of addConstraint() methods) is equivalent to a logical con-junction of the individual constraints.

As an extra help to operator writers, the bindingDir also includes a special entry referring to the SPARQLGraphinstance in use via a special key:

from rdfextras.sparql import graphKeygraph = bindingDir[graphKey]

This construction, ie, the global constraint, is the faithful representation of the SPARQL spec. Note that a number ofoperator methods are available to make the construction of the global constraints easier, see the separate section onthat.

Per Pattern constraint This version is based on a constraint that can be imposed on one specific (bound) patternonly. This is achieved by adding a fourth element to the tuple representing the pattern, e.g.:



select = ("?a","?b","?c")where = GraphPattern([("?a",P,"?x",func),(Q,"?b","?a"),

("?x",R,"?c"),("?x",S,"Some Literal Here")])result = sparqlGr.query(select,where)

where func is a function with three arguments (the bound version of the ?a, P, ?x in the example).

Why Two Constraints? Functionally, the global constraint is a ‘superset’ of the per pattern constraint; in otherwords, anything that can be expressed by per pattern constraints can be achieved by global constraints. E.g., a methodabove can be expressed in two different ways:

select = ("?a","?b","?c")where = GraphPattern([("?a",P,"?x"),(Q,"?b","?a"),

("?x",R,"?c"),("?x",S,"Some Literal Here")])where.addConstraint(lambda binding: int(binding["?x"]) < 10)result = sparqlGr.query(select,where)

or:

select = ("?a","?b","?c")where = GraphPattern([("?a",P,"?x",lambda a,P,x: int(x) < 10),

(Q,"?b","?a"),("?x",R,"?c"),("?x",S,"Some Literal Here")])result = sparqlGr.query(select,where)

However, the second version may be much more efficient. The search is ‘cut’ in the by the constraint, ie the bindingtree is not (unnecessarily) expanded further, whereas a full binding tree must be generated for a global constraint (seethe notes on the implementation below).

For large triple stores and/or large patterns this may be a significant difference. A parser may optimize by generatingper-pattern constraints in some cases to make use of this optimization, hence this alternative.

‘Or’-d Patterns A slight variation of the basic scheme could be described as:

SELECT ?a,?b,?cWHERE ?a P ?x . Q ?b ?a UNION S ?b ?a. ?x R ?c ...

(I hope my understanding is correct that...) the meaning a logical ‘or’ on one of the clauses. This is expressed insparql-p by allowing the query method to accept a list of graph patterns, too, instead of single patterns only:

select = ("?a","?b","?c")where1 = GraphPattern([("?a",P,"?x"),(Q,"?b","?a")])where1 = GraphPattern([(S,"?b","?a"),("?x",R,"?c")])result = sparqlGr.query(select,[where1,where2])

The two queries are evaluated separately, and the concatenation of the results is returned.

Optional Matching Another variation on the basic query is the usage of ‘optional’ clauses:

SELECT ?a,?b,?c,?dWHERE ?a P ?x .

Q ?b ?a .?x R ?c .OPTIONAL ?x S ?d. ...



What this means is that if the fourth tuple (with ?x already bound) is not in the triple store, that should not invalidatethe possible bindings of ?a, ?b, and ?c; instead, the ?d unbound variable should be set to a null value, but theremaining bindings should be returned. In other words first the following query is performed:

SELECT ?a,?b,?cWHERE ?a P ?x .

Q ?b ?a .?x R ?c

then, for each possible bindings, a second query is done:

SELECT ?dWHERE X S ?d

where X stands for a possible binding of ?x.

The sparql-p expression of this facility is based on the creation of a separate graph pattern for the optional clause:

select = ("?a","?b","?c","?d")where = GraphPattern([("?a",P,"?x"),[(Q,"?b","?a"),("?x",R,"?c")])opt = GraphPattern([("?x",S,"?d")])result = sparqlGr.query(select,where,opt)

and the (possible) unbound ?d is set to None in the return value. Just as for the ‘main’ pattern, the third argument ofthe call can be a list of graph patterns (for several OPTIONAL clauses) evaluated separately. Each of the OPTIONALclauses can have their global constraints.

Query Forms The SPARQL draft includes several Query forms, which is a term to control how the query results arereturned to the caller. In the case of sparql-p this is implemented via a separate Python class, called Query. Allquery results yield, in fact, an instance of that class, and various methods on that class are defined corresponding tothe SPARQL Query Forms. The queryObject() method can be invoked instead of query() to return an instanceof such object. (In fact, the SPARQLGraph.query method, used in all previous examples, is simply a convenientshorthand, see below.)

SELECT Forms The SELECT SPARQL query forms are used to retrieve the query results. Corresponding to thedraft, the Query class has a select() method, with two (keyword) arguments: distinct (with possible valuesTrue and False) and limit (which is either a positive integer or None). For example:

select = ("?a","?b","?c","?d")where = GraphPattern([("?a",P,"?x"),[(Q,"?b","?a"),("?x",R,"?c")])opt = GraphPattern([("?x",S,"?d")])resultObject = sparqlGr.queryObject(where,opt)result = resultObject.select(select,distinct=True,limit=5)

returns the first 5 query results, all distinct from one another. The default for distinct is set True and the limitis None. Ie, the query() is, in fact, a shorthand for queryObject(where,...).select(select) (it is probably the mostwidespread use of select hence this shorthand method).

Note that it is possible to use the same class instance returned by queryObject() to run different selections (thoughthe SPARQL draft does not make this distinction); in other words, running the select() method does not changeany internal variable of the class.

CONSTRUCT Forms The construct method can be invoked either with an explicit Graph Pattern or without (thelatter corresponds to the CONSTRUCT * of the draft, the former to the case when a separate CONSTRUCT patternis defined). In both cases, a separate SPARQLGraph instance is returned containing the constructed triples. Forexample, the construction in the draft:


http://www.w3.org/TR/rdf-sparql-query/#QueryForms


CONSTRUCT <http://example.org/person#Alice> FN ?name WHERE ?x nm ?name

corresponds to the sparql-p construction:

where = GraphPattern([("?x",nm,"?name"])constructPattern = GraphPattern([(URIRef(

"http://example.org/person#Alice"),FN,"?name")])resultObject = sparqlGr.queryObject(where)result = resultObject.construct(constructPattern)

whereas the example:

CONSTRUCT * WHERE (?x N ?name)

corresponds to:

where = GraphPattern([("?x",N,"?name"])resultObject = sparqlGr.queryObject(where)result = resultObject.construct() # or resultObject.construct(None)

DESCRIPTION Forms The current draft is pretty vague as to what this facility is (and leaves is to the implementor).What SPARQLGraph implements is a form of clustering. The describe() method has a seed argument (to serveas a seed for clustering) and two keyword arguments, forward and backward, each a boolean. What it means:

• forward=True and backward=False generates a triple store with a transitive closure for each result ofthe query and the seed: take, recursively, all the properties and objects that start by a specific resource.

• forward=False and backward=True the same as forward but in the ‘other direction’.

• forward=True and backward=True combines the two into one triple store.

• forward=False and backward=False returns and empty triple store.

ASK Forms The SPARQL draft refers to an ASK query form., which simply says whether the set of patterns repre-sent a non-empty subgraph. This is done by:

resultObject = sparqlGr.queryObject(where)result = resultObject.ask()

The ask() method returns False or True (whether the resulting subgraph is empty or not, respectively).

Datatype lexical representations The current implementation does not (yet) do a full implementation of all thedatatypes with the precise lexical representation as defined in the XML Schema Datatype document (and referred toin the SPARQL document). In theory, these should be taken care of by the underlying RDFLib layer when parsingstrings into datatypes, but it does not happen yet. sparql-p does a partial conversion to have the vast majority ofqueries running properly, but there are some restrictions:

• string: Implemented and coded in UTF-8

• integer, float, long: Implemented as required

• double: As Python does not know doubles, it is mapped to floats

• decimal: As Python does not know general decimals, mapped to integers

• date: The format is YYYY-MM-DD. The optional timezone character (allowed by the XML Schema docu-ment) is not implemented when interpreting Literal-s as date.



• time: The format is HH:MM:SS. The optional microsecond and timezone characters (allowed by the XMLSchema document) are not implemented when interpreting Literal-s as time.

• dateTime: The format is YYYY-MM-DDTHH:MM:SS (ie, the combination of date and time with a separator‘T’). No microseconds or timezone characters are implemented when interpreting a Literal as a dateTime.

These mappings are used when a typed literal value is specified in a Graph pattern, and a Literal instance isgenerated on-the-fly: the Literal instance uses these lexical representations and the corresponding XML Schemadatatype are stored. When comparing values coming from an RDF data and parsed by RDFLib, these lexical repre-sentations are pre-supposed when comparing Literal instances.

Operators SPARQL defines a number of possible operators for the AND clause. It is not obvious at this point whichof those should be left to a parser and which of those should be implemented by the engine. sparql-p provides anumber of methods that can be used to create an elementary operator and that can also be used in the AND clause.More complex constructions can be done using Python’s lambda function, for example.

The available binary operator functions are: lt() (for less than), le() (for less or equal), gt() (for greater than),ge() (for greater or equal), and eq() (for equal). Each of these operator methods take two parameters, which areboth either a query string or a typed value, and each of these operators return a function that can be plugged into aglobal constraint. (All these methods should be imported from the sparqlOperators module.) For example, toadd the constraint:

FILTER ?m < 42

one can use:

constraints = [ lt("?m",42) ]

For the more complex case of the form:

FILTER ?m < 42 || ?n > 56

the lambda construction can be used:

constraints = [ lambda binding: lt("?m",42)(binding) or gt("?n",56)(binding) ]

The complicated case of how values of different types compare is left completely to Python for the time being. If acomparison does not make sense, the return value is False. When the Working Group gets to an equilibrium pointon this issue, this should be compared to what Python does but this is currently a matter of debate in the group, too.

The module also offers a special operator called isOnCollection() that can be used as a global constraint tocheck whether a resource is on a collection or not.

The SPARQL document also defines a number of special operators. The following of those operators are implemented:bound(), isURI(), isBlank(), isLiteral(), str(), lang(), datatype(). For example:

pattern.addConstraint(isURI("?mbox"))

adds a constraint that the value bound to ?mbox must be a real URI (as opposed to a literal), or

pattern.addConstraint( lambda binding: datatype("?d")(binding) == \\"http://www.myexampledatatype.org" )

checks whether the datatype of a bound resource is of a specific URI.

Whether this set of elementary operators is enough or not for the complete implementation of SPARQL is not yet clear.I presume the final answer will come when somebody writes a parser to the query language...

The sparqlOperators module in the package includes some methods that might be useful in creating more com-plex constraint methods, such as getLiteralValue() (to return the value of a Literal, possibly making on-the-





http://www.w3.org/TR/rdf-sparql-query/#operandDataTypes



fly conversion for the known datatypes), or getValue() (to create a ‘retrieval’ method to return either the originalResource or a bound resource in case of a query string parameter). Look at the detailed method description for details.

Implementation The implementation of SPARQL is based on an expansion tree. Each layer in the tree takes care ofa statement in the WHERE clause, starting by the first for the first layer, then the second statement for the second layer,etc. Once the full expansion is done, the results for SELECT are collected by visiting the leaves. In more details:

The root of the tree is created by handing over the full list of statements, and a dictionary with the variable bindings.Initially, this dictionary looks like

"?x": None, "?a" : None, ...

The node picks the first tuple in the where, replaces all unbound variables by None and makes a RDFLib query tothe triple store. The result are all tuples in the triple store that conform to the pattern expressed by the first wherepattern tuple. For each of those a child node is created, by handing over the rest of the triples in the where clause,and a binding where some of the None values are replaced by ‘real’ RDF resources. The children follow the sameprocess recursively. There are several ways for the recursion to stop:

• though there is still a where pattern to handle, no tuples are found in the triple store in the process. This meansthat the corresponding branch does not produce valid results. (In the implementation, such a node is markedas ‘clashed’). The same happens if, though a tuple is found, that tuple is rejected by the constraint functionassigned to this tuple (the “per-tuple” constraint).

• though there are no statements to process any more in the where clause, there are still unbound variables

• all variables are bound and there are no more patterns to process. Unless one of the global constraints reject thisbinding this yields ‘successful’ leaves.

The third type of leaf contains a valid, possible query result for the unbound variables. Once the expansion is complete,the collection of the results becomes obvious: successful leaves are visited to return their results as the binding for thevariables appearing in the select clause; the non-leaf nodes simply collect and combine the results of their children.

The implementation of the ‘optional’ feature follows the semantic description. A pre-processing step separates the‘regular’ and ‘optional’ select and where clauses. First a regular expansion is done; then, separate optionalexpansions (for each optional clauses) are attached to each successful leaf node (obviously, by binding all variablesthat can be bound on that level). The collection of the result follows the same mechanism except that if the optionalexpansion tree yields no results, the real result tuples are padded by the necessary number of None-s.

author Ivan Herman [email protected]

date 2005/10/10 15:40:35

This software is available for use under the W3C Software License.

Modules

May 02, 2015 .. currentmodule:: rdfextras.sparql

sparql - SPARQL main API

TODO: merge this first bit from sparql.sparql.py into rest of doc... updating all along the way.

SPARQL implementation on top of RDFLib

Implementation of the W3C SPARQL language (version April 2005). The basic class here is supposed to be a super-class of rdfextras.sparql.graph; it has been separated only for a better maintainability.

There is a separate description for the functionalities.



http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231


http://dev.w3.org/cvsweb/%7Echeckout%7E/2004/PythonLib-IH/Doc/sparqlDesc.html


For a general description of the SPARQL API, see the separate, more complete description.

Variables, Imports The top level (__init__.py) module of the Package imports the important classes. In other words,the user may choose to use the following imports only:

from rdflibUtils import myTripleStorefrom rdflibUtils import retrieveRDFFilesfrom rdflibUtils import SPARQLErrorfrom rdflibUtils import GraphPattern

The module imports and/or creates some frequently used Namespaces, and these can then be imported by the user like:

from rdflibUtils import ns_rdf

Finally, the package also has a set of convenience string defines for XML Schema datatypes (ie, the URI-s of thedatatypes); ie, one can use:

from rdflibUtils import type_stringfrom rdflibUtils import type_integerfrom rdflibUtils import type_longfrom rdflibUtils import type_doublefrom rdflibUtils import type_floatfrom rdflibUtils import type_decimalfrom rdflibUtils import type_dateTimefrom rdflibUtils import type_datefrom rdflibUtils import type_timefrom rdflibUtils import type_duration

These are used, for example, in the sparql-p implementation.

The three most important classes in RDFLib for the average user are Namespace, URIRef and Literal; these are alsoimported, so the user can also use, eg:

from rdflib import Namespace, URIRef, Literal

History

• Version 1.0: based on an earlier version of the SPARQL, first released implementation

• Version 2.0: version based on the March 2005 SPARQL document, also a major change of the core code(introduction of the separate GraphPattern rdflibUtils.graph.GraphPattern class, etc).

• Version 2.01: minor changes only: - switch to epydoc as a documentation tool, it gives a much better overviewof the classes - addition of the SELECT * feature to sparql-p

• Version 2.02: - added some methods to myTripleStore rdflibUtils.myTripleStore.myTripleStoreto handle Alt and Bag the same way as Seq - added also methods to add() collections and containers to thetriple store, not only retrieve them

• Version 2.1: adapted to the inclusion of the code into rdflib, thanks to Michel Pelletier

• Version 2.2: added the sorting possibilities; introduced the Unbound class and have a better interface to patternsusing this (in the BasicGraphPattern class)

@author: Ivan Herman

@license: This software is available for use under the W3C Software License

@contact: Ivan Herman, [email protected]

@version: 2.2


http://dev.w3.org/cvsweb/%7Echeckout%7E/2004/PythonLib-IH/Doc/sparqlDesc.html

http://www.ivan-herman.net

http://www.w3.org/Consortium/Legal/2002/copyright-software-20021231



class rdfextras.sparql.SPARQLError(msg)Am SPARQL error has been detected

May 02, 2015 .. currentmodule:: rdfextras.sparql.algebra

algebra - SPARQL Algebra

An implementation of the W3C SPARQL Algebra on top of sparql-p’s expansion trees

See: http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra

For each symbol in a SPARQL abstract query, we define an operator for evaluation. The SPARQL algebra operators ofthe same name are used to evaluate SPARQL abstract query nodes as described in the section “Evaluation Semantics”.

We define eval(D(G), graph pattern) as the evaluation of a graph pattern with respect to a dataset D having active graphG. The active graph is initially the default graph.

class rdfextras.sparql.algebra.AlgebraExpressionFor each symbol in a SPARQL abstract query, we define an operator for evaluation. The SPARQL algebraoperators of the same name are used to evaluate SPARQL abstract query nodes as described in the section“Evaluation Semantics”.

evaluate(tripleStore, initialBindings, prolog)12.5 Evaluation Semantics

We define eval(D(G), graph pattern) as the evaluation of a graph pattern with respect to a dataset D havingactive graph G. The active graph is initially the default graph.

class rdfextras.sparql.algebra.EmptyGraphPatternExpressionA placeholder for evaluating empty graph patterns - which should result in an empty multiset of solution bindings

class rdfextras.sparql.algebra.NonSymmetricBinaryOperator

class rdfextras.sparql.algebra.Join(BGP1, BGP2)

[[(P1 AND P2)]](D,G) = [[P1]](D,G) compat [[P2]](D,G)

Join(Ω1, Ω2) = merge(𝜇1, 𝜇2) | 𝜇1 in Ω1 and 𝜇2 in Ω2, and 𝜇1 and 𝜇2 are compatible

Pseudocode implementation:

Evaluate BGP1 Traverse to leaves (expand and expandOption leaves) of BGP1, set ‘rest’ to triple patterns inBGP2 (filling out bindings). Trigger another round of expand / expandOptions (from the leaves)

class rdfextras.sparql.algebra.LeftJoin(BGP1, BGP2, expr=None)

Let Ω1 and Ω2 be multisets of solution mappings and F a filter. We define:LeftJoin(Ω1, Ω2, expr) =

Filter(expr, Join(Ω1, Ω2)) set-union Diff(Ω1, Ω2, expr)

LeftJoin(Ω1, Ω2, expr) = merge(𝜇1, 𝜇2) | 𝜇1 in Ω1 and 𝜇2 in Ω2, and

𝜇1 and 𝜇2 are compatible, andexpr(merge(𝜇1, 𝜇2)) is true

set-union 𝜇1 | 𝜇1 in Ω1 and 𝜇2 in Ω2, and

𝜇1 and 𝜇2 are not compatible set-union


http://www.w3.org/TR/rdf-sparql-query/#sparqlAlgebra


𝜇1 | 𝜇1 in Ω1and 𝜇2 in Ω2, and 𝜇1 and 𝜇2 are compatible andexpr(merge(𝜇1, 𝜇2)) is false

class rdfextras.sparql.algebra.Union(BGP1, BGP2)

2.[[(P1 UNION P2)]](D,G) = [[P1]](D,G) OR [[P2]](D,G)

Union(Ω1, Ω2) = 𝜇 | 𝜇 in Ω1 or 𝜇 in Ω2

class rdfextras.sparql.algebra.GraphExpression(iriOrVar, GGP)

[24] GraphGraphPattern ::= ’GRAPH’ VarOrIRIref GroupGraphPatterneval(D(G), Graph(IRI,P)) = eval(D(D[i]), P)eval(D(G), Graph(var,P)) =

multiset-union over IRI i in D : Join( eval(D(D[i]), P) , Omega(?v->i) )

evaluate(tripleStore, initialBindings, prolog)The GRAPH keyword is used to make the active graph one of all of the named graphs in the dataset forpart of the query.

rdfextras.sparql.algebra.ReduceGraphPattern(graphPattern, prolog)Takes parsed graph pattern and converts it into a BGP operator

Replace all basic graph patterns by BGP(list of triple patterns)

rdfextras.sparql.algebra.ReduceToAlgebra(left, right)Converts a parsed Group Graph Pattern into an expression in the algebra by recursive folding / reduction (viafunctional programming) of the GGP as a list of Basic Triple Patterns or “Graph Pattern Blocks”

12.2.1 Converting Graph Patterns

[20] GroupGraphPattern ::= ’’ TriplesBlock? ( ( GraphPatternNotTriples | Filter )’.’? TriplesBlock? )* ’’

[22] GraphPatternNotTriples ::= OptionalGraphPattern | GroupOrUnionGraphPattern |GraphGraphPattern

[26] Filter ::= ’FILTER’ Constraint[27] Constraint ::= BrackettedExpression | BuiltInCall | FunctionCall[56] BrackettedExpression ::= ’(’ ConditionalOrExpression ’)’

( GraphPatternNotTriples | Filter ) ’.’? TriplesBlock?nonTripleGraphPattern filter triples

rdfextras.sparql.algebra.RenderSPARQLAlgebra(parsedSPARQL, nsMappings=None)

rdfextras.sparql.algebra.LoadGraph(dtSet, dataSetBase, graph)

rdfextras.sparql.algebra.TopEvaluate(query, dataset, passedBindings=None, DEBUG=False,exportTree=False, dataSetBase=None, extensionFunc-tions=, dSCompliance=False, loadContexts=False)

The outcome of executing a SPARQL is defined by a series of steps, starting from the SPARQL query as a string,turning that string into an abstract syntax form, then turning the abstract syntax into a SPARQL abstract querycomprising operators from the SPARQL algebra. This abstract query is then evaluated on an RDF dataset.

rdfextras.sparql.algebra.fetchUnionBranchesRoots(node)

rdfextras.sparql.algebra.fetchChildren(node)

rdfextras.sparql.algebra.walktree(top, depthfirst=True, leavesOnly=True, optProxies=False)

rdfextras.sparql.algebra.print_tree(node, padding=’ ‘)



May 02, 2015 .. currentmodule:: rdfextras.sparql.components

components - SPARQL components

class rdfextras.sparql.components.ListRedirectA utility class for lists of items joined by an operator. ListRedirects with length 1 are a special case and areconsidered equivalent to the item instead of a list containing it. The reduce function is used for normalizingListRedirect to the single item (and calling reduce on it recursively)

class rdfextras.sparql.components.PrefixDeclaration(qName, iriRef)PrefixDecl ::= ‘PREFIX’ QNAME_NS Q_IRI_REF See: http://www.w3.org/TR/rdf-sparql-query/#rPrefixDecl

class rdfextras.sparql.components.BaseDeclarationBaseDecl ::= ‘BASE’ Q_IRI_REF See: http://www.w3.org/TR/rdf-sparql-query/#rBaseDecl

class rdfextras.sparql.components.ParsedConditionalAndExpressionList(conditionalAndExprList)A list of ConditionalAndExpressions, joined by ‘||’

class rdfextras.sparql.components.ParsedRelationalExpressionList(relationalExprList)A list of RelationalExpressions, joined by ‘&&’s

class rdfextras.sparql.components.ParsedPrefixedMultiplicativeExpressionList(prefix,mulEx-prList)

A ParsedMultiplicativeExpressionList lead by a ‘+’ or ‘-‘

class rdfextras.sparql.components.ParsedMultiplicativeExpressionList(unaryExprList)A list of UnaryExpressions, joined by ‘/’ or ‘*’ s

class rdfextras.sparql.components.ParsedAdditiveExpressionList(multiplicativeExprList)A list of MultiplicativeExpressions, joined by ‘+’ or ‘-‘ s

class rdfextras.sparql.components.ParsedString

class rdfextras.sparql.components.ParsedDatatypedLiteral(value, dType)Placeholder for Datatyped literals This is necessary (instead of instantiating Literals directly) when datatypesIRIRefs are QNames (in which case the prefix needs to be resolved at some point)

class rdfextras.sparql.components.ParsedFilter(filter)

class rdfextras.sparql.components.ParsedExpressionFilter(filter)

class rdfextras.sparql.components.ParsedFunctionFilter(filter)

class rdfextras.sparql.components.FunctionCall(name, arguments=None)

class rdfextras.sparql.components.ParsedArgumentList(arguments)

class rdfextras.sparql.components.ParsedREGEXInvocation(arg1, arg2, arg3=None)

class rdfextras.sparql.components.BuiltinFunctionCall(name, arg1, arg2=None)

class rdfextras.sparql.components.ParsedGroupGraphPattern(triples, graphPatterns)See: http://www.w3.org/TR/rdf-sparql-query/#GroupPatterns A group graph pattern GP is a set of graph pat-terns, GPi. This class is defined to behave (literally) like a set of GraphPattern instances.

class rdfextras.sparql.components.BlockOfTriples(statementList)A Basic Graph Pattern is a set of Triple Patterns.

class rdfextras.sparql.components.GraphPattern(nonTripleGraphPattern=None, filter=None,triples=None)

Complex graph patterns can be made by combining simpler graph patterns. The ways of creating graph patternsare: * Basic Graph Patterns, where a set of triple patterns must match * Group Graph Pattern, where a set of


http://www.w3.org/TR/rdf-sparql-query/#rPrefixDecl

http://www.w3.org/TR/rdf-sparql-query/#rBaseDecl

http://www.w3.org/TR/rdf-sparql-query/#GroupPatterns


graph patterns must all match using the same variable substitution * Value constraints, which restrict RDF termsin a solution * Optional Graph patterns, where additional patterns may extend the

solution

•Alternative Graph Pattern, where two or more possible patterns are tried

•Patterns on Named Graphs, where patterns are matched against named graphs

( GraphPatternNotTriples | Filter ) ‘.’? TriplesBlock?

class rdfextras.sparql.components.ParsedOptionalGraphPattern(groupGraphPattern)An optional graph pattern is a combination of a pair of graph patterns. The second pattern modifies patternsolutions of the first pattern but does not fail matching of the overall optional graph pattern.

class rdfextras.sparql.components.ParsedAlternativeGraphPattern(alternativePatterns)A union graph pattern is a set of group graph patterns GPi. A union graph pattern matches a graph G withsolution S if there is some GPi such that GPi matches G with solution S.

class rdfextras.sparql.components.ParsedGraphGraphPattern(graphName, groupGraph-Pattern)

Patterns on Named Graphs, where patterns are matched against named graphs

class rdfextras.sparql.components.IRIRef

class rdfextras.sparql.components.RemoteGraph

class rdfextras.sparql.components.NamedGraph

class rdfextras.sparql.components.BinaryOperator(left, right)

class rdfextras.sparql.components.EqualityOperator(left, right)

class rdfextras.sparql.components.NotEqualOperator(left, right)

class rdfextras.sparql.components.LessThanOperator(left, right)

class rdfextras.sparql.components.LessThanOrEqualOperator(left, right)

class rdfextras.sparql.components.GreaterThanOperator(left, right)

class rdfextras.sparql.components.GreaterThanOrEqualOperator(left, right)

class rdfextras.sparql.components.UnaryOperator(argument)

class rdfextras.sparql.components.LogicalNegation(argument)

class rdfextras.sparql.components.NumericPositive(argument)

class rdfextras.sparql.components.NumericNegative(argument)

class rdfextras.sparql.components.QName

class rdfextras.sparql.components.QNamePrefix(prefix)

class rdfextras.sparql.components.Query(prolog, query)

Query ::= Prolog ( SelectQuery | ConstructQuery | DescribeQuery | AskQuery )

See: http://www.w3.org/TR/rdf-sparql-query/#rQuery

class rdfextras.sparql.components.WhereClause(parsedGraphPattern)The “where” clause is essentially a wrapper for an instance of a ParsedGraphPattern

class rdfextras.sparql.components.RecurClause(maps, parsedGraphPattern)

class rdfextras.sparql.components.SelectQuery(variables, dataSetList, whereClause, recur-Clause, solutionModifier, distinct=None)


http://www.w3.org/TR/rdf-sparql-query/#rQuery


SelectQuery ::= ‘SELECT’ ‘DISTINCT’? ( Var+ | ‘*’ ) DatasetClause* WhereClause RecurClause? Solu-tionModifier

See: http://www.w3.org/TR/rdf-sparql-query/#rSelectQuery

class rdfextras.sparql.components.AskQuery(dataSetList, whereClause)AskQuery ::= ‘ASK’ DatasetClause* WhereClause See: http://www.w3.org/TR/rdf-sparql-query/#rAskQuery

class rdfextras.sparql.components.ConstructQuery(triples, dataSetList, whereClause, solu-tionModifier)

ConstructQuery ::= ‘CONSTRUCT’ ConstructTemplate DatasetClause* WhereClause SolutionModifier

See: http://www.w3.org/TR/rdf-sparql-query/#rConstructQuery

class rdfextras.sparql.components.DescribeQuery(variables, dataSetList, whereClause, solu-tionModifier)

DescribeQuery ::= ‘DESCRIBE’ ( VarOrIRIref+ | ‘*’ ) DatasetClause* WhereClause? SolutionModifier

http://www.w3.org/TR/rdf-sparql-query/#rConstructQuery

class rdfextras.sparql.components.Prolog(baseDeclaration, prefixDeclarations)Prolog ::= BaseDecl? PrefixDecl* See: http://www.w3.org/TR/rdf-sparql-query/#rProlog

class rdfextras.sparql.components.RDFTermCommon class for RDF terms

class rdfextras.sparql.components.Resource(identifier=None, propertyValueList=None)Represents a sigle resource in a triple pattern. It consists of an identifier (URIRef or BNode) and a list ofPropertyValue instances

class rdfextras.sparql.components.TwiceReferencedBlankNode(props1, props2)Represents BNode in triple patterns in this form: [ :prop1 :val1 ] :prop2 :val2

class rdfextras.sparql.components.ParsedCollection(graphNodeList=None)An RDF Collection

class rdfextras.sparql.components.SolutionModifier(orderClause=None, limit-Clause=None, offsetClause=None)

class rdfextras.sparql.components.ParsedOrderConditionExpression(expression,order)

A list of OrderConditions OrderCondition ::= (

(‘ASC’ | ‘DESC’) BrackettedExpression ) | (FunctionCall | Var | BrackettedExpression)

class rdfextras.sparql.components.PropertyValue(property, objects)

class rdfextras.sparql.components.ParsedConstrainedTriples(triples, constraint)A list of Resources associated with a constraint

rdfextras.sparql.components.ListPrepend(item, list)

May 02, 2015

rdfextras.sparql.evaluate - SPARQL Evaluate

class rdfextras.sparql.evaluate.Resolver

class rdfextras.sparql.evaluate.BNodeRefAn explicit reference to a persistent BNode in the data set. This use of the syntax “_:x” to reference a namedBNode is technically in violation of the SPARQL spec, but is also very useful. If an undistinguished variable isdesired, then an actual variable can be used as a trivial workaround.


http://www.w3.org/TR/rdf-sparql-query/#rSelectQuery

http://www.w3.org/TR/rdf-sparql-query/#rAskQuery



http://www.w3.org/TR/rdf-sparql-query/#rProlog


Support for these can be disabled by disabling the ‘EVAL_OPTION_ALLOW_BNODE_REF’ evaulation op-tion.

Also known as special ‘session’ BNodes. I.e., BNodes at the query side which refer to BNodes in persistence

rdfextras.sparql.evaluate.convertTerm(term, queryProlog)Utility function for converting parsed Triple components into Unbound

rdfextras.sparql.evaluate.unRollCollection(collection, queryProlog)

rdfextras.sparql.evaluate.unRollRDFTerm(item, queryProlog)

rdfextras.sparql.evaluate.unRollTripleItems(items, queryProlog)Takes a list of Triples (nested lists or ParsedConstrainedTriples) and (recursively) returns a generator over allthe contained triple patterns

rdfextras.sparql.evaluate.mapToOperator(expr, prolog, combinationArg=None, con-straint=False)

Reduces certain expressions (operator expressions, function calls, terms, and combinator expressions) intostrings of their Python equivalent

rdfextras.sparql.evaluate.createSPARQLPConstraint(filter, prolog)Takes an instance of either ParsedExpressionFilter or ParsedFunctionFilter and converts it to a sparql-p operatorby composing a python string of lambda functions and SPARQL operators. This string is then evaluated toreturn the actual function for sparql-p

rdfextras.sparql.evaluate.isTriplePattern(nestedTriples)Determines (recursively) if the BasicGraphPattern contains any Triple Patterns returning a boolean flag indicat-ing if it does or not

May 02, 2015 .. currentmodule:: rdfextras.sparql.graph

graph - SPARQL Graph

class rdfextras.sparql.graph.SPARQLGraph(graph, graphVariable=None, dSCompliance=False)A subclass of Graph with a few extra SPARQL bits.

cluster(seed)Cluster up and down, by summing up the forward and backward clustering

Parameters seed – RDFLib Resource

Returns The SPARQLGraph triple store containing the cluster

clusterBackward(seed, Cluster=None)Cluster the triple store: from a seed, transitively get all properties and objects ‘backward’, ie, followingthe link back in the graph.

Parameters

• seed – RDFLib Resource

• Cluster – another sparqlGraph instance; if None, a new one will be created. The subgraphwill be added to this graph.


clusterForward(seed, Cluster=None)Cluster the triple store: from a seed, transitively get all properties and objects in direction of the arcs.

Parameters

• seed – RDFLib Resource



• Cluster – another sparqlGraph instance; if None, a new one will be created. The subgraphwill be added to this graph.


class rdfextras.sparql.graph.GraphPattern(patterns=[])Storage of one Graph Pattern, ie, the pattern tuples and the possible (functional) constraints (filters)

addConstraint(func)Add a global filter constraint to the graph pattern. ‘func’ must be a method with a single input parameter (adictionary) returning a boolean. This method is Iadded to previously added methods, ie, Iall methodsmust return True to accept a binding.

Parameters func – filter function

addConstraints(lst)Add a list of global filter constraints to the graph pattern. Each function in the list must be a method with asingle input parameter (a dictionary) returning a boolean. These methods are Iadded to previously addedmethods, ie, Iall methods must return True to accept a binding.

Parameters lst – list of functions

addPattern(tupl)Append a tuple to the local patterns. Possible type literals are converted to real literals on the fly. Eachtuple should be contain either 3 elements (for an RDF Triplet pattern) or four, where the fourth elementis a per-pattern constraint (filter). (The general constraint of SPARQL can be optimized by assigninga constraint to a specific pattern; because it stops the graph expansion, its usage might be much moreoptimal than the the ‘global’ constraint).

Parameters tupl – either a three- or four-element tuple

addPatterns(lst)Append a list of tuples to the local patterns. Possible type literals are converted to real literals on the fly.Each tuple should be contain either three elements (for an RDF Triplet pattern) or four, where the fourthelement is a per-pattern constraint. (The general constraint of SPARQL can be optimized by assigning aconstraint to a specific pattern; because it stops the graph expansion, its usage might be much more optimalthan the the ‘global’ constraint).

Parameters lst – list consisting of either a three- or four-element tuples

construct(tripleStore, bindings)Add triples to a tripleStore based on a variable bindings of the patterns stored locally. The triples arepatterned by the current Graph Pattern. The method is used to construct a graph after a successful querying.

Parameters

• tripleStore – an (rdflib) Triple Store

• bindings – dictionary

insertPattern(tupl)Insert a tuple to to the start of local patterns. Possible type literals are converted to real literals on thefly. Each tuple should be contain either 3 elements (for an RDF Triplet pattern) or four, where the fourthelement is a per-pattern constraint (filter). (The general constraint of SPARQL can be optimized by assign-ing a constraint to a specific pattern; because it stops the graph expansion, its usage might be much moreoptimal than the the ‘global’ constraint).

Semantically, the behaviour induced by a graphPattern does not depend on the order of the patterns. How-ever, due to the behaviour of the expansion algorithm, users may control the speed somewhat by addingpatterns that would ‘cut’ the expansion tree soon (ie, patterns that reduce the available triplets signifi-cantly). API users may be able to do that, hence this additional method.



Parameters tupl – either a three- or four-element tuple

insertPatterns(lst)Insert a list of tuples to the start of the local patterns. Possible type literals are converted to real literalson the fly. Each tuple should be contain either three elements (for an RDF Triplet pattern) or four, wherethe fourth element is a per-pattern constraint. (The general constraint of SPARQL can be optimized byassigning a constraint to a specific pattern; because it stops the graph expansion, its usage might be muchmore optimal than the the ‘global’ constraint).

Semantically, the behaviour induced by a graphPattern does not depend on the order of the patterns. How-ever, due to the behaviour of the expansion algorithm, users may control the speed somewhat by addingpatterns that would ‘cut’ the expansion tree soon (ie, patterns that reduce the available triplets signifi-cantly). API users may be able to do that, hence this additional method.

Parameters lst – list consisting of either a three- or four-element tuples

isEmpty()Is the pattern empty?

Returns Boolean

class rdfextras.sparql.graph.BasicGraphPattern(patterns=[], prolog=None)One justified, problem with the current definition of GraphPattern is that it makes it difficult for users to usea literal of the type ?XXX, because any string beginning with ? will be considered to be an unbound variable.The only way of doing this is that the user explicitly creates a rdflib.term.Literal object and uses thatas part of the pattern.

This class is a superclass of GraphPattern which does not do this, but requires the usage of a separatevariable class instance

May 02, 2015 .. currentmodule:: rdfextras.sparql.operators

operators - SPARQL Operators

$Date: 2005/11/04 14:06:36 $, by $Author: ivan $, $Revision: 1.1 $

API for the SPARQL operators. The operators (eg, ‘lt’) return a function that can be added to the AND clause ofa query. The parameters are either regular values or query strings.

The resulting function has one parameter (the binding directory), it can be combined with others or be plugged to intoan array of constraints.

For example:

constraints = [lt("?m", 42)]

for checking whether ”?m” is smaller than the (integer) value 42. It can be combined using the lambda function, forexample:

constraints = [lambda(b): lt("?m", 42")(b) or lt("?n", 134)(b)]

is the expression for:

AND ?m < 42 || ?n < 134

(Clearly, the relative complexity is only on the API level; a SPARQL language parser that starts with a SPARQLexpression can map on this API).




rdfextras.sparql.operators.queryString(v)Boolean test whether this is a a query string or not :param v: the value to be checked :returns: True if it is aquery string

rdfextras.sparql.operators.getLiteralValue(v)Return the value in a literal, making on the fly conversion on datatype (using the datatypes that are implemented):param v: the Literal to be converted :returns: the result of the conversion.

rdfextras.sparql.operators.getValue(param)Returns a value retrieval function. The return value can be plugged in a query; it would return the value ofparam directly if param is a real value, and the run-time value if param is a query string of the type ”?xxx”. Ifno binding is defined at the time of call, the return value is None.

Parameters param – query string, Unbound instance, or real value

Returns a function taking one parameter (the binding directory)

rdfextras.sparql.operators.lt(a, b)Operator for ‘<’ :param a: value or query string :param b: value or query string :returns: comparison method

rdfextras.sparql.operators.le(a, b)

rdfextras.sparql.operators.gt(a, b)Operator for ‘>’ :param a: value or query string :param b: value or query string :returns: comparison method

rdfextras.sparql.operators.ge(a, b)Operator for ‘>=’ :param a: value or query string :param b: value or query string :returns: comparisonmethod

rdfextras.sparql.operators.eq(a, b)Operator for ‘=’ :param a: value or query string :param b: value or query string :returns: comparison method

rdfextras.sparql.operators.neq(a, b)Operator for ‘!=’ :param a: value or query string :param b: value or query string :returns: comparison method

rdfextras.sparql.operators.bound(a)Is the variable bound :param a: value or query string :returns: check method

rdfextras.sparql.operators.isURI(a)Is the variable bound to a URIRef :param a: value or query string :returns: check method

rdfextras.sparql.operators.isIRI(a)Is the variable bound to a IRIRef (this is just an alias for URIRef) :param a: value or query string :returns: checkmethod

rdfextras.sparql.operators.isBlank(a)Is the variable bound to a Blank Node :param a: value or query string :returns: check method

rdfextras.sparql.operators.isLiteral(a)Is the variable bound to a Literal :param a: value or query string :returns: check method

rdfextras.sparql.operators.str(a)Return the string version of a resource :param a: value or query string :returns: check method

rdfextras.sparql.operators.lang(a)Return the lang value of a literal :param a: value or query string :returns: check method

rdfextras.sparql.operators.datatype(a)Return the datatype URI of a literal :param a: value or query string :returns: check method

rdfextras.sparql.operators.isOnCollection(collection, item, triplets)Generate a method that can be used as a global constaint in sparql to check whether the ‘item’ is an element of



the ‘collection’ (a.k.a. list). Both collection and item can be a real resource or a query string. Furthermore, itemmight be a plain string, that is then turned into a literal run-time. The method returns an adapted method.

Is a resource on a collection?

The operator can be used to check whether the ‘item’ is an element of the ‘collection’ (a.k.a. list). Bothcollection and item can be a real resource or a query string.

Parameters

• collection – is either a query string (that has to be bound by the query) or an RDFLibResource representing the collection

• item – is either a query string (that has to be bound by the query), an RDFLib Resource,or a data type value that is turned into a corresponding Literal (with possible datatype) thatmust be tested to be part of the collection

Returns a function

rdfextras.sparql.operators.addOperator(args, combinationArg)SPARQL numeric + operator implemented via Python

rdfextras.sparql.operators.XSDCast(source, target=None)XSD Casting/Construction Support For now (this may be an issue since Literal doesn’t override comparisons)it simply creates a Literal with the target datatype using the ‘lexical’ value of the source

rdfextras.sparql.operators.regex(item, pattern, flag=None)Invokes the XPath fn:matches function to match text against a regular expression pattern. The regular expressionlanguage is defined in XQuery 1.0 and XPath 2.0 Functions and Operators section 7.6.1 Regular ExpressionSyntax

rdfextras.sparql.operators.EBV(a)

•If the argument is a typed literal with a datatype of xsd:boolean, the EBV is the value of that argument.

•If the argument is a plain literal or a typed literal with a datatype of xsd:string, the EBV is false if theoperand value has zero length; otherwise the EBV is true.

•If the argument is a numeric type or a typed literal with a datatype derived from a numeric type, the EBVis false if the operand value is NaN or is numerically equal to zero; otherwise the EBV is true.

•All other arguments, including unbound arguments, produce a type error.

May 02, 2015 .. currentmodule:: rdfextras.sparql.parser

parser - SPARQL parser

rdfextras.sparql.parser.composition(callables)

rdfextras.sparql.parser.composition2(callables)

rdfextras.sparql.parser.regex_group(regex)

rdfextras.sparql.parser.as_empty(results)

rdfextras.sparql.parser.setPropertyValueList(results)

rdfextras.sparql.parser.refer_component(component, initial_args=None, projection=None,**kwargs)

Create a function to forward parsing results to the appropriate constructor.

The pyparsing library allows us to modify the token stream that is returned by a particular expression with thesetParseAction() method. This method sets a handler function that should take a single ParseResults instanceas an argument, and then return a new token or list of tokens. Mainly, we want to pass lower level tokens



to SPARQL parse tree objects; the constructors for these objects take a number of positional arguments, sothis function builds a new function that will forward the pyparsing results to the positional arguments of theappropriate constructor.

This function provides a bit more functionality with its additional arguments:

•initial_args: static list of initial arguments to add to the beginning of the arguments list before addi-tional processing

•projection: list of integers that reorders the initial arguments based on the indices that it contains.

Finally, any additional keyword arguments passed to this function are passed along to the handler that is con-structed.

Note that we always convert pyparsing results to a list with the asList() method before using those results; thisworks, but we may only need this for testing. To be safe, we include it here, but we might want to investigatefurther whether or not it could be moved only to testing code. Also, we might want to investigate whether alist-only parsing mode could be added to pyparsing.

rdfextras.sparql.parser.parse(stuff)

May 02, 2015 .. currentmodule:: rdfextras.sparql.processor

processor - SPARQL processor

class rdfextras.sparql.processor.Processor(graph)

May 02, 2015 .. currentmodule:: rdfextras.sparql.query

query - SPARQL Query

class rdfextras.sparql.query.SessionBNodeSpecial ‘session’ BNodes. I.e., BNodes at the query side which refer to BNodes in persistence

class rdfextras.sparql.query.EnoughAnswersRaised within expand when the specified LIMIT has been reached

class rdfextras.sparql.query._SPARQLNode(parent, bindings, statements, tripleStore,expr=None)

The SPARQL implementation is based on the creation of a tree, each level for each statement in the ‘where’clause of SPARQL.

Each node maintains a ‘binding’ dictionary, with the variable names and either a None if not yet bound, or thebinding itself. The method ‘expand’ tries to make one more step of binding by looking at the next statement: ittakes the statement of the current node, binds the variables if there is already a binding, and looks at the triplestore for the possibilities. If it finds valid new triplets, that will bind some more variables, and children will becreated with the next statement in the ‘where’ array with a new level of bindings. This is done for each tripletfound in the store, thereby branching off the tree. If all variables are already bound but the statement, with thebound variables, is not ‘true’ (ie, there is no such triple in the store), the node is marked as ‘clash’ and no moreexpansion is made; this node will then be thrown away by the parent. If Iall children of a node is a clash, thenit is marked as a clash itself.

At the end of the process, the leaves of the tree are searched; if a leaf is such that:

•all variables are bound

•there is no clash



then the bindings are returned as possible answers to the query.

The optional clauses are treated separately: each ‘valid’ leaf is assigned an array of expansion trees that containthe optional clauses (that may have some unbound variables bound at the leaf, though).

Variables

• parent – parent in the tree, a _SPARQLNode

• children – the children (in an array of _SPARQLNodes)

• bindings – copy of the bindings dictionary locally

• statement – the current statement, a (s,p,o,f) tuple (‘f’ is the local filter or None)

• rest – the rest of the statements (an array)

• clash – Boolean, intialized to False

• bound – Boolean True or False depending on whether all variables are bound in self.binding

• optionalTrees – array of _SPARQLNode instances forming expansion trees for optionalstatements

expandAtClient(constraints)The expansion itself. See class comments for details.

Parameters constraints – array of global constraining (filter) methods

expandOptions(bindings, statements, constraints)Managing optional statements. These affect leaf nodes only, if they contain ‘real’ results. A separateExpansion tree is appended to such a node, one for each optional call.

Parameters

• bindings – current bindings dictionary

• statements – array of statements from the ‘where’ clause. The first element is for thecurrent node, the rest for the children. If empty, then no expansion occurs (ie, the node isa leaf). The bindings at this node are taken into account (replacing the unbound variableswith the real resources) before expansion

• constraints – array of constraint (filter) methods

expandSubgraph(subTriples, pattern)Method used to collect the results. There are two ways to invoke the method:

1. if the pattern argument is not None, then this means the construction of a separate triple storewith the results. This means taking the bindings in the node, and constructing the graph viathe construct() method. This happens on the valid leaves; intermediate nodes call the samemethod recursively 2. otherwise, a leaf returns an array of the bindings, and intermediate methodsaggregate those.

In both cases, leaf nodes may successively expand the optional trees that they may have.

Parameters

• subTriples – the triples so far as a rdfextras.sparql.graph.SPARQLGraph

• pattern – a GraphPattern used to construct a graph

Returns if pattern is not None, an array of binding dictionaries

returnResult(select)Collect the result by search the leaves of the the tree. The variables in the select are exchanged against



their bound equivalent (if applicable). This action is done on the valid leaf nodes only, the intermediatenodes only gather the children’s results and combine it in one array.

Parameters select – the array of unbound variables in the original select that do not appear inany of the optionals. If None, the full binding should be considered (this is the case for theSELECT * feature of SPARQL)

Returns an array of dictionaries with non-None bindings.

class rdfextras.sparql.query.Query(sparqlnode, triples, parent1=None, parent2=None)Result of a SPARQL query. It stores to the top of the query tree, and allows some subsequent inquiries onthe expanded tree. This class should not be instantiated by the user, it is done by the queryObject()function.

ask()Whether a specific pattern has a solution or not. :rtype: Boolean

cluster(selection)cluster: a combination of clusterBackward() and clusterForward().

Parameters selection – a selection to define the seeds for clustering via the selection; the resultof select used for the clustering seed

clusterBackward(selection)Backward clustering, using all the results of the query as seeds (when appropriate). It is based on the us-age of the rdfextras.sparql.graph.SPARQLGraph.clusterBackward() method for triplestore.


Returns a new triple store of type rdfextras.sparql.graph.SPARQLGraph

clusterForward(selection)Forward clustering, using all the results of the query as seeds (when appropriate). It is based on the usage ofthe rdfextras.sparql.graph.SPARQLGraph.clusterForward() method for triple store.



construct(pattern=None)Expand the subgraph based on the pattern or, if None, the internal bindings.

In the former case the binding is used to instantiate the triplets in the patterns; in the latter, the originalstatements are used as patterns.

The result is a separate triple store containing the subgraph.

Parameters pattern – a rdfextras.sparql.graph.GraphPattern instance or None


describe(selection, forward=True, backward=True)The DESCRIBE Form in the SPARQL draft is still in state of flux, so this is just a temporary method, infact. It may not correspond to what the final version of describe will be (if it stays in the draft at all, thatis). At present, it is simply a wrapper around cluster().

Parameters

• selection – a selection to define the seeds for clustering via the selection; the result ofselect used for the clustering seed



• forward – cluster forward Boolean, yes or no

• backward – cluster backward Boolean yes or no

select(selection, distinct=True, limit=None, orderBy=None, orderAscend=None, offset=0)Run a selection on the query.

Parameters

• selection – Either a single query string, or an array or tuple of query strings.

• distinct – Boolean - if True, identical results are filtered out.

• limit – if set to a(non-negative) integer value, the first ‘limit’ number of results are re-turned, otherwise all the results are returned.

• orderBy – either a function or a list of strings (corresponding to variables in the query).If None, no sorting occurs on the results. If the parameter is a function, it must taketwo dictionary arguments (the binding dictionaries), return -1, 0, and 1, corresponding tosmaller, equal, and greater, respectively.

• orderAscend – if not None, then an array of booleans of the same length as orderBy, Truefor ascending and False for descending. If None, an ascending order is used.

• offset – the starting point of return values in the array of results. This parameter is onlyrelevant when some sort of order is defined.

Returns selection results as a list of tuples

Raises SPARQLError invalid selection argument

class rdfextras.sparql.query.SPARQLQueryResult(qResult)Query result class for SPARQL

Returns, variously: * xml - as an XML string conforming to the SPARQL XML result format. * python - asPython objects * json - as JSON * graph - as an RDFLib Graph, for CONSTRUCT and DESCRIBE queries

selectionFaccess the ‘selectionF’ attribute; deprecated and provided only for backwards compatibility

rdfextras.sparql.query.isGroundQuad(quad)

rdfextras.sparql.query.query(graph, selection, patterns, optionalPatterns=[], initialBind-ings=, dSCompliance=False, loadContexts=False)

A shorthand for the creation of a Query instance, returning the result of a select() right away. Good formost of the usage, when no more action (clustering, etc) is required.

Parameters

• selection – a list or tuple with the selection criteria, or a single string. Each entry is a stringthat begins with ”?”.

• patterns – either a GraphPattern instance or a list of GraphPattern instances. Eachpattern in the list represent an ‘OR’ (or ‘UNION’) branch in SPARQL.

• optionalPatterns – either a GraphPattern instance or a list of GraphPattern in-stances. Each of the elements in the ‘patterns’ parameter is combined with each of theoptional patterns and the results are concatenated. The list may be empty.

Returns list of query results as a list of tuples

rdfextras.sparql.query.queryObject(graph, patterns, optionalPatterns=[], initialBind-ings=None, dSCompliance=False, loadContexts=False)

Creation of a Query instance.

Parameters


http://www.w3.org/TR/rdf-sparql-XMLres/


• patterns – either a GraphPattern instance or a list of GraphPattern instances. Eachpattern in the list represent an ‘OR’ (or ‘UNION’) branch in SPARQL.

• optionalPatterns – either a GraphPattern instance or a list of GraphPattern in-stances. Each eof the elements in the ‘patterns’ parameter is combined with each of theoptional patterns and the results are concatenated. The list may be empty.

Returns a Query object

(The bison-parsing SPARQL2SQL implementation contributed by Chimezie Ogbuji et. al. has been moved to a sepa-rate archival branch).

1.2 Stores

A new RDFLib Store plugin has been made added to the RDFExtras package - the SPARQL Store uses Ivan Hermanet al.’s SPARQL service wrapper SPARQLWrapper to make a SPARQL endpoint behave programmatically like aread-only RDFLib store.

Warning: The SPARQL Store API does not support the “initNS” keyword arg.

The other back-end stores (except for IOMemory and SleepyCat) have been migrated out of RDFLib core and intoseparate plug-ins.

Extensive tests have been added in support of the development effort. Contributions in this area are especially wel-come.

Acknowledging a longstanding requirement, some ad hoc documentation has been rustled up to address some of themore egregious lacunae.

May 02, 2015

1.2.1 RDFLib Stores

The basic task: creating a non-native RDF store

The basic task is to achieve an efficient and proper translation of an RDF graph into one or other of the wide range ofcurrently-available data store models: relational, key-data, document, etc. Triplestore counts head off into the millionsvery quickly, so considered choices amongst the speed/space/structure tradeoffs in both storage and retrieval will becrucial to the success of any non-trivial attempt. Because data storage and retrieval is a highly technical field, thoseconsiderations can be complex, (a typical paper in the field: An Efficient SQL-based RDF Querying Scheme) andwide-ranging, as indicated in the W3C deliverable Mapping Semantic Web Data with RDBMSes report (well worth aquick dekko and a leisurely revisit later).

answers.semanticweb.com, the semantic web “correlate” of stackoverflow has some highly informative an-swers to questions about RDF storage and contemporary non-native RDF stores:

Storing RDF data into HBase?

RDF storages vs. other NoSQL storages

The answers are an excellent tour d’horizon of the principles in play and provide accessible and highly-relevantbackground support to the RDFLib-specific topics that are covered in this document.

Other preliminary reading that would most likely make this document more useful:

• RDF meets NoSQL Sandro Hawkes, (the comments are useful).


http://pypi.python.org/pypi/SPARQLWrapper

http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.plugins.html#rdflib.plugins.memory.IOMemory

http://www.nesc.ac.uk/talks/683/oracle_rdf_query_vldb_2005.pdf

http://www.w3.org/2001/sw/Europe/reports/scalable_rdbms_mapping_report/

http://answers.semanticweb.com/questions/716/storing-rdf-data-into-hbase

http://answers.semanticweb.com/questions/723/rdf-storages-vs-other-nosql-storages

http://decentralyze.com/2010/03/09/rdf-meets-nosql/


Types of RDF Store

The domain being modelled is that of RDF graphs and (minimally) statements of the form subject,predicate, object (aka triples), desirably augmented with the facility to handle statements about statements(quoted statements) and references to groups of statements (contexts), hence the following broad divisions of RDFstore, all of which have an impact on the modelling:

Context-aware: An RDF store capable of storing statements within contexts is consideredcontext-aware. Essentially, such a store is able to partition the RDF model it represents into in-dividual, named, and addressable sub-graphs.

Formula-aware: An RDF store capable of distinguishing between statements that are asserted andstatements that are quoted is considered formula-aware.

Conjunctive Graph: This refers to the ‘top-level’ Graph. It is the aggregation of all the contextswithin it and is also the appropriate, absolute boundary for closed world assumptions / models.

For the sake of persistence, Conjunctive Graphs must be distinguished by identifiers (which may not nec-essarily be RDF identifiers or may be an RDF identifier normalized - SHA1/MD5 perhaps - for databasenaming purposes).

The Notation3 reference has relevant information regarding formulae, quoted statements and such.

“An RDF document parses to a set of statements, or graph. However RDF itself has no datatype allowinga graph as a literal value. N3 extends RDF allows a graph itself to be referred to within the language,where it is known as a formula.”

For a more detailed discussion, see Chimezie’s blog post “Patterns and Optimizations for RDF Queries over NamedGraph Aggregates”

The RDFlib Store API

A “pseudocode” distillation of the API calls

All Stores subclass the main RDFLib Store class which presents the following triple- and namespace-oriented API:

class Store(object):""""""context_aware = Falseformula_aware = Falsetransaction_aware = Falsebatch_unification = False

def __init__(self, configuration=None, identifier=None):""" """pass

# Basic store managementdef create(self, configuration):

""" """pass

def open(self, configuration, create=False):""" """pass

def close(self, commit_pending_transaction=False):""" """

1.2. Stores 33

http://www.w3.org/DesignIssues/Notation3.html

http://copia.posterous.com/patterns-and-optimizations-for-rdf-queries-ov



pass

def destroy(self, configuration):""" """pass

def gc(self):""" """pass

# The RDF APIdef add(self, (subject, predicate, object), context, quoted=False):

"""Adds the given statement to a specific context or to the model. Thequoted argument is interpreted by formula-aware stores to indicate thisstatement is quoted/hypothetical It should be an error to not specify acontext and have the quoted argument be True. It should also be an errorfor the quoted argument to be True when the store is not formula-aware."""pass

def addN(self, quads):"""Adds each item in the list of statements to a specific context. Thequoted argument is interpreted by formula-aware stores to indicate thisstatement is quoted/hypothetical. Note that the default implementationis a redirect to add"""pass

def remove(self, (subject, predicate, object), context=None):"""Remove the set of triples matching the pattern from the store"""pass

def triples_choices(self, (subject, predicate, object_),context=None):"""A variant of triples that can take a list of terms instead of a singleterm in any slot. Stores can implement this to optimize the responsetime from the default ’fallback’ implementation, which will iterate overeach term in the list and dispatch to triples."""pass

def triples(self, (subject, predicate, object), context=None):"""A generator over all the triples matching the pattern. Pattern caninclude any objects for used for comparing against nodes in the store,for example, REGEXTerm, URIRef, Literal, BNode, Variable, Graph,QuotedGraph, Date? DateRange?

A conjunctive query can be indicated by either providing a value of Nonefor the context or the identifier associated with the Conjunctive Graph(if it is context-aware)."""pass



def __len__(self, context=None):"""Number of statements in the store. This should only account for non-quoted (asserted) statements if the context is not specified, otherwiseit should return the number of statements in the formula or contextgiven."""pass

def contexts(self, triple=None):"""Generator over all contexts in the graph. If triple is specified, agenerator over all contexts the triple is in."""pass

# Optional Namespace methodsdef bind(self, prefix, namespace):

""" """pass

def prefix(self, namespace):""" """pass

def namespace(self, prefix):""" """pass

def namespaces(self):""" """pass

# Optional Transactional methodsdef commit(self):

""" """pass

def rollback(self):""" """pass

Approaches to modelling RDF

RDF Modelling: a Relational Model for FOL Persistence

The FOPLRelationModel module implements a relational model for Notation 3 abstract syntax. Contributor:Chimezie Ogbuji.

In essence, this is an open-source, maximally efficient RDBM upon which large volume RDF can be persisted, withinnamed graphs, with the ability to persist Notation 3 formulae in a seperate manner (consistent with Notation 3 se-mantics). The module is called the “FOPLRelationModel” because although it is specifically a relational model forNotation 3 syntax, it covers much of the requirement for the syntactic representation of First Order Logic in general.

Discussion and design rationale Original post by Chimezie Ogbuji

1.2. Stores 35

http://copia-test.posterous.com/a-relational-model-for-fol-persistance



A short while ago I was rather engaged in investigating the most efficient way to persist RDF on Relational DatabaseManagement Systems. One of the outcomes of this effort that I have yet to write about is a relational model forNotation 3 abstract syntax and a fully funcitoning implementation - which is now part of RDFLib’s MySQL drivers.

It’s written in with Soft4Science’s SciWriter and seems to render natively in Firefox alone (havne’t tried any otherbrowser)

Originally, I kept coming at it from a pure Computer Science approach (programming and datastructures) but even-tually had to roll my sleeves and get down to the formal logic level (i.e., the Deconstructionist, Computer Engineerapproach).

Partitioning the KR Space The first method with the most impact was separating Assertional Box statements (state-ments of class membership) from the rest of the Knowledge Base. When I say Knowledge Base, I mean a ‘named’aggregation of all the named graphs in an RDF database. Partitioning the Table space has a universal effect on short-ening indices and reducing the average number of rows needed to be scanned for even the worst-case scenario for aSQL optimizer. The nature of RDF data (at the syntactic level) is a major factor. RDF is a Description Logic-orientedrepresentation and thus relies heavily on statements of class membership.

The relational model is all about representing everything as specific relations and the ‘instantiation’ relationship is aperfect candidate for a database table.

Eventually, it made sense to create additional table partitions for:

• RDF statments between resources (where the object is not an RDF Literal).

• RDF’s equivalent to EAV statements (where the object is a value or RDF Literal).

• Matching Triple Patterns against these partitions can be expressed using a decision tree which accomodatesevery combination of RDF terms. For example, a triple pattern:

?entity foaf:name "Ikenna"

Would only require a scan through the indices for the EAV-type RDF statements (or the whole table if necessary - butthat decision is up to the underlying SQL optimizer).

Using Term Type Enumerations The second method involves the use of the enumeration of all the term types asan additional column whose indices are also available for a SQL query optimizer. That is:

ANY_TERM = [’U’,’B’,’F’,’V’,’L’]

The terms can be partitioned into the exact allowable set for certain kinds of RDF terms:

ANY_TERM = [’U’,’B’,’F’,’V’,’L’]CONTEXT_TERMS = [’U’,’B’,’F’]IDENTIFIER_TERMS = [’U’,’B’]GROUND_IDENTIFIERS = [’U’]NON_LITERALS = [’U’,’B’,’F’,’V’]CLASS_TERMS = [’U’,’B’,’V’]PREDICATE_NAMES = [’U’,’V’]

NAMED_BINARY_RELATION_PREDICATES = GROUND_IDENTIFIERSNAMED_BINARY_RELATION_OBJECTS = [’U’,’B’,’L’]

NAMED_LITERAL_PREDICATES = GROUND_IDENTIFIERSNAMED_LITERAL_OBJECTS = [’L’]

ASSOCIATIVE_BOX_CLASSES = GROUND_IDENTIFIERS



For example, the Object term of an EAV-type RDF statment doesn’t need an associated column for the kind of termit is (the relation is explicitely defined as those RDF statements where the Object is a Literal - L)

Efficient Skolemization with Hashing Finally. thanks to Benjamin Nowack’s related efforts with ARC - a PHP-based implementation of an RDF / SPARQL storage system, Mark Nottinghams suggestion, and an earlier paperby Stephen Harris and Nicholas Gibbins: 3store: Efficient Bulk RDF Storage, a final method of using a half-hash(MD5 hash) of the RDF identifiers in the ‘statement’ tables was employed instead. The statements table each used anunsigned MySQL BIGint to encode the half hash in base 10 and use as foreign keys to two separate tables:

• A table for identifiers (with a column that enumerated the kind of identifier it was)

• A table for literal values

The key to both tables was the 16-byte unsigned integer which represented the half-hash

This of course introduces a possibility of collision (due to the reduced hash size), but by hashing the identifier alongwith the term type, this further dilutes the lexical space and reduces this collision risk. This latter part is still a theoryI haven’t formally proven (or disproven) but hope to. At the maximum volume (around 20 million RDF assertions)I can resolve a single triple pattern in 8 seconds on an SGI machine and there is no collision - the implementationincludes (disabled by default) a collision detection mechanism.

The implementation includes all the magic needed to generate SQL statements to create, query, and manage indicesfor the tables in the relational model. It does this from a Python model that encapsulates the relational model andmethods to carry out the various SQL-level actions needed by the underlying DBMS.

For me, it has satisfied my needs for an open-source maximally efficient RDBM upon which large volume RDF can bepersisted, within named graphs, with the ability to persist Notation 3 formulae in a seperate manner (consistent withNotation 3 semantics).

I called the Python module FOPLRelationModel because although it is specifically a relational model for Notation3 syntax it covers much of the requirements for the syntactic representation of First Order Logic in general.

Contents:

BinaryRelationPartition The set of classes used to model the 3 ‘partitions’ for N3 assertions.

There is a top level class which implements operations common to all partitions as well as a class for each partition.These classes are meant to allow the underlying SQL schema to be completely configurable as well as to automate thegeneration of SQL queries for adding, updating, removing and resolving triples to/from/in the partitions.

These classes work in tandem with the RelationHashes to automate all (or most) of the SQL processing associatedwith this FOPL Relational Model

Note: The use of foreign keys (which - unfortunately - bumps the minimum MySQL version to 5.0) allows for theefficient removal of all statements about a particular resource using cascade on delete (although this is currently notused).

This is the common ancestor of the three partitions for assertions. It implements behavior common to all 3. Eachsubclass is expected to define the following:

nameSuffix - The suffix appended to the name of the table

termEnumerations - a 4-item list (for each quad ‘slot’) of lists (or None) which enumerate the allowable termtypes for each quad slot (one of U - URIs, V - Variable, L - Literals, B - BNodes, F - Formulae)

columnNames - a list of column names for each quad slot (can be of additional length where each item is a 3-itemtuple of: column name, column type, index)

1.2. Stores 37

http://citeseer.ist.psu.edu/harris03store.html

http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html


columnIntersectionList - a list of 2-item tuples (the quad index and a boolean indicating whether or not the asso-ciated term is an identifier) this list (the order of which is very important) is used for generating intersectionsbetween the partition and the identifier / value hash

hardCodedResultFields - a dictionary mapping quad slot indices to their hardcoded value (for partitions - such asABOX - which have a hardcoded value for a particular quad slot)

hardCodedResultTermsTypes - a dictionary mapping quad slot indices to their hardcoded term type (for partitions- such as Literal properties - which have hardcoded values for a particular quad slot’s term type)

QuadSlot Utility functions associated with RDF terms:

• normalizing (to 64 bit integers via half-md5-hashes)

• escaping literals for SQL persistence

RelationalHash This module implements two hash tables for identifiers and values that facilitate maximal indexlookups and minimal redundancy (since identifiers and values are stored once only and referred to by integer half-md5-hashes).

The identifier hash uses the half-md5-hash (converted by base conversion to an integer) to key on the identifier’s fulllexical form (for partial matching by REGEX) and their term types.

The use of a half-hash introduces a collision risk that is currently not accounted for.

The volume at which the risk becomes significant is calculable, though through the birthday paradox.

The value hash is keyed off the half-md5-hash (as an integer also) and stores the identifier’s full lexical representation(for partial matching by REGEX)

These classes are meant to automate the creation, management, linking, insertion of these hashes (by SQL) automati-cally.

MySQLMassLoader - Bulk loading If you need to load a large number of RDF statements into an empty database,RDFLib provides a module that can be run as a script to help you with this task. You can run this module with thecommand

$ python -m \rdfextras.store.FOPLRelationalModel.MySQLMassLoader [options] <DB Type>

Note that several of the options are very important.

Let’s start with an example.

If you wanted to load the RDF/XML file profiles.rdf and the N-Triples file targets.nt into an emptyMySQL database named ‘plan’ located at host ‘bubastis’, accessible to user ‘ozymandias’ with password ‘ramsesIII’,you could use the following command:

$ python -m rdfextras.store.FOPLRelationalModel.MySQLMassLoader \-c db=plan,host=bubastis,user=ozymandias,password=ramsesIII \-i plan \-x profiles.rdf --nt=targets.nt \MySQL

Here, we’re connecting to a MySQL database, but this script can also utilize a PostgreSQL database with the ‘Post-greSQL’ keyword.

The -c option allows you to specify the connection details for the target database; it is a comma-separated string ofvariable assignments, as in the example above. As in that example, it can specify the database with ‘db’, the name ofthe target machine with ‘host’, the username with ‘user’, and the password for that user with ‘password’.


http://en.wikipedia.org/wiki/Birthday_Paradox


Also, you can specify the port on the target machine with ‘port’. A single database can support multiple RDF stores;each such store has an additional store “identifier”, which you must provide with the -i option.

Once we have connected, we can load data from files that can be in various formats. This script supports identifyingRDF/XML files to load with the -x option, TriX files with the -t option, N3 files with the -n option, N-Triples fileswith the --nt option, and RDFa files with the -a option.

In addition, you can load all the files in a directory, assuming that they all have the same format. To do this, use the--directory option to identify the directory containing the files, and the --format option to specify the formatof the files in that directory.

There are a few advanced options available for this script; you can use the -h option to get a summary of all theavailable options.

Typical Usage Typical usage is via module import in order to support the development of an implementation of anRDFLib Store, such as in the MySQL Store, from which the following illustration is drawn:

from FOPLRelationalModel.BinaryRelationPartition import AssociativeBoxfrom FOPLRelationalModel.BinaryRelationPartition import NamedLiteralPropertiesfrom FOPLRelationalModel.BinaryRelationPartition import NamedBinaryRelationsfrom FOPLRelationalModel.BinaryRelationPartition import BinaryRelationPartitionCoveragefrom FOPLRelationalModel.BinaryRelationPartition import PatternResolutionfrom FOPLRelationalModel.QuadSlot import genQuadSlotsfrom FOPLRelationalModel.QuadSlot import normalizeNodefrom FOPLRelationalModel.RelationalHash import IdentifierHashfrom FOPLRelationalModel.RelationalHash import LiteralHashfrom FOPLRelationalModel.RelationalHash import GarbageCollectionQUERY

class SQL(Store):"""Abstract SQL implementation of the FOPL Relational Model as an RDFLibStore."""context_aware = Trueformula_aware = Truetransaction_aware = Trueregex_matching = NATIVE_REGEXbatch_unification = Truedef __init__(

self, identifier=None, configuration=None,debug=False, engine="ENGINE=InnoDB",useSignedInts=False, hashFieldType=’BIGINT unsigned’,declareEnums=False, perfLog=False,optimizations=None,scanForDatatypes=False):

self.dataTypes=self.scanForDatatypes=scanForDatatypesself.optimizations=optimizationsself.debug = debugif debug:

self.timestamp = TimeStamp()

#BE: performance loggingself.perfLog = perfLogif self.perfLog:

self.resetPerfLog()

self.identifier = identifier and identifier or ’hardcoded’

1.2. Stores 39


#Use only the first 10 bytes of the digestself._internedId = INTERNED_PREFIX + sha1(self.identifier).hexdigest()[:10]

self.engine = engineself.showDBsCommand = ’SHOW DATABASES’self.findTablesCommand = "SHOW TABLES LIKE ’%s’"self.findViewsCommand = "SHOW TABLES LIKE ’%s’"# TODO: Note, the following three members are MySQL-specific, and# must be overridden for other databases.self.defaultDB = ’mysql’self.default_port = 3306self.select_modifier = ’straight_join’self.can_cast_bigint = False

self.INDEX_NS_BINDS_TABLE = \’CREATE INDEX uri_index on %s_namespace_binds (uri(100))’

#Setup FOPL RelationalModel objectsself.useSignedInts = useSignedInts# TODO: derive this from ‘self.useSignedInts‘?self.hashFieldType = hashFieldTypeself.idHash = IdentifierHash(self._internedId,

self.useSignedInts, self.hashFieldType, self.engine, declareEnums)self.valueHash = LiteralHash(self._internedId,

self.useSignedInts, self.hashFieldType, self.engine, declareEnums)self.binaryRelations = NamedBinaryRelations(

self._internedId, self.idHash, self.valueHash, self,self.useSignedInts, self.hashFieldType, self.engine, declareEnums)

self.literalProperties = NamedLiteralProperties(self._internedId, self.idHash, self.valueHash, self,self.useSignedInts, self.hashFieldType, self.engine, declareEnums)

self.aboxAssertions = AssociativeBox(self._internedId, self.idHash, self.valueHash, self,self.useSignedInts, self.hashFieldType, self.engine, declareEnums)

self.tables = [self.binaryRelations,self.literalProperties,self.aboxAssertions,self.idHash,self.valueHash]

self.createTables = [self.idHash,self.valueHash,self.binaryRelations,self.literalProperties,self.aboxAssertions]

self.hashes = [self.idHash,self.valueHash]self.partitions = [self.literalProperties,self.binaryRelations,self.aboxAssertions,]

#This is a dictionary which caputures the relationships between#the each view, it’s prefix, the arguments to viewUnionSelectExpression#and the tables involvedself.viewCreationDict=

’_all’ : (False,self.partitions),’_URI_or_literal_object’ : (False,[self.literalProperties,



self.binaryRelations]),’_relation_or_associativeBox’: (True,[self.binaryRelations,

self.aboxAssertions]),’_all_objects’ : (False,self.hashes)

#This parameter controls how exlusively the literal table is searched#If true, the Literal partition is searched *exclusively* if the object term#in a triple pattern is a Literal or a REGEXTerm. Note, the latter case#prevents the matching of URIRef nodes as the objects of a triple in the store.#If the object term is a wildcard (None)#Then the Literal paritition is searched in addition to the others#If this parameter is false, the literal partition is searched regardless of what the object#of the triple pattern isself.STRONGLY_TYPED_TERMS = Falseself._db = Noneif configuration is not None:

#self.open(configuration)self._set_connection_parameters(configuration=configuration)

self.cacheHits = 0self.cacheMisses = 0

self.literalCache = self.uriCache = self.bnodeCache = self.otherCache =

self.literal_properties = set()’’’set of URIRefs of those RDF properties which are known to rangeover literals.’’’self.resource_properties = set()’’’set of URIRefs of those RDF properties which are known to rangeover resources.’’’

#update the two sets above with defaultsif False: # TODO: Update this to reflect the new namespace layout

self.literal_properties.update(OWL.literalProperties)self.literal_properties.update(RDF.literalProperties)self.literal_properties.update(RDFS.literalProperties)self.resource_properties.update(OWL.resourceProperties)self.resource_properties.update(RDF.resourceProperties)self.resource_properties.update(RDFS.resourceProperties)

self.length = None# [ ... ]

class MySQL(SQL):"""MySQL implementation of FOPL Relational Model as an rdflib Store"""# node_pickler = None# __node_pickler = None_Store__node_pickler = Nonetry:

import MySQLdbdef _connect(self, db=None):

1.2. Stores 41


if db is None:db = self.config[’db’]

return MySQL.MySQLdb.connect(user=self.config[’user’],passwd=self.config[’password’], db=db,port=self.config[’port’], host=self.config[’host’])

except ImportError:def _connect(self, db=None):

raise NotImplementedError(’We need the MySQLdb module to connect to MySQL databases.’)

def _createViews(self,cursor):for suffix, (relations_only, tables) in self.viewCreationDict.items():

query = (’CREATE SQL SECURITY INVOKER VIEW %s%s AS %s’ %(self._internedId, suffix, ’ UNION ALL ’.join(

[t.viewUnionSelectExpression(relations_only)for t in tables])))

if self.debug:print >> sys.stderr, "## Creating View ##\n",query

self.executeSQL(cursor, query)

Modules and contents

BinaryRelationPartition The set of classes used to model the 3 ‘partitions’ for N3 assertions. There is atop level class which implements operations common to all partitions as well as a class for each partition. These classesare meant to allow the underlying SQL schema to be completely configurable as well as to automate the generation ofSQL queries for adding,updating,removing,resolving triples from the partitions. These classes work in tandem withthe RelationHashes to automate all (or most) of the SQL processing associated with this FOPL Relational Model

NOTE: The use of foreign keys (which - unfortunately - bumps the minimum MySQL version to 5.0) allows for theefficient removal of all statements about a particular resource using cascade on delete (currently not used)

see: http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html

class rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartition(identifier,id-Hash,val-ue-Hash,store,us-eSigned-Ints=False,hash-Field-Type=’BIGINTun-signed’,en-gine=’ENGINE=InnoDB’,de-cla-reEnums=False)


http://dev.mysql.com/doc/refman/5.0/en/ansi-diff-foreign-keys.html


The common ancestor of the three partitions for assertions. Implements behavior common to all 3. Each subclassis expected to define the following:

Variables

• nameSuffix –

– The suffix appended to the name of the table

• termEnumerations –

– a 4 item list (for each quad ‘slot’) of lists (or None) which enumerate the al-lowable term types for each quad slot (one of ‘U’ - rdflib.term.URI,‘V’ - rdflib.term.Variable, ‘L’ - rdflib.term.Literal, ‘B’ -rdflib.term.BNode, ‘F’ - rdflib.term.Formula)

• columnNames –

– a list of column names for each quad slot (can be of additional length where each item isa 3-item tuple of: column name, column type, index)

• columnIntersectionList –

– a list of 2 item tuples (the quad index and a boolean indicating whether or not the as-sociated term is an identifier) this list (the order of which is very important) is used forgenerating intersections between the partition and the identifier / value hash

• hardCodedResultFields –

– a dictionary mapping quad slot indices to their hardcoded value (for partitions - such asABOX - which have a hardcoded value for a particular quad slot)

• hardCodedResultTermsTypes –

– a dictionary mapping quad slot indices to their hardcoded term type (for partitions - suchas Literal properties - which have hardcoded values for a particular quad slot’s term type)

createStatements()Generates a CREATE TABLE statement which creates a SQL table used for persisting assertions associ-ated with this partition.

flushInsertions(db)Adds the pending identifiers / values and assertions (using executemany for maximum efficiency), andresets the queue.

foreignKeySQL(slot)Generates foreign key expressions for relating a particular quad term with the identifier hash.

generateHashIntersections()Generates the SQL JOINS (INNER and LEFT) used to intersect the identifier and value hashes with thispartition. This relies on each parition setting up an ordered list of intersections (ordered with optimizationin mind). For instance the ABOX partition would want to intersect on classes first (since this will have alower cardinality than any other field) wherease the Literal Properties partition would want to intersect ondatatypes first. The paritions and hashes are joined on the integer half-MD5-hash of the URI (or literal) aswell as the ‘Term Type’

generateWhereClause(queryPattern)Takes a query pattern (a list of quad terms - subject,predicate,object,context) and generates a SQL WHEREclauses which works in conjunction to the intersections to filter the result set by partial matching (byREGEX), full matching (by integer half-hash), and term types. For maximally efficient SELECT queries

1.2. Stores 43


insertRelations(quadSlots)Takes a list of QuadSlot objects and queues the new identifiers / values to insert and the assertions as well(so they can be added in a batch for maximum efficiency)

insertRelationsSQLCMD()Generates a SQL command with parameter references (%s) in order to facilitate efficient batch insertionof multiple assertions by Python DB implementations (such as MySQLdb)

selectContextFields(first)Generates a list of column aliases for the SELECT SQL command used in order to fetch contexts fromeach partition

selectFields(first=False)Returns a list of column aliases for the SELECT SQL command used to fetch quads from a partition

viewUnionSelectExpression(relations_only=False)Return a SQL statement which creates a view of all the RDF statements from all the contributing partitions

class rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.AssociativeBox(identifier,id-Hash,val-ue-Hash,store,us-eSigned-Ints=False,hash-Field-Type=’BIGINTun-signed’,en-gine=’ENGINE=InnoDB’,de-cla-reEnums=False)

The partition associated with assertions of class membership (formally known - in Description Log-ics - as an Associative Box) This partition is for all assertions where the property is rdf:type see:http://en.wikipedia.org/wiki/Description_Logic#Modelling_in_Description_Logics



class rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.NamedLiteralProperties(identifier,id-Hash,val-ue-Hash,store,us-eSigned-Ints=False,hash-Field-Type=’BIGINTun-signed’,en-gine=’ENGINE=InnoDB’,de-cla-reEnums=False)

The partition associated with assertions where the object is a Literal.

extractIdentifiers(quadSlots)Test literal data type extraction >>> from rdflib.namespace import RDF >>> from rdfex-tras.store.FOPLRelationalModel.QuadSlot import genQuadSlots >>> class DummyClass: ... def__init__(self,test=False): ... self.test = test ... def updateIdentifierQueue(self,stuff): ... ifself.test: ... term,termType = stuff[-1] ... assert termType == ‘U’,”Datatype’s are URIs!”>>> class Tester(NamedLiteralProperties): ... def __init__(self): ... self.idHash = Dum-myClass(True) ... self.valueHash = DummyClass() >>> c = Tester() >>> slots = genQuad-Slots([BNode(),RDF.first,Literal(1),BNode()]) >>> c.extractIdentifiers(slots)

class rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.NamedBinaryRelations(identifier,id-Hash,val-ue-Hash,store,us-eSigned-Ints=False,hash-Field-Type=’BIGINTun-signed’,en-gine=’ENGINE=InnoDB’,de-cla-reEnums=False)

Partition associated with assertions where the predicate isn’t rdf:type and the object isn’t a literal

1.2. Stores 45


rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionCoverage((subject,pred-i-cate,ob-ject_,con-text),BRPs)

This function takes a quad pattern (where any term is one of: URIRef,BNode,Literal,None,or REGEXTerm),a list of 3 live partitions and returns a list of only those partitions that need to be searched in order to resolvethe pattern. This function relies on the BRPQueryDecisionMap dictionary to determine which partitions to use.Note that the dictionary as it is currently constituted requres that REGEXTerms in the object slot require thatboth the binary relation partition and the literal properties partitions are searched when this search could belimited to the literal properties only (for more efficient REGEX evaluation of literal values). Given the nature ofthe REGEX function in SPARQL and the way Versa matches by REGEX, this seperation couldn’t be done

rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.PatternResolution(quad,cur-sor,BRPs,or-der-ByTriple=True,fetchall=True,fetch-Con-texts=False,se-lect_modifier=’‘)

This function implements query pattern resolution against a list of partition objects and 3 parameters specifyingwhether to sort the result set (in order to group identical triples by the contexts in which they appear), whetherto fetch the entire result set or one at a time, and whether to fetch the matching contexts only or the assertions.This function uses BinaryRelationPartitionCoverage to whittle out the partitions that don’t need to be searched,generateHashIntersections / generateWhereClause to generate the SQL query and the parameter fill-ins andcreates a single UNION query against the relevant partitions.

Note the use of UNION syntax requires that the literal properties partition is first (since it uses the first select todetermine the column types for the resulting rows from the subsequent SELECT queries)

see: http://dev.mysql.com/doc/refman/5.0/en/union.html

QuadSlot Utility functions associated with RDF terms:

• normalizing (to 64 bit integers via half-md5-hashes)

• escaping literals for SQL persistence

class rdfextras.store.FOPLRelationalModel.QuadSlot.QuadSlot(position, term, us-eSignedInts=False)

rdfextras.store.FOPLRelationalModel.QuadSlot.EscapeQuotes(qstr)

rdfextras.store.FOPLRelationalModel.QuadSlot.dereferenceQuad(index, quad)

rdfextras.store.FOPLRelationalModel.QuadSlot.genQuadSlots(quads, useSigned-Ints=False)


http://dev.mysql.com/doc/refman/5.0/en/union.html


rdfextras.store.FOPLRelationalModel.QuadSlot.normalizeValue(value, termType, us-eSignedInts=False)

rdfextras.store.FOPLRelationalModel.QuadSlot.makeSigned(bigint)

rdfextras.store.FOPLRelationalModel.QuadSlot.normalizeNode(node, useSigned-Ints=False)

RelationalHash This module implements two hash tables for identifiers and values that facilitate maximal indexlookups and minimal redundancy (since identifiers and values are stored once only and referred to by integer half-md5-hashes). The identifier hash uses the half-md5-hash (converted by base conversion to an integer) to key on theidentifier’s full lexical form (for partial matching by REGEX) and their term types. The use of a half-hash introducesa collision risk that is currently not accounted for. The volume at which the risk becomes significant is calculable,though through the ‘birthday paradox’.

The value hash is keyed off the half-md5-hash (as an integer also) and stores the identifier’s full lexical representation(for partial matching by REGEX)

These classes are meant to automate the creation, management, linking, insertion of these hashes (by SQL) automati-cally

see: http://en.wikipedia.org/wiki/Birthday_Paradox

class rdfextras.store.FOPLRelationalModel.RelationalHash.Table

createStatements()Returns a list of SQL statements that, when executed, will create this table (and any other critical datastructures).

defaultStatements()Returns a list of SQL statements that, when executed, will provide an initial set of rows for this table.

foreignKeyStatements()Returns a list of SQL statements that, when executed, will initialize appropriate foreign key references forthis table.

get_name()Returns the name of this table in the backing SQL database.

indexingStatements()Returns a list of SQL statements that, when executed, will create appropriate indices for this table.

removeForeignKeyStatements()Returns a list of SQL statements that, when executed, will remove all the foreign key references corre-sponding to foreignKeyStatements.

removeIndexingStatements()Returns a list of SQL statements that, when executed, will remove all of the indices corresponding toindexingStatements.

1.2. Stores 47

http://en.wikipedia.org/wiki/Birthday_Paradox


class rdfextras.store.FOPLRelationalModel.RelationalHash.RelationalHash(identifier,us-eSigned-Ints=False,hash-Field-Type=’BIGINTun-signed’,en-gine=’ENGINE=InnoDB’,decla-reEnums=False)

class rdfextras.store.FOPLRelationalModel.RelationalHash.IdentifierHash(identifier,us-eSigned-Ints=False,hash-Field-Type=’BIGINTun-signed’,en-gine=’ENGINE=InnoDB’,decla-reEnums=False)

defaultStatements()Since rdf:type is modeled explicitely (in the ABOX partition) it must be inserted as a ‘default’ identifier.

class rdfextras.store.FOPLRelationalModel.RelationalHash.LiteralHash(identifier,useSigned-Ints=False,hashField-Type=’BIGINTunsigned’,en-gine=’ENGINE=InnoDB’,decla-reEnums=False)

rdfextras.store.FOPLRelationalModel.RelationalHash.GarbageCollectionQUERY(idHash,val-ue-Hash,aBox-Part,bin-Rel-Part,lit-Part)

Performs garbage collection on interned identifiers and their references. Joins the given KB partitions againstthe identifiers and values and removes the ‘danglers’. This must be performed after every removal of an assertionand so becomes a primary bottleneck



May 02, 2015

AbstractSQLStore :: a SQL-92 formula-aware Store

A SQL-92 formula-aware implementation of an RDFLib Store, contributed by Chimezie Ogbuji. It stores its triples inthe following partitions:

1. Asserted non rdf:type statements

2. Asserted literal statements,

3. Asserted rdf:type statements (in a table which models Class membership). The motivation for this partitionis primarily query speed and scalability as most graphs will always have more rdf:type statements than others

4. All Quoted statements.

Namespace mappings are persisted in a separate table.

Chimezie explains the design rationale for this implementation in his blog post of October 28 2005, Addressing theRDF Scalability bottleneck , shamelessly reproduced below.

Addressing the RDF Scalability bottleneck I’ve been building RDF persistence stores for some time (it’s gone fromsomething of a hobby to the primary responsibility in my current work capacity) and have come to the conclusion thatRDF stores will almost always be succeptible to the physical limitations of database scalability.

I recall when I was at the Semantic Technology Conference this spring and asked one of the presenters there what hethought about this problem that all RDF databases face and the reason why most don’t function effectively beyond5-10 million triples. I liked his answer:

“It’s an engineering problem.”

Consider the amount of information an adult has stored (by whatever mechanism the human brain uses to persistinformation) in his or her noggin. We often take it for granted - as we do all other aspects of biology we know verylittle about - but it’s worth considering when thinking about why scalability is a ubiquitous hurdle for RDF databases.

Some basic Graph theory is relevant to this point:

The size of a graph is the number of edges and the order of a graph is the number of nodes within the graph. RDF is aResource Description Framework (where what we know about resources is key - not so much the resouces themselves)so it’s not surprising that RDF graphs will almost always have a much larger size than order. It’s also not suprising thatmost performance analysis made across RDF implementations (such as LargeTripleStores for instance) focus mostlyon triple size.

I’ve been working on a SQL-based persistence schema for RDF content (for rdflib) that is a bit different from thestandard approaches taken by most RDBMS implementations of RDF stores I’m familiar with (including those I’vewritten). Each of the tables are prefixed with a SHA1-hashed digest of the identifier associated with the ‘localizeduniverse’ (AKA,the boundary for a closed world assumptions). The schema is below:

CREATE TABLE %s_asserted_statements (subject text not NULL,predicate text not NULL,object text,context text not NULL,termComb tinyint unsigned not NULL,objLanguage varchar(3),objDatatype text,INDEX termComb_index (termComb),INDEX spoc_index (subject(100),predicate(100),object(50),context(50)),INDEX poc_index (predicate(100),object(50),context(50)),

1.2. Stores 49

http://copia.posterous.com/addressing-the-rdf-scalability-bottleneck

http://copia.posterous.com/addressing-the-rdf-scalability-bottleneck


INDEX csp_index (context(50),subject(100),predicate(100)),INDEX cp_index (context(50),predicate(100))) TYPE=InnoDB

CREATE TABLE %s_type_statements (member text not NULL,klass text not NULL,context text not NULL,termComb tinyint unsigned not NULL,INDEX termComb_index (termComb),INDEX memberC_index (member(100),klass(100),context(50)),INDEX klassC_index (klass(100),context(50)),INDEX c_index (context(10))) TYPE=InnoDB

CREATE TABLE %s_quoted_statements (subject text not NULL,predicate text not NULL,object text,context text not NULL,termComb tinyint unsigned not NULL,objLanguage varchar(3),objDatatype text,INDEX termComb_index (termComb),INDEX spoc_index (subject(100),predicate(100),object(50),context(50)),INDEX poc_index (predicate(100),object(50),context(50)),INDEX csp_index (context(50),subject(100),predicate(100)),INDEX cp_index (context(50),predicate(100))) TYPE=InnoDB

The first thing to note is that statements are partitioned into logical groupings:

Asserted non rdf:type statements: where all asserted RDF statements where the predicate isn’trdf:type are stored

Asserted rdf:type statements: where all asserted rdf:type statements are stored

Quoted statements: where all quoted/hypothetical statements are stored

Statement quoting is a Notation 3 concept and an extension of the RDF model for this purpose. The most significantpartition is the rdf:type grouping. The idea is to have class membership modeled at the store level instead of ata level above it. RDF graphs are as different as the applications that use them but the primary motivating factor formaking this seperation was the assumption that in most RDF graphs a majority of the statements (or a significantportion) would consist of rdf:type statements (statements of class membership).

Class membership can be considered an unstated RDF modelling best practice since it allows an author to say a lotabout a resource simply by associating it with a class that has its semantics completely spelled out in a separate,supporting ontology.

The rdf:type table models class membership explicitly with two columns: klass and member. This results ina savings of 43 characters per rdf:type statement. The implementation takes note of the predicate submitted intriple-matching pattern and determines which tables to search

Consider the following triple pattern:

http://metacognition.info ?predicate ?object

The persistence layer would know it needs to check against the table that persists non rdf:type statements as wellas the class membership table. However, patterns that match against a specific predicate (other than rdf:type) orclass membership queries only need to check within one partition (or table):

http://metacognition.info rdf:type ?klass


http://www.w3.org/DesignIssues/Notation3.html


In general, I’ve noticed that being able to partition your SQL search space (searching within a named graph / contextor searching within a single table) goes along way in query response.

The other thing worth noting is the termComb column, which is an integer value representing the 40 unique ways thefollowing RDF terms could appear in a triple:

• URI Ref

• Blank Node

• Formula

• Literal

• Variable

I’m certain there are many other possible optimizations that can be made in a SQL schema for RDF triple persistence(there isn’t much precedent in this regard - and Oracle has only recently joined the foray) but partitioning rdf:typestatements seperately is one such thought I’ve recently had.

[Chimezie Ogbuji]

Typical Usage Typical usage is via subclassing to provide an RDFLib Store API in support of persistence-specificimplementations of RDFLib Store, e.g. the SQLite Store, from which the following illustration is drawn:

from sqlite3 import dbapi2

class SQLite(AbstractSQLStore):"""SQLite store formula-aware implementation. It stores its triples in thefollowing partitions:

- Asserted non rdf:type statements- Asserted rdf:type statements (in a table which models Class membership)

The motivation for this partition is primarily query speed andscalability as most graphs will always have more rdf:type statementsthan others

- All Quoted statements

In addition it persists namespace mappings in a seperate table"""context_aware = Trueformula_aware = Truetransaction_aware = Trueregex_matching = PYTHON_REGEXautocommit_default = False_Store__node_pickler = None

def open(self, db_path, create=True):"""Opens the store specified by the configuration string. Ifcreate is True a store will be created if it does not alreadyexist. If create is False and a store does not already existan exception is raised. An exception is also raised if a storeexists, but there is insufficient permissions to open thestore."""if create:

db = dbapi2.connect(db_path)c = db.cursor()

1.2. Stores 51


# Only create tables if they don’t already exist. If the first# exists, assume they all do.try:

c.execute(CREATE_ASSERTED_STATEMENTS_TABLE % self._internedId)except dbapi2.OperationalError, e:

# Raise any error aside from existing table.if (str(e) != ’table %s_asserted_statements already exists’

% self._internedId):raise dbapi2.OperationalError, e

else:c.execute(CREATE_ASSERTED_TYPE_STATEMENTS_TABLE %

self._internedId)c.execute(CREATE_QUOTED_STATEMENTS_TABLE % self._internedId)c.execute(CREATE_NS_BINDS_TABLE % self._internedId)c.execute(CREATE_LITERAL_STATEMENTS_TABLE % self._internedId)for tblName, indices in [

("%s_asserted_statements",[

("%s_A_termComb_index",(’termComb’,)),("%s_A_s_index",(’subject’,)),("%s_A_p_index",(’predicate’,)),("%s_A_o_index",(’object’,)),("%s_A_c_index",(’context’,)),

],),(

"%s_type_statements",[

("%s_T_termComb_index",(’termComb’,)),("%s_member_index",(’member’,)),("%s_klass_index",(’klass’,)),("%s_c_index",(’context’,)),

],),(

"%s_literal_statements",[

("%s_L_termComb_index",(’termComb’,)),("%s_L_s_index",(’subject’,)),("%s_L_p_index",(’predicate’,)),("%s_L_c_index",(’context’,)),

],),(

"%s_quoted_statements",[

("%s_Q_termComb_index",(’termComb’,)),("%s_Q_s_index",(’subject’,)),("%s_Q_p_index",(’predicate’,)),("%s_Q_o_index",(’object’,)),("%s_Q_c_index",(’context’,)),

],),(

"%s_namespace_binds",[

("%s_uri_index",(’uri’,)),



],)]:for indexName, columns in indices:

c.execute("CREATE INDEX %s on %s (%s)" %(indexName % self._internedId,tblName % self._internedId,’,’.join(columns)))

c.close()db.commit()db.close()

self._db = dbapi2.connect(db_path)self._db.create_function("regexp", 2, regexp)

if os.path.exists(db_path):c = self._db.cursor()c.execute("SELECT * FROM sqlite_master WHERE type=’table’")tbls = [rt[1] for rt in c.fetchall()]c.close()

missing = 0for tn in [tbl%(self._internedId) for tbl in table_name_prefixes]:

if tn not in tbls:missing +=1

if missing == len(table_name_prefixes):return NO_STORE

elif missing > 0:return CORRUPTED_STORE

else:return VALID_STORE

# The database doesn’t exist - nothing is therereturn NO_STORE

Anatomy of the RDFLib Sleepycat key-value non-nested btree Store

BerkeleyDB/Sleepycat underpinning At base, we have get(key) and put(key, data) as provided by theSleepycat/BerkeleyDB core API:

get(key, default=None, txn=None, flags=0, ...)Returns the data object associated with key.

put(key, data, txn=None, flags=0, ...)Stores the key/data pair in the database.

Python’s bsddb module From the documentation for Python’s (now deprecated) bsddb module:

The bsddb module provides an interface to the Berkeley DB library. Users can create hash, btree or recordbased library files using the appropriate open() call. Bsddb objects behave generally like dictionaries.Keys and values must be strings, however, so to use other objects as keys or to store other kinds of objectsthe user must serialize them somehow, typically using marshal.dumps() or pickle.dumps().

The two main points of interest here are i) the choice of hash, btree or record-based storage techniques typicallyprovided by key-data stores and ii) the requirement for serialization of Python objects - which, for the case in point,are RDFLib objects: BNode, Literal, URIRef, Namespace, Graph, QuotedGraph, etc.

1.2. Stores 53

http://docs.python.org/library/bsddb.html


Modelling an RDF store using serialized key-data pairs To illustrate (sketchily) how this basic principle of serial-ized key-data pairs is used to model an RDF store, here is a sort-of-pseudocode distillation of RDFLib’s SleepycatStore implementation (which uses non-nested btrees, specified via a relevant db flag) and shows the creation of theindices and the main key-data tables: context, namespace, prefix, k2i and i2k (the latter being “key-to-index” and “index-to-key” respectively) and then, broadly, how a subject, predicate, object triple isserialized into keys and indices which are then put into the underlying key-data store:

def open(self, config):

# creating and opening the DBs

# Create the indices ...

self.__indices = [None,] * 3self.__indices_info = [None,] * 3for i in xrange(0, 3):

index_name = to_key_func(i)(("s", "p", "o"), "c")index = db.DB(db_env)index.open(index_name, dbopenflags)self.__indices[i] = indexself.__indices_info[i] = \

(index, to_key_func(i), from_key_func(i))

# [ ... ]

# Create the required key-data stores

self.__contexts = db.DB(db_env)self.__contexts.open("contexts", dbopenflags)

self.__namespace = db.DB(db_env)self.__namespace.open("namespace", dbopenflags)

self.__prefix = db.DB(db_env)self.__prefix.open("prefix", dbopenflags)

self.__k2i = db.DB(db_env)self.__k2i.open("k2i", dbopenflags)

self.__i2k = db.DB(db_env)self.__i2k.open("i2k", dbopenflags)

# [ ... ]

def add(self, (subject, predicate, object), context=None, txn=None):

# Serializing the subject, predicate, object and context

s = _to_string(subject, txn=txn)p = _to_string(predicate, txn=txn)o = _to_string(object, txn=txn)c = _to_string(context, txn=txn)

# Storing the serialized data (protected by a transaction# object, if provided)

cspo, cpos, cosp = self.__indices



value = cspo.get("%s^%s^%s^%s^" % (c, s, p, o), txn=txn)if value is None:

self.__contexts.put(c, "", txn=txn)

contexts_value = cspo.get("%s^%s^%s^%s^" % ("", s, p, o), txn=txn) or ""contexts = set(contexts_value.split("^"))contexts.add(c)contexts_value = "^".join(contexts)assert contexts_value!=None

cspo.put("%s^%s^%s^%s^" % (c, s, p, o), "", txn=txn)cpos.put("%s^%s^%s^%s^" % (c, p, o, s), "", txn=txn)cosp.put("%s^%s^%s^%s^" % (c, o, s, p), "", txn=txn)

if not quoted:cspo.put("%s^%s^%s^%s^" % ("", s, p, o), contexts_value, txn=txn)cpos.put("%s^%s^%s^%s^" % ("", p, o, s), contexts_value, txn=txn)cosp.put("%s^%s^%s^%s^" % ("", o, s, p), contexts_value, txn=txn)

A corresponding get method reconstructs (de-serializes) the triple from the indices and keys.

Indexing and storage issues Returning to the issue of the choice of hash, btree or record-based storage, some of theissues that might usefully be taken into consideration are outlined in the Sleepycat DB manual:

Choosing between BTree and Hash

For small working datasets that fit entirely in memory, there is no difference between BTree and Hash.Both will perform just as well as the other. In this situation, you might just as well use BTree, if for noother reason than the majority of DB applications use BTree.

Note that the main concern here is your working dataset, not your entire dataset. Many applicationsmaintain large amounts of information but only need to access some small portion of that data with anyfrequency. So what you want to consider is the data that you will routinely use, not the sum total of allthe data managed by your application.

However, as your working dataset grows to the point where you cannot fit it all into memory, then youneed to take more care when choosing your access method. Specifically, choose:

BTree if your keys have some locality of reference. That is, if they sort well and you can expect that aquery for a given key will likely be followed by a query for one of its neighbors.

Hash if your dataset is extremely large. For any given access method, DB must maintain a certain amountof internal information. However, the amount of information that DB must maintain for BTree is muchgreater than for Hash. The result is that as your dataset grows, this internal information can dominate thecache to the point where there is relatively little space left for application data. As a result, BTree can beforced to perform disk I/O much more frequently than would Hash given the same amount of data.

Moreover, if your dataset becomes so large that DB will almost certainly have to perform disk I/O tosatisfy a random request, then Hash will definitely out perform BTree because it has fewer internal recordsto search through than does BTree.

And, in addition, there is the usual raft of cryptic XXXTHISNTHAT flags for tweaking the inevitable variety ofdatabase speed/space/structure knobs.

Adapting the key-data approach to different back-ends The design of the RDFLib Store facilitates the ex-ploration of the above-mentioned tradeoffs as shown in Drew Pertulla’s experiment with replacing the BerkeleyDBkey-data database with the Tokyo Cabinet key-data database, using the pytc Python bindings.

1.2. Stores 55

http://fallabs.com/tokyocabinet/

http://pypi.python.org/pypi/pytc


Firstly, the Sleepycat Store is adapted by swapping out bsddb’s BDB (btree) API in favour of pytc’s HDB (hash) API ...

class BdbApi(pytc.HDB):"""Make HDB’s API look more like BerkeleyDB so we can sharethe Sleepycat code."""

def get(self, key, txn=None):try:

return pytc.HDB.get(self, key)except KeyError:

return None

def put(self, key, data, txn=None):try:

return pytc.HDB.set(self, key, data)except KeyError:

return None

def delete(self, key, txn=None):try:

return pytc.HDB.out(self, key)except KeyError:

return None

The next step is to create a wrapper to substitute for the standard bsddb open() call, returning a BdbApi objectinstead of a bsddb object ...

def dbOpen(name):return BdbApi(name, pytc.HDBOWRITER | pytc.HDBOCREAT)

This can be slotted into place with minimal disturbance to the re-use of the (substantial amount of) remainingSleepycat-based code ...

# Create the required key-data stores

# These 3 were BTree mode in Sleepycat, but currently I’m using TC hashself.__contexts = dbOpen("contexts")self.__namespace = dbOpen("namespace")self.__prefix = dbOpen("prefix")

self.__k2i = dbOpen("k2i")self.__i2k = dbOpen("i2k")

The pytc HashDB API unfortunately does not provide a cursor() object, whereas Sleepycat/BerkeleyDB does andkey parts of the functionality of the RDFLib Sleepycat Store implementation rely on the availability of that cursor. Theconsequent necessity of mimicking a cursor in Python rather than being able to use the library’s fast, C-coded versionrendered the exploration much less promising.

However, Tokyo Cabinet has subsequently given way to its anagrammatic successor Kyoto Cabinet which offers amuch richer API, including (crucially) a cursor object for the HashDB and so the exploration recovers its promise andan RDFLib KyotoCabinet key-value Store is now undergoing performance trials.


http://fallabs.com/kyotocabinet/

http://fallabs.com/kyotocabinet/api/


Technical discussions and support

Comparative performance of Stores and back-ends

Assessing comparative performance is problematic - it’s a “how long is a piece of string?” problem, i.e. one that issubject to multiply-interacting, often subtle, information-theoretic factors.

Also, RDFLib is a Python library for programming Pythonically with RDF and not necessarily an industrial-gradesolution to a set of specialised technical domain problems.

And bulk RDF storage is a specialised technical domain problem.

So, the aftergoing is basically “colour supplement” material - mildly diverting but not what you’d call “a seriousapproach”.

For info, the test platforms are i) a Dell Inspiron (32bit x86) laptop with 2G RAM running Ubuntu natty narwhal forrunning simple storage timings and ii) a commodity-build 64bit x68 desktop, again with 2Gb of RAM running nattythat was drafted in to do the relative heavy lifting of 50,000 triples.

Neither machine is configured to be a suitable platform for this kind of work. Existing background processes suchas cron jobs and running services such anti-UCE measures will have introduced significant distortions compared to aresult set obtained from running the tests on a dedicated machine with a OS tuned to just this task.

The test suite is part of the repository branch distribution and anyone with an interest in seeing results specific to theirlocal installation can easily replicate the tests, e.g.:

$ python run_tests.py -s -q test/test_store/store_load_and_query_performance.py

So ... YMMV as they say, but on the other hand this isn’t an untypical working set-up so the timings will be in theball-park for similiar general-purpose installations.

Storage of 500 - 10k triples 500 - 25k triples in Notation 3 form, from a dataset generated by the sp2b test suite andgenerator.

The basic test is: read the data into a Graph, iterate over the triples in the graph, adding each to a store-backed Graph,thus testing just the time taken to write to permanent store.

store = Graph(store="MySQL/SQLite/Sleepycat/etc")input = Graph()input.parse(location="<inputlocn>", format="n3")start_timer()for triple in input:

store.add(triple)end_timer()

It was immediately clear that a test set of 25k triples was too limited to enable the BDBOptimized Store to demonstratethe expected speed advantage accruing from the optimized indexing system that it employs.

Similarly, it was evident from the figures turned in by BerkeleyDB (a transaction-adding Store layered on top of theSleepycat Store) that transactions impose a heavy toll in terms of time taken writing to store — as one might wellexpect. Both Stores returned storage times an order of magnitude slower than any of the other Stores.

At triple counts of 20k and higher, the BDBOptimized Store hints at a query speed advantage. This is a topic for furtherexploration (see below) but for the current exercise, both BDB Stores must be considered to be oranges amongst theapples and so have been excluded from this reporting.

For some inexplicable reason, MySQL ran like a slug in these tests. This apparent poor performance is no doubt dueto DBA ignorance on my part but nevertheless, the view is clearer when the distorting results are excluded and soMySQL was sent to join the BerkeleyDB Stores in the naughty corner for making a nonsense of the vertical scale.

1.2. Stores 57

http://dbis.informatik.uni-freiburg.de/index.php?project=SP2B

http://dbis.informatik.uni-freiburg.de/index.php?project=SP2B


_static/store/time_to_store_triples.png

Load and query 50k triples 50,000 triples of statements generated by the sp2b data generator and using the(arbitrarily-selected) first SPARQL query listed in the sp2b set:

PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>PREFIX dc: <http://purl.org/dc/elements/1.1/>PREFIX dcterms: <http://purl.org/dc/terms/>PREFIX bench: <http://localhost/vocabulary/bench/>PREFIX xsd: <http://www.w3.org/2001/XMLSchema#>

SELECT ?yrWHERE

?journal rdf:type bench:Journal .?journal dc:title "Journal 1 (1940)"^^xsd:string .?journal dcterms:issued ?yr

For this test, the BDB Stores and MySQL returned to the arena.

Extended storage times are the norm for RDF in bulk.

_static/store/store_50ktriples.png

Query times will play a significant role in UI.

_static/store/answer_query.png

The increased number of triples enabled the BSDBOptimized Store to show the anticipated advantages of its optimizedindexing.

May 02, 2015

Patterns and Optimizations for RDF Queries over Named Graph Aggregates

Original post by Chimezie Ogbuji

In a previous post I used the term ‘Conjunctive Query’ to refer to a kind of RDF query pattern over an aggregation ofnamed graphs. However, the term (apparently) has already-established roots in database querying and has a differentmeaning that what I intended. It’s a pattern I have come across often and is for me a major requirement for an RDFquery language, so I’ll try to explain by example.




http://copia.posterous.com/closed-world-assumptions-conjunctive-querying


Consider two characters, King (Wen) and his heir / son (Wu) of the Zhou Dynasty. Let’s say they each have a FOAFgraph about themselves and the people they know within a larger database which holds the FOAF graphs of everyhistorical character in literature.

The FOAF graphs for both Wen and Wu are (preceded by the name of each graph):

<urn:literature:characters:KingWen>

@prefix : <http://xmlns.com/foaf/0.1/> .@prefix rel: <http://purl.org/vocab/relationship/> .

<http://en.wikipedia.org/wiki/King_Wen_of_Zhou> a :Person ;:name "King Wen" ;:mbox <mailto:[email protected]> ;rel:parentOf [ a :Person; :mbox <mailto:[email protected]> ] .

<urn:literature:characters:KingWu>

@prefix : <http://xmlns.com/foaf/0.1/>.@prefix rel: <http://purl.org/vocab/relationship/>.

<http://en.wikipedia.org/wiki/King_Wu_of_Zhou> a :Person;:name "King Wu";:mbox <mailto:[email protected]>;rel:childOf [ a :Person; :mbox <mailto:[email protected]> ].

In each case, Wikipedia URLs are used as identifiers for each historical character. There are better ways for usingWikipedia URLs within RDF, but we’ll table that for another conversation.

Now lets say a third party read a few stories about “King Wen” and finds out he has a son, however, he/she doesn’tknow the son’s name or the URL of either King Wen or his son. If this person wants to use the database to find outabout King Wen’s son by querying it with a reasonable response time, he/she has a few thing going for him/her:

foaf:mbox is an owl:InverseFunctionalProperty

and so can be used for uniquely identifying people in the database.

The database is organized such that all the out-going relationships (between foaf:Persons – foaf:knows,rel:childOf, rel:parentOf, etc..) of the same person are asserted in the FOAF graph associated with thatperson and nowhere else.

So, the relationship between King Wen and his son, expressed with the term ref:parentOf, will only be assertedin

urn:literature:characters:KingWen.

Yes, the idea of a character from an ancient civilization with an email address is a bit cheeky, but foaf:mbox is the onlyinverse functional property in FOAF to use to with this example, so bear with me.

Now, both Versa and SPARQL support restricting queries with the explicit name of a graph, but there are no constructsfor determining all the contexts of an RDF triple or:

The names of all the graphs in which a particular statement (or statements matching a specific pattern) are asserted.

This is necessary for a query plan that wishes to take advantage of [2]. Once we know the name of the graph in whichall statements about King Wen are asserted, we can limit all subsequent queries about King Wen to that same graphwithout having to query across the entire database.

Similarly, once we know the email of King Wen’s son we can locate the other graphs with assertions about this sameemail address (knowing they refer to the same person [1]) and query within them for the URL and name of King Wen’sson. This is a significant optimization opportunity and key to this query pattern.

1.2. Stores 59


I can’t speak for other RDF implementations, but RDFLib has a mechanism for this at the API level: a method calledquads() which takes a tuple of three terms ((subject, predicate, object)) and returns a tuple of size4 which correspond to the all triples (across the database) that match the pattern along with the graph that the triplesare asserted in:

for s,p,o,containingGraph in aConjunctiveGraph.quads(s,p,o):do_something_with(containingGraph)

It’s likely that most other QuadStores have similar mechanisms and given the great value in optimizing queries acrosslarge aggregations of named RDF graphs, it’s a strong indication that RDF query languages should provide the meansto express such a mechanism.

Most of what is needed is already there (in both Versa and SPARQL). Consider a SPARQL extension function whichreturns a boolean indicating whether the given triple pattern is asserted in a graph with the given name:

rdfg:AssertedIn(?subj,?pred,?obj,?graphIdentifier)

We can then get the email of King Wen’s son efficiently with:

BASE <http://xmlns.com/foaf/0.1/>PREFIX rel: <http://purl.org/vocab/relationship/>PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>

SELECT ?mboxWHERE

GRAPH ?foafGraph ?kingWen :name "King Wen";

rel:parentOf [ a :Person; :mbox ?mbox ] .FILTER (rdfg:AssertedIn(?kingWen,:name,”King Wen”,?foafGraph) ) .

Now, it is worth noting that this mechanism can be supported explicitly by asserting provenance statements associatingthe people the graphs are about with the graph identifiers themselves, such as:

<urn:literature:characters:KingWen>:primaryTopic <http://en.wikipedia.org/wiki/King_Wen_of_Zhou> .

However, I think that the relationship between an RDF triple and the graph in which it is asserted, although currentlyoutside the scope of the RDF model, should have its semantics outlined in the RDF abstract syntax instead of usingterms in an RDF vocabulary. The demonstrated value in RDF query optimization makes for a strong argument:

BASE <http://xmlns.com/foaf/0.1/>PREFIX rel: <http://purl.org/vocab/relationship/>PREFIX rdfg: <http://www.w3.org/2004/03/trix/rdfg-1/>

SELECT ?kingWu, ?sonNameWHERE

GRAPH ?wenGraph ?kingWen :name "King Wen";

:mbox ?wenMbox;rel:parentOf [ a :Person; :mbox ?wuMbox ].

FILTER (rdfg:AssertedIn(?kingWen,:name,"King Wen",?wenGraph) ).GRAPH ?wuGraph

?kingWu :name ?sonName;:mbox ?wuMbox;rel:childOf [ a :Person; :mbox ?wenMbox ].


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.ConjunctiveGraph.quads


FILTER (rdfg:AssertedIn(?kingWu,:name,?sonName,?wuGraph) ).

Generally, this pattern is any two-part RDF query across a database (a collection of multiple named graphs) where thescope of the first part of the query is the entire database, identifies terms that are local to a specific named graph, andthe scope of the second part of the query is this named graph.

May 02, 2015

Summary Overview of using MySQL or PostgreSQL as a triple store.

Introduction The RDFLib 3 plugin interface supports using either a MySQL or PostgreSQL database to store andquery your RDF graphs. This document describes how to use these backends, from loading large datasets into them totaking advantage of their query capabilities.

Bulk loading If you need to load a large number of RDF statements into an empty database, RDFLib provides amodule that can be run as a script to help you with this task. You can run this module with the command

$ python -m rdfextras.store.FOPLRelationalModel.MySQLMassLoader [options] <DB Type>;

note that several of the options are very important. Let’s start with an example.

If you wanted to load the RDF/XML file profiles.rdf and the N-Triples file targets.nt into an emptyMySQL database named plan located at host bubastis accessible to user ozymandias with passwordramsesIII, you could use the following command:

$ python -m rdflib.store.FOPLRelationalModel.MySQLMassLoader \-c db=plan,host=bubastis,user=ozymandias,password=ramsesIII \-i plan \-x profiles.rdf --nt=targets.nt \MySQL

Here, we’re connecting to a MySQL database, but this script can also utilize a PostgreSQL database with thePostgreSQL keyword. The -c option allows you to specify the connection details for the target database; it isa comma-separated string of variable assignments, as in the example above. As in that example, it can specify thedatabase with db, the name of the target machine with host, the username with user, and the password for that userwith password. Also, you can specify the port on the target machine with port.

A single database can support multiple RDF stores; each such store has an additional store “identifier”, which youmust provide with the -i option.

Once we have connected, we can load data from files that can be in various formats. This script supports identifyingRDF/XML files to load with the -x option, TriX files with the -t option, N3 files with the -n option, N-Triples files withthe –nt option, and RDFa files with the -a option. In addition, you can load all the files in a directory, assuming thatthey all have the same format. To do this, use the –directory option to identify the directory containing the files, andthe –format option to specify the format of the files in that directory.

There are a few advanced options available for this script; you can use the -h option to get a summary of all theavailable options. You may also want to see the “Benchmarking” section, below, for specific examples that you cangeneralize.

Query The RDFLib SPARQL implementation allows you to use the SPARQL language to query your RDF stores.The default implementation works entirely in memory; with a SQL backend, two different RDFLib components offerseparate approaches to utilizing that backend to optimize the query. This section will eventually provide genericinstructions for how to use the different query options, but until I get around to writing it see the “Benchmarking”section, below, for specific examples that you can generalize.

1.2. Stores 61



Benchmarking When working on the various SQL backends, I found it useful to compare the results of the RDFLibstore with the results obtained in Christian Becker’s RDF Store Benchmarks with DBpedia.

Walking through this process serves both as a good example to how to load and query large RDF datasets with anSQL backend, but also helps to judge the RDFLib backend against other options. Indeed, the DBpedia data set isinteresting in its own right; loading and querying DBpedia may be a reasonably common use case on its own. For ourbenchmarking, we will compare both the MySQL and the PostgreSQL backends.

I obtained a set of results for this benchmark dataset on a dual core 1.86 GHz machine with 3.5 GB of RAM, runningUbuntu GNU/Linux 8.10. These specs do not completely align with Becker’s configuration, so the results are onlyroughly comparable. Also, note that I used MySQL version 5.0.67, and, importantly, PostgreSQL 8.4beta1.

Version 8.4 of PostgreSQL contains a large performance enhancement over previous versions, so if you want the bestperformance (and if you want to reproduce the results in this report), you will need to install your own PostgreSQLserver until the next stable version makes it out into the wild.

Loading To begin, we first need to load our data. To do this, we need to first create both a MySQL and a PostgreSQLdatabase which will receive the data; these examples assume that this database is named ‘Becker_dbpedia’. This loadprocess also assumes that we have downloaded and extracted the benchmark datasets to a data directory relative to thecurrent directory. Once we have created a database, we can load that database (and time the load) with the followingcommand:

$ time python -m rdfextras.store.FOPLRelationalModel.MySQLMassLoader \-c db=Becker_dbpedia,host=localhost,user=username,password=password \-i Becker_dbpedia \--nt=data/infoboxes-fixed.nt --nt=data/geocoordinates-fixed.nt \--nt=data/homepages-fixed.nt \MySQL

Note that the name MySQLMassLoader is a misnomer; it started life targeting MySQL, but now supports bothMySQL and PostgreSQL through its first positional parameter. As such, we can load the data into PostgreSQL bychanging the argument from MySQL to PostgreSQL (in addition to changing any relevant connection details in theconnection string).

The results for the bulk load times are listed below. Note that in addition to the hardware differences listed above, weare also doing a bulk load of all the pieces at once, instead of loading the three pieces in stages.

Backend Load time (seconds)MySQL 28,612PostgreSQL (8.4beta1) 7,812

Note: the PostgreSQL and MySQL load strategies are very different, which may account for the dramatic difference.Interestingly, it was a missing feature (the IGNORE keyword on the delimited load statement) that led to the construc-tion of a different load mechanism in PostgreSQL, but it may turn out that the alternate load mechanism may workbetter on MySQL as well. I will continue to experiment with that.

Queries Becker’s benchmark set includes five amusing queries; we can currently run the first three of these queries,but the last two use SPARQL features that are not currently supported by the RDFLib SPARQL processor. To runthese queries, we will use the rdfextras.tools.sparqler script.

For both backends, we will run each query in up to four different ways. The RDFLib SPARQL processor has a newcomponent that can completely translate SPARQL queries to equivalent SQL queries directly against the backend,so we will run each query using that component, and again without it. Also, for each component run, we may alsoprovide range metadata to the processor as an optimization.


http://www4.wiwiss.fu-berlin.de/benchmarks-200801/


All available information about a specific subject We run this query using the SPARQL to SQL translator usingthe sparqler.py command line below.

$ time python /home/john/development/rdfextras/tools/sparqler.py -s MySQL \db=Becker_dbpedia,host=localhost,user=username,password=password Becker_dbpedia \’SELECT ?p ?o WHERE

<http://dbpedia.org/resource/Metropolitan_Museum_of_Art> ?p ?o’ > results

We run this query using the original SPARQL implementation using the command line below.

$ time python /home/john/development/rdfextras/tools/sparqler.py \--originalSPARQL -s MySQL \db=Becker_dbpedia,host=localhost,user=username,password=password Becker_dbpedia \’SELECT ?p ?o WHERE

<http://dbpedia.org/resource/Metropolitan_Museum_of_Art> ?p ?o’ > results

We must simply change ‘MySQL’ to ‘PostgreSQL’ in the above commands (and change connection parameters asnecessary) to run the same queries against the PostgreSQL backend.

The results for this query are listed below. All times are in seconds. For this query, we do not add any range informa-tion, because we don’t know anything about the properties that may be involved.

Backend SPARQL to SQL translator Original implementationMySQL 2.063 2.013PostgreSQL (8.4beta1) 1.993 2.002

Two degrees of separation from Kevin Bacon To run this query, we can replace the query in the above commandswith the new query:

PREFIX p: <http://dbpedia.org/property/>

SELECT ?film1 ?actor1 ?film2 ?actor2WHERE

?film1 p:starring <http://dbpedia.org/resource/Kevin_Bacon> .?film1 p:starring ?actor1 .?film2 p:starring ?actor1 .?film2 p:starring ?actor2 .

The results for this query are listed below. All times are in seconds. This time, we will also run the query with therange optimization; we know the http://dbpedia.org/property/starring property only ranges over resources, so we canadd -r http://dbpedia.org/property/starring to the query command line to provide this hint to the query processor.

Backend Translator Original Translator with hint Original with hintMySQL 843 645 23.58 25.216PostgreSQL (8.4beta1) 68.36 82.64 23.38 80.45

Unconstrained query for artworks, artists, museums and their directors To run this query, we can replace thequery in the above commands with the new query:

PREFIX p: <http://dbpedia.org/property/>

SELECT ?artist ?artwork ?museum ?directorWHERE

?artwork p:artist ?artist .?artwork p:museum ?museum .

1.2. Stores 63


?museum p:director ?director

The results for this query are listed below. All times are in seconds. We will not use any range optimizations for thisquery.

Backend SPARQL to SQL translator Original implementationMySQL 1026 336PostgreSQL (8.4beta1) 98 5.074

API This section describes how to use the RDFLib API to use either a MySQL or PostgreSQL backend as a Con-junctiveGraph. This section assumes that you have MySQL or PostgreSQL installed and configured correctly (partic-ularly permissions), as well as either the MySQLdb, the pgdb, the postgresql or the psycopg Python modulesinstalled.

Setting up the database server is outside the scope of this document and so is installing the modules.

Here’s an example:

import rdflibfrom rdflib import plugin, term, graph, namespace

db_type = ’PostgreSQL’ # Use ’MySQL’ instead, if that’s what you havestore = plugin.get(db_type, rdflib.store.Store)(

identifier = ’some_ident’,configuration = ’user=u,password=p,host=h,db=d’)

store.open(create=True) # only True when opening a store for the first time

g = graph.ConjunctiveGraph(store)sg = graph.Graph(store, identifier=term.URIRef(

’tag:[email protected],2009-08-20:bookmarks’))sg.add((term.URIRef(’http://www.google.com/’),

namespace.RDFS.label,term.Literal(’Google home page’)))

sg.add((term.URIRef(’http://wikipedia.org/’),namespace.RDFS.label,term.Literal(’Wikipedia home page’)))

Other general Graph/ConjunctiveGraph API uses here

May 02, 2015

BNode Drama for your Mama

Blog posting by Chimezie on 23 Sept 2005

The context, the question... You know you are geek when it’s 5am in the morning and you are wrestling withexistential quantification and their value in querying. This was triggered originally by the ongoing effort to extendan already expressive pattern-based RDF querying language to cover more usecases. The motivation being that suchpatterns should be expressive beyond just the level of triple-matching since the core RDF model consists of a levelof granularity below statements (you have literals, resources, and bnodes, ..). I asked myself if there was a justifiablereason why Versa at it’s core does not include BNodes?:


http://copia.posterous.com/bnode-drama-for-your-mama


Blank nodes Blank nodes are treated as simply indicating the existence of a thing, without using, or saying anythingabout, the name of that thing. (This is not the same as assuming that the blank node indicates an ‘unknown’ URIreference; for example, it does not assume that there is any URI reference which refers to the thing. The discussion ofSkolemization in appendix A is relevant to this point.)

I don’t remember the original motivation for leaving BNodes out of the core query data types, but in retrospect Ithink it it was a good decision and not only because the SPARQL specification does something similar (in interpretingBNodes as an open-ended variable). But it’s worth noting that the section on blank nodes appearing in a query asopposed to appearing to a query result (or existing in the underlying knowledge base) is quite short:

A blank node can appear in a query pattern. It behaves as a variable; a blank node in a query pattern may match anyRDF term.

Anyways, at the time I noticed this lack of BNodes in query languages, I had a misconception about BNodes. Ithought they represented individual things we want to make statements about but don’t know their identification ordon’t want to have to worry about assigning identification about them (this is probably 90% of the way BNodes areused in reality). This confusion came from the practical way BNodes are almost always handled by RDF data stores(Skolemization):

Skolemization is a syntactic transformation routinely used in automatic inference systems in which ex-istential variables are replaced by ‘new’ functions - function names not used elsewhere - applied to anyenclosing universal variables. In RDF, Skolemization amounts to replacing every blank node in a graphby a ‘new’ name, i.e. a URI reference which is guaranteed to not occur anywhere else. In effect, it gives‘arbitrary’ names to the anonymous entities whose existence was asserted by the use of blank nodes:the arbitrariness of the names ensures that nothing can be inferred that would not follow from the bareassertion of existence represented by the blank node.

This misconception was clarified when Bijan Parsia

(scope(PyChinko) => scope(FuXi))

expressed that he had issue with my assertion(s) that there are some compromising redundancies with BNodes, Literals,and simple entailment with regards to building programmatic APIs for them.

Then the light bulb went off that the semantics of BNodes are (as he put it) much stronger than they are most oftenused. Most people who use BNodes don’t mean to use it to state that there is a class of things which have the assertedset of statements made about them. Consider the difference between:

• Who are all the people Chime knows?

• There is someone Chime knows, but I just don’t know his/her name right now

• Chime knows someone (dudn’t madder who)

The first scenario is the basic use case for variable resolution in an RDF query and is asking for the resolution ofvariable ?knownByChime in:

<http://metaacognition.info/profile/webwho.xrdf#chime> foaf:knows ?knownByChime.

Which can be [expressed] in Versa (currently) as:

resource(’http://metacognition.info/profile/webwho.xrdf#chime’)-foaf:knows->*

Or eventually (hopefully) as:

foaf:knows(<http://metacognition.info/profile/webwho.xrdf#chime>)

And in SPARQL as:

select ?knownByChimewhere

1.2. Stores 65


<http://metacognition.info/profile/webwho.xrdf#chime> foaf:knows ?knownByChime

The second case is the most common way people use BNodes. You want to say Chime knows someone but don’t knowa permanent identifier for this person or care to at the time you make the assertion:

http://metaacognition.info/profile/webwho.xrdf#chime foaf:knows _:knownByChime

“The proper use for BNodes is as scoped existentials within ontological assersions”

But RDF-MT specifically states that BNodes are not meant to be interpreted in this way only. Their semantics are muchstronger. In fact, as Bijaan pointed out to me, the proper use for BNodes is as scoped existentials within ontologicalassersions. For example owl:Restrictions which allow you to say things like:

“The named class KnowsChime consists of everybody who knows Chime” and express that as:

@prefix mc <http://metaacognition.info/profile/webwho.xrdf#>.@prefix owl <http://www.w3.org/2002/07/owl#>.:KnowsChime a owl:Class;

rdfs:subClassOf[

a owl:Restriction;owl:onProperty foaf:knows;owl:hasValue mc:chime

];rdfs:label "KnowsChime";rdfs:comment "Everybody who knows Chime";

The fact that BNodes aren’t meant to be used in the way they often are leads to some suggested modifications to allowBNodes to be used as ‘temporary identifiers’ in order to simplify query resolution. But as clarified in the same thread,BNodes in a query doesn’t make much sense - which is the conclusion I’m coming around to: There is no use case forasserting an existential quantification while querying for information against a knowledge base. Using a variable (inthe way SPARQL does) should be sufficient. In fact, all RDF querying usecases (and languages) seem to be reducableto variable resolution.

This last part is worth noting because it suggests that if you have a library that handles variable resolution (suchas rdflib’s most recent addition) you can map any query language to (Versa/SPARQL/RDFQueryLanguage_X) it byreducing it to a set of triple patterns with the variables you wish to resolve.

Conclusions So my conclusions?:

Blank Nodes are a neccessary component in the model (and any persistence API) that unfortunately have much strongersemantics (existential quanitifcation) than their most common use (as temporary identifiers)

The distinction between the way BNodes are most often used (as a syntactic shorthand for a single resource for whichthere is no known identity - at the time) and the formal definition of BNodes is very important to note - especially tothose who are very much wed to their BNodes as Shelly Powers has shown to be :).

Finally, BNodes emphatically do not make sense in the context of a query - since they become infinitely resolvablevariables: which is not very useful. This confusion is further proof that (once again), for the sake of minimizingsaid confusion and misinterpretation of some very complicated axioms there is plenty value in parenthetically (if notlogically) divorcing (pun intended) RDF model theoretics from the nuts and bolts of the underlying model

Additional Reading:

• How RDF Databases Differ from Other NoSQL Solutions

• RDF on Cloud Number Nine


http://blog.datagraph.org/2010/04/rdf-nosql-diff

http://vzach.de/papers/2010_nefors.pdf


• Versa: Path-Based RDF Query Language

• Versa 2.0

• A Semantic Graph Query Language

• Experimenting with MongoDB as an RDF Store

• SHARD

Resources

• SHARD-3STORE

• Berlin SPARQL Benchmark (BSBM)

• BSBM Tools - Java-implemented test dataset generator

• DAWG SPARQL Testcases

• Thea SWIProlog-OWL

• RDFS and OWL2 RL

Scamped Notes

Rob Vesse

The one I settled on is essentially to have a single simple document which represents the existence of theGraph:

name: "some-name" ,uri: "http://example.org/graph"

And then to have a document for each individual triple:

subject : "<http://example.org/subject>" ,predicate : "<http://example.org/predicate>" ,object : "<http://example.org/object>" ,graphuri : "http://example.org/graph"

I took advantage of MongoDBs indexing capabilities to generate indexes on Subject, Predicate, Objectand Graph URI and then used these to apply SPARQL queries over MongoDB and it worked reasonablywell. Though as I note in my blog post it isn’t going to replace dedicated triple stores but does work wellfor small scale stores - actual performance would vary depending on your data and how you use it in yourapplication.

Vasiliy Faronov

Suppose I am building a Linked Data client app based on Python and RDFLib, and I want to do somereasoning. Most likely I have a few vocabularies that are dear to my heart, and want to do RDFS reason-ing with them, i.e. materialize superclass membership, superproperty values etc. I also want to handleowl:sameAs in instance data. Support for the rest of OWL is welcome but not essential.

The graphs I will be working with are rather small, let’s say on the order of 10,000 triples (all stored inmemory), but I need to reason in real-time (e.g. my client is an end-user app that works with Linked Data)and so delays should be small.

1.2. Stores 67

http://www.xml.com/pub/a/2005/07/20/versa.html

http://wiki.xml3k.org/Versa

http://www.bearcave.com/misl/misl_tech/dsge_query_language.pdf

http://www.dotnetrdf.org/blogitem.asp?blogID=35

http://www.dist-systems.bbn.com/people/krohloff/shard.shtml

http://sourceforge.net/projects/shard-3store/

http://www4.wiwiss.fu-berlin.de/bizer/BerlinSPARQLBenchmark/

http://sourceforge.net/projects/bsbmtools/

http://www.w3.org/2001/sw/DataAccess/tests/r2

https://github.com/vangelisv/thea/wiki

http://www.ivan-herman.net/Misc/2008/owlrl/

http://answers.semanticweb.com/questions/2579/approaches-to-storing-semantic-data-in-a-document-database-like-mongodb


But most importantly, the solution has to be as easy to use as possible. Ideally:

import reasonerreasoner.infer_all(my_rdflib_graph)

What are my best options?

author Graham Higgins

contact Graham Higgins, [email protected]

version 0.1

1.3 Tools

The rdflib “tools” directory and its small collection of tools have been removed from rdflib core and placedinto rdfextras.tools.

The collection has shrunk slightly with the removal of tools that fail to build, are otherwise obsolete,and/or no longer supported by the author.

May 02, 2015

1.3.1 Tools and utilities

Contents:

May 02, 2015

CSV2RDF

csv2rdf.py -b <instance-base> -p <property-base> [-c <classname>] [-i <identity column(s)>] [-l <label columns>] [-s<N>] [-o <output>] [-f configfile] [–col<N> <colspec>] [–prop<N> <property>] <[-d <delim>] [-C] [files...]”

Reads csv files from stdin or given files

if -d is given, use this delimiter

if -s is given, skips N lines at the start

Creates a URI from the columns given to -i, or automatically bynumbering if none is given

Outputs RDFS labels from the columns given to -l

if -c is given adds a type triple with the given classname

if -C is given, the class is defined as rdfs:Class

Outputs one RDF triple per column in each row.

Output is in n3 format.

Output is stdout, unless -o is specified


http://bel-epa.com/gjh/



Long options also supported: –base, –propbase, –ident, –class, –label, –out, –defineclass

Long options –col0, –col1, ... can be used to specify conversion for columns. Conversions can be: float(), int(),split(sep, [more]), uri(base, [class]), date(format)

Long options –prop0, –prop1, ... can be used to use specific properties, rather than ones auto-generated from theheaders

-f says to read config from a .ini/config file - the file must contain one section called csv2rdf, with keys like the longoptions, i.e.:

[csv2rdf]out=output.n3base=http://example.org/col0=split(";")col1=split(";", uri("http://example.org/things/","http://xmlns.com/foaf/0.1/Person"))col2=float()col3=int()col4=date("%Y-%b-%d %H:%M:%S")

class rdfextras.tools.csv2rdf.CSV2RDF

May 02, 2015

describer

class rdfextras.tools.describer.Describer(graph=None, about=None, base=None)

about(subject, **kws)Sets the current subject. Will convert the given object into an URIRef if it’s not an Identifier.

Usage:

>>> d = Describer()>>> d._current()rdflib.term.BNode(...)>>> d.about("http://example.org/")>>> d._current()rdflib.term.URIRef(u’http://example.org/’)

rdftype(t)Shorthand for setting rdf:type of the current subject.

Usage:

>>> from rdflib import URIRef>>> from rdflib.namespace import RDF, RDFS>>> d = Describer(about="http://example.org/")>>> d.rdftype(RDFS.Resource)>>> (URIRef(’http://example.org/’), RDF.type, RDFS.Resource) in d.graphTrue

rel(p, o=None, **kws)Set an object for the given property. Will convert the given object into an URIRef if it’s not anIdentifier. If none is given, a new BNode is used.

Returns a context manager for use in a with block, within which the given object is used as currentsubject.

Usage:

1.3. Tools 69


>>> from __future__ import with_statement>>> from rdflib import URIRef>>> from rdflib.namespace import RDF, RDFS>>> d = Describer(about="/", base="http://example.org/")>>> _ctxt = d.rel(RDFS.seeAlso, "/about")>>> d.graph.value(URIRef(’http://example.org/’), RDFS.seeAlso)rdflib.term.URIRef(u’http://example.org/about’)

>>> with d.rel(RDFS.seeAlso, "/more"):... d.value(RDFS.label, "More")>>> (URIRef(’http://example.org/’), RDFS.seeAlso,... URIRef(’http://example.org/more’)) in d.graphTrue>>> d.graph.value(URIRef(’http://example.org/more’), RDFS.label)rdflib.term.Literal(u’More’)

rev(p, s=None, **kws)Same as rel, but uses current subject as object of the relation. The given resource is still used as subjectin the returned context manager.

Usage:

>>> from __future__ import with_statement>>> from rdflib import URIRef>>> from rdflib.namespace import RDF, RDFS>>> d = Describer(about="http://example.org/")>>> with d.rev(RDFS.seeAlso, "http://example.net/"):... d.value(RDFS.label, "Net")>>> (URIRef(’http://example.net/’), RDFS.seeAlso,... URIRef(’http://example.org/’)) in d.graphTrue>>> d.graph.value(URIRef(’http://example.net/’), RDFS.label)rdflib.term.Literal(u’Net’)

value(p, v, **kws)Set a literal value for the given property. Will cast the value to an Literal if a plain literal is given.

Usage:

>>> from rdflib import URIRef>>> from rdflib.namespace import RDF, RDFS>>> d = Describer(about="http://example.org/")>>> d.value(RDFS.label, "Example")>>> d.graph.value(URIRef(’http://example.org/’), RDFS.label)rdflib.term.Literal(u’Example’)

May 02, 2015

rdfpipe

A commandline tool for parsing RDF in different formats and serializing the resulting graph to a chosen format.

rdfextras.tools.rdfpipe.guess_format(fpath, fmap=None)Guess RDF serialization based on file suffix. Uses SUFFIX_FORMAT_MAP unless fmap is provided. Exam-ples:

>>> guess_format(’path/to/file.rdf’)’xml’>>> guess_format(’path/to/file.owl’)



’xml’>>> guess_format(’path/to/file.ttl’)’n3’>>> guess_format(’path/to/file.xhtml’)’rdfa’>>> guess_format(’path/to/file.svg’)’rdfa’>>> guess_format(’path/to/file.xhtml’, ’xhtml’: ’grddl’)’grddl’

This also works with just the suffixes, with or without leading dot, and regardless of letter case:

>>> guess_format(’.rdf’)’xml’>>> guess_format(’rdf’)’xml’>>> guess_format(’RDF’)’xml’

rdfextras.tools.rdfpipe.parse_and_serialize(input_files, input_format, guess, outfile,output_format, ns_bindings, store_conn=’‘,store_type=’IOMemory’)

rdfextras.tools.rdfpipe._format_and_kws(fmt)

>>> _format_and_kws("fmt")(’fmt’, )>>> _format_and_kws("fmt:+a")(’fmt’, ’a’: True)>>> _format_and_kws("fmt:a")(’fmt’, ’a’: True)>>> _format_and_kws("fmt:+a,-b")(’fmt’, ’a’: True, ’b’: False)>>> _format_and_kws("fmt:c=d")(’fmt’, ’c’: ’d’)

rdfextras.tools.rdfpipe.make_option_parser()

rdfextras.tools.rdfpipe.main()

1.4 Utils

rdfextras.utils contains collections of utility functions.

May 02, 2015

1.4.1 Utilities

Documentation opened.

Contents: May 02, 2015

termutils

Convenience functions for working with Terms and Graphs.

1.4. Utils 71


normalizeGraph()

rdfextras.utils.termutils.normalizeGraph(graph)Takes an instance of a Graph and returns the instance’s identifier and type.

Types are U for a Graph, F for a QuotedGraph and B for a ConjunctiveGraph

>>> from rdflib import plugin>>> from rdflib.graph import Graph, ConjunctiveGraph, QuotedGraph>>> from rdflib.store import Store>>> from rdflib import URIRef, Namespace>>> from rdfextras.utils.termutils import normalizeGraph>>> memstore = plugin.get(’IOMemory’, Store)()>>> g = Graph(memstore, URIRef("http://purl.org/net/bel-epa/gjh"))>>> normalizeGraph(g)(rdflib.term.URIRef(u’http://purl.org/net/bel-epa/gjh’), ’U’)>>> g = ConjunctiveGraph(... memstore, URIRef("http://purl.org/net/bel-epa/gjh"))...>>> normalizeGraph(g)(rdflib.term.URIRef(u’http://purl.org/net/bel-epa/gjh’), ’U’)>>> g = QuotedGraph(memstore, Namespace("http://purl.org/net/bel-epa/gjh"))>>> normalizeGraph(g)(Namespace(u’http://purl.org/net/bel-epa/gjh’), ’F’)

term2Letter()

rdfextras.utils.termutils.term2Letter(term)Relate a given term to one of several key types:

•BNode,

•Literal,

•Statement

•URIRef,

•Variable

•Graph

•QuotedGraph

>>> import rdflib>>> from rdflib import plugin>>> from rdflib import URIRef, Namespace>>> from rdflib.term import BNode, Literal, Variable, Statement>>> from rdflib.graph import Graph, ConjunctiveGraph, QuotedGraph>>> from rdflib.store import Store>>> from rdfextras.utils.termutils import term2Letter>>> term2Letter(URIRef(’http://purl.org/net/bel-epa.com/’))’U’>>> term2Letter(BNode())’B’>>> term2Letter(Literal(u’’))’L’>>> term2Letter(Variable(u’x’))’V’>>> term2Letter(Graph())



http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.QuotedGraph

http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.ConjunctiveGraph

http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.term.BNode


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.term.Statement


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.term.Variable


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.QuotedGraph


’B’>>> term2Letter(QuotedGraph("IOMemory", None))’F’>>> term2Letter(None)’L’>>> term2Letter(Statement((None, None, None), None))’s’

constructGraph()

rdfextras.utils.termutils.constructGraph(key)Given a key (one of ‘F’, ‘U’ or ‘B’), returns a tuple containing a Graph and an appropriate referent.

>>> from rdfextras.utils.termutils import constructGraph>>> constructGraph(’F’)(<class ’rdflib.graph.QuotedGraph’>, <class ’rdflib.term.URIRef’>)>>> constructGraph(’U’)(<class ’rdflib.graph.Graph’>, <class ’rdflib.term.URIRef’>)>>> constructGraph(’B’)(<class ’rdflib.graph.Graph’>, <class ’rdflib.term.BNode’>)

triplePattern2termCombinations()

rdfextras.utils.termutils.triplePattern2termCombinations((s, p, o))Maps a triple pattern to term combinations (non-functioning)

type2TermCombination()

rdfextras.utils.termutils.type2TermCombination(member, klass, context)Maps a type to a TermCombo

statement2TermCombination()

rdfextras.utils.termutils.statement2TermCombination(subject, predicate, obj, context)Maps a statement to a Term Combo

May 02, 2015

graphutils

find_roots()

rdfextras.utils.graphutils.find_roots(graph, prop, roots=None)Find the roots in some sort of transitive hierarchy.

find_roots(graph, rdflib.RDFS.subClassOf) will return a set of all roots of the sub-class hierarchy

Assumes triple of the form (child, prop, parent), i.e. the direction of RDFS.subClassOf or SKOS.broader

1.4. Utils 73


get_tree()

rdfextras.utils.graphutils.get_tree(graph, root, prop, mapper=<function <lambda>at 0x7ff3afbd7398>, sortkey=None, done=None,dir=’down’)

Return a nested list/tuple structure representing the tree built by the transitive property given, starting from theroot given

i.e.

get_tree(graph, rdflib.URIRef(“http://xmlns.com/foaf/0.1/Person”), rdflib.RDFS.subClassOf)

will return the structure for the subClassTree below person.

dir=’down’ assumes triple of the form (child, prop, parent), i.e. the direction of RDFS.subClassOf orSKOS.broader Any other dir traverses in the other direction

May 02, 2015

cmdlineutils

main()

rdfextras.utils.cmdlineutils.main(target, _help=<function _help at 0x7ff3aff42848>, op-tions=’‘, stdin=True)

A main function for tools that read RDF from files given on commandline or from STDIN (if stdin parameter istrue)

May 02, 2015

pathutils

RDF- and RDFlib-centric file and URL path utilities.

uri_leaf()

rdfextras.utils.pathutils.uri_leaf(uri)Get the “leaf” - fragment id or last segment - of a URI. Useful e.g. for getting a term from a “namespace like”URI. Examples:

>>> uri_leaf(’http://example.org/ns/things#item’)’item’>>> uri_leaf(’http://example.org/ns/stuff/item’)’item’>>> uri_leaf(’http://example.org/ns/stuff/’)>>>>>> uri_leaf(’urn:example.org:stuff’)’stuff’>>> uri_leaf(’example.org’)>>>


http://xmlns.com/foaf/0.1/Person


guess_format()

rdfextras.utils.pathutils.guess_format(fpath, fmap=None)Guess RDF serialization based on file suffix. Uses SUFFIX_FORMAT_MAP unless fmap is provided. Exam-ples:

>>> guess_format(’path/to/file.rdf’)’xml’>>> guess_format(’path/to/file.owl’)’xml’>>> guess_format(’path/to/file.ttl’)’n3’>>> guess_format(’path/to/file.xhtml’)’rdfa’>>> guess_format(’path/to/file.svg’)’rdfa’>>> guess_format(’path/to/file.xhtml’, ’xhtml’: ’grddl’)’grddl’

This also works with just the suffixes, with or without leading dot, and regardless of letter case:

>>> guess_format(’.rdf’)’xml’>>> guess_format(’rdf’)’xml’>>> guess_format(’RDF’)’xml’

1.4. Utils 75

CHAPTER 2

Introduction to basic tasks in rdflib

rdflib wiki articles transcluded here for convenience.

2.1 Parsing RDF into rdflib graphs

2.1.1 Reading an NT file

RDF data has various syntaxes ([ xml], [ n3], [ ntriples], trix, etc) that you might want to read. The simplest format isntriples. Create the file demo.nt in the current directory with these two lines:

<http://bigasterisk.com/foaf.rdf#drewp> \<http://www.w3.org/1999/02/22-rdf-syntax-ns#type><http://xmlns.com/foaf/0.1/Person> .

<http://bigasterisk.com/foaf.rdf#drewp> \<http://example.com/says> "Hello world" .

In an interactive python interpreter, try this:

>>> from rdflib.graph import Graph

>>> g = Graph()

>>> g.parse("demo.nt", format="nt") # DOCTEST ELLIPSIS<Graph identifier=... (<class ’rdflib.Graph.Graph’>)>

>>> len(g)2

>>> for stmt in g:... print stmt...(rdflib.URIRef(’http://bigasterisk.com/foaf.rdf#drewp’),rdflib.URIRef(’http://example.com/says’),rdflib.Literal(’Hello world’, language=None, datatype=None))

(rdflib.URIRef(’http://bigasterisk.com/foaf.rdf#drewp’),rdflib.URIRef(’http://www.w3.org/1999/02/22-rdf-syntax-ns#type’),rdflib.URIRef(’http://xmlns.com/foaf/0.1/Person’))

The final lines show how RDFLib represents the two statements in the file. The statements themselves are just length-3tuples; and the subjects, predicates, and objects are all rdflib types.

77


2.1.2 Reading remote graphs

Reading graphs from the net is just as straightforward:

>>> g.parse("http://bigasterisk.com/foaf.rdf")

>>> len(g)42

The format defaults to xml, which is the common format for .rdf files you’ll find on the net.

See also the :meth:‘~rdflib.graph.Graph.parse method <http://readthedocs/rdflib3/rdflib.Graph.Graph-class.html#parse>‘_ and

other parsers supported by rdflib 3

2.2 Using SPARQL to query an rdflib 3 graph

2.2.1 Get the SPARQL RDFLib plugin

SPARQL is no longer shipped with core RDFLib, instead it is now a part of rdfextras

Assuming you have rdfextras installed with setuptools (highly recommended), you can use SPARQL with RDFLib3.X out of the box.

If you only have distutils, you have to add these lines somewhere at the top of your program:

import rdfextrasrdfextras.registerplugins()

2.2.2 Create an RDFLib Graph

You might parse some files into a new graph or open an on-disk RDFLib store.

from rdflib.graph import Graphg = Graph()g.parse("http://www.w3.org/People/Berners-Lee/card.rdf")

2.2.3 Run a Query

querystr = """SELECT ?aname ?bnameWHERE

?a foaf:knows ?b .?a foaf:name ?aname .?b foaf:name ?bname .

"""for row in g.query(

querystr,initNs=dict(foaf=Namespace("http://xmlns.com/foaf/0.1/"))):

print("%s knows %s" % row)

The results are tuples of values in the same order as your SELECT arguments.

78 Chapter 2. Introduction to basic tasks in rdflib

http://readthedocs/rdflib3/rdflib.Graph.Graph-class.html#parse

http://readthedocs/rdflib3/rdflib.Graph.Graph-class.html#parse

http://readthedocs.com/rdflib3/rdflib.syntax.parsers-module.html

https://github.com/RDFLib/rdfextras/


Timothy Berners-Lee knows Edd DumbillTimothy Berners-Lee knows Jennifer GolbeckTimothy Berners-Lee knows Nicholas GibbinsTimothy Berners-Lee knows Nigel ShadboltDan Brickley knows binzacTimothy Berners-Lee knows Eric MillerDrew Perttula knows David McCloskyTimothy Berners-Lee knows Dan Connolly...

2.2.4 Namespaces

The rdfextras.sparql.graph.Graph.parse() initNs argument is a dictionary of namespaces to beexpanded in the query string. In a large program, it’s common to use the same dict for every single query. You mighteven hack your graph instance so that the initNs arg is already filled in.

If someone knows how to use the empty prefix (e.g. ”?a :knows ?b”), please write about it here and in the Graph.querydocs.

2.2.5 Bindings

As with conventional SQL queries, it’s common to run the same query many times with only a few terms changing.rdflib calls this initBindings:

FOAF = Namespace("http://xmlns.com/foaf/0.1/")ns = dict(foaf=FOAF)drew = URIRef(’http://bigasterisk.com/foaf.rdf#drewp’)for row in g.query(’SELECT ?name WHERE ?p foaf:name ?name ’,

initNs=ns,initBindings=’p’ : drew):

print(row)

Output:

(rdflib.Literal(’Drew Perttula’, language=None, datatype=None),)

See also the the rdflib.graph.Graph.query() API docs

2.3 Using MySQL as a triple store with rdflib/rdfextras

Example code to create a MySQL triple store, add some triples, and serialize the resulting graph.

import rdflibfrom rdflib.graph import ConjunctiveGraph as Graphfrom rdflib import pluginfrom rdflib.store import Storefrom rdflib.store import NO_STOREfrom rdflib.store import VALID_STOREfrom rdflib import Literalfrom rdflib import Namespacefrom rdflib import URIRef

default_graph_uri = "http://rdflib.net/rdfstore"configString = "host=localhost,user=username,password=password,db=rdfstore"

2.3. Using MySQL as a triple store with rdflib/rdfextras 79

http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.Graph.query

http://rdflib.net/rdflib-2.4.0/html/public/rdflib.Graph.Graph-class.html#query


# Get the mysql plugin. You may have to install the python mysql librariesstore = plugin.get(’MySQL’, Store)(’rdfstore’)

# Open previously created store, or create it if it doesn’t exist yetrt = store.open(configString,create=False)if rt == NO_STORE:

# There is no underlying MySQL infrastructure, create itstore.open(configString,create=True)

else:assert rt == VALID_STORE,"There underlying store is corrupted"

# There is a store, use itgraph = Graph(store, identifier = URIRef(default_graph_uri))

print("Triples in graph before add: %s" % len(graph))

# Now we’ll add some triples to the graph & commit the changesrdflib = Namespace(’http://rdflib.net/test/’)graph.add((rdflib[’pic:1’], rdflib[’name’], Literal(’Jane & Bob’)))graph.add((rdflib[’pic:2’], rdflib[’name’], Literal(’Squirrel in Tree’)))graph.commit()

print("Triples in graph after add: %" % len(graph))

# display the graph in RDF/XMLprint(graph.serialize())

2.4 Transitive Traversal

How to use the transitive_objects and transitive_subjects graph methods.

2.4.1 Formal definition

The transitive_objects method finds all nodes such that there is a path from subject to one of those nodesusing only the predicate property in the triples. The transitive_subjects method is similar; it finds all nodessuch that there is a path from the node to the object using only the predicate property.

2.4.2 Informal description, with an example

In brief, transitive_objects walks forward8 in a graph using a particular property and ‘‘transitive_subjects‘‘walks *backward.

A good example uses a property ex:parent, the semantics of which are biological parentage.

The transitive_objects method would get all the ancestors of a particular person (all nodes such that there isa parent path between the person and the object).

The transitive_subjects method would get all the descendants of a particular person (all nodes such thatthere is a parent path between the node and the person).

So, say that your URI is ex:person. The following code would get all of your (known) ancestors, and then get allthe (known) descendants of your maternal grandmother:



from rdflib import ConjunctiveGraph, URIRef

person = URIRef(’ex:person’)dad = URIRef(’ex:d’)mom = URIRef(’ex:m’)momOfDad = URIRef(’ex:gm0’)momOfMom = URIRef(’ex:gm1’)dadOfDad = URIRef(’ex:gf0’)dadOfMom = URIRef(’ex:gf1’)

parent = URIRef(’ex:parent’)

g = ConjunctiveGraph()g.add((person, parent, dad))g.add((person, parent, mom))g.add((dad, parent, momOfDad))g.add((dad, parent, dadOfDad))g.add((mom, parent, momOfMom))g.add((mom, parent, dadOfMom))

print "Parents, forward from ‘ex:person‘:"for i in g.transitive_objects(person, parent):

print(i)

print "Parents, *backward* from ‘ex:gm1‘:"for i in g.transitive_subjects(parent, momOfMom):

print(i)

Warning: The transitive_objects method has the start node as the first argument, but thetransitive_subjects method has the start node as the second argument.

2.5 Working with RDFLib and RDFExtras, the basics

2.5.1 Working with Graphs

The RDFLib Graph is one of the main workhorses for working with RDF.

The most direct way to create a Graph is simply:

>>> from rdflib import Graph>>> g = Graph()>>> g<Graph identifier=aGwNIAoQ0 (<class ’rdflib.graph.Graph’>)>

A BNode is automatically created as the graph’s default identifier. A specific identifier can be supplied at creationtime:

>>> g = Graph(identifier="mygraph")>>> g<Graph identifier=mygraph (<class ’rdflib.graph.Graph’>)>

By default a Graph is persisted in an integer-key-optimized, context-aware, in-memory Store:

>>> g.store<rdflib.plugins.memory.IOMemory object at 0x8c881ac>

2.5. Working with RDFLib and RDFExtras, the basics 81


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.store.Store


A different store can be specified at creation time by using the identifying string registered for the store, e.g. for aSleepycat store:

>>> g = Graph(’Sleepycat’, identifier="mygraph")>>> g.store<rdflib.plugins.sleepycat.Sleepycat object at 0x8c8836c>

Note that an identifier for the Graph object is required. The Sleepycat Store affords the storage of multiple graphs sohaving an identifier is necessary for any subsequent retrieval by identifier.

RDFLib Stores can be created separately and can subsequently be bound to Graph.store:

>>> from rdflib_sqlalchemy.SQLAlchemy import SQLAlchemy>>> store = SQLAlchemy(configuration="postgresql://localhost/test")>>> g = Graph(store, identifier="mygraph")>>> g.store<Partitioned SQL N3 Store: 0 contexts, 0 classification assertions, \0 quoted statements, 0 property/value assertions, and 0 other assertions>

See the RDFLib documentation for further details of the RDFLib Graph API.

For a list of other available RDFLib plugin Stores see the RDFLib Github project page.

2.5.2 Working with ConjunctiveGraphs

The ConjunctiveGraph is the ‘top-level’ Graph. It is the aggregation of all the contexts (sub-graphs) within it and it isalso the appropriate, absolute boundary for closed world assumptions / models.

For the sake of persistence, Conjunctive Graphs must be distinguished by identifiers. If an identifier is not supplied atcreation time, then one will be automatically assigned:

>>> from rdflib import ConjunctiveGraph, URIRef>>> g = ConjunctiveGraph(store)>>> g<Graph identifier=JAxWBSXY0 (<class ’rdflib.graph.ConjunctiveGraph’>)>

Contexts are sub-graphs, they are identified by an identifier which may be an RDFLib Literal or a URIRef:

>>> c1 = URIRef("http://example.org/mygraph1")>>> c2 = URIRef("http://example.org/mygraph2")

Statements can be added to / retrieved from specific contexts:

>>> bob = URIRef(u’urn:bob’)>>> likes = URIRef(u’urn:likes’)>>> pizza = URIRef(u’urn:pizza’)>>> g.get_context(c1).add((bob, likes, pizza))>>> g.get_context(c2).add((bob, likes, pizza))

The ConjunctiveGraph is the aggregation of all the contexts within it:

>>> list(g.contexts())[<Graph identifier=http://example.org/mygraph2 (<class ’rdflib.graph.Graph’>)>,<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>]

The contexts / sub-graphs are instances of RDFLib Graph:

>>> gc1 = g.get_context(c1)>>> gc1<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>


http://rdflib.readthedocs.org/en/latest/modules/graphs/graph.html

http://github.com/RDFLib


>>> len(gc1)1>>> gc2 = g.get_context(c2)>>> len(gc2)1>>> len(g)2

Changes to the contexts are also changes to the embracing aggregate ConjunctiveGraph:

>>> tom = URIRef(u’urn:tom’)>>> gc1.add((tom, likes, pizza))>>> len(g)3

2.5.3 Working with namespaces

A small selection of frequently-used namespaces are directly importable:

>>> from rdflib import OWL, RDFS>>> OWLNamespace(u’http://www.w3.org/2002/07/owl#’)>>> RDFSrdf.namespace.ClosedNamespace(’http://www.w3.org/2000/01/rdf-schema#’)

Otherwise, namespaces are defined using the Namespace class which takes as its argument the base URI of thenamespace:

>>> from rdflib import Namespace>>> FOAF = Namespace("http://xmlns.com/foaf/0.1/")>>> FOAFNamespace(u’http://xmlns.com/foaf/0.1/’)

Namespace instances can be accessed attribute-style or dictionary key-style:

>>> RDFS.labelrdflib.term.URIRef(u’http://www.w3.org/2000/01/rdf-schema#label’)>>> RDFS[’label’]rdflib.term.URIRef(u’http://www.w3.org/2000/01/rdf-schema#label’)

Typical use:

>>> g = Graph()>>> s = BNode(’someone’)>>> g.add((s, RDF.type, FOAF.Person))

Instances of Namespace class can be bound to Graphs:

>>> g.bind("foaf", FOAF)

As a programming convenience, a namespace binding is automatically created when URIRef predicates are added tothe graph:

>>> g = Graph()>>> g.add((URIRef("http://example0.com/foo"),... URIRef("http://example1.com/bar"),... URIRef("http://example2.com/baz")))>>> print(g.serialize(format="n3"))@prefix ns1: <http://example1.com/> .


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.namespace.Namespace



<http://example0.com/foo> ns1:bar <http://example2.com/baz> .

2.5.4 Working with statements

Working with statements as Python strings

An example of hand-drawn statements in Notation3:

n3data = """\@prefix : <http://www.snee.com/ns/demo#> .

:Jane :hasParent :Gene .:Gene :hasParent :Pat ;

:gender :female .:Joan :hasParent :Pat ;

:gender :female .:Pat :gender :male .:Mike :hasParent :Joan ."""

These can be added to a Graph via the parse() method:

>>> gc1.parse(data=n3data, format="n3")<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>>>> len(gc1)7

Working with external bulk data

Alternatively, an external source of bulk data can be used (unless specified otherwise the format defaults toRDF/XML):

>>> data_url = "http://www.w3.org/2000/10/swap/test/gedcom/gedcom-facts.n3">>> gc1.parse(data_url, format="n3")<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>>>> len(gc1)74>>> print(gc1.serialize(format="n3"))@prefix default5: <http://www.w3.org/2000/10/swap/test/gedcom/gedcom-relations.n3#> .@prefix gc: <http://www.daml.org/2001/01/gedcom/gedcom#> .

default5:Ann gc:childIn default5:gd;default5:gender default5:F .

default5:Ann_Sophie gc:childIn default5:dv;default5:gender default5:F .

default5:Bart gc:childIn default5:gd;default5:gender default5:M .

...


http://rdflib.readthedocs.org/en/latest/apidocs/rdflib.html#rdflib.graph.Graph.parse


Working with web pages containing RDFa

RDFLib provides a built-in version of Ivan Herman’s RDFa Distiller so “external bulk data” also means “web pagescontaining RDFa markup”:

>>> url = "http://www.oettl.it/">>> gc1.parse(location=url, format="rdfa", lax=True)<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>>>> len(gc1)68>>> print(gc1.serialize(format="n3"))@prefix commerce: <http://search.yahoo.com/searchmonkey/commerce/> .@prefix eco: <http://www.ebusiness-unibw.org/ontologies/eclass/5.1.4/#> .@prefix foaf: <http://xmlns.com/foaf/0.1/> .@prefix gr: <http://purl.org/goodrelations/v1#> .@prefix media: <http://search.yahoo.com/searchmonkey/media/> .@prefix owl: <http://www.w3.org/2002/07/owl#> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .@prefix vcard: <http://www.w3.org/2006/vcard/ns#> .@prefix xhv: <http://www.w3.org/1999/xhtml/vocab#> .

<http://www.oettl.it/#BusinessEntity> a gr:BusinessEntity,commerce:Business,vcard:VCard;

gr:hasPOS <http://www.oettl.it/#LOSOSP_1>;gr:offers <http://www.oettl.it/#Offering_1>;commerce:hoursOfOperation "Mon-Fri 8.00-12.00 and 13.00-18.00, Sat 8.00-12.00 [Yahoo commerce]"@NULL;media:image <http://www.oettl.it/img/karl_foto.jpg>;rdfs:isDefinedBy <http://www.oettl.it/>;rdfs:seeAlso <http://www.oettl.it/>;vcard:adr <http://www.oettl.it/#address>;vcard:url <http://www.oettl.it/>;foaf:depiction <http://www.oettl.it/img/karl_foto.jpg> .

...

The GoodRelations wiki lists some other sources of RDFa-enabled web pages

The RDFLib Graph API presents full details of args and kwargs for Graph.parse.

Also see the working with Graphs <http://rdflib.readthedocs.org/en/latest/modules/graphs/index.html#module-rdflib.graph> section of the RDFLib documentation.

Working with individual statements

Individual statements can be added, removed, etc.

>>> gc1.remove((tom, likes, pizza))

>>> from rdflib import RDFS, Literal>>> gc1.bind("rdfs", RDFS.uri)>>> graham = URIRef(u’urn:graham’)>>> gc1.add((graham, likes, pizza))>>> gc1.add((graham, RDFS.label, Literal("Graham")))>>> print(gc1.serialize(format="n3"))@prefix ns4: <urn:> .@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .


http://www.w3.org/2007/08/pyRdfa/

http://www.w3.org/TR/rdfa-syntax

http://www.ebusiness-unibw.org/wiki/GoodRelations

http://rdflib.readthedocs.org/en/latest/modules/graphs/graph.html#rdflib.graph.Graph.parse


ns4:graham rdfs:label "Graham";ns4:likes ns4:pizza .

As before, see the RDFLib documentation for further details of the RDFLib Graph API for a range of useful operationson Graphs, e.g.

>>> [o for o in gc1.objects(subject=graham, predicate=likes)][rdflib.term.URIRef(u’urn:pizza’)]

>>> [o for o in gc1.predicate_objects(subject=graham)] # output prettified by hand here[(rdflib.term.URIRef(u’urn:likes’), rdflib.term.URIRef(u’urn:pizza’)),(rdflib.term.URIRef(u’http://www.w3.org/2000/01/rdf-schema#label’),rdflib.term.Literal(u’Graham’))]

>>> gc1.value(subject=graham, predicate=likes)rdflib.term.URIRef(u’urn:pizza’)

2.5.5 Working with nodes

Literal and URIRef are the two most commonly-used nodes in an RDF graph.

Working with URIRefs is quite straightforward:

>>> uri = URIRef("http://example.com")>>> urirdflib.term.URIRef(u’http://example.com’)>>> str(uri)’http://example.com’

The options for working with Literals are amply illustrated in the Literal node docs. Also see the appropriate sectionin the RDF specs:

>>> graham = Literal(u’Graham’, lang="en")>>> grahamrdflib.term.Literal(u’Graham’, lang=’en’)>>> from rdflib.namespace import XSD>>> graham = Literal(u’Graham’, datatype=XSD.string)>>> grahamrdflib.term.Literal(u’Graham’, datatype=rdflib.term.URIRef(u’http://www.w3.org/2001/XMLSchema#string’))

Literals are permitted to have only one of the attributes datatype or lang.:

>>> graham = Literal(u’Graham’, datatype=XSD.string, lang="en")Traceback (most recent call last):

File "<stdin>", line 1, in <module>File ".../rdflib/term.py", line 337, in __new__raise TypeError("A Literal can only have one of lang or datatype, "

TypeError: A Literal can only have one of lang or datatype,per http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal

2.5.6 Working with SPARQL

Assuming that RDFExtras was installed with setuptools (highly recommended), SPARQL can be used out of the boxwith RDFLib 3.X.

Note: If only distutils is available, then these lines need to be included somewhere near the top of the progam code:


http://rdflib.readthedocs.org/en/latest/modules/graphs/graph.html

http://rdflib.readthedocs.org/en/latest/modules/node.html#rdflib.term.Literal

http://www.w3.org/TR/rdf-concepts/#section-Graph-Literal


import rdfextrasrdfextras.registerplugins()

“SPARQL can be used out of the box” translates as: RDFLib Graph gets a ‘query’ method that accepts a SPARQLquery string:

>>> results = gc1.query("""SELECT ?s ?p ?o WHERE ?s ?p ?o .""")

The ‘query’ method API offers keywords to set namespace bindings - initNs (RDF, RDFS and OWL namespaces arepre-installed as a convenience to programmers but see example below for usage), variable bindings - initBindings(also see example below) and a boolean debug flag - DEBUG (ditto):

>>> FOAF = Namespace("http://xmlns.com/foaf/0.1/")>>> ns = dict(foaf=FOAF)>>> drew = URIRef(’http://bigasterisk.com/foaf.rdf#drewp’)>>> for row in g.query(... """SELECT ?name WHERE ?p foaf:name ?name """,... initNs=ns,... initBindings=’p’ : drew,... DEBUG=True):... print(row)

Note: When graph.store is an instance of SPARQLStore or SPARQLUpdateStore, the API is reduced to just thequery string arg, i.e. the ‘initNs’, ‘initBindings’ and ‘DEBUG’ keywords are not recognized.

Using the following set of statements:

>>> n3data = """\@prefix : <http://www.snee.com/ns/demo#> .

:Jane :hasParent :Gene .:Gene :hasParent :Pat ;

:gender :female .:Joan :hasParent :Pat ;

:gender :female .:Pat :gender :male .:Mike :hasParent :Joan ."""

And the following SPARQL CONSTRUCT query:

>>> cq = """\CONSTRUCT ?p :hasGrandfather ?g .

WHERE ?p :hasParent ?parent .?parent :hasParent ?g .?g :gender :male .

"""

Executing the query returns a SPARQLQueryResult, the serialization of which can be passed directly to Graph.parse:

>>> gc1.parse(data=n3data, format="n3")>>> nsdict = ’’:"http://www.snee.com/ns/demo#">>> result_graph = gc1.query(cq, initNs=nsdict)>>> newg = Graph().parse(data=result_graph.serialize(format=’xml’))>>> print(newg.serialize(format="n3"))@prefix ns3: <http://www.snee.com/ns/demo#> .

ns3:Jane ns3:hasGrandfather ns3:Pat .



ns3:Mike ns3:hasGrandfather ns3:Pat .

The RDFExtras test suite contains many examples of SPARQL queries and a companion document provides furtherdetails of working with basic SPARQL in RDFLib.

2.5.7 Working with SPARQL query results

Query results can be iterated over in a straightforward fashion. Row bindings are positional:

>>> gc1.parse("http://bel-epa.com/gjh/foaf.rdf", format="xml")<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>>>> query = """\... SELECT ?aname ?bname... WHERE ... ?a foaf:knows ?b .... ?a foaf:name ?aname .... ?b foaf:name ?bname .... """>>> nses = dict(foaf=Namespace("http://xmlns.com/foaf/0.1/"))>>> for row in gc1.query(query, initNs=nses):... print(repr(row))...(rdflib.term.Literal(u’Graham Higgins’), rdflib.term.Literal(u’Ngaio Macfarlane’))

A more detailed view of the returned SPARQLResult:

>>> gc1.parse("http://bel-epa.com/gjh/foaf.rdf", format="xml")<Graph identifier=http://example.org/mygraph1 (<class ’rdflib.graph.Graph’>)>>>> query = """\... SELECT ?aname ?bname... WHERE ... ?a :knows ?b .... ?a :name ?aname .... ?b :name ?bname .... """>>>>>> foaf = Namespace("http://xmlns.com/foaf/0.1/")>>> rows = gc1.query(query, initNs=’’:foaf)>>> for i in [’askAnswer’, ’bindings’, ’graph’,... ’selectionF’, ’type’, ’vars’]:... v = getattr(rows, i)... print(i, type(v), v, repr(v))...(’askAnswer’, <type ’NoneType’>, None, ’None’)(’bindings’, <type ’list’>, [

?bname: rdflib.term.Literal(u’Ngaio Macfarlane’),?aname: rdflib.term.Literal(u’Graham Higgins’)]")

(’graph’, <type ’NoneType’>, None, ’None’)(’selectionF’, <type ’list’>, [?aname, ?bname], ’[?aname, ?bname]’)(’type’, <type ’str’>, ’SELECT’, "’SELECT’")(’vars’, <type ’list’>, [?aname, ?bname], ’[?aname, ?bname]’)

>>> x = rows.vars[0]>>> print(type(x), repr(x), str(x), x)(<class ’rdflib.term.Variable’>, ’?aname’, ’aname’, ?aname)>>> for row in rows.bindings[4:5]:... print("Row", type(row), row)


https://github.com/RDFLib/rdfextras/blob/master/test/test_sparql/test_sparql_date_filter.py


... for col in row:

... print("Col", type(col), repr(col), str(col), col, row[col])

...(’Row’, <type ’dict’>, ?bname: rdflib.term.Literal(u’Ngaio Macfarlane’),

?aname: rdflib.term.Literal(u’Graham Higgins’))(’Col’, <class ’rdflib.term.Variable’>, ’?bname’, ’bname’, ?bname,rdflib.term.Literal(u’Ngaio Macfarlane’))

(’Col’, <class ’rdflib.term.Variable’>, ’?aname’, ’aname’, ?aname,rdflib.term.Literal(u’Graham Higgins’))

Note the unusual __repr__() result for the SPARQL variables, i.e. ?aname. The actual value is aname, the questionmark is added for the __repr__() result. Iterating over the bindings behaves as expected:

>>> for row in rows.bindings:... for col in row:... print(col, row[col])...

and so does iteration driven by the vars:

>>> for row in rows.bindings:... for col in rows.vars:... print(col, row[col])...

But when using the keys directly, discard the ‘?’ prefix:

>>> for row in rows.bindings:... knowee = row[’bname’]

SPARQL query result objects can be serialized as XML or JSON:

>>> print("json", rows.serialize(format="json"))(’json’,’"head": "vars": ["aname", "bname"],"results":

"bindings": ["bname": "type": "literal", "value": "Ngaio Macfarlane","aname": "type": "literal", "value": "Graham Higgins"]’)


CHAPTER 3

Techniques

May 02, 2015

3.1 Extending SPARQL Basic Graph Matching

Robbed from the W3C’s SPARQL Query Language for RDF

The overall SPARQL design can be used for queries which assume a more elaborate form of entailment than simpleentailment, by re-writing the matching conditions for basic graph patterns. Since it is an open research problem to statesuch conditions in a single general form which applies to all forms of entailment and optimally eliminates needlessor inappropriate redundancy, this document only gives necessary conditions which any such solution should satisfy.These will need to be extended to full definitions for each particular case.

Basic graph patterns stand in the same relation to triple patterns that RDF graphs do to RDF triples, and much of thesame terminology can be applied to them. In particular, two basic graph patterns are said to be equivalent if there is abijection M between the terms of the triple patterns that maps blank nodes to blank nodes and maps variables, literalsand IRIs to themselves, such that a triple ( s, p, o ) is in the first pattern if and only if the triple ( M(s),M(p) M(o) ) is in the second. This definition extends that for RDF graph equivalence to basic graph patterns bypreserving variable names across equivalent patterns.

An entailment regime specifies

• a subset of RDF graphs called well-formed for the regime

• an entailment relation between subsets of well-formed graphs and well-formed graphs.

Examples of entailment regimes include simple entailment, RDF entailment, RDFS entailment, D-entailment andOWL-DL entailment. Of these, only OWL-DL entailment restricts the set of well-formed graphs. If E is an entailmentregime then we will refer to E-entailment, E-consistency, etc, following this naming convention.

Some entailment regimes can categorize some RDF graphs as inconsistent. For example, the RDF graph:

_:x rdf:type xsd:string ._:x rdf:type xsd:decimal .

is D-inconsistent when D contains the XSD datatypes. The effect of a query on an inconsistent graph is notcovered by this specification, but must be specified by the particular SPARQL extension.

A SPARQL extension to E-entailment must satisfy the following conditions.

1. The scoping graph, SG, corresponding to any consistent active graph AG is uniquely specified and isE-equivalent to AG.

2. For any basic graph pattern BGP and pattern solution mapping P, P(BGP) is well-formed for E

91

http://www.w3.org/TR/rdf-sparql-query/#sparqlBGPExtend


3. For any scoping graph SG and answer set P1 ... Pn for a basic graph pattern BGP, and where BGP1.... BGPn is a set of basic graph patterns all equivalent to BGP, none of which share any blank nodes withany other or with SG. SG E-entails (SG union P1(BGP1) union ... union Pn(BGPn)).These conditions do not fully determine the set of possible answers, since RDF allows unlimited amounts ofredundancy. In addition, therefore, the following must hold.

4. Each SPARQL extension must provide conditions on answer sets which guarantee that every BGP and AG has afinite set of answers which is unique up to RDF graph equivalence.

92 Chapter 3. Techniques

CHAPTER 4

Epydoc API docs

rdfextras epydoc API docs

93


94 Chapter 4. Epydoc API docs

CHAPTER 5

Indices and tables

• genindex

• modindex

• search

95


96 Chapter 5. Indices and tables

Python Module Index

rrdfextras.sparql, 16rdfextras.sparql.algebra, 18rdfextras.sparql.components, 20rdfextras.sparql.evaluate, 22rdfextras.sparql.graph, 23rdfextras.sparql.operators, 25rdfextras.sparql.parser, 27rdfextras.sparql.processor, 28rdfextras.sparql.query, 28rdfextras.store.FOPLRelationalModel.BinaryRelationPartition,

42rdfextras.store.FOPLRelationalModel.QuadSlot,

46rdfextras.store.FOPLRelationalModel.RelationalHash,

47rdfextras.tools.csv2rdf, 69rdfextras.tools.describer, 69rdfextras.tools.rdfpipe, 70rdfextras.utils, 71rdfextras.utils.cmdlineutils, 74rdfextras.utils.graphutils, 73rdfextras.utils.pathutils, 74rdfextras.utils.termutils, 71

97


98 Python Module Index

Index

Symbols_SPARQLNode (class in rdfextras.sparql.query), 28_format_and_kws() (in module rdfextras.tools.rdfpipe),

71

Aabout() (rdfextras.tools.describer.Describer method), 69addConstraint() (rdfextras.sparql.graph.GraphPattern

method), 24addConstraints() (rdfextras.sparql.graph.GraphPattern

method), 24addOperator() (in module rdfextras.sparql.operators), 27addPattern() (rdfextras.sparql.graph.GraphPattern

method), 24addPatterns() (rdfextras.sparql.graph.GraphPattern

method), 24AlgebraExpression (class in rdfextras.sparql.algebra), 18as_empty() (in module rdfextras.sparql.parser), 27ask() (rdfextras.sparql.query.Query method), 30AskQuery (class in rdfextras.sparql.components), 22AssociativeBox (class in rdfex-

tras.store.FOPLRelationalModel.BinaryRelationPartition),44

BBaseDeclaration (class in rdfextras.sparql.components),

20BasicGraphPattern (class in rdfextras.sparql.graph), 25BinaryOperator (class in rdfextras.sparql.components),

21BinaryRelationPartition (class in rdfex-


BinaryRelationPartitionCoverage() (in module rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition),45

BlockOfTriples (class in rdfextras.sparql.components), 20BNodeRef (class in rdfextras.sparql.evaluate), 22bound() (in module rdfextras.sparql.operators), 26BuiltinFunctionCall (class in rdfex-

tras.sparql.components), 20

Ccluster() (rdfextras.sparql.graph.SPARQLGraph method),

23cluster() (rdfextras.sparql.query.Query method), 30clusterBackward() (rdfex-

tras.sparql.graph.SPARQLGraph method),23

clusterBackward() (rdfextras.sparql.query.Querymethod), 30

clusterForward() (rdfextras.sparql.graph.SPARQLGraphmethod), 23

clusterForward() (rdfextras.sparql.query.Query method),30

composition() (in module rdfextras.sparql.parser), 27composition2() (in module rdfextras.sparql.parser), 27construct() (rdfextras.sparql.graph.GraphPattern method),

24construct() (rdfextras.sparql.query.Query method), 30constructGraph() (in module rdfextras.utils.termutils), 73ConstructQuery (class in rdfextras.sparql.components),

22convertTerm() (in module rdfextras.sparql.evaluate), 23createSPARQLPConstraint() (in module rdfex-

tras.sparql.evaluate), 23createStatements() (rdfex-

tras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 43

createStatements() (rdfex-tras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

CSV2RDF (class in rdfextras.tools.csv2rdf), 69

Ddatatype() (in module rdfextras.sparql.operators), 26defaultStatements() (rdfex-

tras.store.FOPLRelationalModel.RelationalHash.IdentifierHashmethod), 48

defaultStatements() (rdfex-tras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

99


dereferenceQuad() (in module rdfex-tras.store.FOPLRelationalModel.QuadSlot),46

describe() (rdfextras.sparql.query.Query method), 30DescribeQuery (class in rdfextras.sparql.components), 22Describer (class in rdfextras.tools.describer), 69

EEBV() (in module rdfextras.sparql.operators), 27EmptyGraphPatternExpression (class in rdfex-

tras.sparql.algebra), 18EnoughAnswers (class in rdfextras.sparql.query), 28eq() (in module rdfextras.sparql.operators), 26EqualityOperator (class in rdfextras.sparql.components),

21EscapeQuotes() (in module rdfex-

tras.store.FOPLRelationalModel.QuadSlot),46

evaluate() (rdfextras.sparql.algebra.AlgebraExpressionmethod), 18

evaluate() (rdfextras.sparql.algebra.GraphExpressionmethod), 19

expandAtClient() (rdfextras.sparql.query._SPARQLNodemethod), 29

expandOptions() (rdfextras.sparql.query._SPARQLNodemethod), 29

expandSubgraph() (rdfex-tras.sparql.query._SPARQLNode method),29

extractIdentifiers() (rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition.NamedLiteralPropertiesmethod), 45

FfetchChildren() (in module rdfextras.sparql.algebra), 19fetchUnionBranchesRoots() (in module rdfex-

tras.sparql.algebra), 19find_roots() (in module rdfextras.utils.graphutils), 73flushInsertions() (rdfex-


foreignKeySQL() (rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 43

foreignKeyStatements() (rdfex-tras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

FunctionCall (class in rdfextras.sparql.components), 20

GGarbageCollectionQUERY() (in module rdfex-

tras.store.FOPLRelationalModel.RelationalHash),48

ge() (in module rdfextras.sparql.operators), 26

generateHashIntersections() (rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 43

generateWhereClause() (rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 43

genQuadSlots() (in module rdfex-tras.store.FOPLRelationalModel.QuadSlot),46

get_name() (rdfextras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

get_tree() (in module rdfextras.utils.graphutils), 74getLiteralValue() (in module rdfextras.sparql.operators),

26getValue() (in module rdfextras.sparql.operators), 26GraphExpression (class in rdfextras.sparql.algebra), 19GraphPattern (class in rdfextras.sparql.components), 20GraphPattern (class in rdfextras.sparql.graph), 24GreaterThanOperator (class in rdfex-

tras.sparql.components), 21GreaterThanOrEqualOperator (class in rdfex-

tras.sparql.components), 21gt() (in module rdfextras.sparql.operators), 26guess_format() (in module rdfextras.tools.rdfpipe), 70guess_format() (in module rdfextras.utils.pathutils), 75

IIdentifierHash (class in rdfex-


indexingStatements() (rdfex-tras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

insertPattern() (rdfextras.sparql.graph.GraphPatternmethod), 24

insertPatterns() (rdfextras.sparql.graph.GraphPatternmethod), 25

insertRelations() (rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 43

insertRelationsSQLCMD() (rdfex-tras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 44

IRIRef (class in rdfextras.sparql.components), 21isBlank() (in module rdfextras.sparql.operators), 26isEmpty() (rdfextras.sparql.graph.GraphPattern method),

25isGroundQuad() (in module rdfextras.sparql.query), 31isIRI() (in module rdfextras.sparql.operators), 26isLiteral() (in module rdfextras.sparql.operators), 26isOnCollection() (in module rdfextras.sparql.operators),

26isTriplePattern() (in module rdfextras.sparql.evaluate), 23isURI() (in module rdfextras.sparql.operators), 26

100 Index


JJoin (class in rdfextras.sparql.algebra), 18

Llang() (in module rdfextras.sparql.operators), 26le() (in module rdfextras.sparql.operators), 26LeftJoin (class in rdfextras.sparql.algebra), 18LessThanOperator (class in rdfextras.sparql.components),

21LessThanOrEqualOperator (class in rdfex-

tras.sparql.components), 21ListPrepend() (in module rdfextras.sparql.components),

22ListRedirect (class in rdfextras.sparql.components), 20LiteralHash (class in rdfex-


LoadGraph() (in module rdfextras.sparql.algebra), 19LogicalNegation (class in rdfextras.sparql.components),

21lt() (in module rdfextras.sparql.operators), 26

Mmain() (in module rdfextras.tools.rdfpipe), 71main() (in module rdfextras.utils.cmdlineutils), 74make_option_parser() (in module rdfextras.tools.rdfpipe),

71makeSigned() (in module rdfex-


mapToOperator() (in module rdfextras.sparql.evaluate),23

NNamedBinaryRelations (class in rdfex-


NamedGraph (class in rdfextras.sparql.components), 21NamedLiteralProperties (class in rdfex-


neq() (in module rdfextras.sparql.operators), 26NonSymmetricBinaryOperator (class in rdfex-

tras.sparql.algebra), 18normalizeGraph() (in module rdfextras.utils.termutils), 72normalizeNode() (in module rdfex-


normalizeValue() (in module rdfex-tras.store.FOPLRelationalModel.QuadSlot),46

NotEqualOperator (class in rdfextras.sparql.components),21

NumericNegative (class in rdfextras.sparql.components),21

NumericPositive (class in rdfextras.sparql.components),21

Pparse() (in module rdfextras.sparql.parser), 28parse_and_serialize() (in module rdfextras.tools.rdfpipe),

71ParsedAdditiveExpressionList (class in rdfex-

tras.sparql.components), 20ParsedAlternativeGraphPattern (class in rdfex-

tras.sparql.components), 21ParsedArgumentList (class in rdfex-

tras.sparql.components), 20ParsedCollection (class in rdfextras.sparql.components),

22ParsedConditionalAndExpressionList (class in rdfex-

tras.sparql.components), 20ParsedConstrainedTriples (class in rdfex-

tras.sparql.components), 22ParsedDatatypedLiteral (class in rdfex-

tras.sparql.components), 20ParsedExpressionFilter (class in rdfex-

tras.sparql.components), 20ParsedFilter (class in rdfextras.sparql.components), 20ParsedFunctionFilter (class in rdfex-

tras.sparql.components), 20ParsedGraphGraphPattern (class in rdfex-

tras.sparql.components), 21ParsedGroupGraphPattern (class in rdfex-

tras.sparql.components), 20ParsedMultiplicativeExpressionList (class in rdfex-

tras.sparql.components), 20ParsedOptionalGraphPattern (class in rdfex-

tras.sparql.components), 21ParsedOrderConditionExpression (class in rdfex-

tras.sparql.components), 22ParsedPrefixedMultiplicativeExpressionList (class in rd-

fextras.sparql.components), 20ParsedREGEXInvocation (class in rdfex-

tras.sparql.components), 20ParsedRelationalExpressionList (class in rdfex-

tras.sparql.components), 20ParsedString (class in rdfextras.sparql.components), 20PatternResolution() (in module rdfex-


PrefixDeclaration (class in rdfextras.sparql.components),20

print_tree() (in module rdfextras.sparql.algebra), 19Processor (class in rdfextras.sparql.processor), 28Prolog (class in rdfextras.sparql.components), 22PropertyValue (class in rdfextras.sparql.components), 22

Index 101


QQName (class in rdfextras.sparql.components), 21QNamePrefix (class in rdfextras.sparql.components), 21QuadSlot (class in rdfex-


Query (class in rdfextras.sparql.components), 21Query (class in rdfextras.sparql.query), 30query() (in module rdfextras.sparql.query), 31queryObject() (in module rdfextras.sparql.query), 31queryString() (in module rdfextras.sparql.operators), 25

Rrdfextras.sparql (module), 16rdfextras.sparql.algebra (module), 18rdfextras.sparql.components (module), 20rdfextras.sparql.evaluate (module), 22rdfextras.sparql.graph (module), 23rdfextras.sparql.operators (module), 25rdfextras.sparql.parser (module), 27rdfextras.sparql.processor (module), 28rdfextras.sparql.query (module), 28rdfextras.store.FOPLRelationalModel.BinaryRelationPartition

(module), 42rdfextras.store.FOPLRelationalModel.QuadSlot (mod-

ule), 46rdfextras.store.FOPLRelationalModel.RelationalHash

(module), 47rdfextras.tools.csv2rdf (module), 69rdfextras.tools.describer (module), 69rdfextras.tools.rdfpipe (module), 70rdfextras.utils (module), 71rdfextras.utils.cmdlineutils (module), 74rdfextras.utils.graphutils (module), 73rdfextras.utils.pathutils (module), 74rdfextras.utils.termutils (module), 71RDFTerm (class in rdfextras.sparql.components), 22rdftype() (rdfextras.tools.describer.Describer method), 69RecurClause (class in rdfextras.sparql.components), 21ReduceGraphPattern() (in module rdfex-

tras.sparql.algebra), 19ReduceToAlgebra() (in module rdfextras.sparql.algebra),

19refer_component() (in module rdfextras.sparql.parser), 27regex() (in module rdfextras.sparql.operators), 27regex_group() (in module rdfextras.sparql.parser), 27rel() (rdfextras.tools.describer.Describer method), 69RelationalHash (class in rdfex-


RemoteGraph (class in rdfextras.sparql.components), 21removeForeignKeyStatements() (rdfex-

tras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

removeIndexingStatements() (rdfex-tras.store.FOPLRelationalModel.RelationalHash.Tablemethod), 47

RenderSPARQLAlgebra() (in module rdfex-tras.sparql.algebra), 19

Resolver (class in rdfextras.sparql.evaluate), 22Resource (class in rdfextras.sparql.components), 22returnResult() (rdfextras.sparql.query._SPARQLNode

method), 29rev() (rdfextras.tools.describer.Describer method), 70

Sselect() (rdfextras.sparql.query.Query method), 31selectContextFields() (rdfex-


selectFields() (rdfextras.store.FOPLRelationalModel.BinaryRelationPartition.BinaryRelationPartitionmethod), 44

selectionF (rdfextras.sparql.query.SPARQLQueryResultattribute), 31

SelectQuery (class in rdfextras.sparql.components), 21SessionBNode (class in rdfextras.sparql.query), 28setPropertyValueList() (in module rdfex-

tras.sparql.parser), 27SolutionModifier (class in rdfextras.sparql.components),

22SPARQLError (class in rdfextras.sparql), 17SPARQLGraph (class in rdfextras.sparql.graph), 23SPARQLQueryResult (class in rdfextras.sparql.query), 31statement2TermCombination() (in module rdfex-

tras.utils.termutils), 73str() (in module rdfextras.sparql.operators), 26

TTable (class in rdfex-


term2Letter() (in module rdfextras.utils.termutils), 72TopEvaluate() (in module rdfextras.sparql.algebra), 19triplePattern2termCombinations() (in module rdfex-

tras.utils.termutils), 73TwiceReferencedBlankNode (class in rdfex-

tras.sparql.components), 22type2TermCombination() (in module rdfex-

tras.utils.termutils), 73

UUnaryOperator (class in rdfextras.sparql.components), 21Union (class in rdfextras.sparql.algebra), 19unRollCollection() (in module rdfextras.sparql.evaluate),

23unRollRDFTerm() (in module rdfextras.sparql.evaluate),

23

102 Index


unRollTripleItems() (in module rdfex-tras.sparql.evaluate), 23

uri_leaf() (in module rdfextras.utils.pathutils), 74

Vvalue() (rdfextras.tools.describer.Describer method), 70viewUnionSelectExpression() (rdfex-


Wwalktree() (in module rdfextras.sparql.algebra), 19WhereClause (class in rdfextras.sparql.components), 21

XXSDCast() (in module rdfextras.sparql.operators), 27

Index 103

release 0.1a original contributors - read the...

Documents