sesame: an architecture for storing and querying rdf data and schema inf. yasser ganji saffar...
TRANSCRIPT
Sesame: An Architecture for Storing and Querying RDF
Data and Schema Inf.
Yasser Ganji [email protected]
When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame.“
--Tales of 1001 Nights
Outline Querying Levels Sesame’s Architecture Sesame’s Modules Storing RDF Data in RDBMSs
Querying Levels RDF documents can be considered at three
different levels of abstraction:1. At the syntactic level they are XML documents.2. At the structure level they consist of a set of triples.3. At the semantic level they constitute one or more
graphs with partially predefined semantics.
Querying at what level is the best?
Querying at the Syntactic Level In this level we just have an XML document. So we can Query RDF using an XML query
language. (e.g. XQuery) But RDF is not just an XML dialect.
XML: Has a tree structure data model. Only nodes are labeled.
RDF: Has a graph structure data model. Both edges (properties) and nodes (subjects/objects) are
labeled. Different ways of encoding the same information
in XML are possible.
Querying at the Structure Level In this level RDF document represents a
set of triples: (type, Book, Class) (subClassOf, FamousWriter, Writer) (hasWritten, twain/mark, ISBN00023423442) (type, twain/mark, FamousWriter)
Advantage: Independent of the specific XML syntax.
A successful query: SELECT ?x FROM … WHERE (type ?x FamousWriter)
An unsuccessful query: SELECT ?x FROM … WHERE (type ?x Writer)
Querying at the Semantic Level We need a query language that is sensitive to the
RDF Schema primitives: e.g. Class, subClassOf, Property, …
RQL RDF Query Language The first proposal for a declarative query language for
RDF and RDF Schema. Adopts the syntax of OQL. Output of queries is again legal RDF schema code, which
can be used as input of another query. A sample query:
SELECT Y FROM FamousWriter {X}. hasWritten {Y}
Sesame’s History The European On-To-Knowledge project
kicked off in Feb. 2000: This project aims at developing ontology-driven
knowledge management tools. In this project Sesame fulfills the role of
storage and retrieval middleware for ontologies and metadata expressed in RDF and RDF Schema.
On-To-Knowledge Project Sesame is positioned as a central tool in this project. OntoExtract: extracts ontological conceptual structures
from natural-language documents. OntoEdit: An ontology editor. RDF Ferret: A user front-end, that provides search and
query.
RDF Ferret
OntoExtract
Sesame OntoEdit
What is Sesame? Sesame is an open source Java framework for
storing, querying and reasoning with RDF and RDF Schema.
It can be used as: Standalone Server: A database for RDF and RDF
Schema. Java Library: For applications that need to work with
RDF internally.
Sesame’s Architecture
Repository
Repository Abstraction Layer (RAL)
Admin Module Export ModuleQuery Module
HTTP Protocol Handler SOAP Protocol Handler
Sesa
me
SO
AP
HTTP
The Repository DBMSs
Currently, Sesame is able to use PostgreSQL MySQL Oracle (9i or newer) SQL Server
Existing RDF stores RDF flat files RDF network services
Using multiple sesame server to retrieve results for queries.
This opens up the possibility of a highly distributed architecture for RDF(S) storing and querying.
Repository Abstraction Layer (RAL) RAL offers stable, high-level interface for talking
to repositories. It is defined by an API that offers these
functionalities: Add data Retrieve data Delete data
Data is returned in streams. (Scalability) Only small amount of data is kept in memory. Suitable for use in highly constrained environments such
as portable devices. Caching data (Performance)
E.g. caching RDF schema data which is needed very frequently.
Stacking Abstraction Layers
Admin Module Allows incrementally inserting or deleting RDF data in/from
repository. Retrieves its information form an RDF(S) source Parses it using an RDF parser Checks each (S, P, O) statement it gets from the parser for
consistency with the information already present in the repository and infers implied information if necessary for instance: If P equals type, it infers that O must be a class. If P equals subClassOf, it infers that S and O must be classes. If P equals subPropertyOf, then it infers that both S and O
must be properties. If P equals domain or range, then it infers that S must be a
property and O must be a class.
Query Module Evaluates RQL queries posed by the user It is independent of the underlying repository. So
it can not use optimizations and query evaluations offered by specific DBMSs.
RQL queries are translated into a set of calls to the RAL. e.g. when a query contains a join operation over two
subqueries, each of the subqueries is evaluated, and the join operation is then executed by the query engine on the results.
RDF Export Module This module allows for the extraction of the
complete schema and/or data from a model in RDF format.
It supplies the basis for using Sesame with other RDF tools.
Important Features of Sesame Powerful query language Portability
It is written completely in Java. Repository independence Extensibility
Other functional modules can be created and be plugged in it.
Flexible communication by using protocol handlers The architecture separates the communication details
from the actual functionality through the use of protocol handlers.
SeRQL (Sesame RDF Query Language) It combined the best features of other
query languages: RQL, RDQL, N-Triples, N3 Some of the built-in predicates:
{X} serql:directSubClassOf {Y} {X} serql:directSubPropertyOf {Y} {X} serql:directType {Y}
Using PostgreSQL as Repository PostgreSQL is an open-source object-relational
DBMS. It supports subtable relations between its tables. Subtable relations are also transitive. These relations can be used to model the
subsumption reasoning of RDF schema.
Example RDF Schema & Data
Writer
FamousWriter
…/ISBN00023423442…/twain/mark
BookhasWritten
Schema
type
hasWritten
type
subClassOf
rangedomain
Data
Storing Schema (in an RDBMS)
uri
ResourceWriterFamousWriterBook
uri
hasWritten
source target
WriterFamousWriterBook
ResourceWriterResource
source target
hasWritten Writer
source target
hasWritten Book
source target
Class SubClassOf SubPropertyOf
Property Domain Range
uri
uri uri
…/ISBN00023423442
uri
…/twain/mark source target
…/twain/mark …/ISBN00023423442
FamousWriter
WriterBook
Resource
hasWritten
Storing Data (PostgreSQL)
In order to decrease the database size another table, called resources, is added to database which maps resource descriptions to unique IDs.
There are many ambiguities in RDFS: RDF Schema is defined in natural language. No formal description of its semantic is given. E.g. about subClassOf it only says that it is a property with
class as its domain and range. RDF Schema is self-describing:
The definition of its terms is itself done in RDF schema. As a result it consists some inconsistencies. Circular dependencies in terms definitions:
Class is both a subclass of and an instance of Resource. Resource is an instance of Class.
RDF Schema Ambiguities
Scalability Issues An experiment using Sesame:
Uploading and querying a collection of nouns from Wordnet (http://www.semanticweb.org/library)
Consisting of about 400,000 RDF statements. Using a desktop computer (Sun UltraSPARC 5
workstation, 256MB RAM) Uploading the Wordnet nouns took 94 minutes. Querying was quite slow.
Because data is distributed over multiple tables, and retrieving data needs doing many joins on tables.
Thanks for your attention.
References User Guide for Sesame
http://openrdf.org/doc/users/userguide.html Broekstra J., Sesame: A Generic Architecture for
Storing and Querying RDF and RDF Schema, ISWC2002
http://sesame.aidministrator.nl http://www.openRDF.org