sesame: an architecture for storing and querying rdf data and schema inf. yasser ganji saffar...

26
Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar [email protected] When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame.“ --Tales of 1001 Nights

Upload: gillian-holland

Post on 19-Jan-2016

232 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Sesame: An Architecture for Storing and Querying RDF

Data and Schema Inf.

Yasser Ganji [email protected]

When they were out of sight Ali Baba came down, and, going up to the rock, said, "Open, Sesame.“

--Tales of 1001 Nights

Page 2: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Outline Querying Levels Sesame’s Architecture Sesame’s Modules Storing RDF Data in RDBMSs

Page 3: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Querying Levels RDF documents can be considered at three

different levels of abstraction:1. At the syntactic level they are XML documents.2. At the structure level they consist of a set of triples.3. At the semantic level they constitute one or more

graphs with partially predefined semantics.

Querying at what level is the best?

Page 4: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Querying at the Syntactic Level In this level we just have an XML document. So we can Query RDF using an XML query

language. (e.g. XQuery) But RDF is not just an XML dialect.

XML: Has a tree structure data model. Only nodes are labeled.

RDF: Has a graph structure data model. Both edges (properties) and nodes (subjects/objects) are

labeled. Different ways of encoding the same information

in XML are possible.

Page 5: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Querying at the Structure Level In this level RDF document represents a

set of triples: (type, Book, Class) (subClassOf, FamousWriter, Writer) (hasWritten, twain/mark, ISBN00023423442) (type, twain/mark, FamousWriter)

Advantage: Independent of the specific XML syntax.

A successful query: SELECT ?x FROM … WHERE (type ?x FamousWriter)

An unsuccessful query: SELECT ?x FROM … WHERE (type ?x Writer)

Page 6: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Querying at the Semantic Level We need a query language that is sensitive to the

RDF Schema primitives: e.g. Class, subClassOf, Property, …

RQL RDF Query Language The first proposal for a declarative query language for

RDF and RDF Schema. Adopts the syntax of OQL. Output of queries is again legal RDF schema code, which

can be used as input of another query. A sample query:

SELECT Y FROM FamousWriter {X}. hasWritten {Y}

Page 7: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Sesame’s History The European On-To-Knowledge project

kicked off in Feb. 2000: This project aims at developing ontology-driven

knowledge management tools. In this project Sesame fulfills the role of

storage and retrieval middleware for ontologies and metadata expressed in RDF and RDF Schema.

Page 8: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

On-To-Knowledge Project Sesame is positioned as a central tool in this project. OntoExtract: extracts ontological conceptual structures

from natural-language documents. OntoEdit: An ontology editor. RDF Ferret: A user front-end, that provides search and

query.

RDF Ferret

OntoExtract

Sesame OntoEdit

Page 9: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

What is Sesame? Sesame is an open source Java framework for

storing, querying and reasoning with RDF and RDF Schema.

It can be used as: Standalone Server: A database for RDF and RDF

Schema. Java Library: For applications that need to work with

RDF internally.

Page 10: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Sesame’s Architecture

Repository

Repository Abstraction Layer (RAL)

Admin Module Export ModuleQuery Module

HTTP Protocol Handler SOAP Protocol Handler

Sesa

me

SO

AP

HTTP

Page 11: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

The Repository DBMSs

Currently, Sesame is able to use PostgreSQL MySQL Oracle (9i or newer) SQL Server

Existing RDF stores RDF flat files RDF network services

Using multiple sesame server to retrieve results for queries.

This opens up the possibility of a highly distributed architecture for RDF(S) storing and querying.

Page 12: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Repository Abstraction Layer (RAL) RAL offers stable, high-level interface for talking

to repositories. It is defined by an API that offers these

functionalities: Add data Retrieve data Delete data

Data is returned in streams. (Scalability) Only small amount of data is kept in memory. Suitable for use in highly constrained environments such

as portable devices. Caching data (Performance)

E.g. caching RDF schema data which is needed very frequently.

Page 13: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Stacking Abstraction Layers

Page 14: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Admin Module Allows incrementally inserting or deleting RDF data in/from

repository. Retrieves its information form an RDF(S) source Parses it using an RDF parser Checks each (S, P, O) statement it gets from the parser for

consistency with the information already present in the repository and infers implied information if necessary for instance: If P equals type, it infers that O must be a class. If P equals subClassOf, it infers that S and O must be classes. If P equals subPropertyOf, then it infers that both S and O

must be properties. If P equals domain or range, then it infers that S must be a

property and O must be a class.

Page 15: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Query Module Evaluates RQL queries posed by the user It is independent of the underlying repository. So

it can not use optimizations and query evaluations offered by specific DBMSs.

RQL queries are translated into a set of calls to the RAL. e.g. when a query contains a join operation over two

subqueries, each of the subqueries is evaluated, and the join operation is then executed by the query engine on the results.

Page 16: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

RDF Export Module This module allows for the extraction of the

complete schema and/or data from a model in RDF format.

It supplies the basis for using Sesame with other RDF tools.

Page 17: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Important Features of Sesame Powerful query language Portability

It is written completely in Java. Repository independence Extensibility

Other functional modules can be created and be plugged in it.

Flexible communication by using protocol handlers The architecture separates the communication details

from the actual functionality through the use of protocol handlers.

Page 18: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

SeRQL (Sesame RDF Query Language) It combined the best features of other

query languages: RQL, RDQL, N-Triples, N3 Some of the built-in predicates:

{X} serql:directSubClassOf {Y} {X} serql:directSubPropertyOf {Y} {X} serql:directType {Y}

Page 19: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Using PostgreSQL as Repository PostgreSQL is an open-source object-relational

DBMS. It supports subtable relations between its tables. Subtable relations are also transitive. These relations can be used to model the

subsumption reasoning of RDF schema.

Page 20: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Example RDF Schema & Data

Writer

FamousWriter

…/ISBN00023423442…/twain/mark

BookhasWritten

Schema

type

hasWritten

type

subClassOf

rangedomain

Data

Page 21: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Storing Schema (in an RDBMS)

uri

ResourceWriterFamousWriterBook

uri

hasWritten

source target

WriterFamousWriterBook

ResourceWriterResource

source target

hasWritten Writer

source target

hasWritten Book

source target

Class SubClassOf SubPropertyOf

Property Domain Range

Page 22: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

uri

uri uri

…/ISBN00023423442

uri

…/twain/mark source target

…/twain/mark …/ISBN00023423442

FamousWriter

WriterBook

Resource

hasWritten

Storing Data (PostgreSQL)

In order to decrease the database size another table, called resources, is added to database which maps resource descriptions to unique IDs.

Page 23: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

There are many ambiguities in RDFS: RDF Schema is defined in natural language. No formal description of its semantic is given. E.g. about subClassOf it only says that it is a property with

class as its domain and range. RDF Schema is self-describing:

The definition of its terms is itself done in RDF schema. As a result it consists some inconsistencies. Circular dependencies in terms definitions:

Class is both a subclass of and an instance of Resource. Resource is an instance of Class.

RDF Schema Ambiguities

Page 24: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Scalability Issues An experiment using Sesame:

Uploading and querying a collection of nouns from Wordnet (http://www.semanticweb.org/library)

Consisting of about 400,000 RDF statements. Using a desktop computer (Sun UltraSPARC 5

workstation, 256MB RAM) Uploading the Wordnet nouns took 94 minutes. Querying was quite slow.

Because data is distributed over multiple tables, and retrieving data needs doing many joins on tables.

Page 25: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

Thanks for your attention.

Page 26: Sesame: An Architecture for Storing and Querying RDF Data and Schema Inf. Yasser Ganji Saffar ganji@ce.sharif.edu When they were out of sight Ali Baba

References User Guide for Sesame

http://openrdf.org/doc/users/userguide.html Broekstra J., Sesame: A Generic Architecture for

Storing and Querying RDF and RDF Schema, ISWC2002

http://sesame.aidministrator.nl http://www.openRDF.org