lecture semantic dataaccess_presentation

53
Co-funded by the European Union Semantic CMS Community Semantic Data Access Copyright IKS Consortium 1 Lecturer Organization Date of presentation

Upload: iks-project

Post on 11-May-2015

554 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Lecture semantic dataaccess_presentation

Co-funded by the European Union

Semantic CMS Community

Semantic Data Access

Copyright IKS Consortium1

LecturerOrganization

Date of presentation

Page 2: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Copyright IKS Consortium

Introduction of Content Management

Foundations of Semantic Web Technologies

Storing and Accessing Semantic Data

Knowledge Interaction and Presentation

Knowledge Representation and Reasoning

Semantic Lifting

Designing Interactive Ubiquitous IS

Requirements Engineering for Semantic CMS

Designing Semantic CMS

Semantifying your CMS

Part I: Foundations

Part II: Semantic Content Management

Part III: Methodologies

(2) (1)

(3)

(4)

(5)

(6)

(7)

(8)

(9)

(10)

Page 3: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

What is this Lecture about?

We have learned ... ... which languages can be used

to model knowledge. ... how to extract knowledge

from content in a automatic way (semantic lifting).

We need a way ... ... to store the extracted

knowledge technically in an accessible way.

Copyright IKS Consortium

3

Storing and Accessing Semantic Data

Knowledge Interaction and Presentation

Knowledge Representation and Reasoning

Semantic Lifting

Part II: Semantic Content Management

(3)

(4)

(5)

(6)

Page 4: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Outline

Semantic Data Semantic Web RDF

Semantic Data Storage Triple Stores

Semantic Data Access SPARQL RQL API Calls

Copyright IKS Consortium

4

Page 5: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Semantic Data

Stands for machine understandable information Allows computers to figure out the data without user

interference Allows computers act intelligently without programming

for each task

5

Copyright IKS Consortium

Page 6: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Semantic Data

Provides infrastructure to get practical results Applications find out subsequent information based on the

previous relations. (e.g. Eiffel Tower -> Paris -> France) Allows reasoning capabilities

Providing extraction of related information which is not directly linked

6

Copyright IKS Consortium

Page 7: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Semantic Web

A classical generic description: “Web of data”

Extends the World Wide Web By encouraging,

Common language for representing data Transformable to/from disparate sources such as relational

databases, XML, etc (RDF) Common reusable data model to represent data from different

domains in common terms (RDFS, OWL, etc) Rules to enable applications reason over the information

(SWRL)

7

Copyright IKS Consortium

Page 8: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Semantic Web Layer Cake

Semantic Web Layer Cake, Image source: http://www.w3.org/2007/03/layerCake.svg

8

Copyright IKS Consortium

Page 9: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Semantic Web So many organizations publishing their data in different

domains Media Geographic Government …

Whole set contains approximately 30 billion triples One of the largest collections is DBPEDIA

Semantified version of Wikipedia Example:

Obtain cities of China that have population over 20 million Needs efficient storage and query for semantic data

Copyright IKS Consortium

9

Page 10: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Representation of Semantic Data

RDF The common data format An abstract model with several serialization formats Consists of statement referred as triples having the form

(subject, predicate, object) where, Subject: any resource identifier Predicate: a resource identifier of any property Object: either a resource identifier or a literal value

10

Copyright IKS Consortium

Page 11: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Storing Semantic Data

Need for specialized designs for triple collections Two modalities:

Relational databases Triple stores

Mostly used for storage Lots of implementations

They can also be RDB based.

11

Copyright IKS Consortium

Page 12: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Triple Store A purpose-built database for the storage and retrieval of

RDF data. Optimized place to add, remove and query for triples.

Each triple in the TripleStore complies with the form (subject, predicate, object)

12

Copyright IKS Consortium

Page 13: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Considering XML Databases XML databases are existing storage systems for semi-

structured data Idea: Transform RDF to XML and store it in XML databases Yet, XML data model is not exactly same with semantic data

XML data model is a tree-like structure RDF data is represented through a graph without an hierarchy

Copyright IKS Consortium

13

Page 14: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Considering XML Databases XML Databases are not suitable for storage and querying

RDF Only simple manipulations can be handled through XML query

languages RDF Schema processing and inference is not possible Standard RDF/XML mapping is unsuitable

Copyright IKS Consortium

14

Page 15: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Monolithic approach for DB Based Triple Stores

Generic representation for all RDF schemas Only two tables are used

Resources table Triples table

Copyright IKS Consortium

15

Page 16: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Monolithic approach for DB Based Triple Stores

Copyright IKS Consortium

16

predid subid objid objvalue

6 2 1

5 3 7

5 1 8

5 9 2

3 9 Sunscale

id uri

1 http://www.iks.og/topics.rdfs#Hotel

2 http://www.iks.og/topics.rdfs#HotelDirections

3 http://www.oclc.org/dublincore.rdfs#title

4 http://www.iks.og/schema.rdf#Ext.Resource

5 http://www.w3.org/1999/02/22-rdf-syntax-ns#type

6 http://www.w3.org/2000/01/rdf-schema#subClassOf

7 http://www.w3.org/1999/02/22-rdf-syntax-ns#Property

8 http://www.w3.org/2000/01/rdf-schema#Class

9 rl

Page 17: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Triples Stores

Can be categorized into 3 category: In memory triple stores

Used for certain operations like benchmarking, caching, etc Native triple stores

Provides their own implementations (Virtuoso, Mulgara, AllegroGraph, …)

Non memory non native triple stores Are built on third party databases (Jena SDB, Kaon, …)

17

Copyright IKS Consortium

Page 18: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Functionalities provided by Triple Stores RDBMS-support General RDF model access Query language support in the store such as RQL,

SPARQL Some stores provide:

Provenance - tracking of who-said-what APIs for accessing triple store over network

Very few stores provide: Full text search Inference and rule languages

Copyright IKS Consortium

18

Page 19: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Example Triple Store implementations

RDF Suite Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris

Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001

Based on an ORDBMS model Sesame

http://www.openrdf.org/ Relational databases (mysql, postgres, oracle)

Jena http://www.hpl.hp.com/semweb/jena2.htm Relational databases (mysql , postgres, oracle)

Virtuoso http://virtuoso.openlinksw.com/ Native RDF Quad Storage (Physical Quads)

Copyright IKS Consortium

19

Page 20: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

RDFSuite (ICS-Forth)*

* IST-1999-13479 C-Web, IST-2000-26074 Mesmuses

20

Copyright IKS Consortium

Page 21: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

How triples are stored and accessed in RDF Suite

Separate tables are created to store resources Properties, subClasses, subProperties and instances

Indices on attributes like URI, source and target Querying is possible through RQL

Copyright IKS Consortium

21

Page 22: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

How triples are stored and accessed in RDF Suite

Copyright IKS Consortium

22

[Figure from *]

*Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle. The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001

Page 23: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Sesame Architecture

DBMS-independent API for accessing triple repositories SAIL API

A set of Java interfaces between other modules and repository

Abstract from the actual storage mechanism

Query Module RQL support

Different ways to communicate with clients Through Protocol handlers

Copyright IKS Consortium

23

*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002

Page 24: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

SAIL API over PostgreSQL PostgreSQL

Object-relational DBMS Support sub-table

relations between its tables for providing RDF Schema class and property subsumption

Individuals are represented under separate tables created for resources

Difficult to add table

*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002

24

Copyright IKS Consortium

Page 25: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

SAIL API over MySQL MySQL

The database schema does not change when the RDFS changes

Has advantage where RDFS is unstable

*Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture for Storing and Querying RDF and RDF Schema, Proceedings of the First International Semantic Web Conference, 2002

25

Copyright IKS Consortium

Page 26: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena2 Architecture

Copyright IKS Consortium

26

Page 27: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena2 Architecture

Copyright IKS Consortium

27

*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases

Page 28: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena2

Jena2 Denormalized schema

Avoids unnecessary joins by merging URIs, literals in statements table

Multiple statement tables Better locality and caching

Property Tables

Copyright IKS Consortium

28

Page 29: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Normalized vs Denormalized Tables

Copyright IKS Consortium

29

Page 30: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Property Tables

Copyright IKS Consortium

30

Subject Property Object

person1 name Alice

person1 age 32

person1 twinOf person2

person1 faxPhone x1234

person1 adminPh x5678

person2 name Bob

person2 age 35

person2 adopteeOf person6

person2 friendOf person8

person2 gender male

Subject Property Object

person1 twinOf person2

person1 faxPhone x1234

person1 adminPh x5678

person2 adopteeOf person6

person2 friendOf person8

ID name age gender

p1 Alice 32 -

p2 Bob 35 male

Triple Store

Person Property TableTriple Store Only

*Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases

Page 31: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena Persistence Options

SDB Scalable storage and query for RDF Specifically designed for SPARQL support Supports: MySQL, PostgreSQL, Oracle 11g, Microsoft

SQL server and IBM DB2 Scales to graphs of 100 million triples

Copyright IKS Consortium

31

Page 32: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena Persistence Options

TDB Provides for large scale storage and query of RDF

datasets using a pure Java engine Supports SPARQL A non-transactional, faster database solution for use by a

single system It scales well beyond SDB and is simpler to setup

Copyright IKS Consortium

32

Page 33: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Virtuoso

General purpose RDBMS with extensive RDF adaptations

RDF data is stored as RDF quads, i.e. it supports RDF with named graphs i.e. graph, subject, predicate, object tuples The columns are G for graph, P for predicate, S for subject

and O for object

Copyright IKS Consortium

33

Page 34: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Querying Semantic Data

Semantic data can be queried from triple stores by Various query languages

SPARQL Different endpoints provided

RQL RDQL SeRQL …

API Calls Through proprietary APIs of different projects

Linked Data

34

Copyright IKS Consortium

Page 35: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

SPARQL

Is an RDF query language Standardized by W3C consortium Similar concept of SQL for databases

Syntactically resembles to SQL RDF Graphs instead of databases

35

Copyright IKS Consortium

Page 36: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

SPARQL Endpoints

Provides functionality to query the knowledge base via the SPARQL language

Accepts queries and returns results through HTTP protocol

Query results can be in different formats such as RDF XML HTML JSON CSV

36

Copyright IKS Consortium

Page 37: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Semantic Data Access With API Calls

Open source projects provides APIs to manipulate RDF data Jena Apache Clerezza Sesame JRDF

37

Copyright IKS Consortium

Page 38: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena

Jena provides a rich API to manipulate the RDF stored in the underlying triple store. Model to represent graphs CRUD methods for triples Querying methods for existing resources

See the next slide for the code snippet…

38

Copyright IKS Consortium

Page 39: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena Code SnippetString personURI = "http://somewhere/JohnSmith"; String givenName = "John"; String familyName = "Smith"; String fullName = givenName + " " + familyName;

// create an empty Model which represents an RDF graphModel model = ModelFactory.createDefaultModel();

// create the resource which will produce the triples in the next slideResource johnSmith

= model.createResource(personURI).addProperty(VCARD.FN, fullName) .addProperty(VCARD.N,

model.createResource() .addProperty(VCARD.Given, givenName)

.addProperty(VCARD.Family, familyName));

39

Copyright IKS Consortium

Page 40: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Jena

Created triples with the code snippet in previous slide:

(<http://somewhere/JohnSmith>, VCARD.FN, “John Smith”)(<http://somewhere/JohnSmith>, VCARD.FN, _)(_, VCARD.Given, “John”)(_, VCARD.Family, “Smith”)

• Note that _ symbol represents a blank node

40

Copyright IKS Consortium

Page 41: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Apache Clerezza

Provides an API regardless from the different triples stores it supports

Its API provides a model to represent RDF graphs and manipulate those graphs

Also provides an SPARQL endpoint to query the stored knowledge

41

Copyright IKS Consortium

Page 42: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Apache Clerezza Code Snippet

String base = “http://www.example.org#”;MGraph g = new SimpleMGraph();g.add( new TripleImpl(

new UriRef(base + “JohnSmith”),new UriRef(rdf:Type)new UriRef(foaf:Person)));

g.add( new TripleImpl(new UriRef(base + “JohnSmith”),new UriRef(VCARD:FN)LiteralFactory.getInstance().createTypedLiteral(“John”)));

Simple code snippet adding two triples to the graph:

42

Copyright IKS Consortium

Page 43: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linked Data

Interrelated datasets on the Web so that computers can explore them

Has a standard format to be accessed and managed Provides integration and reasoning on a huge amount

of data on the Web

43

Copyright IKS Consortium

Page 44: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linked Data

Four famous principles of linked data represented by Tim Berners-Lee Use URIs as names of things Use HTTP URIs to provide dereferencable data to people When an URI is dereferenced provide useful information in

standard format (RDF, SPARQL) Provide links to other URIs to make possible discovery of

related data

44

Copyright IKS Consortium

Page 45: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linked Data

45

Copyright IKS Consortium

Page 46: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linking Open Data Project

Is an W3C SWEO Project Aims to make data freely to everyone Aims to publish open data sets as RDF and set

semantic relationships between them Serves information in a machine readable format Enriches content Reduces duplication

Linked datasets increasing rapidly A large number of datasets are linked already

46

Copyright IKS Consortium

Page 47: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linked Datasets As of October 2008

47

Copyright IKS Consortium

Page 48: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linked Datasets As of September 2010

48

Copyright IKS Consortium

Page 49: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

2011

49

Copyright IKS Consortium

Page 50: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Access Data In The Cloud

Follow the RDF links representing the “things” SPARQL Endpoints Ready to use software to discover linked data (See the

next slide)

50

Copyright IKS Consortium

Page 51: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Linked Data Applications

Lots of application on top of the linked data Tabulator Marbles Openlink RDF Browser …

Just google RDF Crawlers RDF Browsers

Also see the following link containing a number of linked data applications: http://www.w3.org/wiki/SweoIG/TaskForces/CommunityPr

ojects/LinkingOpenData/Applications

51

Copyright IKS Consortium

Page 52: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

Available SPARQL Endpoints

http://dbpedia.org/sparql http://www4.wiwiss.fu-berlin.de/dblp/ To see possible SPARQL endpoints providing a certain

URI see http://void.rkbexplorer.com/endpoint-search/

See also a list of alive SPARQL endpoints http://www.w3.org/wiki/SparqlEndpoints

52

Copyright IKS Consortium

Page 53: Lecture semantic dataaccess_presentation

www.iks-project.eu

Page:

References http://www.w3.org/TR/rdf-sparql-query http://jena.sourceforge.net/tutorial/RDF_API/index.html http://www.slideshare.net/ldodds/sparql-tutorial http://www.slideshare.net/shamod/a-hands-on-overview-of-the-semantic-web?src=related_normal

&rel=1702851 http://www.cambridgesemantics.com/2008/09/sparql-by-example http://linkeddata-specs.info/ http://www.w3.org/wiki/SweoIG/TaskForces/CommunityProjects/LinkingOpenData http://www.bioontology.org/wiki/images/6/6a/Triple_Stores.pdf Sofia Alexaki, Vassilis Christophides, Gregory Karvounarakis, Dimitris Plexousakis, Karsten Tolle.

The ICS-FORTH RDFSuite: Managing Voluminous RDF Description Bases , SemWeb, 2001 Jeen Broekstra and Arjohn Kampman and Frank van Harmelen, Sesame: A Generic Architecture

for Storing and Querying RDF and RDF Schema, Proceedings of the First International, Semantic Web Conference, 2002

Kevin Wilkinson, Craig Sayers, Harumi A. Kuno, Dave Reynolds: Efficient RDF Storage and Retrieval in Jena2, Proceedings of SWDB'03, The first International Workshop on Semantic Web and Databases

http://jena.sourceforge.net/DB/index.html http://virtuoso.openlinksw.com/

53

Copyright IKS Consortium