author: akiyoshi matono y, toshiyuki amagasa y, masatoshi yoshikawa z, shunsuke uemura y

19
Author: Author: Akiyoshi Matono Akiyoshi Matono y, y, Toshiyuki Amagasa Toshiyuki Amagasa y, y, Masatoshi Masatoshi Yoshikawa Yoshikawa z, z, Shunsuke Uemura Shunsuke Uemura y y

Upload: vian

Post on 07-Jan-2016

18 views

Category:

Documents


0 download

DESCRIPTION

A Path-based Relational RDF Database. Author: Akiyoshi Matono y, Toshiyuki Amagasa y, Masatoshi Yoshikawa z, Shunsuke Uemura y. Semantic Web. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Author:Author: Akiyoshi Matono Akiyoshi Matonoy, y, Toshiyuki Toshiyuki AmagasaAmagasay, y, Masatoshi YoshikawaMasatoshi Yoshikawaz, z, Shunsuke UemuraShunsuke Uemurayy

Page 2: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• The World Wide Web growing ever larger and more complex, the Semantic Web has emerged as a vision of the next generation of the web. Compared with the current Web, the Semantic Web makes human-to-machine and machine-to-machine interactions more intelligent with the good quality and quantity of metadata on Web resources.

Semantic WebSemantic Web

Page 3: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• Resource Description Framework (RDF), the core Resource Description Framework (RDF), the core of the Semantic Web, describes its metadata and of the Semantic Web, describes its metadata and semantics. With the popular utilization of the semantics. With the popular utilization of the Semantic Web, the storage and retrieval of RDF Semantic Web, the storage and retrieval of RDF data come into the light accordingly. data come into the light accordingly.

• RDF is commonly used for large data, such as RDF is commonly used for large data, such as ontology or dictionaries. If we use conventional ontology or dictionaries. If we use conventional RDF databases to process such large data, some RDF databases to process such large data, some problems may emerge. problems may emerge.

RDFRDF

Page 4: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• RDF Schema is a specification for defining RDF Schema is a specification for defining schematic information of RDF data. It makes schematic information of RDF data. It makes developers define a particular vocabulary for developers define a particular vocabulary for RDF data and specify the kinds of object. RDF data and specify the kinds of object.

• RDF data can be decomposed into statements, so it also can be modeled as a directed graph, where nodes and arcs represent resources and relationships separately. It is composed of RDF-meta schema data, RDF schema data and RDF data, and each group are instances of the former one.

RDFRDF

Page 5: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

The conventional The conventional approachapproach

• Flatly storeFlatly store

• Problems?Problems?

Any query contains RDF schema information will not be

handled properly.

Page 6: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• Creates relational tables for classes and Creates relational tables for classes and properties,properties, storing resources according to their storing resources according to their classes.classes.

• Problems? Problems? Doesn’t make any distinction between schemaDoesn’t make any distinction between schema

and data, will have problem when you perform aand data, will have problem when you perform a

schema query other than RDF data query. schema query other than RDF data query.

The conventional The conventional approachapproach

Page 7: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• Store the subject , predicate and object as keys into three tables. using these keys , we can retrieve corresponding statements.

• Problems?– Poor performance when processing

path-based queries. – Join operation makes the query

string longer

The conventional The conventional approachapproach

Page 8: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Sub graphsSub graphs

Graph CI, inheritance relationships between classes Graph CI, inheritance relationships between classes Graph PI, inheritance relationships between propertieGraph PI, inheritance relationships between propertie

s s Graph T, a single-labeled directed acyclic graph Graph T, a single-labeled directed acyclic graph Graph DR, domain (Graph DR, domain (rdfs:domainrdfs:domain) or range () or range (rdfs:rangerdfs:range) )

of each property of each property Graph G, consist of all the remaining statements not inGraph G, consist of all the remaining statements not in

cluded in the above sub graphs cluded in the above sub graphs Separate RDF schema information and RDF instance dSeparate RDF schema information and RDF instance d

ataata Simpler structure ease to storeSimpler structure ease to store

Page 9: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Path expressionPath expression

Store arc paths of the graphs into path table in Store arc paths of the graphs into path table in relational databaserelational database

Page 10: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Extended interval numbering Extended interval numbering schemescheme

Add virtual root if the graph has more than one root nAdd virtual root if the graph has more than one root nodeode

Add new node (s) for the node which is reachable throAdd new node (s) for the node which is reachable through multiple pathugh multiple path

Each node is assigned (preorder, postorder, depth)Each node is assigned (preorder, postorder, depth) V is an ancestor of u: pre (v) < pre (u) ^ post (v) > post V is an ancestor of u: pre (v) < pre (u) ^ post (v) > post

(u), v, u are nodes in the graph.(u), v, u are nodes in the graph. V is a parent of u: v is an ancestor of u, and depth (u) – V is a parent of u: v is an ancestor of u, and depth (u) –

depth (v) = 1depth (v) = 1

Page 11: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Algorithm Algorithm

Page 12: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y
Page 13: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Relational database schemaRelational database schema

Page 14: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Query processingQuery processing

Path query - Path query - Find the title of something painted by someone:Find the title of something painted by someone:SELECT r.resourceNameSELECT r.resourceNameFROM path AS p, resource AS rFROM path AS p, resource AS rWHERE p.pathID = r.pathIDWHERE p.pathID = r.pathIDAND p.pathexp = '#title<#paints'AND p.pathexp = '#title<#paints'

Schema query - Schema query - Find the names of the classes that are Find the names of the classes that are http://www.w3.org/2000/01/rdf-schema#http://www.w3.org/2000/01/rdf-schema# Resource Resource’s direct sup’s direct super class:er class:

SELECT c1.classNameSELECT c1.classNameFROM class AS c, class AS c1FROM class AS c, class AS c1WHERE c.pre < c1.preWHERE c.pre < c1.preAND c.post > c1.postAND c.post > c1.postAND c.depth = c1.depth - 1AND c.depth = c1.depth - 1AND c.className =AND c.className ='http://www.w3.org/2000/01/rdf-schema#Resource''http://www.w3.org/2000/01/rdf-schema#Resource'

Page 15: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Summary & ConclusionSummary & Conclusion

• The main reason for the study is to The main reason for the study is to improve the performance, while retrieving improve the performance, while retrieving RDF related data and path based querying RDF related data and path based querying of Relational RDF data is efficient as it of Relational RDF data is efficient as it reduces number of joins. Also, It is for both reduces number of joins. Also, It is for both RDF without schema, and RDF with RDF without schema, and RDF with schema data. The paper assumes that schema data. The paper assumes that most of the RDF data is acyclic. The other most of the RDF data is acyclic. The other thing to observe is, sub graph extraction thing to observe is, sub graph extraction into 5 sub graphs. into 5 sub graphs.

Page 16: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• Data is stored based on 5 sub Data is stored based on 5 sub graphs. Extended interval graphs. Extended interval numbering scheme is used to numbering scheme is used to detect parent – child detect parent – child relationships, resulting into fast relationships, resulting into fast retrieval of super classes, sub retrieval of super classes, sub classes. classes.

• It is mentioned that most of the It is mentioned that most of the queries for RDF data are queries for RDF data are generally queries to detect sub generally queries to detect sub graphs matching a given graph. graphs matching a given graph. Also, they are, in general, Also, they are, in general, queries to detect a set of nodes, queries to detect a set of nodes, which can be reached via given which can be reached via given path expression. So, RDF data path expression. So, RDF data can be dealt more efficiently can be dealt more efficiently using path based queries.using path based queries.

Page 17: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

Why Relational RDF…Why Relational RDF…• Because Flat & Hash approaches Because Flat & Hash approaches

do not make any distinction do not make any distinction between schema information & between schema information & resource descriptions.resource descriptions.

• Schema approach is able to Schema approach is able to process RDF based queries. process RDF based queries. What about schema less RDF What about schema less RDF data. Also, there is a big data. Also, there is a big overhead while maintaining overhead while maintaining schema, as it evolves. schema, as it evolves.

• Hence, Relational DB and store Hence, Relational DB and store the RDF data, schema in the RDF data, schema in separate tables.separate tables.

Page 18: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

ConclusionsConclusions : :

As both RDF schema & RDF As both RDF schema & RDF instance data are stored in to instance data are stored in to distinct relational tables, Wedistinct relational tables, We

1.1.Can handle schema less RDF Can handle schema less RDF data.data.

2.2.Can process, schema based Can process, schema based queries. (using the extended queries. (using the extended interval numbering scheme.)interval numbering scheme.)

3.3.Can process, path based Can process, path based expressions as the RDF data is expressions as the RDF data is stored in the Relational DB based stored in the Relational DB based on path expressions.on path expressions.

Page 19: Author:   Akiyoshi Matono y,  Toshiyuki Amagasa y,  Masatoshi Yoshikawa z,  Shunsuke Uemura y

• Also, the performance is Also, the performance is dramatically improved, as dramatically improved, as the length of path the length of path expression is increased. expression is increased. Refer to the graph on Page Refer to the graph on Page 6.6.

• Problems:Problems:• Sub graphing, Assumption Sub graphing, Assumption

of Acyclic data, No mention of Acyclic data, No mention of ETL if we want to convert of ETL if we want to convert from conventional. Not easy from conventional. Not easy to query (compared SQL).to query (compared SQL).