ontological conjunctive query answering over large, semi-structured knowledge bases

53
Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases Bruno Paiva Lima da Silva GraphIK Research Team, LIRMM FOSDEM 2012 - February 5th Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases PAIVA LIMA DA SILVA Bruno ([email protected]) 1 / 32

Upload: graphdevroom

Post on 21-Jun-2015

716 views

Category:

Technology


2 download

DESCRIPTION

Ontological Conjunctive Query Answering knows today a renewed interest in knowledge systems that allow for expressive inferences. Most notably in the Semantic Web domain, this problem is known as Ontology-Based Data Access. The problem consists in, given a knowledge base with some factual knowledge (very often a relational database) and universal knowledge (ontology), to check if there is an answer to a conjunctive query in the knowledge base. This problem has been successfully studied in the past, however the emergence of large and semi-structured knowledge bases and the increasing interest on non-relational databases have slightly changed its nature.This presentation will highlight the following aspects. First, we introduce the problem and the manner we have chosen to address it. We then discuss how the size of the knowledge base impacts our approach. In a second time, we introduce the ALASKA platform, a framework for performing knowledge representation & reasoning operations over heterogeneously stored data. Finally we present preliminary results obtained by comparing efficiency of existing storage systems when storing knowledge bases of different sizes on disk and future implications.

TRANSCRIPT

Page 1: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Ontological Conjunctive Query Answering overLarge, Semi-Structured Knowledge Bases

Bruno Paiva Lima da Silva

GraphIK Research Team, LIRMM

FOSDEM 2012 - February 5th

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 1 / 32

Page 2: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

1 Introduction

2 Research Problem

3 ALASKA platform

4 Tests & Results

5 Current & Future work

6 Questions

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 2 / 32

Page 3: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

About me

Bruno PAIVA LIMA DA SILVA

2nd year PhD Student @GraphIK Research Team

(http://www2.lirmm.fr/graphik)

GraphIK team is located at LIRMM, Montpellier, France.

Research topics: Knowledge representation (interrogation of knowledge bases),record linkage & argumentation problems

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 3 / 32

Page 4: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Ontological Conjunctive Query Answering

Problem:

Ontological Conjunctive Query Answering (OCQA)[Also known as Ontology-based Data Access (ODBA)]

Given:

Knowledge base (KB)

Factual knowledgeOntology (Universal knowledge)

(Boolean) Conjunctive Query

OCQA consists in verifying if there is (or not) an answer to the query in the

KB (if the query can be deduced from tke KB).

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 4 / 32

Page 5: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Ontological Conjunctive Query Answering

Problem:

Ontological Conjunctive Query Answering (OCQA)[Also known as Ontology-based Data Access (ODBA)]

Given:

Knowledge base (KB)

Factual knowledgeOntology (Universal knowledge)

(Boolean) Conjunctive Query

OCQA consists in verifying if there is (or not) an answer to the query in the

KB (if the query can be deduced from tke KB).

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 4 / 32

Page 6: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Ontological Conjunctive Query Answering

Problem:

Ontological Conjunctive Query Answering (OCQA)[Also known as Ontology-based Data Access (ODBA)]

Given:

Knowledge base (KB)

Factual knowledgeOntology (Universal knowledge)

(Boolean) Conjunctive Query

OCQA consists in verifying if there is (or not) an answer to the query in the

KB (if the query can be deduced from tke KB).

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 4 / 32

Page 7: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Let us describe the problem through a quick example:

Factual knowledge:Alice and Bob are animals.Alice is a clownfish. Bob is a parrot.

Ontology:“A clownfish is a fish.”“A fish swims.”“A parrot is a bird.”“A bird flies.”

Query #1:Is there a clownfish? Yes, Alice.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 5 / 32

Page 8: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Let us describe the problem through a quick example:

Factual knowledge:Alice and Bob are animals.Alice is a clownfish. Bob is a parrot.

Ontology:“A clownfish is a fish.”“A fish swims.”“A parrot is a bird.”“A bird flies.”

Query #1:Is there a clownfish?

Yes, Alice.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 5 / 32

Page 9: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Let us describe the problem through a quick example:

Factual knowledge:Alice and Bob are animals.Alice is a clownfish. Bob is a parrot.

Ontology:“A clownfish is a fish.”“A fish swims.”“A parrot is a bird.”“A bird flies.”

Query #1:Is there a clownfish? Yes, Alice.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 5 / 32

Page 10: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.Alice is a fish.Alice swims.

Bob is a bird.Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 11: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.

Alice is a fish.Alice swims.

Bob is a bird.Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 12: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.Alice is a fish.

Alice swims.Bob is a bird.

Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 13: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.Alice is a fish.Alice swims.

Bob is a bird.Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 14: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.Alice is a fish.Alice swims.

Bob is a bird.

Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 15: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.Alice is a fish.Alice swims.

Bob is a bird.Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 16: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Example

Query #2:Is there an animal who flies?

Factual knowledge:Alice and Bob are animals.

Alice is a clownfish.Bob is a parrot.Alice is a fish.Alice swims.

Bob is a bird.Bob flies.

Ontology:“A clownfish is a fish.”

“A fish swims.”“A parrot is a bird.”

“A bird flies.”

Answer: Yes, Bob.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 6 / 32

Page 17: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

First-Order Logic

We use a decidable subset of First-Order Logic (FOL) to represent the problem:

Definitions:

Terms: Alice, BobPredicates: flies(x), swims(x), friend(x,y), between(x,y,z)Atoms: parrot(Bob), friend(Alice,Bob)Rules: ∀x [hypothesis] bird(x) → [conclusion] flies(x)

According to this formalism, we have:

Factual knowledge: conjunctions of atoms

Ontology: set of rules

Conjunctive Query: conjunctions of atoms

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 7 / 32

Page 18: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Equivalences

According to the chosen set of rules, we retrieve semanticalequivalences from our problem into others that are or have alreadybeen studied in the litterature.

If O is empty, our problem becomes equivalent to theEntailment problem in RDF language.

If O is a set of ∀-rules, we enter the RDFS, Datalog andConceptual Graphs (CGs) scope.[“if x has a car, then x has a driving licence”]

If O is a set of ∀∃-rules, we obtain an equivalence to theproblems found in Datalog± and CGs with rules.[“if x is an human, it exists y , another human, which is its parent”]

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 8 / 32

Page 19: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Equivalences

According to the chosen set of rules, we retrieve semanticalequivalences from our problem into others that are or have alreadybeen studied in the litterature.

If O is empty, our problem becomes equivalent to theEntailment problem in RDF language.

If O is a set of ∀-rules, we enter the RDFS, Datalog andConceptual Graphs (CGs) scope.[“if x has a car, then x has a driving licence”]

If O is a set of ∀∃-rules, we obtain an equivalence to theproblems found in Datalog± and CGs with rules.[“if x is an human, it exists y , another human, which is its parent”]

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 8 / 32

Page 20: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Equivalences

According to the chosen set of rules, we retrieve semanticalequivalences from our problem into others that are or have alreadybeen studied in the litterature.

If O is empty, our problem becomes equivalent to theEntailment problem in RDF language.

If O is a set of ∀-rules, we enter the RDFS, Datalog andConceptual Graphs (CGs) scope.[“if x has a car, then x has a driving licence”]

If O is a set of ∀∃-rules, we obtain an equivalence to theproblems found in Datalog± and CGs with rules.[“if x is an human, it exists y , another human, which is its parent”]

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 8 / 32

Page 21: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Deduction

F |= Q... iff there is a substitution S associating every term of the query to a term inthe facts.

Problem: Finding substitutions(Also known as ENTAILMENT)

{F ,O} |= Q... iff after being enriched by O, there is a substitution S associating everyterm of the query to a term in the facts.

Problem: Applying rules, Finding substitutions(Also known as RULE-ENTAILMENT)

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 9 / 32

Page 22: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Rule application

There are two distinct methods for applying rules:

Forward chaining: (seen in the example)Knowledge base information is increased with rule application.Queries are applied (homomorphism computation) into thefacts when no more information can be added (the base issaturated).

Backwards chaining:Initial query is decomposed/rewritten according to the rules ofthe ontology. Those new queries are then applied to theknowledge base, which was not modified.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 10 / 32

Page 23: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Elementary operations

The efficiency of finding substitutions and applying rules stepsdepends on the efficiency of some elementary operations:

Finding substitutions (homomorphism):

Retrieving a term in the knowledge base.

Retrieving adjacent terms (neighbourhood) of a given term.

Check the existence of an atom with given terms.

Rule application:

Finding substitutions.

Inserting new pieces of information from time to time (andnot all at once).

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 11 / 32

Page 24: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Overview

Until very recently...

Factual knowledge=

RDBMS

However different new factors have appeared, changing the natureof the problem...

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 12 / 32

Page 25: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Overview

Until very recently...

Factual knowledge=

RDBMS

However different new factors have appeared, changing the natureof the problem...

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 12 / 32

Page 26: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Overview

Until very recently...

Factual knowledge=

RDBMS

However different new factors have appeared, changing the natureof the problem...

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 12 / 32

Page 27: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

New factors

Semi-structured data(Abiteboul,1997)Knowledge bases with: “irregular, partial or implicit structure”,

“very large schema”, “schema is ignored”, “schema evolving

rapidly”, “difficult distinction between schema and data”, etc.

Emergence of semi-structured knowledge bases over the web.

KBs can now be very large (see the Semantic Web).

For our work: Large → Does not fit in main memory.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 13 / 32

Page 28: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

State of the art

What we already know about the subject...

RDBs handle very well data stored in secondary memory,however:

Using SQL for querying is not the best solution, as it relies onjoins, which become very costly on larger queries.Homomorphism algorithms use SQL statements for elementaryoperations: their complexity also depend on the size of thetables.

Graph homomorphism works very well with graphs stored inmemory. They were not tested on GDBs yet.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 14 / 32

Page 29: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Objectives

Three different approaches to this problem exist in the litterature:

1 Approximative and probabilistic algorithms.

2 Algorithms optimization.

3 Analysis of storage methods.

We try to show that items 2 and 3 are tightly correlated. How?

Investigating different storage models (RDBs, GDBs & Triple Stores) and their

internal data structure.

Using an abstract architecture to compare their efficiency on elementary

operations.

Writing an efficient algorithm for deduction.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 15 / 32

Page 30: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Objectives

Three different approaches to this problem exist in the litterature:

1 Approximative and probabilistic algorithms.

2 Algorithms optimization.

3 Analysis of storage methods.

We try to show that items 2 and 3 are tightly correlated. How?

Investigating different storage models (RDBs, GDBs & Triple Stores) and their

internal data structure.

Using an abstract architecture to compare their efficiency on elementary

operations.

Writing an efficient algorithm for deduction.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 15 / 32

Page 31: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

ALASKA platform

ALASKA platform

Abstract Logic-based Architecture Storage systems & Knowledge base Analysis

Its goal is to enable to perform OCQA in a logical, generic manner, over

existing, heterogenous storage systems.

Graph to RDB, RDB to Graph, all using an intermediary translation into logics.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 16 / 32

Page 32: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Features & details

Multi-layered architecture: Program goes from higher level

operations down to I/O disk functions.

Classes and interfaces ensuring all the storage systems connected

will have same methods, using a common datatype (based on FOL).

Written in JAVA: Very easy to plug several pieces of code in,

however, with a significant loss in speed and efficiency.

Systems already connected: TSs (Jena, Sesame), RDBs (MySQL,

Sqlite), GDBs (DEX, Neo4j) - Non-definitive list

All layer below application layer work as the lower level part forOCQA (and other KR problems) computation.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 17 / 32

Page 33: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Class diagram

KRRoperations

IFact

< interface >

IAtom

< interface >

ITerm

< interface >

GDBConnectors

RDBConnectors

TSConnectors

Predicate TermAtom

GDB RDB TS

Applicationlayer (1)

Abstractlayer (2)

Translationlayer (3)

Datalayer (4)

Figure: Class diagram of ALASKA architecture.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 18 / 32

Page 34: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

ALASKA for OCQA

Our current goal is to use ALASKA to verify the efficiency of theconnected systems on elementary operations:

Storage tests:

Measuring the time and size when storing smaller, then larger

knowledge bases on disk.

Querying tests:

Measuring the time that each system takes to answer a set of

queries using different algoritms/query engines.

Once both tests are done, there will be a result analysis stage:

Is the best system for storage also the best for querying?

Is there a system that performs excellently on a certain task?

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 19 / 32

Page 35: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Storage algorithm

input = getInputManager();

fact = new XFact(DB location);A fact is created or loaded.X ∈ {DEX, Sqlite, Neo4j, MySQL, etc.}

atoms = input.parse(content);Content is parsed, an atom iterator is returned

fact.store(atoms);Atoms are added to the fact according to the storage type

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 20 / 32

Page 36: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Storage algorithm

Case for a graph database:

Algorithm 1: KB to HypergraphInput: A an atom iteratorOutput: a boolean value

begin1g ←− empty graph;2foreach Atom a in A do3

foreach Term ti in a.terms do4if !exists node with label t then5

if t is a constant term then t ←− c : t;6else t ←− v : t;7

add hyperedge (t1,...,tn) with label a.predicate to g ;8

return true;9

end10

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 21 / 32

Page 37: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Storage algorithm

Case for a relational database:

Algorithm 2: KB to RDBInput: A an atom iteratorOutput: a boolean value

begin1foreach Atom a in A do2

p ←− a.predicate;3if !exists table with label p then4

create table with name p;5

foreach Term t in a.terms do6if t is a constant term then t ←− c : t;7else t ←− v : t;8

insert (t1,...,tn) into table p;9

return true;10

end11

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 22 / 32

Page 38: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Workflow

RDF FileInput

ManagerRDF Parser

IFactManager

IFact to GDBTranslation

IFact to RDBTranslation

Graph DBRelational

DBTriple Store

Layer (1)

Layer (2)

Layer (3)

Layer (4)

Figure: Testing protocol workflow for storing a knowledge base in RDF.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 23 / 32

Page 39: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Input elements

For our tests, we use knowledge bases from the SP2B Project:

Presented in 2008 at ISWC.

Initially a SPARQL benchmark.

Has defined a set of queries that covers all SPARQLspecifications.

Also features a Knowledge Base generator, inspired on theDBLP structure.

The generator is able to create bases of any size, maintainingthe same structure.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 24 / 32

Page 40: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Preliminary results

Using our platform, we have evaluated the insertion efficiency ofdifferent storage systems:

Knowledge Base

Transformation into IFact

RelationalDatabase

GraphDatabase

TriplesStore

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 25 / 32

Page 41: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Preliminary results

Using our platform, we have evaluated the insertion efficiency ofdifferent storage systems:

Knowledge Base

Transformation into IFact

RelationalDatabase

GraphDatabase

TriplesStore

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 25 / 32

Page 42: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Preliminary results

Using our platform, we have evaluated the insertion efficiency ofdifferent storage systems:

Knowledge Base

Transformation into IFact

RelationalDatabase

GraphDatabase

TriplesStore

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 25 / 32

Page 43: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Issues

Results have shown that our method was not really appropriate for thesize of knowledge bases we aim work with:

Parsing issues:

More or less memory is used by our program according to the parsing method used.

Bigger memory consumption at parsing = less memory available for the storage system.

Transaction sizes:

At a certain level, it is impossible to store all information at once (Most systems went on swap).

Creation of an atom buffer: information is treated in pieces, parsed then stored in a smaller transaction.

Garbage collecting:

GC overhead limit errors on almost every storage system bases beyond 20M triples.

Recycling JAVA objects became mandatory: setting/re-setting objects attributes instead of

creations/destructions.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 26 / 32

Page 44: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Changes & improvements

The algorithm was then changed to the following version:

input = getInputManager();fact = new XFact(DB location);input.store(fact,content);

Calling the store method makes the parser create the atom buffer (array).An event is thrown when the parser finishes parsing a statement.

Event handling method:

if (buffer is full) { fact.store(buffer); position = 0; }The fact now only stores N (buffer size) atoms at a time.buffer[position].setPredicate(stmtPredicate);buffer[position].setTerms([stmtSubject,stmtObject]);Atoms in buffer are now recycled(Number of atoms created/destroyed = buffer size).position++;

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 27 / 32

Page 45: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

New results

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 28 / 32

Page 46: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Interrogation tests

Next step in the project will be to perform interrogation tests.

Using the platform + a Datalog-to-SQL algorithm, we aimevaluating querying performances of the selected storage systems:

For GDBs:Comparing the efficiency of each system using the same algorithm.

For RDBs:Comparing the efficiency of our algorithm against the native SQLinterface.

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 29 / 32

Page 47: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Workflow

The workflow of the tests is detailed below:

F |= Q

AbstractArchitecture

Graph DBRelational DB

Test results− Query size TimeBT ... terms ... s

SQL ... terms ... s

Test results− Query size TimeBT ... terms ... s

Graph ... terms ... s

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 30 / 32

Page 48: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Workflow

The workflow of the tests is detailed below:

F |= Q

AbstractArchitecture

Graph DBRelational DB

Test results− Query size TimeBT ... terms ... s

SQL ... terms ... s

Test results− Query size TimeBT ... terms ... s

Graph ... terms ... s

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 30 / 32

Page 49: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Workflow

The workflow of the tests is detailed below:

F |= Q

AbstractArchitecture

Graph DBRelational DB

Test results− Query size TimeBT ... terms ... s

SQL ... terms ... s

Test results− Query size TimeBT ... terms ... s

Graph ... terms ... s

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 30 / 32

Page 50: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Workflow

The workflow of the tests is detailed below:

F |= Q

AbstractArchitecture

Q → SQL

Graph DBRelational DB

Test results− Query size TimeBT ... terms ... sSQL ... terms ... s

Test results− Query size TimeBT ... terms ... s

Graph ... terms ... s

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 30 / 32

Page 51: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Workflow

The workflow of the tests is detailed below:

F |= Q

AbstractArchitecture

Q → SQLQ → Graph

Query

Graph DBRelational DB

Test results− Query size TimeBT ... terms ... sSQL ... terms ... s

Test results− Query size TimeBT ... terms ... s

Graph ... terms ... s

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 30 / 32

Page 52: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Current & Future work

Currently, we focus on finding (really) difficult queries to checkalgorithms behaviour on these cases.

But some questions in this field are still open:

Traversal queries:Can they enhance homomorphism computation? How?

Real world KBs vs. generated KBs

Can we integrate a constraint solving program for computinghomomorphism?

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 31 / 32

Page 53: Ontological Conjunctive Query Answering over large, semi-structured knowledge bases

Introduction Research Problem ALASKA platform Tests & Results Current & Future work Questions

Questions

Thank you!

Questions & comments...

Ontological Conjunctive Query Answering over Large, Semi-Structured Knowledge Bases

PAIVA LIMA DA SILVA Bruno ([email protected]) 32 / 32