cliquesquare processing

20
CliqueSquare Query Processing OAK Code Review September 16-18, 2014 Elham Akbari Azirani Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014

Upload: inria-oak

Post on 23-Jun-2015

172 views

Category:

Data & Analytics


1 download

DESCRIPTION

CliqueSquare RDF plateform, query processing overview

TRANSCRIPT

Page 1: CliqueSquare processing

CliqueSquare Query Processing

OAK Code Review !September 16-18, 2014!!Elham Akbari Azirani

Elham Akbari, Oak team, Inria Saclay, Orsay!Fall 2014

Page 2: CliqueSquare processing

Acknowledgement

❖ I do NOT own anything about CliqueSquare.!

❖ Just reviewing the code.!

❖ This presentation is prepared by the help of Stamatis Zambetakis and Benjamin Djahandideh.

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"2

Page 3: CliqueSquare processing

What are we presenting?❖ We’re going to present part of the code of CliqueSquare project.!

❖ CliqueSquare?!

❖ A distributed RDF data management platform, working on top of Hadoop.!

❖ Parts?!

❖ Data Partitioning: stores RDF data in a distributed file system.!

❖ Query Processing: Answers queries. !

❖ Finds an optimum plan for executing each query.!

❖ Executes the query by passing it to Hadoop.

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"3

Page 4: CliqueSquare processing

What are we presenting?❖ We’re going to present part of the code of CliqueSquare project.!

❖ CliqueSquare?!

❖ A distributed RDF data management platform, working on top of Hadoop.!

❖ Parts?!

❖ Data Partitioning: stores RDF data in a distributed file system.!

❖ Query Processing: Answers queries. !

❖ Finds an optimum plan for executing each query.!

❖ Executes the query by passing it to Hadoop.

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"4

Page 5: CliqueSquare processing

Where is the code?❖ The whole project: https://gforge.inria.fr/scm/viewvc.php/hadoop/

cliquesquare/?root=xmlinthecloud !

❖ Trunk: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/trunk/?root=xmlinthecloud!

❖ Query Processing (under package fr.inria.oak.cliquesquare.query.*: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/trunk/src/main/java/fr/inria/oak/cliquesquare/query/?root=xmlinthecloud!

❖ First release (yet to come) will be here: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/tags/?root=xmlinthecloud !

❖ Branches: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/branches/?root=xmlinthecloud (Currently contains a version of cliquesquare working with file indexing)

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"5

Page 6: CliqueSquare processing

PeopleContributed in the past:!

Jorge Quiane!

Zoi Kaoudi!

Ioana Manolescu!

François Goasdoué!

Stamatis Zampetakis!

Benjamin Djahandideh

Currently working in the code:!

Stamatis Zampetakis!

Benjamin Djahandideh

Currently using:!Zoi Kaoudi!

Stamatis Zampetakis!

Benjamin Djahandideh

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"6

Page 7: CliqueSquare processing

Code Structure: Packagesalgorithms!

(4)

simple!(4)

skewed!(6)

partitioner

utils!(7)

query!(80)

experiments(4)

exceptions(2)

query

engine!(33)

graph!(20)

pop!(10)

mrop!(11)

pred!(1)

translators!(3)!

cliquesquare

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"7

Page 8: CliqueSquare processing

What the code does: Layers & I/OParser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

Execution plan: logical operators

Execution plan: physical operators

Execution plan: map reduce operators

graph of query

query answer

Sparql !query

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"8

Page 9: CliqueSquare processing

algorithms!

simple!skewed!

partitioner

utils!

query!

experiments(4)

exceptions(2)

Layers: Parser• Code in “conjunctivequery”

project, class CQParser.!

• Used in cliquesquare/algorithms/*

Parser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"9

Page 10: CliqueSquare processing

t1t2

Builds Query Graph: !

• Each triple: a node!

• Each common variable: an edge.!

• Example: !q:- ?s :p1 o1, ?s ?p2 o2, ?s ?p2 o3!

!

t3

Layers: Parser

t1 t2 t3

Parser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"10

Page 11: CliqueSquare processing

Layers: Logical OptimizerCliqueSquare AlgorithmParser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

query graph as initial state

state (i)!(graph)

<state0, state1, state2,.., state(n)>

choose a !decomposition!

and reduce

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"11

Page 12: CliqueSquare processing

Layers: Logical OptimizerCode in cliquesquare/query/

graph/*, (contains description of!graph, decomposition algorithms)

Parser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

query

engine!(33)

graph!(20)

pop!

mrop!

pre translato

code for the decomposition

part in cliquesquare

algorithm (and description of

graph)

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"12

Page 13: CliqueSquare processing

Layers: Physical OptimizerParser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

• Input: set of high level query execution plans!

• Output: best query execution plan in terms of physical operators!

• Packages:!

• pop: declaration of physical operators!

• Translator: class Logical2Physical.java!

• Estimator project (to compare costs)

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"13

Page 14: CliqueSquare processing

Layers: Job TranslatorParser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

• Input: query execution plan in terms of physical operators !

• Output: query execution plan in terms of map/reduce operators (suitable for distributed data)!

• Packages:!

• mrop: description of map/reduce operators!

• Translator: class Physical2MR.java, MR2Jobs.java

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"14

Page 15: CliqueSquare processing

Layers: HadoopParser

Logical Optimizer

Physical Optimizer

Job Translator

Hadoop

• Input: query execution plan in terms of map/reduce operators !

• Output: answer of query!

• Packages:!

• engine: exploits Hadoop to execute query!

• Translator: class MR2Jobs.java

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"15

Page 16: CliqueSquare processing

In a Nutshellquery

engine!(33)

graph!(20)

pop!(10)

mrop!(11)

pred!(1)

translators!(3)!

logical plans and graph decompositions !

physical operators’ declarations!

map reduce operators’ declarations!

translate plans to different levels of abstractions!

query execution engine. Imports Hadoop.!

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"16

Page 17: CliqueSquare processing

Libraries and imported code❖ Oak projects imported:!

❖ Conjunctive query!

❖ Estimator!

❖ Software imported:!

❖ Maven!

❖ JUnit!

❖ HadoopElham Akbari, Oak team, Inria Saclay, Orsay

Fall 2014"17

Page 18: CliqueSquare processing

Code Size

❖ CliqueSquare : 129 classes, 10117 lines of code (apart from comments and blank lines)!

❖ Query processing: ~102 classes, 8095 lines of code.

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"18

Page 19: CliqueSquare processing

Known Bugs

❖ Memory overhead exceptions when dealing with queries.!

❖ Datasets oversized for the cluster configuration.

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"19

Page 20: CliqueSquare processing

Thanks for listening :)

Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"20