cliquesquare processing
DESCRIPTION
CliqueSquare RDF plateform, query processing overviewTRANSCRIPT
CliqueSquare Query Processing
OAK Code Review !September 16-18, 2014!!Elham Akbari Azirani
Elham Akbari, Oak team, Inria Saclay, Orsay!Fall 2014
Acknowledgement
❖ I do NOT own anything about CliqueSquare.!
❖ Just reviewing the code.!
❖ This presentation is prepared by the help of Stamatis Zambetakis and Benjamin Djahandideh.
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"2
What are we presenting?❖ We’re going to present part of the code of CliqueSquare project.!
❖ CliqueSquare?!
❖ A distributed RDF data management platform, working on top of Hadoop.!
❖ Parts?!
❖ Data Partitioning: stores RDF data in a distributed file system.!
❖ Query Processing: Answers queries. !
❖ Finds an optimum plan for executing each query.!
❖ Executes the query by passing it to Hadoop.
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"3
What are we presenting?❖ We’re going to present part of the code of CliqueSquare project.!
❖ CliqueSquare?!
❖ A distributed RDF data management platform, working on top of Hadoop.!
❖ Parts?!
❖ Data Partitioning: stores RDF data in a distributed file system.!
❖ Query Processing: Answers queries. !
❖ Finds an optimum plan for executing each query.!
❖ Executes the query by passing it to Hadoop.
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"4
Where is the code?❖ The whole project: https://gforge.inria.fr/scm/viewvc.php/hadoop/
cliquesquare/?root=xmlinthecloud !
❖ Trunk: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/trunk/?root=xmlinthecloud!
❖ Query Processing (under package fr.inria.oak.cliquesquare.query.*: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/trunk/src/main/java/fr/inria/oak/cliquesquare/query/?root=xmlinthecloud!
❖ First release (yet to come) will be here: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/tags/?root=xmlinthecloud !
❖ Branches: https://gforge.inria.fr/scm/viewvc.php/hadoop/cliquesquare/branches/?root=xmlinthecloud (Currently contains a version of cliquesquare working with file indexing)
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"5
PeopleContributed in the past:!
Jorge Quiane!
Zoi Kaoudi!
Ioana Manolescu!
François Goasdoué!
Stamatis Zampetakis!
Benjamin Djahandideh
Currently working in the code:!
Stamatis Zampetakis!
Benjamin Djahandideh
Currently using:!Zoi Kaoudi!
Stamatis Zampetakis!
Benjamin Djahandideh
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"6
Code Structure: Packagesalgorithms!
(4)
simple!(4)
skewed!(6)
partitioner
utils!(7)
query!(80)
experiments(4)
exceptions(2)
query
engine!(33)
graph!(20)
pop!(10)
mrop!(11)
pred!(1)
translators!(3)!
cliquesquare
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"7
What the code does: Layers & I/OParser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
Execution plan: logical operators
Execution plan: physical operators
Execution plan: map reduce operators
graph of query
query answer
Sparql !query
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"8
algorithms!
simple!skewed!
partitioner
utils!
query!
experiments(4)
exceptions(2)
Layers: Parser• Code in “conjunctivequery”
project, class CQParser.!
• Used in cliquesquare/algorithms/*
Parser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"9
t1t2
Builds Query Graph: !
• Each triple: a node!
• Each common variable: an edge.!
• Example: !q:- ?s :p1 o1, ?s ?p2 o2, ?s ?p2 o3!
!
t3
Layers: Parser
t1 t2 t3
Parser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"10
Layers: Logical OptimizerCliqueSquare AlgorithmParser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
query graph as initial state
state (i)!(graph)
<state0, state1, state2,.., state(n)>
choose a !decomposition!
and reduce
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"11
Layers: Logical OptimizerCode in cliquesquare/query/
graph/*, (contains description of!graph, decomposition algorithms)
Parser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
query
engine!(33)
graph!(20)
pop!
mrop!
pre translato
code for the decomposition
part in cliquesquare
algorithm (and description of
graph)
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"12
Layers: Physical OptimizerParser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
• Input: set of high level query execution plans!
• Output: best query execution plan in terms of physical operators!
• Packages:!
• pop: declaration of physical operators!
• Translator: class Logical2Physical.java!
• Estimator project (to compare costs)
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"13
Layers: Job TranslatorParser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
• Input: query execution plan in terms of physical operators !
• Output: query execution plan in terms of map/reduce operators (suitable for distributed data)!
• Packages:!
• mrop: description of map/reduce operators!
• Translator: class Physical2MR.java, MR2Jobs.java
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"14
Layers: HadoopParser
Logical Optimizer
Physical Optimizer
Job Translator
Hadoop
• Input: query execution plan in terms of map/reduce operators !
• Output: answer of query!
• Packages:!
• engine: exploits Hadoop to execute query!
• Translator: class MR2Jobs.java
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"15
In a Nutshellquery
engine!(33)
graph!(20)
pop!(10)
mrop!(11)
pred!(1)
translators!(3)!
logical plans and graph decompositions !
physical operators’ declarations!
map reduce operators’ declarations!
translate plans to different levels of abstractions!
query execution engine. Imports Hadoop.!
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"16
Libraries and imported code❖ Oak projects imported:!
❖ Conjunctive query!
❖ Estimator!
❖ Software imported:!
❖ Maven!
❖ JUnit!
❖ HadoopElham Akbari, Oak team, Inria Saclay, Orsay
Fall 2014"17
Code Size
❖ CliqueSquare : 129 classes, 10117 lines of code (apart from comments and blank lines)!
❖ Query processing: ~102 classes, 8095 lines of code.
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"18
Known Bugs
❖ Memory overhead exceptions when dealing with queries.!
❖ Datasets oversized for the cluster configuration.
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"19
Thanks for listening :)
Elham Akbari, Oak team, Inria Saclay, Orsay Fall 2014"20