parallel datalog reasoning in rdfox presentation
Post on 09-Jul-2015
185 Views
Preview:
DESCRIPTION
TRANSCRIPT
PARALLEL DATALOG REASONING IN RDFOX
Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks, Dan Olteanu
University of Oxford
September 25, 2014
TABLE OF CONTENTS
1 INTRODUCTION
2 OUR APPROACH TO PARALLEL MATERIALISATION
3 EVALUATION
4 CONCLUSION
Motik et al. Parallel Datalog Reasoning in RDFox 0/9
Introduction
TABLE OF CONTENTS
1 INTRODUCTION
2 OUR APPROACH TO PARALLEL MATERIALISATION
3 EVALUATION
4 CONCLUSION
Motik et al. Parallel Datalog Reasoning in RDFox 0/9
Introduction
CURRENT TRENDS IN KNOWLEDGE/DATABASE SYSTEMS
Materialisation is computationally intensive ⇒ natural to parallelisemid-range laptops have 4 cores, servers with 16 cores are routine
Price of RAM keeps falling128 GB is routine, systems with 1 TB are emergingin-memory databases: SAP’s HANA, Oracle’s TimesTen, YarcData’s Urika
Motik et al. Parallel Datalog Reasoning in RDFox 1/9
Introduction
RDFOX SUMMARY
RDFox: a new RDF store and reasoner
http://www.cs.ox.ac.uk/isg/tools/RDFox/
store data in RAM
effectively use modern multi-core architectures
Core focus: materialisation reasoning (recursive) datalog rules
e.g., 〈x , rdf :type, y〉 ∧ 〈y , rdfs:subClassOf , z〉 → 〈x , rdf :type, z〉handle OWL 2 RL and the Semantic Web Rule Language (SWRL)
explicitly store all implied triples so that queries can be evaluated directly
used in a number of systems: Oracle’s database, OWLIM, WebPIE
Used in projects for reasoning beyond OWL 2 RL
‘combined approach’ for reasoning with OWL 2 EL
PAGOaA reasoner for OWL 2 DL comdining RDFox with HermiT
Motik et al. Parallel Datalog Reasoning in RDFox 2/9
Our Approach to Parallel Materialisation
TABLE OF CONTENTS
1 INTRODUCTION
2 OUR APPROACH TO PARALLEL MATERIALISATION
3 EVALUATION
4 CONCLUSION
Motik et al. Parallel Datalog Reasoning in RDFox 2/9
Our Approach to Parallel Materialisation
EXISTING APPROACHES TO PARALLEL MATERIALISATION
Interquery parallelism: run independent rules in parallel
degree of parallelism limited by the number of independent rules
⇒ does not distribute workload to cores evenly
Intraquery parallelism: partition rule instantiations to N threads
e.g., constrain the body of rules evaluated by thread i to (x mod N = i)
⇒ static partitioning may not distribute workload well due to data skew
⇒ dynamic partitioning may incur an overhead due to load balancing
Goal: distribute workload to threads evenly and with minimum overhead
Motik et al. Parallel Datalog Reasoning in RDFox 3/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery:
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
⇒ R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: A(a)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)⇒ R(a,c)
R(b,d)R(b,e)A(a)R(c,f)R(c,g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: A(a)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)
⇒ R(b,d)R(b,e)A(a)R(c,f)R(c,g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: A(b)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)
⇒ R(b,e)A(a)R(c,f)R(c,g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: A(b)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)
⇒ A(a)R(c,f)R(c,g)A(b)A(c)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(a,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)
⇒ R(c,f)R(c,g)A(b)A(c)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: A(c)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)
⇒ R(c,g)A(b)A(c)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: A(c)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)
⇒ A(b)A(c)A(d)A(e)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(b,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)A(b)
⇒ A(c)A(d)A(e)A(f)A(g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(c,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)A(b)A(c)
⇒ A(d)A(e)A(f)A(g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(d,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)A(b)A(c)A(d)
⇒ A(e)A(f)A(g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(e,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)A(b)A(c)A(d)A(e)
⇒ A(f)A(g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(f,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
SOLUTION PART I: ALGORITHM
R(a,b)R(a,c)R(b,d)R(b,e)A(a)R(c,f)R(c,g)A(b)A(c)A(d)A(e)A(f)
⇒ A(g)
A(x) ∧ R(x , y) → A(y)
For each fact:match the fact to all body atoms to obtain subqueriesevaluate subqueries w.r.t. all previous factsadd results to the table
Current subquery: R(g,y)
Motik et al. Parallel Datalog Reasoning in RDFox 4/9
Our Approach to Parallel Materialisation
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of factsensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
EXAMPLE WHEN PARALLELISATION FAILS
A(x) ∧ R(x , y) → A(y)
R(a, b), R(b, c), R(c, d), R(d , e), A(a)
, A(b), A(c), A(d), A(e)
Motik et al. Parallel Datalog Reasoning in RDFox 5/9
Our Approach to Parallel Materialisation
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of factsensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
EXAMPLE WHEN PARALLELISATION FAILS
A(x) ∧ R(x , y) → A(y)
R(a, b), R(b, c), R(c, d), R(d , e), A(a), A(b)
, A(c), A(d), A(e)
Motik et al. Parallel Datalog Reasoning in RDFox 5/9
Our Approach to Parallel Materialisation
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of factsensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
EXAMPLE WHEN PARALLELISATION FAILS
A(x) ∧ R(x , y) → A(y)
R(a, b), R(b, c), R(c, d), R(d , e), A(a), A(b), A(c)
, A(d), A(e)
Motik et al. Parallel Datalog Reasoning in RDFox 5/9
Our Approach to Parallel Materialisation
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of factsensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
EXAMPLE WHEN PARALLELISATION FAILS
A(x) ∧ R(x , y) → A(y)
R(a, b), R(b, c), R(c, d), R(d , e), A(a), A(b), A(c), A(d)
, A(e)
Motik et al. Parallel Datalog Reasoning in RDFox 5/9
Our Approach to Parallel Materialisation
PARALLELISING COMPUTATION
Each thread extracts facts and evaluates subqueries independently
The number of subqueries is determined by the number of factsensures in practice that threads are equally loaded
Requires no thread synchronisation
⇒ We partition rule instances dynamically and with little overhead
EXAMPLE WHEN PARALLELISATION FAILS
A(x) ∧ R(x , y) → A(y)
R(a, b), R(b, c), R(c, d), R(d , e), A(a), A(b), A(c), A(d), A(e)
Motik et al. Parallel Datalog Reasoning in RDFox 5/9
Our Approach to Parallel Materialisation
SOLUTION PART II: INDEXING RDF DATA IN MAIN MEMORY
The critically algorithm depends on:
matching atoms 〈t1, t2, t3〉 with ti a constant or variable
continuous concurrent updates
Our RDF storage data structure:
hash-based indexes ⇒ naturally parallel data structure‘mostly’ lock-free: at least one thread makes progress at most of the time
compare: if a thread acquire a lock and dies, other threads are blockedmain benefit: performance is less susceptible to scheduling decisions
Main technical challenge: reduce thread interference
when A writes to a location cached by B, the cache of B is invalidated
our updates ensure that threads (typically) write to different locations
Motik et al. Parallel Datalog Reasoning in RDFox 6/9
Evaluation
TABLE OF CONTENTS
1 INTRODUCTION
2 OUR APPROACH TO PARALLEL MATERIALISATION
3 EVALUATION
4 CONCLUSION
Motik et al. Parallel Datalog Reasoning in RDFox 6/9
Evaluation
CONCURRENCY OVERHEAD AND PARALLELISATION SPEEDUP
RDFox: an RDF store developed at Oxford University
http://www.cs.ox.ac.uk/isg/tools/RDFox/
8 16 24 32
2
4
6
8
10
12
14
16
18
20ClarosL
ClarosLEDBpediaL
DBpediaLELUBML01KLUBMU 01K
8 16 24 32
2
4
6
8
10
12
14
16
18
20UOBML01KUOBMU 010LUBMLE 01KLUBML05K
LUBMLE 05KLUBMU 05K
Small concurrency overhead; parallelisation pays off already with two threads
Speedup continues to increase after we exhaust all physical cores
⇒ hyperthreading and parallelism can compensate CPU cache misses
Motik et al. Parallel Datalog Reasoning in RDFox 7/9
Evaluation
COMPARISON WITH THE STATE OF THE ART
8 16 24 32
2
4
6
8
10
12
14
16
18
20ClarosL
ClarosLEDBpediaL
DBpediaLELUBML01KLUBMU 01K
(a) Scalability and Concurrency Overhead of SysX
ClarosL ClarosLE DBpediaL DBpediaLELUBML
01KLUBMLE
01KLUBMU
01KLUBML
05KLUBMLE
05KLUBMU
05KUOBMU
010UOBML
01KTime Spd Time Spd Time Spd Time Spd Time Spd Time Spd Time Spd Time Spd Time Spd Time Spd Time Spd Time Spd
seq 1907 1.3 4128 1.2 158 1.0 8457 1.1 68 1.1 913 1.0 122 1.1 422 1.0 4464 1.1 580 1.1 2381 1.2 476 1.11 2477 1.0 4989 1.0 161 1.0 9075 1.0 73 1.0 947 1.0 135 1.0 442 1.0 4859 1.0 635 1.0 2738 1.0 532 1.02 1177 2.1 2543 2.0 84 1.9 4699 1.9 35 2.1 514 1.8 61 2.2 221 2.0 2574 1.9 310 2.1 1400 2.0 247 2.28 333 7.4 773 6.5 28 5.8 1453 6.2 14 5.2 155 6.1 20 6.8 65 6.8 745 6.5 96 6.6 451 6.1 72 7.4
16 179 13.9 415 12.0 26 6.1 828 11.0 8 8.7 88 10.8 13 10.6 42 10.6 Mem 60.7 10.5 256 10.7 42 12.624 139 17.8 313 15.9 25 6.4 695 13.1 7 10.9 77 12.4 11 12.7 39 11.3 Mem 54.3 11.7 188 14.6 34 15.532 127 19.5 285 17.5 24 6.6 602 15.1 7 10.1 71 13.4 10 13.4 Mem Mem Mem 168 16.3 31 17.1
Seq. imp. 61 57 331 335 421 408 410 2610 2587 2643 6 798Par. imp. 58 -4.9% 58 1.7% 334 0.9% 367 9.5% 415 1.4% 415 1.7% 415 1.2% 2710 3.8% 3553 37% 2733 3.4% 6 0.0% 783 -1.9%Triples 95.5 M 555.1 M 118.3 M 1529.7 M 182.4 M 332.6 M 219.0 M 911.2 M 1661.0 M 1094.0 M 35.6 M 429.6 M
Memory 4.2 GB 18.0 GB 6.1 GB 51.9 GB 9.3 GB 13.8 GB 11.1 GB 49.0 GB 75.5 GB 53.3 GB 1.1 GB 20.2 GBActive 93.3% 98.8% 28.1% 94.5% 66.5% 90.3% 85.3% 66.5% 90.3% 85.3% 99.7% 99.1%
8 16 24 32
2
4
6
8
10
12
14
16
18
20UOBML01KUOBMU 010
LUBMLE 01KLUBML05K
LUBMLE 05KLUBMU 05K
(b) Comparison of SysX with DBRDF and OWLIM-LiteSysX PG-VP MDB-VP PG-TT MDB-TT OWLIM-Lite
Import Materialise Import Materialise Import Materialise Import Import Import MaterialiseT B/t T B/t T B/t T B/t T B/t T B/t T B/t T B/t T B/t T B/t
ClarosL 48 892062 60
1138 16525026 97
883 58Mem
1174 217 896 94 300 5414293 28
ClarosLE 4218 44 Mem Mem TimeDBpediaL 274 69
143 675844 148
Mem15499 51
354 495968 174 4492 92 1735 171
14855 164DBpediaLE 9538 49 Mem Mem TimeLUBML01K
332 7471 65
7736 144948 127
6136 5627 41
7947 187 5606 104 2409 40316 36
LUBMLE 01K 765 54 15632 112 Mem TimeLUBMU 01K 113 67 Time 138 44 TimeUOBMU 010 5 68 2501 43 116 154 Time 96 50 Mem 120 203 96 85 32 69 11311 24UOBML01K 632 64 467 63 13864 119 6708 107 11063 41 358 33 14901 176 10892 101 4778 38 Time
Figure 2: Results of our Empirical Evaluation
complete, and we report averages over three runs. OWLIM-Lite combines materialisation and import, so we loaded eachRDF graph once with the test datalog program and once withno program, and we subtracted the two times; Ontotext con-firmed that this yields a good materialisation time estimate.
The graphs and Table (a) in Figure 2 show the speedupof SysX with the number of used threads. The middle partof Table (a) shows the sequential and parallel import times,and the percentage slowdown for the parallel version. Thelower part of Table (a) shows the number of triples, the mem-ory consumption, and the percentage of active triples (i.e.,triples to which a rule was applied during materialisation).
With all 16 physical cores, materialisation is up to 13.9times (on ClarosL) faster than with just one core. Speedupincreases further up to 19.5 with 32 cores (on ClarosL), sug-gesting that hyperthreading and high degree of parallelismcan compensate for CPU stalls due to random memory ac-cess. Memory exhaustion in some tests was caused by ‘pri-vate’ per-thread indexes described at the end of Section 4.These indexes, however, proved very effective at reducingthread interference, and the main remaining source of inter-ference is in retrieving triples from factsI . When the per-centage of active triples is low and the number of threads ishigh, the calls to factsI .next are more likely to overlap, assuggested by the correlation of 0.9 between the speedup for32 threads and the percentage of active triples; this explainsthe comparatively low speedup on DBpediaL.
We further analyse these results using the Karp-Flatt met-ric: if N threads achieve a speedup of s(N), the fraction ofthe parallelised work is p(N) = (1/s(N) � 1)(1/N � 1).In our case, p(32) ranges from 88% to 98%; thus, computa-tion is parallelised exceptionally well. Solving this equationfor s(N) and letting N approach infinity yields the max-
imum speedup as s(1) = 1/(1 � p(1)); assuming thatp(1) = p(32), we get s(1) between 8.3 and 50. With 32threads we thus already approach this maximum, which ex-plains the flattening of the curves in Figure 2.
Table (b) in Figure 2 compares SysX with DBRDF onPostgreSQL (PG) and MonetDB (MDB) with VP or TTscheme, and with OWLIM-Lite. Columns T show the timesin seconds, and columns B/t show the number of bytes pertriple. Import in DBRDF is about 20 times slower than inSysX; we found out that half of the import time is used inthe slow Jena RDF parser. Furthermore, VP it is more mem-ory efficient than TT by 33% as it does not store triples’spredicates. MDB-VP can be more memory-efficient thanSysX; however, MDB-TT is not, which is surprising sinceSysX does not compress data. For OWLIM-Lite, we do notknow how the dictionary is split between the disk and RAM,so the RAM consumption reported in Table (b) is a ‘best-case’ estimate. On materialisation tests, MDB-VP comfort-ably outperformed PG-VP, and it outperformed SysX onLUBML01K and UOBML01K, which is probably due tobetter query planning. However, both MDB-TT and PG-TTran out of time in all cases, apart from MDB-TT which com-pleted DBpediaL in 11,958 seconds: as Abadi et al. (2009)have already observed, self-joins on the triple table are no-toriously difficult for RDBMSs. In contrast, although it im-plements TT, SysX could successfully complete all tests.
6 Conclusion & OutlookWe presented a novel and very efficient approach to parallelmaterialisation of datalog in centralised, multi-core, main-memory RDF systems. Our main challenge is to adapt theRDF indexing scheme to secondary storage, the main diffi-culty of which is to reduce random access.
DBRDF: an implementation using RDBMS and known techniques
PostgreSQL (row store) or MonetDB (column store) as underlying engine
vertical partitioning (VP) or triple table (TT) storage model
OWLIM-Lite: a commercial RDF store by OntoText
Motik et al. Parallel Datalog Reasoning in RDFox 8/9
Conclusion
TABLE OF CONTENTS
1 INTRODUCTION
2 OUR APPROACH TO PARALLEL MATERIALISATION
3 EVALUATION
4 CONCLUSION
Motik et al. Parallel Datalog Reasoning in RDFox 8/9
Conclusion
RESEARCH DIRECTIONS
Under development/already finished:Deal with owl:sameAs via rewriting
Incremental deletions/additions
Add a data/query/reasoning distribution layer
Future work:Investigate potential for data compression
Improve join cardinality estimation
Improve query planning
Motik et al. Parallel Datalog Reasoning in RDFox 9/9
top related