a datalog engine for gpus - ag programmiersprachen und ...a datalog engine for gpus carlos a. mart...

25
A Datalog Engine for GPUs Carlos A. Mart´ ınez Angeles [email protected] Computer Science Department Centre for Research & Postgraduate Studies National Polytechnic Institute Cinvestav-IPN September 13, 2013 Carlos A. Mart´ ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 1/25

Upload: others

Post on 07-Apr-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

A Datalog Engine for GPUs

Carlos A. Mart́ınez Angeles

[email protected] Science Department

Centre for Research & Postgraduate StudiesNational Polytechnic Institute

Cinvestav-IPN

September 13, 2013

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 1/25

Page 2: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Contents

1. GPUs

2. Datalog

3. Our Datalog engine

4. Experimental evaluation

5. Related work

6. Conclusions

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 2/25

Page 3: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

GPUs

I Graphics Processing Units (GPUs) are high-performancemany-core processors capable of very high computationand data throughput.

I GPUs can be seen as SIMD machines: they consist of manyprocessing elements that run all a same function but ondistinct data items.

I GPUs are used in a wide array of applications, includinggaming, bioinformatics, chemistry, finance, etc.

I CUDA a software platform used to program GPUs. Itextends C by allowing the definition of functions, calledkernels, that are executed in parallel.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 3/25

Page 4: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

GPU architecture

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 4/25

Page 5: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Datalog

I Datalog is a language based on first order logic that hasbeen used as a data model for relational databases.

I A Datalog program consist of facts about a subject ofinterest and rules to deduce new facts.

I Facts and rules are specified using atomic formulas, whichconsist of predicate symbols with arguments, e.g.:

FACTS

father(harry, john).

father(john, david).

RULE

grandfather(Z, X) :- father(Y, X), father(Z, Y).

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 5/25

Page 6: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Evaluation of Datalog programs

I Datalog programs can be evaluated through a top-downapproach or a bottom-up approach.

I Bottom-up applies the rules to the given facts, derivingnew facts, and repeating this process with the new factsuntil no more facts are derivable.

I The query is considered only at the end, when the factsmatching the query are selected.

I Benefits of this approach include the fact that rules can beevaluated in any order and in a highly parallel manner.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 6/25

Page 7: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Evaluation based on relational algebra operators

I Datalog rules can be evaluated using the typical relationalalgebra operators select, join and projection.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 7/25

Page 8: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Our Datalog engine for GPUs

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 8/25

Page 9: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Parsing

I We translate facts and rules to numbers, keeping theircorresponding strings in a hashed dictionary.

I Each unique string is assigned a unique id, equal stringsare assigned the same id.

I The dictionary is used at the very end when the finalresults are to be displayed.

I The idea is to capitalise on the GPU capacity to processnumbers and to have a constant processing time for eachtuple.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 9/25

Page 10: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Preprocessing

I For each rule, we specify what operations to perform andwith which arguments should they be performed.

I To do so, we create small arrays for each operation, e.g.:fact1(A,X,Y,Z), fact2(Z,X,B,C,Y).

-> [1, 1, 2, 4, 3, 0]

I These arrays are loaded in the shared memory of the GPU.

I The idea is to allow each thread to work with the correctarguments without having to calculate them.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 10/25

Page 11: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Memory management

I We minimize transfers between GPU memory and hostmemory by maintaining facts and rule results in GPUmemory for as long as possible.

I To do so, we maintain a list with information about eachfact and rule result resident in GPU memory.

I Similar to Least Recently Used (LRU) page replacementalgorithm.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 11/25

Page 12: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Selection

I The size of the result in a selection is not knownbeforehand.

I Our selection uses three different kernel executions.I The first kernel marks all the rows that satisfy the selection

arguments with a value one.I The second kernel performs a prefix sum on the marks to

determine the size of the results buffer and the locationwhere each GPU thread must write the results.

I The last kernel writes the results.

Input numbers 1 1 0 0 1 0 ...

Prefix sum 1 2 2 2 3 3 ...

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 12/25

Page 13: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Join

I We use a modified version of the Indexed Nested LoopJoin:

I A CSS-Tree (Cache Sensitive Search Tree) is created withone of the columns to join.

I CSS-Trees can be constructed in parallel and their treetraversal is performed via address arithmetic.

I Using the tree, a preliminary join is made to obtain anarray similar to that of the selection.

I A second join writes the results.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 13/25

Page 14: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Other operations

I Projection simply involves taking all the elements of eachrequired column and storing them in a new memorylocation.

I Multijoin is similar to the join. The difference is that,when performing the first join, we compare any additionalcolumns involved.

I Selfjoin is similar to the selection. The difference is thatinstead of checking constant values, it checks the values ofthe columns affected.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 14/25

Page 15: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Optimisations

I Duplicate Elimination. When a rule iteration isfinished, its result is sorted and the duplicates removed.

I Optimising projections. Executing a projection at theend of each join, allows us to discard unnecessary columnsearlier in the computation.

I Fusing operations. Operations are applied together to adata set in a single read of the data set, as opposed to oneoperation per read of the data set.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 15/25

Page 16: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Experimental Evaluation

I Hardware. Host platform: Intel Core 2 Quad CPU Q94002.66GHz (4 cores in total), Kingston RAM DDR2 6GB 800MHz. GPU platform: Fermi GeForce GTX 580 - 512 cores- 1536 MB GDDR5 memory.

I Software. Ubuntu 12.04.1 LTS 64bits. CUDA 5.0Production Release, gcc 4.5, g++ 4.5. YAP 6.3.3Development Version, Datalog 2.4, XSB 3.4.0.

I We compared our engine against XSB, YAP and MITREDatalog.

I We are at this stage interested in the performance benefitof using GPUs for the evaluation of Datalog queries, asopposed to using a CPU only.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 16/25

Page 17: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Join over four big tables

join(X,Z) :- table1(X), table2(X,4,Y), table3(Y,Z,Z),

table4(Y,Z).

join(X,Z)?

I Four tables, all with the same number of rows filled withrandom numbers, are joined together.

I It was used to test all the different operations of our engine.

I Our engine is roughly 200 times faster than YAP.

I Joins were the most costly operations with the Multijoinalone taking more than 70% of the total execution time.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 17/25

Page 18: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Join over four big tables results

1

10

100

1000

1 1.5 2 2.5 3 3.5 4 4.5 5

Tim

e(s

ec)

Size of each table in millions of rows

GPUDatalogYAP

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 18/25

Page 19: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Transitive closure of a graph

path(X,Y) :- edge(X,Y).

path(X,Z) :- edge(X,Y), path(Y,Z).

path(X,Y)?

I The edges of a graph are represented by a table with twocolumns filled with random numbers.

I The idea is to find all the nodes that can be reached if westart from a particular node.

I Our engine is 40 times faster than YAP.

I At first duplicate elimination was the most costlyoperation, but as the rows to process in each iterationdecreased, the join became the most costly operation.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 19/25

Page 20: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Transitive closure of a graph results

0.1

1

10

100

1 2 3 4 5 6 7 8 9 10

Tim

e(s

ec)

Size of each table in millions of rows

GPUDatalogYAP

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 20/25

Page 21: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Same-Generation program

sg(X,Y) :- flat(X,Y).

sg(X,Y) :- up(X,X1), sg(X1,Y1), down(Y1,Y).

sg(a,Y)?

I Three tables are created with the following equations:

up = {(a, bi)|iε[1, n]} ∪ {(bi, cj)|i, jε[1, n]}. (1)

flat = {(ci, dj)|i, jε[1, n]}. (2)

down = {(di, ej)|i, jε[1, n]} ∪ {(ei, f)|iε[1, n]}. (3)

I Very little gain in performance and our engine fails forn > 90 due to lack of memory.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 21/25

Page 22: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Same-Generation program results.

I Duplicate elimination takes more than 80% of the totaltime and is the cause of the memory problem.

I The reason is that the rule creates too many tuples, butmost are duplicates.

0

200

400

600

800

1000

1200

1400

1600

1800

25 30 35 40 45 50 55 60 65 70 75

Tim

e(m

sec)

Problem size(n)

GPUDatalogYAP

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 22/25

Page 23: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Related Work

I He et. al created GDB, a relational processing system forboth CPUs and GPUs. GDB has primitive operations andthe RA operators are built upon them.

I We modified the INLJ of GDB for our joins. We addedmultijoin and fused joins and projections.

I Diamos et. al developed relational operators for GPUswhich partition and process data in blocks usingalgorithmic skeletons.

I Their join algorithm was compared to that of GDB,showing 1.69 performance improvement.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 23/25

Page 24: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Conclusions

I Our engine performed very well but can be furtherimproved by:

I Extended syntax to accept built-in predicates and negation.

I Optimisations based on tabling or magic sets methods.

I Mixed processing of rules both on the GPU and on thehost multicore.

I Improved join operations to eliminate duplicates beforewriting final results.

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 24/25

Page 25: A Datalog Engine for GPUs - AG Programmiersprachen und ...A Datalog Engine for GPUs Carlos A. Mart nez Angeles camartinez@cinvestav.mx ... I CUDA a software platform used to program

Thank you

Carlos A. Mart́ınez Angeles Cinvestav-IPN A Datalog Engine for GPUs 25/25