vertica relational database graph analytics using malú ...ssalihog/courses/cs848... · malú...
TRANSCRIPT
![Page 1: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/1.jpg)
Graph Analytics using Vertica Relational Database
Alekh Jindal - Samuel Madden, Malú Castellanos - Meichun Hsu
1
![Page 2: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/2.jpg)
Introduction● High demand for graph analytics
● Popularity of distributed graph computing systems○ Vertex-centric systems: Pregel, Giraph, GraphLab
● Question:
Are traditional relational database systems not good enough for graph analytics?
2
![Page 3: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/3.jpg)
IntroductionLimitations of distributed graph computing systems:
● Data is initially collected and stored in a relational database● Graph processing is slow for very large graphs
○ Users have to choose a subgraph to run the algorithm ● Preparation might include operations that relational databases are
optimized for.○ Pre-processing or post-processing
● Some graph algorithms compute aggregates over a large neighbourhood ○ Hard to express in vertex-centric systems
3
![Page 4: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/4.jpg)
Goal● Show how vertex-centric graph processing can be translated,
optimized and run on Vertica ○ SSSP, PageRank, Connected Components
● Compare Performance with two vertex-centric distributed systems for graph analysis (Giraph and GraphLab)
● Vertica → Enterprise column-store database management system○ Supports parallel processing
4
![Page 5: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/5.jpg)
Vertex-Centric Model● The user provides a vertex.compute function (UDF):
○ The UDF will be executed at each node.
○ Will update the node’s state.
○ And communicate the changes to the neighbours.
5
![Page 6: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/6.jpg)
Giraph Physical Plan1. Input Superstep: Workers reading
the data, building “Server Data stores”
2. Intermediate step: Run UDF, shuffle messages, wait for everyone, synchronize.
3. Output Superstep: Produce the output.
6
![Page 7: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/7.jpg)
Giraph Logical PlanSame query plan but in relational logic:
1. V join E2. (V join E) join M: messages from
previous superstep3. Run UDF4. Produce new state for vertex (V’)
and messages for the next superstep (M’).
7
![Page 8: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/8.jpg)
Overview● Translation to SQL queries
● Query Optimization
● Query Execution
● Extending Vertica
8
![Page 9: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/9.jpg)
Translation to SQL1) Eliminate the message table
9
![Page 10: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/10.jpg)
Translation to SQL2) Translate vertex compute function
Logical plan
10
![Page 11: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/11.jpg)
Query Optimizations1) Update Vs. Replace
11
![Page 12: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/12.jpg)
Query Optimizations2) Incremental Evaluation
12
![Page 13: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/13.jpg)
Query Optimizations2) Join Elimination
Join Elimination in PageRank
13
![Page 14: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/14.jpg)
Query Execution● Physical Design
○ Encoding and compression, sort orders, multiple table projections
● Join Optimization○ Join directly over compressed data, choose from hash join and merge join
● Query Pipelining○ Avoids materializing intermediate output and repeated access to disk
● Intra-query Parallelism○ Process subgraphs in parallel across cpu cores using GroupBy
14
![Page 15: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/15.jpg)
Query Execution Plan of SSSPDifferent from Giraph execution pipeline:
1. Filter unnecessary tuples as early as possible.
2. Fully pipelines the execution flow.
3. Picks the best join execution strategy.
15
![Page 16: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/16.jpg)
Extending Vertica● Running unmodified vertex programs
○ As table UDFs without translating to relational operators
16
![Page 17: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/17.jpg)
Extending Vertica● Avoiding Intermediate Disk I/O
○ Load and store graph in shared memory, higher memory footprint
17
![Page 18: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/18.jpg)
ExperimentsSetup:
● Cluster of 4 machines● 48 GB memory● 1.4 TB Disk
Dataset:
18
![Page 19: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/19.jpg)
ExperimentsData Preparation:
Runtime:
19
![Page 20: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/20.jpg)
ExperimentsMemory Usage (PageRank):
20
![Page 21: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/21.jpg)
ExperimentsIn memory Graph Analysis:
21
![Page 22: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/22.jpg)
ExperimentsMixed Graph and Relational Analysis :
More Complicated Graph Processing:
22
![Page 23: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/23.jpg)
Conclusion● Vertica can be tuned to offer good end-to-end performance on graph
queries (because it is optimized for scans, joins and aggregates).
● Users can trade memory with reduced I/O cost in iterative graph analysis.
● Relational databases can combine graph processing with relational analysis as pre-processing or post-processing steps.
● Features of relational databases can be combined with graph processing systems and it might be a good idea to stitch these systems together.
23
![Page 24: Vertica Relational Database Graph Analytics using Malú ...ssalihog/courses/cs848... · Malú Castellanos - Meichun Hsu 1. Introduction High demand for graph analytics Popularity](https://reader033.vdocuments.us/reader033/viewer/2022060321/5f0d27e87e708231d438f31d/html5/thumbnails/24.jpg)
Thank you for your attention.
24