firehose,pagerank and nvgraph - clsac · 2019-11-03 · firehose,pagerank and nvgraph gpu...

41
FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D.

Upload: others

Post on 14-Jul-2020

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

FIREHOSE,PAGERANK AND NVGRAPHGPU ACCELERATED ANALYTICSCHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016Joe Eaton Ph.D.

Page 2: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

2

Agenda

Accelerated Computing

FireHose

PageRank end-to-end

nvGRAPH

Coming Soon

Conclusion

Page 3: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

3CPU

GPU Accelerator

ACCELERATED COMPUTING10x Performance & 5x Energy Efficiency

# GPU Developers

Page 4: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

4

PERFORMANCE GAP CONTINUES TO GROW

0

100

200

300

400

500

600

700

800

2008 2011 2012 2014 2016

Peak Memory Bandwidth

NVIDIA GPU x86 CPU

M2090

M1060

K20

K80

Pascal

GB/s

0.0

0.5

1.0

1.5

2.0

2.5

3.0

3.5

4.0

4.5

5.0

5.5

2008 2010 2012 2014 2016

Peak Double Precision FLOPS

NVIDIA GPU x86 CPU

M1060

K20

GFLOPS

K80

Pascal

M2090

Page 5: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

5

FIREHOSE BENCHMARKGPU Processing of Streaming Data

Volume and Velocity of some big data tasks do not allow for store & analyze.

Strong need to analyze on-the-fly, continuous stream of data, without trip to disk first.

Applications in Network monitoring - Social and Cyber – are growing in importance

FirehoseSuite of tasks measuring best-effort processingof UDP packets at high data rates.

Compare processing software and hardwareQuantitive & Qualitative

Page 6: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

6

FIREHOSE BENCHMARKGPU Processing of Streaming Data

Firehose PartsGenerator streams UDP packets– not throttled by Analyzer- only a few ops per datum

Analyzer may not be able to keep up, measure success rate

3 Firehose Benchmarks

Ø Power-law anomaly detectionØ Active power-law anomaly detectionØ Two-level anomaly detection

Page 7: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

7

FIREHOSE CUDA IMPLEMENTATIONAnalytics are implemented as pThreads + CUDA apps

N worker threads for each UDP socket

Threads read data and insert into double-buffered queue.

When buffer filled it is sent to one of K GPUs for processing

Page 8: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

8

FIREHOSE PROCESSING RATE

0.0

10.0

20.0

30.0

40.0

50.0

60.0

Anomaly 1 Anomaly 2 Anomaly 3CPU 10.0 3.4 1.54xMaxwell 58 44 331xPascal 21 18 15

M p

acke

ts/s

econ

d

Processing Rate

6X

CPU: Dual Xeon, 16 core X5690

13X

22X

Page 9: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

9

PAGERANK PIPELINE BENCHMARKGraph Analytics Benchmark

Proposed by MIT LL.

Apply supercomputing benchmarking methods to create scalable benchmark for big data workloads.

Four different phases that focus on data ingest and analytic processing.

Reference code for serial implementations available on GitHub.https://github.com/NVIDIA/PRBench

Page 10: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

10

PAGERANK PIPELINE BENCHMARK4 Stage Graph Analytics Benchmark

Stage 1 – Generate graph (not timed)

Stage 2 – Read graph from disk, sort edges, write back to disk

Stage 3 – Read sorted edge list, generate normalized adjacency matrix for graph

Stage 4 – Run 20 iterations of Pagerank algorithm (power method)

Stage 2 tests I/O

Stage 3 tests I/O + compute

Stage 4 tests compute

Page 11: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

11

SPEEDUP VS REFERENCE C++

Scale K0(99%) K1(90%) K2(80%) K3(0%)16 2.8x 1.1x 4.6x 5.9x17 2.6x 1.3x 5.0x 10.3x18 2.9x 1.3x 6.3x 14.1x19 3.2x 1.5x 7.6x 14.0x20 3.3x 1.5x 8.7x 11.8x21 3.2x 1.5x 9.2x 9.8x22 3.4x 1.5x 9.9x 9.8x

Page 12: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

12

PAGERANK PIPELINE RESULTSSingle P100 GPU

STAGE 2 STAGE 3 STAGE 4

GPU Speed Up

GPU

GPU: Tesla P100

CPU: Intel Xeon E5 2690 3.0 GHz

Speed up versus MIT LL reference C++ implementation

1.8X

11X

44X

Page 13: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

13

PAGERANK PIPELINE RESULTSComparing Across GPUs

2.45 3.23

9.88

37.9

0

1000

2000

3000

4000

5000

6000

7000

0

5

10

15

20

25

30

35

40

K40 M40 P100 DGX-1

Axi

s Ti

tle

Axi

s Ti

tle

Axis TitleGTEPS Aggregate Memory BW

Stage 4 – Pagerank computationMemory bandwidth

most important

GTEPS = billions edges/sec

# edges = E * 2S

E = 16 S = 21

CPU

Page 14: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

14

1.00E+06

1.00E+07

1.00E+08

1.00E+09

1.00E+10

1.00E+11

21 22 23 24 25 26 27 28 29

Edges/second

DataScale

PageRankKernelWeakScaling

K0K1K2K3

1GPU128GPU

256GPU512GPU

8GPU

Page 15: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

DSSD+NVIDIAPageRankResults

KeyTakeaways• Outofthebox(nochangetotestcode)D5BLKspeediscomparabletoRAMDisk

• UsingAPI(minimalcodechange)D5is2.7xfasterthanRAMDisk• D5Advantages:

• Device speed• Direct-to-device API• Direct transfer to GPU

• Shared between machines• High Capacity

Page 16: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

DSSD+NVIDIAPageRankResults

21.6036

6.6846

1.9719

1.5913

4.2278

0 5 10 15 20 25

HDD

DSSDBLKDEV

DSSDAPI

DSSDAPIGPUDirect

RAMDisk

CompleteRuntime

timeinseconds

Date: 4Aug201618:32GMTHost: shadedDSSD/NVIDIAmachine

Test:mpirun -np1--bind-tonone./prbench -S21-E16–no-rcache

2.1x fasterthanRAMDisk,11x fasterthanHDD

3.2x fasterthanHDD

2.7x fasterthanRAMDisk,13.6x fasterthanHDD

Page 17: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

DSSD+NVIDIAPageRankResults

timeinseconds

Date: 4Aug201618:32GMTHost: shadedDSSD/NVIDIAmachine

Test:mpirun -np1--bind-tonone./prbench -S21-E16–no-rcache

1.2x fasterthanRAMDisk,11.5x fasterthanHDD

2.9x fasterthanHDD

2.3x fasterthanRAMDisk,22.9x fasterthanHDD

Page 18: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

DSSD+NVIDIAPageRankResults

timeinseconds

Date: 4Aug201618:32GMTHost: shadedDSSD/NVIDIAmachine

Test:mpirun -np1--bind-tonone./prbench -S21-E16–no-rcache

4.2x fasterthanRAMDisk,18.2x fasterthanHDD

3.9x fasterthanHDD,and90% speedofRAMDisk

4.2x fasterthanRAMDisk,18.2x fasterthanHDD

Page 19: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

19

GRAPHS ARE FUNDAMENTALTight connection between data and graphs

Data View Graph ViewData Element/ Entity Graph VertexEntity Attributes Vertex labelsBinary Relation (1 to 1) Graph EdgeN-ary Relation (many to 1) Hypergraph edgeRelation Attributes Edge labelsGroup of relations over entities Sets of Vertices and Edges

Page 20: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

20

NVGRAPHEasy Onramp to GPU Accelerated Graph Analytics

GPU Optimized Algorithms

Reduced cost & Increased performance

Standard formats and primitivesSemi-rings, load-balancing

Performance Constantly Improving

Page 21: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

21

nvGRAPHAccelerated Graph Analytics

nvGRAPH for high performance graph analytics

Deliver results up to 3x faster than CPU-only

Solve graphs with up to 2 Billion edges on a single GPU (M40)

Accelerates a wide range of graph analytics applications:

developer.nvidia.com/nvgraph

PageRank Single Source Shortest Path

Single Source Widest Path

Search Robotic Path Planning IP Routing

Recommendation Engines Power Network Planning Chip Design / EDA

Social Ad Placement Logistics & Supply Chain Planning

Traffic sensitive routing

0

1

2

3

Iter

atio

ns/s

nvGRAPH: 3.4x Speedup

2x12 Core Xeon E5 v2

nvGRAPH on P100

PageRank on Twitter 1.5B edge dataset

nvGraph on P100

GraphMat on 2 socket 12-core Xeon E5-2697 v2 CPU,@ 2.70 GHz

Page 22: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

22

Motivating examplePower law graph: wiki2003.bin

Cusparse csrmv time: 8.05 ms Merge Path csrmv time: 1.08 ms

455,436 vertices (n)

2,033,173 edges (nnz)

sparsity = 4.464234

~7.45x faster!

PSG Cluster, K40

Page 23: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

23

SEMI-RINGSDefinition / Axioms

Set R with two binary operators: + and * that satisfy:

1. (R, +) is associative, commutative with additive identity 0

(0 + a = a)

2. (R, *) is associative with multiplicative identity 1

(1 * a = a)

3. Left and Right multiplication is distributive over addition

4. Additive identity 0 = multiplicative null operator

(0 * a = a * 0 = 0)

Page 24: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

24

SEMI-RINGSExamples

SEMIRING SET PLUS TIMES 0 1

Real ℝ + * 0 1

MinPlus ℝ ∪ {−∞,∞} min + ∞ 0

MaxMin ℝ ∪ {−∞,∞} max min -∞ ∞

Boolean {0, 1} ∨ ∧ 0 1

Page 25: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

25

APPLICATIONSPagerank (+, *)

//sw/gpgpu/naga/src/pagerank.cpp

• Ideal application: runs on web and social graphs

• Each iteration involves computing: 𝑦 = 𝐴𝑥

• Standard csrmv

• PlusTimes Semiring

• α = 1.0 (multiplicative identity)

• β = 0.0 (multiplicative nullity)

Page 26: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

26

APPLICATIONSSingle Source Shortest Path (min, +)

Common Usage Examples:

• Path-finding algorithms:• Navigation• Modeling• Communications Network

• Breadth first search building block

• Graph 500 Benchmark

0

11

2

1

1

2

22

2

1

3

2

3

2

2 3

2

Page 27: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

27

APPLICATIONSWidest Path (max,min)

Common Usage Examples:

• Maximum bipartite graph matching

• Graph partitioning

• Minimum cut

• Common application areas:• power grids• chip circuits

Page 28: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

28

PROPERTY GRAPHSMany simple graphs overlaid

Page 29: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

29

SUBGRAPH EXTRACTIONFocus on a specific area

Page 30: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

30

COMING SOON

Partitioning

Clustering

BFS

Graph Contraction

Features in next release

Page 31: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

31

PARTITIONING AND CLUSTERING

Spectral Min Edge Cut Partition

Page 32: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

32

0

5

10

15

20

25

20 21 22 23

GTEP

S

Scale

cuMPI NVSHMEM

BREADTH FIRST SEARCHKeysubroutineinseveralgraphalgorithms,naturallyleadstorandomaccess

MPIVersionimplementations:packoruseabitmaptoexchangefrontieratendofeachstep

NVSHMEMversion:directlyupdatesthefrontiermapattargetusingatomics

Benefitswithsmallergraphs(likelybehaviorwithstrongscaling)4 P100 GPUs alltoall connected with NVLink

1x4 Process Grid 2x2 Process Grid

0

5

10

15

20

25

20 21 22 23

GTEP

SScale

cuMPI NVSHMEM

41%33%

20% 13%

25%18%

8%

Page 33: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

33

Graph Contraction

Page 34: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

34

Graph Contraction

Page 35: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

35

DYNAMIC GRAPHScuSTINGER brings STINGER to GPUs

Oded Green presented at HPEC 2016

cuSTINGER: Supporting Dynamic Graph Algorithms for GPUs

https://www.researchgate.net/publication/308174457

Page 36: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

36

GRAPHBLAS

nvGRAPH is working toward a full GraphBLAS implementation

Semi-rings are a start

Page 37: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

37

TOWARD REAL TIME BIG DATA ANALYTICSGPUs enable the next generation of in-memory processing

DUAL BROADWELLSERVER

NVIDIA DGX-1 SERVER

GPU PERFORMANCE INCREASE

Aggregate Memory Bandwidth 150 GB/s 5760 GB/s 38 X

Aggregate SP FLOPS 4 TF 85 TF 21 X

Single DGX-1 server provides the computecapability of dozens of dual-cpu servers

Page 38: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

38

ACCELERATED DATABASE TECHNOLOGYBig data ISVs moving to the accelerated model

SQL

No SQL

Graph

Page 39: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

39

SUMMARYGPUs for High Performance Data Analytics

GPU computing is not just for scientists anymore!

GPUs excel at in-memory analytics.Streaming – FirehoseGraph – Pagerank PipelineAnalytics - nvGRAPH

Savings in infrastructure size & cost by using GPU servers versus standard dual-CPU server.Database In-Memory performance 20-100x at practical scale

Page 40: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda

40

REFERENCES

GPU Processing of Streaming Data: A CUDA implementation of the Firehose benchmarkMauro Bisson, Massimo Bernaschi, Massimiliano Fatica IEEE HPEC 2016NVIDIA corporation & Italian Research Council

A CUDA Implementation of the Pagerank Pipeline BenchmarkMauro Bisson, Everett Phillips, Massimiliano Fatica IEEE HPEC 2016NVIDIA corporation

Page 41: FIREHOSE,PAGERANK AND NVGRAPH - CLSAC · 2019-11-03 · FIREHOSE,PAGERANK AND NVGRAPH GPU ACCELERATED ANALYTICS CHESAPEAKE LARGE SCALE DATA ANALYTICS, OCT 2016 Joe Eaton Ph.D. 2 Agenda