all-pairs-shortest-paths for large graphs on the gpucis565/lectures/apsp.pdf• “parallel...
TRANSCRIPT
![Page 1: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/1.jpg)
All-Pairs-Shortest-Paths for Large Graphs on the GPU
Gary J Katz1,2, Joe Kider1
1University of Pennsylvania2Lockheed Martin IS&GS
![Page 2: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/2.jpg)
1
What Will We Cover?
• Quick overview of Transitive Closure and All-Pairs Shortest Path
• Uses for Transitive Closure and All-Pairs
• GPUs, What are they and why do we care?
• The GPU problem with performing Transitive Closure and All-Pairs….
• Solution, The Block Processing Method
• Memory formatting in global and shared memory
• Results
![Page 3: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/3.jpg)
2
Previous Work
• “A Blocked All-Pairs Shortest-Paths Algorithm”
Venkataraman et al.
• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph”
Bondhugula et al.
• “Accelerating large graph algorithms on the GPU using CUDA”
Harish
![Page 4: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/4.jpg)
3
NVIDIA GPU Architecture
Issues•No Access to main memory•Programmer needs to explicitly reference L1 shared cache•Can not synchronize multiprocessors•Compute cores are not as smart as CPUs, does not handle if statements well
![Page 5: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/5.jpg)
4
Background
•Some graph G with vertices V and edges E
•G= (V,E)
•For every pair of vertices u,v in V a shortest path from u to v, where the weight of a path is the sum of he weights of its edges
![Page 6: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/6.jpg)
5
Adjacency Matrix
![Page 7: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/7.jpg)
6
Quick Overview of Transitive Closure
The Transitive Closure of G is defined as the graph G* = (V, E*), whereE* = {(i,j) : there is a path from vertex i to vertex j in G}
-Introduction to Algorithms, T. Cormen
Simply Stated: The Transitive Closure of a graph is the list of edges for any vertices that can reach each other
1
2
3
4
5
8
6
7
Edges1, 52, 14, 24, 36, 38, 6
1
2
3
4
5
8
6
7
Edges1, 52, 14, 24, 36, 38, 62, 58, 37, 67, 3
![Page 8: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/8.jpg)
7 Design and Analysis of Algorithms - Chapter 8 7
WarshallWarshall’’ss algorithm: transitive algorithm: transitive closureclosure
•• Computes the transitive closure of a relationComputes the transitive closure of a relation•• (Alternatively: all paths in a directed graph)(Alternatively: all paths in a directed graph)
•• Example of transitive closure:1Example of transitive closure:13
42
1
0 0 1 01 0 0 10 0 0 00 1 0 0
0 0 1 01 1 11 1 10 0 0 011 1 1 11 1
3
42
1
![Page 9: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/9.jpg)
8 Design and Analysis of Algorithms - Chapter 8 8
WarshallWarshall’’ss algorithmalgorithm
•• Main idea: a path exists between two vertices i, j, Main idea: a path exists between two vertices i, j, iffiff••there is an edge from i to j; orthere is an edge from i to j; or••there is a path from i to j going through vertex 1; orthere is a path from i to j going through vertex 1; or••there is a path from i to j going through vertex 1 and/or 2; there is a path from i to j going through vertex 1 and/or 2; oror••……••there is a path from i to j going through vertex 1, 2, there is a path from i to j going through vertex 1, 2, ……and/or k; orand/or k; or••......••there is a path from i to j going through any of the other there is a path from i to j going through any of the other verticesvertices
![Page 10: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/10.jpg)
9 Design and Analysis of Algorithms - Chapter 8 9
Idea: dynamic programmingIdea: dynamic programming•• Let V={1, Let V={1, ……, n} and for , n} and for kk≤≤nn, , VVkk={1, ={1, ……, k}, k}•• For any pair of vertices i, For any pair of vertices i, jj∈∈VV, identify all paths from i to j , identify all paths from i to j
whose intermediate vertices are all drawn from whose intermediate vertices are all drawn from VVkk: : PPijijkk={p1, p2, ={p1, p2,
……}, if }, if PPijijkk≠∅≠∅ thenthen RRkk[i[i, j]=1, j]=1
•• For any pair of vertices i, j: For any pair of vertices i, j: RRnn[i[i, j], that is , j], that is RRnn
•• Starting with RStarting with R00=A, the adjacency matrix, how to get R=A, the adjacency matrix, how to get R1 1 ⇒⇒ ……⇒⇒ RRkk--11 ⇒⇒ RRkk ⇒⇒ …… ⇒⇒ RRnn
i jP1
Vk
WarshallWarshall’’ss algorithmalgorithm
p2
![Page 11: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/11.jpg)
10 Design and Analysis of Algorithms - Chapter 8 10
Idea: dynamic programmingIdea: dynamic programming•• pp∈∈PPijij
kk:: p is a path from i to j with all intermediate vertices p is a path from i to j with all intermediate vertices in in VVkk
•• If k is not on p, then p is also a path from i to j with all If k is not on p, then p is also a path from i to j with all intermediate vertices in Vintermediate vertices in Vkk--11: p: p∈∈PPijij
kk--1 1
WarshallWarshall’’ss algorithmalgorithm
i jpVk-1
Vkk
![Page 12: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/12.jpg)
11 Design and Analysis of Algorithms - Chapter 8 11
Idea: dynamic programmingIdea: dynamic programming•• pp∈∈PPijij
kk:: p is a path from i to j with all intermediate vertices p is a path from i to j with all intermediate vertices in in VVkk
•• If k is on p, then we break down p into pIf k is on p, then we break down p into p11 and pand p22 wherewhere–– pp11 is a path from i to k with all intermediate vertices in Vis a path from i to k with all intermediate vertices in Vkk--11
–– pp22 is a path from k to j with all intermediate vertices in Vis a path from k to j with all intermediate vertices in Vkk--11
WarshallWarshall’’ss algorithmalgorithm
i j
Vk-1
p
Vkk
p1p2
![Page 13: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/13.jpg)
12 Design and Analysis of Algorithms - Chapter 8 12
WarshallWarshall’’ss algorithmalgorithm
•• In theIn the kkthth stage determine if a path exists between two vertices stage determine if a path exists between two vertices i, j i, j using just vertices among 1, using just vertices among 1, ……, , k k
RR(k(k--1)1)[[i,ji,j] (path using just 1, ] (path using just 1, ……, , kk--1)1)RR(k)(k)[[i,ji,j] = or ] = or
((RR(k(k--1)1)[[i,ki,k] and ] and RR(k(k--1)1)[[k,jk,j]) (path from ]) (path from i i to to kkand fand from rom kk to to jj
using using just 1, just 1, ……, , kk--1)1)i
j
k
kth stage
{
![Page 14: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/14.jpg)
13
Quick Overview All-Pairs-Shortest-Path
Simply Stated: The All‐Pairs‐Shortest‐Path of a graph is the most optimal list of vertices connecting any two vertices that can reach each other
1
2
3
4
5
8
6
7
Paths1 → 52 → 14 → 24 → 36 → 38 → 62 → 1 → 58 → 6 → 37 → 8 → 67 → 8 → 6 → 3
The All-Pairs Shortest-Path of G is defined for every pair of vertices u,v E V as the shortest (least weight) path from u to v, where the weight of a path is the sum of the weights of its constituent edges.
-Introduction to Algorithms, T. Cormen
![Page 15: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/15.jpg)
14
Uses for Transitive Closure and All-Pairs
![Page 16: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/16.jpg)
15
Floyd-Warshall Algorithm
111
1111
1111
1111
87654321
87654321
Pass 1: Finds all connections that are connected through 1
1
2
3
4
5
8
6
7
1
1
1
Pass 6: Finds all connections that are connected through 6
1
Running Time = O(V3)
Pass 8: Finds all connections that are connected through 8
![Page 17: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/17.jpg)
16
Parallel Floyd-Warshall
There’s a short coming to this algorithm though…
Each Processing Element needs global access to memory
This can be an issue for GPUs
![Page 18: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/18.jpg)
17
The Question
How do we calculate the transitive closure on the GPU to:
1. Take advantage of shared memory
2. Accommodate data sizes that do not fit in memory
Can we perform partial processing
of the data?
Can we perform partial processing
of the data?
![Page 19: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/19.jpg)
18
Block Processing of Floyd-WarshallMulti-core
SharedMemory
Multi-core
SharedMemory
Multi-core
SharedMemory
GPU
Multi-core
SharedMemory
Multi-core
SharedMemory
Multi-core
SharedMemory
GPU
Data Matrix
Organizationalstructure for block
processing?
Organizationalstructure for block
processing?
![Page 20: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/20.jpg)
19
Block Processing of Floyd-Warshall
111
1111
1111
1111
87654321
87654321
![Page 21: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/21.jpg)
20
Block Processing of Floyd-Warshall
1
11
111
11
N = 4
4
3
2
1
4321
![Page 22: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/22.jpg)
21
Block Processing of Floyd-Warshall
111
1111
1111
1111
87654321
87654321
[i,j] [i,k] [k,j](5,1) ‐> (5,1) & (1,1)(8,1) ‐> (8,1) & (1,1)(5,4) ‐> (5,1) & (1,4)(8,4) ‐> (8,1) & (1,4)
W[i,j] = W[i,j] | (W[i,k] && W[k,j])
K = 4
K = 1
[i,j] [i,k] [k,j](5,1) ‐> (5,4) & (4,1)(8,1) ‐> (8,4) & (4,1)(5,4) ‐> (5,4) & (4,4)(8,4) ‐> (8,4) & (4,4)
For each pass, k, the cells retrieved must be processed to at least k‐1
![Page 23: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/23.jpg)
22
Block Processing of Floyd-Warshall
111
1111
1111
1111
87654321
87654321
W[i,j] = W[i,j] | (W[i,k] && W[k,j])
Putting it all TogetherProcessing K = [1‐4]
Pass 1: i = [1‐4], j = [1‐4]
Pass 2: i = [5‐8], j = [1‐4]i = [1‐4], j = [5‐8]
Pass 3:i = [5‐8], j = [5‐8]
![Page 24: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/24.jpg)
23
Block Processing of Floyd-Warshall
11
1
11
11
8
7
6
5
N = 8
Computing k = [5‐8]
Range:i = [5,8]j = [5,8]k = [5,8]
8765
![Page 25: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/25.jpg)
24
Block Processing of Floyd-Warshall
111
1111
1111
1111
87654321 Putting it all TogetherProcessing K = [5‐8]
Pass 1: i = [5‐8], j = [5‐8]
Pass 2: i = [5‐8], j = [1‐4]i = [1‐4], j = [5‐8]
Pass 3:i = [1‐4], j = [1‐4]
Transitive ClosureIs complete for k = [1‐8]
Transitive ClosureIs complete for k = [1‐8]
W[i,j] = W[i,j] | (W[i,k] && W[k,j])
87654321
![Page 26: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/26.jpg)
25
Increasing the Number of Blocks
Pass 1
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 27: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/27.jpg)
26
Increasing the Number of Blocks
Pass 2
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 28: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/28.jpg)
27
Increasing the Number of Blocks
Pass 3
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 29: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/29.jpg)
28
Increasing the Number of Blocks
Pass 4
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 30: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/30.jpg)
29
Increasing the Number of Blocks
Pass 5
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 31: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/31.jpg)
30
Increasing the Number of Blocks
Pass 6
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 32: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/32.jpg)
31
Increasing the Number of Blocks
Pass 7
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 33: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/33.jpg)
32
Increasing the Number of Blocks
Pass 8
In Total:N Passes3 sub‐passes per pass
Primary blocks are along the diagonal
Secondary blocks are the rows and columns of the primary block
Tertiary blocks are all remaining blocks
![Page 34: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/34.jpg)
33
Running it on the GPU
• Using CUDA
• Written by NVIDIA to access GPU as a parallel processor
• Do not need to use graphics APIGrid Dimension
{Block Dimension
• Memory Indexing
• CUDA Provides
• Grid Dimension
• Block Dimension
• Block Id
• Thread Id
Block Id
Thread Id
![Page 35: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/35.jpg)
34
Partial Memory Indexing
SP2
SP3
0
1
N ‐ 1
N ‐ 1
N ‐
1
1
1SP1
![Page 36: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/36.jpg)
35
Memory Format for All-Pairs Solution
All-Pairs requires twice the memory footprint of Transitive ClosureConnecting
NodeDistance
0 1
0 1 1 2
0 1 0 1
0 1
8 3 8 2 0 1
6 2 0 1
2N
N
1
2
3
4
5
6
78
1 2 3 4 5 6 7 8
1
2
3
4
5
8
6
7
7 38 6
Shortest Path
![Page 37: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/37.jpg)
36
Results
SM cache efficient GPU implementation compared to standard GPU implementation
![Page 38: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/38.jpg)
37
Results
SM cache efficient GPU implementation compared to standard CPU implementation and cache-efficient
CPU implementation
![Page 39: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/39.jpg)
38
Results
SM cache efficient GPU implementation compared to best variant of Han et al.’s tuned code
![Page 40: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/40.jpg)
39
Conclusion
•Advantages of Algorithm
• Relatively Easy to Implement
• Cheap Hardware
• Much Faster than standard CPU version
• Can work for any data size
Special thanks to NVIDIA for supporting our research
![Page 41: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/41.jpg)
40
Backup
![Page 42: All-Pairs-Shortest-Paths for Large Graphs on the GPUcis565/LECTURES/APSP.pdf• “Parallel FPGA-based All-Pairs Shortest Path in a Diverted Graph” Bondhugula et al. • “Accelerating](https://reader033.vdocuments.us/reader033/viewer/2022060421/5f182f6715fb83761d20d820/html5/thumbnails/42.jpg)
41
CUDA
•CompUte Driver Architecture
•Extension of C
•Automatically creates thousands of threads to run on a graphics card
•Used to create non-graphical applications
•Pros:
• Allows user to design algorithms that will run in parallel
• Easy to learn, extension of C
• Has CPU version, implemented by kicking off threads
•Cons:
• Low level, C like language
• Requires understanding of GPU architecture to fully exploit
gcc / cl
G80 SASSfoo.sass
OCG
cudaccEDG C/C++ frontend
Open64 Global OptimizerGPU Assembly
foo.sCPU Host Code
foo.cpp
Integrated source(foo.cu)