difficult problems in high-performance computation on large graphs

22
Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 1 Difficult Problems in High-Performance Computation on Large Graphs John R. Gilbert University of California at Santa Barbara DOE / DOD Workshop on Emerging High Performance Architectures and Applications November 29, 2007

Upload: others

Post on 03-Feb-2022

0 views

Category:

Documents


0 download

TRANSCRIPT

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 1

Difficult Problems in High-Performance Computation on Large Graphs

John R. GilbertUniversity of California at Santa Barbara

DOE / DOD Workshop on Emerging High Performance Architectures and Applications

November 29, 2007

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 2

Graphs

• Very large graphs appear in many HPC applications.

• Indeed, large graph applications are rapidly becoming more and more common.

• Computational biology, informatics and analytics, web search, network theory, dynamical systems, sparse matrix computation, geometric modeling, ….

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 3

Graph view and matrix view

1 52 3 4 6 71

5

234

67

1 2

6

3 4

7

5

1

5

2

3

4

1

5

2

3

4

7

6

7

6

Adjacency matrix

Bipartite graphDirected

graph

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 4

Kernel: Sort permuted triangular matrix

• Used in sparse linear solvers (e.g. Matlab’s)

• Simple kernel abstracts many other graph operations (see next)

• Sequential: linear time; greedy topological sort; no locality

• Parallel: very unbalanced; one DAG level per step; possible long sequential dependencies

Original matrix Permuted to upper triangular form

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 5

Graph k-core

• Delete all vertices of degree less than k

• Repeat until no such vertices remain

• Used (originally in biological applications) to find “essential” or “strongly related” subgraphs of a graph

• Triangular matrix algorithm is 2-core of a bipartite graph

• k-core has similar issues in parallel

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 6

Matching in bipartite graph

• Perfect matching: set of edges that hits each vertex exactly once

• Matrix permutation to put nonzeros on diagonal

• Variant: Maximum-weight matching

1 52 3 41

5

234

A

1

5

2

3

4

1

5

2

3

4

1 52 3 44

2

531

PA

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 7

1 52 4 7 3 61

5

247

36

Strongly connected components

• Symmetric permutation to block triangular form

• Diagonal blocks are strong Hall (irreducible / strongly connected)

• Sequential: linear time by depth-first search [Tarjan]

• Parallel: divide & conquer algorithm, performance depends on input

[Fleischer, Hendrickson, Pinar]

1 2

3

4 7

6

5

PAPT G(A)

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 8

Strongly connected components

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 9

Dulmage-Mendelsohn decomposition

1

5

234

678

12

91011

1 52 3 4 6 7 8 9 10 111

2

5

3

4

7

6

10

8

9

12

11

1

2

3

5

4

7

6

9

8

11

10

HR

SR

VR

HC

SC

VC

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 10

Applications of D-M decomposition

• Permutation to block triangular form for Ax=b

• Connected components of undirected graphs

• Strongly connected components of directed graphs

• Minimum-size vertex cover of bipartite graphs

• Extracting vertex separators from edge cuts for arbitrary graphs

• For strong Hall matrices, several upper bounds in nonzero structure prediction are best possible

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 11

Graph partitioning

• Graph partitioning heuristics have been studied for many years, often motivated by partitioning for parallel computation.

• Best results (and best theory) for graphs from PDE problems.

• Some approaches:– Iterative swapping (Kernighan-Lin, Fiduccia-Matheysses)– Spectral partitioning (eigenvectors of graph Laplacian)– Geometric partitioning (for meshes with coordinates)– Breadth-first search (fast but poor performance)

• Modern codes (Metis, Chaco) use multilevel iterative swapping.

• Parallel versions exist (e.g. ParMetis) but don’t work as well.

• Partitioning for non-PDE problems is poorly understood in general.

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 12

Multilevel partitioning sketch

(N+,N- ) = Multilevel_Partition( N, E )… recursive partitioning routine returns N+ and N- where N = N+ U N-if |N| is small

(1) Partition G = (N,E) directly to get N = N+ U N-Return (N+, N- )

else(2) Coarsen G to get an approximation Gc = (Nc, Ec)(3) (Nc+ , Nc- ) = Multilevel_Partition( Nc, Ec )(4) Expand (Nc+ , Nc- ) to a partition (N+ , N- ) of N(5) Improve the partition ( N+ , N- )

Return ( N+ , N- )endif

“V - cycle:”(2,3)

(2,3)

(2,3)

(1)

(4)

(4)

(4)

(5)

(5)

(5)

Slide courtesy of Kathy Yelick

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 13

+

Symbolic factorization for LU

• Add fill edge a -> b if there is a path from a to b through lower-numbered vertices.

1 2

3

4 7

6

5

A G (A) L+U

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 14

Some other key graph kernels

• Graph contraction

• Connected components

• s-t connectivity

• Shortest paths

• Subgraph isomorphism

Many studied by Berry, Hendrickson, and others on MTA architecture

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 15

Sparse Matrix times Sparse Matrix

• A primitive in many array-based graph algorithms:– Parallel breadth-first search

– Shortest paths

– Graph contraction

– Subgraph / submatrix indexing

– Etc.

• Graphs are often not mesh-like, i.e. geometric locality and good separators.

• Do not want to optimize for one repeated operation, as in matvec for iterative methods

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 16

* =I

J

A(I,K)

K

K

B(K,J)

C(I,J)

ParSpGEMM

C(I,J) += A(I,K)*B(K,J) • Based on SUMMA

• Simple for non-square matrices, etc.

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 17

Toolbox for Graph Analysis and Pattern Discovery

Layer 1: Graph Theoretic Tools

• Graph operations

• Global structure of graphs

• Graph partitioning and clustering

• Graph generators

• Visualization and graphics

• Scan and combining operations

• Utilities

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 18

Application stack for combinatorics + numerics

Distributed Sparse ArraysArithmetic, matrix multiplication, indexing, solvers (\, eigs)

Graph Analysis & PD Toolbox

Graph querying & manipulation, connectivity, spanning trees,

geometric partitioning, nested dissection, NNMF, . . .

Preconditioned Iterative Methods

CG, BiCGStab, etc. + combinatorial preconditioners (AMG, Vaidya)

ApplicationsComputational ecology, CFD, data exploration

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 19

Landscape connectivity modeling

• Habitat quality, gene flow, corridor identification, conservation planning

• Pumas in southern California: 12 million nodes, < 1 hour

• Targeting larger problems: Yellowstone-to-Yukon corridor

Figures courtesy of Brad McRae, NCEAS

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 20

Issues in (many) large graph applications

• Multiple simultaneous queries to same graph– Graph may be fixed, or slowly changing– Throughput and response time both important

• Dynamic subsetting– User needs to solve problem on “their own” version

of the main graph– E.g. landscape data masked by geographic location,

filtered by obstruction type, resolved by species of interest

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 21

Productivity

Raw performance isn’t always the only criterion. Other factors include:

• Seamless scaling from desktop to HPC

• Interactive response for data exploration and viz

• Rapid prototyping

• Just plain programmability

Emerging HP Architectures & Applications – 29 Nov 2007 -- Gilbert -- 22

Some approaches to HPC graph programming

• Visitor-based multithreaded – MTGL + XMT + search templates natural for many algorithms+ relatively simple load balancing– complex thread interactions, race conditions

• Array-based data parallel – GAPDT + parallel Matlab+ relatively simple control structure+ user-friendly interface– some algorithms hard to express naturally– load balancing not as simple

• Scan-based vectorized – NESL: something of a wild card

• We don’t really know the right set of primitives yet!