sparse matrices and combinatorial algorithms in star-p status and plans april 8, 2005
Post on 21-Dec-2015
223 views
TRANSCRIPT
![Page 1: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/1.jpg)
Sparse Matrices and
Combinatorial Algorithms in Star-P
Status and PlansApril 8, 2005
![Page 2: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/2.jpg)
Matlab’s sparse matrix design principles
All operations should give the same results for sparse and full matrices (almost all)
Sparse matrices are never created automatically, but once created they propagate
Performance is important -- but usability, simplicity, completeness, and robustness are more important
Storage for a sparse matrix should be O(nonzeros)
Time for a sparse operation should be O(flops)(as nearly as possible)
![Page 3: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/3.jpg)
Matlab’s sparse matrix design principles
All operations should give the same results for sparse and full matrices (almost all)
Sparse matrices are never created automatically, but once created they propagate
Performance is important -- but usability, simplicity, completeness, and robustness are more important
Storage for a sparse matrix should be O(nonzeros)
Time for a sparse operation should be O(flops)(as nearly as possible)
Star-P dsparse matrices: same principles, some different tradeoffs
![Page 4: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/4.jpg)
Sparse matrix operations
dsparse layout, same semantics as ddense
For now, only row distribution
Matrix operators: +, -, max, etc.
Matrix indexing and concatenation A (1:3, [4 5 2]) = [ B(:, 7) C ] ;
A \ b by direct methods
![Page 5: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/5.jpg)
Star-P sparse data structure
• Full: • 2-dimensional array of real or
complex numbers
• (nrows*ncols) memory
31 0 53
0 59 0
41 26 0
31 53 59 41 26
1 3 2 1 2
• Sparse:
• compressed row storage
• about (2*nzs + nrows) memory
![Page 6: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/6.jpg)
P0
P1
P2
Pn
5941 532631
23 131
Each processor stores:• # of local nonzeros• range of local rows• nonzeros in CSR form
Star-P distributed sparse data structure
![Page 7: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/7.jpg)
Sorting in Star-P
[V, perm] = sort (V)
Useful primitive for some sparse array algorithms, e.g. the sparse constructor
Star-P uses a parallel sample sort
![Page 8: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/8.jpg)
The sparse( ) constructor
A = sparse (I, J, V, nr, nc)
Input: ddense vectors I, J, V, and dimensions nr, nc
A(I(k), J(k)) = V(k) for each k
Sort triples (i, j, v) by (i, j)
Adjust for desired row distribution
Locally convert to compressed row indices Sum values with duplicate indices
![Page 9: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/9.jpg)
Sparse matrix times dense vector
y = A * x
The first call to matvec caches a communication schedule for matrix A. Later calls to multiply any vector by A use the cached schedule.
Communication and computation overlap.
![Page 10: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/10.jpg)
Sparse linear systems
x = A \ b
Matrix division uses MPI-based direct solvers: SuperLU_dist: nonsymmetric static pivoting MUMPS: nonsymmetric multifrontal PSPASES: Cholesky
ppsetoption(’SparseDirectSolver’,’SUPERLU’)
Iterative solvers implemented in Star-P
Ongoing work on preconditioners
![Page 11: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/11.jpg)
Sparse eigenvalue solvers
[V, D] = eigs(A); etc.
Implemented via MPI-based PARPACK [Lehoucq, Maschhoff, Sorensen, Yang]
Work in progress
![Page 12: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/12.jpg)
Combinatorial Scientific Computing
Sparse matrix methods; machine learning; search and information retrieval; adaptive multilevel modeling; geometric meshing; computational biology; . . .
How will combinatorial methods be used by people who don’t understand them in complete detail?
![Page 13: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/13.jpg)
Analogy: Matrix division in Matlab
x = A \ b;
Works for either full or sparse A
Is A square?no => use QR to solve least squares problem
Is A triangular or permuted triangular?yes => sparse triangular solve
Is A symmetric with positive diagonal elements?yes => attempt Cholesky after symmetric minimum degree
Otherwise => use LU on A(:, colamd(A))
![Page 14: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/14.jpg)
Combinatorics in Star-P
A sparse matrix language is a good start on primitives for combinatorial computing.
Random-access indexing: A(i,j)
Neighbor sequencing: find (A(i,:))
Sparse table construction: sparse (I, J, V)
Other primitives we are building: sorting, searching, pointer-jumping, connectivity and paths, . . .
![Page 15: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/15.jpg)
Matching and depth-first search in Matlab
dmperm: Dulmage-Mendelsohn decomposition
Square, full rank A: [p, q, r] = dmperm(A); A(p,q) is block upper triangular with nonzero diagonal also, strongly connected components of a directed graph also, connected components of an undirected graph
Arbitrary A: [p, q, r, s] = dmperm(A); maximum-size matching in a bipartite graph minimum-size vertex cover in a bipartite graph decomposition into strong Hall blocks
![Page 16: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/16.jpg)
Connected components (work in progress)
Sequential Matlab uses depth-first search (dmperm), which doesn’t parallelize well
Shiloach-Vishkin algorithm: repeat
Link every (super)vertex to a random neighbor Shrink each tree to a supervertex by pointer jumping
until no further change
Hybrid SV / search method under construction
Other potential graph kernels: Breadth-first search (after Husbands, LBNL) Bipartite matching (after Riedy, UCB) Strongly connected components (after Pinar, LBNL)
![Page 17: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/17.jpg)
MIT Lincoln Laboratory
SSCA #2 -- Overview
SSCA #2: Directed weighted multigraph with no self-loops
Scalable data generator Kernel 1 — Graph Construction Kernel 2 — Classify Large Sets (Search) Kernel 3 — Subgraph Extraction Kernel 4 — Graph Clustering / Partitioning
Randomly sized “cliques” Most intra-clique edges present Random inter-clique edges,
gradually 'thinning' with distance Integer and character string edge labels Randomized vertex numbers
![Page 18: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/18.jpg)
MIT Lincoln Laboratory
HPCS Benchmark SpectrumSSCA #2
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
Data Generator
1. Kernel
2. Kernel
3. Kernel
4. Kernel
HPCchallengeBenchmarks
Micro &Kernel
BenchmarksMission Partner
ApplicationBenchmarks
6.Signal
ProcessingKnowledgeFormation
Ex
isti
ng
Ap
pli
cati
on
s
Em
erg
ing
Ap
pli
cat
ion
s
Fu
ture
Ap
pli
cat
ion
s
Sim
ula
tio
nIn
telli
gen
ce
Re
co
nn
ais
san
ce5.
SimulationMulti-Physics
1.OptimalPattern
Matching
1.OptimalPattern
Matching
4.SimulationNAS PB AU
3.SimulationNWCHEM
Scalable SyntheticCompact Applications
HPCSSpanning
Set ofKernels
Kernels
DiscreteMath…GraphAnalysis…LinearSolvers…SignalProcessing…Simulation…I/O
ExecutionPerformance
Bounds
ExecutionPerformance
Indicators
LocalDGEMMSTREAM
RandomAccess1D FFT
GlobalLinpackPTRANS
RandomAccess1D FFT
CurrentUM2000GAMESS
OVERFLOWLBMHDRFCTHHYCOM
Near-FutureNWChemALEGRA
CCSM
Execution andDevelopment
Performance Indicators
System Bounds
Data Generator
Graph Construction
Sort Large Sets
Graph Extraction
Graph Clustering
Data Generator
Graph Construction
Sort Large Sets
Graph Extraction
Graph Clustering
2.Graph
Analysis
2.Graph
Analysis
![Page 19: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/19.jpg)
MIT Lincoln Laboratory
Scalable Data Generator
Produce edge tuples containing the start vertex, end vertex, and weight for each directed edge Hierarchical nature, with random-sized cliques connected by
edges between some of the cliques Interclique edges are assigned using a random distribution
that represents a hierarchical thinning as vertices are further apart
Weights are either random integer values or randomly selected character strings
Random permutations to avoid induced locality Vertex numbers — local processor memory locality Parallel — global memory locality
Data sets must be developed in their entirety before graph construction
May be parallelized — data tuple vertex naming schemes must be globally consistent
Data generator will not be timed Statistics will be saved to be used in later verification stages
![Page 20: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/20.jpg)
MIT Lincoln Laboratory
Kernel 1 — Graph Construction
Construct the multigraph in a format usable by all subsequent kernels
No subsequent modifications will be permitted to benefit specific kernels
Graph construction will be timed
![Page 21: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/21.jpg)
MIT Lincoln Laboratory
Kernel 2 — Sort on Selected Edge Weights
Sort on edge weights to determine those vertex pairs with The largest integer weights With a desired string weight
The two vertex pair lists will be saved for use in the subsequent kernel Start set SI for integer start sets Start set SC for character start sets
Sorting weights will be timed
![Page 22: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/22.jpg)
MIT Lincoln Laboratory
Kernel 3 — Extract Subgraphs
Extract a series of subgraphs formed by following paths of length k from a start set of initial vertices
The set of initial vertices will be determined from the previous kernel
Produce a series of subgraphs consisting of vertices and edges on paths of length k from the vertex pairs start sets SI and SC
A possible computational kernel for graph extraction is Breadth First Search
Extracting graphs will be timed
![Page 23: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/23.jpg)
MIT Lincoln Laboratory
Kernel 4 — Partition Graph Using a Clustering Algorithm
Apply a graph clustering algorithm to partition the vertices of the graph into subgraphs no larger than a maximum size so as to minimize the number of edges that need be “cut”, e.g. Kernighan and Lin Sangiovanni-Vincentelli, Chen, and Chua
The output of this kernel will be a description of the clusters and the edges that form the interconnecting “network”
This step should identify the structure in the graph specified in the data generation phase
It is important that the implementation of this kernel does not utilize a priori knowledge of the details in the data generator or the statistics collected in the graph generation process
Identifying the clusters and the interconnecting “network” will be timed
![Page 24: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/24.jpg)
Concise SSCA#2
A serial Matlab implementation of SSCA#2
Coding style is simple and “data parallel”
Our starting point for Star-P implementation
Edge labels are only integers, not strings yet
![Page 25: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/25.jpg)
Data generation
Using Lincoln Labs serial Matlab generator at present
Data-parallel Star-P generator under construction
Generate “edge triples” as distributed dense vectors
![Page 26: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/26.jpg)
Kernel 1: Graph construction
cSSCA#2 K1: ~30 lines of executable serial Matlab
Multigraph is a cell array of simple directed graphs
Graphs are dsparse matrices, created by sparse( )
![Page 27: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/27.jpg)
Kernel 2: Search in large sets
cSSCA#2 K2: ~12 lines of executable serial Matlab
Uses max( ) and find( ) to locate edges of max weight
ng = length(G.edgeWeights); % number of graphs maxwt = 0;for g = 1:ng maxwt = max(maxwt, max(max(G.edgeWeights{g})));endintedges = zeros(0,2);for g = 1:ng [intstart intend] = find(G.edgeWeights{g} == maxwt);intedges = [intedges ; intstart intend];end;
![Page 28: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/28.jpg)
Kernel 3: Subgraph extraction
cSSCA#2 K3: ~25 lines of executable serial Matlab
Returns subgraphs consisting of vertices and edges within specified distance of specified starting vertices
Sparse matrix-vector product for breadth-first search
![Page 29: Sparse Matrices and Combinatorial Algorithms in Star-P Status and Plans April 8, 2005](https://reader035.vdocuments.us/reader035/viewer/2022062714/56649d595503460f94a39f48/html5/thumbnails/29.jpg)
Kernel 4: Partitioning / clustering
cSSCA#2 K4: serial prototype exists, not yet parallel
Different algorithm from Lincoln Labs executable spec, which uses seed-growing breadth-first search
cSSCA#2 uses spectral methods for clustering, based on eigenvectors (Fiedler vectors) of the graph
One version seeks small balanced edge cuts (graph partitioning); another seeks clusters with low isoperimetric number (clustering)