1 challenges in combinatorial scientific computing john r. gilbert university of california, santa...
Post on 21-Dec-2015
216 views
TRANSCRIPT
![Page 1: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/1.jpg)
1
Challenges in Combinatorial Scientific Computing
John R. GilbertUniversity of California, Santa Barbara
SIAM Annual MeetingJuly 10, 2009
Support: DOE Office of Science, NSF, DARPA, SGI
![Page 2: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/2.jpg)
2
Combinatorial Scientific Computing
“I observed that most of the coefficients in our matrices were zero; i.e., the nonzeros were ‘sparse’ in the matrix, and that typically the triangular matrices associated with the forward and back solution provided by Gaussian elimination would remain sparse if pivot elements were chosen with care”
- Harry Markowitz, describing the 1950s work on portfolio theory that won the 1990 Nobel Prize for Economics
![Page 3: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/3.jpg)
3
IC10: Mathematical Analysis of Hyperlinks in the World-Wide Web
IP3: Discrete Mathematics and Theoretical Computer Science
VNL: The Human Genome and Beyond
IC18: Finding Provably Near-Optimal Solutions to Discrete Optimization Problems
CP6: Computer Science: Discrete Algorithms
MS54: New Approaches for Scalable Sparse Linear System Solution
MS71: Special Session on IR Algorithms and Software
MS73: Reverse Engineering Gene Networks
MS86/110: Combinatorial Algorithms in Scientific Computing
2002 SIAM Annual Meeting
![Page 4: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/4.jpg)
4
IC 7: Combinatorics Inspired by Biology
IC 12: Parallel Network Analysis
IC 14: On the Complexity of Game, Market, and Network Equilibria
MS1/8: Adaptive Algebraic Multigrid Methods
MS 4/11/26 : Mathematical Challenges in Cyber Security
MS 36/43/55 : High Performance Computing on Massive Real-World Graphs
MS 47: Optimization and Graph Algorithms
MS 79: Enumerative and Geometric Combinatorics
MS 71: Modeling of Large-Scale Metabolic Networks
MS 75: Combinatorial Scientific Computing
2007 SIAM Annual Meeting
![Page 5: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/5.jpg)
5
An analogy?
As the “middleware” of scientific computing, linear algebra has supplied or enabled:
• Mathematical tools
• “Impedance match” to computer operations
• High-level primitives
• High-quality software libraries
• Ways to extract performance from computer architecture
• Interactive environments
Computers
Continuousphysical modeling
Linear algebra
![Page 6: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/6.jpg)
6
An analogy?
Computers
Continuousphysical modeling
Linear algebra
Discretestructure analysis
Graph theory
Computers
![Page 7: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/7.jpg)
7
An analogy? Well, we’re not there yet ….
Discretestructure analysis
Graph theory
Computers
• Mathematical tools
• “Impedance match” to computer operations
• High-level primitives
• High-quality software libs
• Ways to extract performance from computer architecture
• Interactive environments
![Page 8: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/8.jpg)
8
Five Challenges in
Combinatorial Scientific Computing
![Page 9: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/9.jpg)
9
• How do we know we’re solving the right problem?
– Harder in graph analytics than in numerical linear algebra
– I’ll pretend this is a math modeling question, not a CSC question
• How will applications keep us honest in the future?
– Tony Chan, 1991: T(A, B) = elapsed time before field A would notice the disappearance of field B
– minA( T(A, CSC) ) ?
Challenges I’m not going to talk about
![Page 10: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/10.jpg)
10
#1: The Architecture & Algorithms Challenge
LANL / IBM Roadrunner> 1 PFLOPS
Two Nvidia 8800 GPUs> 1 TFLOPS
Intel 80-core chip> 1 TFLOPS Parallelism is no longer optional…
… in every part of a computation.
![Page 11: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/11.jpg)
11
High-Performance Architecture
Most high-performance computer designs allocate resources to optimize Gaussian elimination on large, dense matrices.
Originally, because linear algebra is the middleware of scientific computing.
Nowadays, largely for bragging rights.
= xP A L U
![Page 12: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/12.jpg)
12
The Memory Wall Blues
Most of memory is hundreds or thousands of cycles away from the processor that wants it.
You can buy more bandwidth, but you can’t buy less latency. (Speed of light, for one thing.)
![Page 13: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/13.jpg)
13
The Memory Wall Blues
Most of memory is hundreds or thousands of cycles away from the processor that wants it.
You can buy more bandwidth, but you can’t buy less latency. (Speed of light, for one thing.)
You can hide latency with either locality or (parallelism + bandwidth).
You can hide lack of bandwidth with locality, but not with parallelism.
![Page 14: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/14.jpg)
14
The Memory Wall Blues
Most of memory is hundreds or thousands of cycles away from the processor that wants it.
You can buy more bandwidth, but you can’t buy less latency. (Speed of light, for one thing.)
You can hide latency with either locality or (parallelism + bandwidth).
You can hide lack of bandwidth with locality, but not with parallelism.
Most emerging graph problems have lousy locality.
Thus the algorithms need even more parallelism!
![Page 15: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/15.jpg)
15
Architectural impact on algorithms
-1
0
1
2
3
4
5
6
0 1 2 3 4 5
log Problem Size
log
cycl
es/fl
opT = N4.7
Naïve algorithm is O(N5) time under UMH model.BLAS-3 DGEMM and recursive blocked algorithms are O(N3).
Size 2000 took 5 days
12000 would take1095 years
Slide from Larry Carter
Naïve 3-loop matrix multiply [Alpern et al., 1992]:
![Page 16: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/16.jpg)
16
Efficient sequential algorithms for graph-theoretic
problems often follow long chains of dependencies
Computation per edge traversal is often small
Little locality or opportunity for reuse
A big opportunity exists for architecture to influence combinatorial algorithms.
Maybe even vice versa.
The parallel computing challenge
![Page 17: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/17.jpg)
17
Strongly connected components
• Symmetric permutation to block triangular form
• Diagonal blocks are strong Hall (irreducible / strongly connected)
• Sequential: linear time by depth-first search [Tarjan]
• Parallel: divide & conquer, work and span depend on input [Fleischer, Hendrickson, Pinar]
1 52 4 7 3 61
5
2
4
7
3
6
PAPT G(A)
1 2
3
4 7
6
5
![Page 18: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/18.jpg)
18
Strong components of 1M-vertex RMAT graph
![Page 19: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/19.jpg)
19
One architectural approach: Cray MTA / XMT
• Hide latency by massive multithreading
• Per-tick context switching
• Uniform (sort of) memory access time
• But the economic case for the machine is less than completely clear.
![Page 20: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/20.jpg)
20
#2: The Data Size Challenge
“Can we understand anything interesting about our data when we do not even have time to read all of it?”
- Ronitt Rubinfeld, 2002
![Page 21: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/21.jpg)
21
Approximating min spanning tree weight in sublinear time [Chazelle, Rubinfeld, Trevisan]
• Key subroutine: estimate number of connected components of a graph, in time depending on
relative error but not on size of graph
• Idea: for each vertex v define
f(v) = 1/(size of component containing v)
• Then Σv f(v) = number of connected components
• Estimate Σv f(v) by breadth-first search from a few vertices
![Page 22: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/22.jpg)
22
Features of (many) large graph applications
• “Feasible” means O(n), or even less.
• You can’t scan all the data.– you want to poke at it in various ways
– maybe interactively.
• Multiple simultaneous queries to the same graph– maybe to differently filtered subgraphs
– throughput and response time are both important.
• Benchmark data sets are a big challenge!
• Think of data not as a finite object but as a statistical sample of an infinite process . . .
• Multiple simultaneous queries to same graph– Graph may be fixed, or slowly changing
– Throughput and response time both important
![Page 23: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/23.jpg)
23
#3: The Uncertainty Challenge
“You could not step twice into the same river; for other waters are ever flowing on to you.”
- Heraclitus
![Page 24: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/24.jpg)
24
• Statistical perspectives; robustness to fluctuations; streaming, noisy data; dynamic structure; analysis of sensitivity and propagation of uncertainty
• K-betweenness centrality for robustness [Bader]
• Stochastic centrality for infection vulnerability analysis [Vullikanti]
• Dynamic community updating [Bhowmick]
• Detection theory [Kepner]: “You should never worry about
implementation; you should worry about whether you can correctly model the background and foreground.”
Change and uncertainty
![Page 25: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/25.jpg)
25
Horizontal - vertical decomposition[Mezic et al.]
• Strongly connected components, ordered by levels of DAG
• Linear-time sequentially; no work/span efficient parallel algorithms known
• … but we don’t want an exact algorithm anyway; edges are probabilities, and the application is a kind of statistical model reduction …
3 54
9
7
1
8
6
2
level 1
level 2
level 3
level 45 96 7 81 2 3 4
1
5
2
3
4
9
6
7
8
![Page 26: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/26.jpg)
26
Approach:
1. Decompose networks
2. Propagate uncertainty through components
3. Iteratively aggregate component uncertainty
Spectral graph decomposition technique combined with dynamical Spectral graph decomposition technique combined with dynamical systems analysis leads to deconstruction of a possibly unknown network systems analysis leads to deconstruction of a possibly unknown network into inputs, outputs, forward and feedback loops and allows identification into inputs, outputs, forward and feedback loops and allows identification of a minimal functional unitof a minimal functional unit (MFU)(MFU) of a systemof a system..
(node 4 and several connections pruned, with no loss of performance)H-V decomposition
Output, execution
Feedback loops
Trim the network,preserve dynamics!
Input, initiator
Forward, production unit
Additional functional requirements
Minimal functional units: sensitive edges (leading to lack of production)
easily identifiable
Allows identification of roles ofdifferent feedback loops
Level of outputFor MFU
Level of output with feedback loops
Mezic group, UCSB
Model reduction and graph decomposition
![Page 27: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/27.jpg)
27
• By analogy to numerical scientific computing. . .
• What should the combinatorial BLAS look like?
#4: The Primitives Challenge
C = A*B
y = A*x
μ = xT y
Basic Linear Algebra Subroutines (BLAS):Speed (MFlops) vs. Matrix Size (n)
![Page 28: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/28.jpg)
28
Primitives should…
• Supply a common notation to express computations
• Have broad scope but fit into a concise framework
• Allow programming at the appropriate level of
abstraction and granularity
• Scale seamlessly from desktop to supercomputer
• Hide architecture-specific details from users
![Page 29: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/29.jpg)
29
Frameworks for graph primitives
Many possibilities; none completely satisfactory; relatively
little work on common frameworks or interoperability.
• Visitor-based, distributed-memory: PBGL
• Visitor-based, multithreaded: MTGL
• Heterogeneous, tuned kernels: SNAP
• Sparse array-based: Matlab dialects, CBLAS
• Scan-based vectorized: NESL
• Map-reduce: lots of visibility, utility unclear
![Page 30: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/30.jpg)
30
Distributed-memory parallel sparse matrix-matrix multiplication
j
* =i
kk
Cij
Cij += Aik * Bkj
2D block layout Outer product formulation Sequential “hypersparse” kernel
• Asynchronous MPI-2 implementation
• Experiments: TACC Lonestar cluster
• Good scaling to 256 processors
Time vs Number of cores -- 1M-vertex RMAT
![Page 31: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/31.jpg)
31
Recursive All-Pairs Shortest Paths
A B
C DA
B
DC
A = A*; % recursive call
B = AB; C = CA;
D = D + CB;
D = D*; % recursive call
B = BD; C = DC;
A = A + BC;
+ is “min”, × is “add”
Based on R-Kleene algorithm
Well suited for GPU architecture:
• Fast matrix-multiply kernel
• In-place computation => low memory bandwidth
• Few, large MatMul calls => low GPU dispatch overhead
• Recursion stack on host CPU,
not on multicore GPU
• Careful tuning of GPU code
![Page 32: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/32.jpg)
32
APSP: Experiments and observations
128-core Nvidia 8800 GPU
Speedup relative to. . .
1-core CPU: 120x – 480x
16-core CPU: 17x – 45x
Iterative, 128-core GPU: 40x – 680x
MSSSP, 128-core GPU: ~3x
Conclusions:
• High performance is achievable but not simple
• Carefully chosen and optimized primitives will be key
Time vs. Matrix Dimension
![Page 33: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/33.jpg)
33
#5: The Education Challenge
Or, the interdisciplinary challenge.
Where do you go to take courses in
Graph algorithms …
… on massive data sets …
… in the presence of uncertainty …
… analyzed on parallel computers …
… applied to a domain science?
CSC needs to be both coherent and interdisciplinary!
![Page 34: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/34.jpg)
34
Five Challenges In CSC
1. Architecture & algorithms
2. Data size & management
3. Uncertainty
4. Primitives & software
5. Education
![Page 35: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/35.jpg)
35
Morals
• Things are clearer if you look at them from multiple perspectives
• Combinatorial algorithms are pervasive in scientific computing and will become more so
• This is a great time to be doing combinatorial scientific computing!
![Page 36: 1 Challenges in Combinatorial Scientific Computing John R. Gilbert University of California, Santa Barbara SIAM Annual Meeting July 10, 2009 Support: DOE](https://reader035.vdocuments.us/reader035/viewer/2022062407/56649d625503460f94a44024/html5/thumbnails/36.jpg)
36
Thanks …
David Bader, Jon Berry, Nadya Bliss, Aydin Buluc, Larry Carter, Alan Edelman, John Feo,
Bruce Hendrickson, Jeremy Kepner, Jure Leskovic, Kamesh Madduri, Michael Mahoney, Igor Mezic, Cleve Moler, Steve Reinhardt, Eric
Robinson, Ronitt Rubinfeld, Rob Schreiber, Viral Shah, Sivan Toledo, Anil Vullikanti