partitioning and partitioning toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · schwarz...
TRANSCRIPT
![Page 1: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/1.jpg)
Partitioning and Partitioning Tools
Tim BarthNASA Ames Research Center
Moffett Field, California 94035-1000 USA
1
![Page 2: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/2.jpg)
Graph/Mesh Partitioning
• Why do it?
• The graph bisection problem
• What are the standard heuristic algorithms?
• What tools are available?
2
![Page 3: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/3.jpg)
Why do it?
• Efficient utilization of distributed computational resources
– Equidistribution of workload among processors (load balancing)
– Minimized time spend in interprocessor communication
∗ Communication takes time and it’s not always possible to hide
this latency in data tranfer
∗ Cost of communication is often modeled by the linear
relationship for n messages: Cost =∑
n(α+ βmn)
(a) (b)
Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh
with minimized message length.
3
![Page 4: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/4.jpg)
Why do it?
• Efficient utilization of distributed computational resources
– Equidistribution of workload among processors (load balancing)
– Minimized time spend in interprocessor communication
∗ Communication takes time and it’s not always possible to hide
this latency in data tranfer
∗ Cost of communication is often modeled by the linear
relationship for n messages: Cost =∑
n(α+ βmn)
(a) (b)
Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh
with minimized message length.
3-a
![Page 5: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/5.jpg)
Why do it?
• Efficient utilization of distributed computational resources
– Equidistribution of workload among processors (load balancing)
– Minimized time spend in interprocessor communication
∗ Communication takes time and it’s not always possible to hide
this latency in data tranfer
∗ Cost of communication is often modeled by the linear
relationship for n messages: Cost =∑
n(α+ βmn)
(a) (b)
Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh
with minimized message length.
3-b
![Page 6: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/6.jpg)
Why do it?
• Efficient utilization of distributed computational resources
– Equidistribution of workload among processors (load balancing)
– Minimized time spend in interprocessor communication
∗ Communication takes time and it’s not always possible to hide
this latency in data tranfer
∗ Cost of communication is often modeled by the linear
relationship for n messages: Cost =∑
n(α+ βmn)
(a) (b)
Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh
with minimized message length.
3-c
![Page 7: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/7.jpg)
Why do it?
• Efficient utilization of distributed computational resources
– Equidistribution of workload among processors (load balancing)
– Minimized time spend in interprocessor communication
∗ Communication takes time and it’s not always possible to hide
this latency in data tranfer
∗ Cost of communication is often modeled by the linear
relationship for n messages: Cost =∑
n(α+ βmn)
(a) (b)
Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh
with minimized message length.
3-d
![Page 8: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/8.jpg)
Why do it?
• Efficient utilization of distributed computational resources
– Equidistribution of workload among processors (load balancing)
– Minimized time spend in interprocessor communication
∗ Communication takes time and it’s not always possible to hide
this latency in data tranfer
∗ Cost of communication is often modeled by the linear
relationship for n messages: Cost =∑
n(α+ βmn)
(a) (b)
Figure 1: (a) Mesh partitioning with minimized number of messages, (b) Mesh
with minimized message length.
3-e
![Page 9: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/9.jpg)
Why do it?
• As a strategy for reducing the overall arithmetic complexity ofan algorithm
– Overlapping Schwarz methods
– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring
– Multiscale methods, e.g. agglomeration multigrid
4
![Page 10: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/10.jpg)
Why do it?
• As a strategy for reducing the overall arithmetic complexity ofan algorithm
– Overlapping Schwarz methods
– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring
– Multiscale methods, e.g. agglomeration multigrid
4-a
![Page 11: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/11.jpg)
Why do it?
• As a strategy for reducing the overall arithmetic complexity ofan algorithm
– Overlapping Schwarz methods
– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring
– Multiscale methods, e.g. agglomeration multigrid
4-b
![Page 12: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/12.jpg)
Why do it?
• As a strategy for reducing the overall arithmetic complexity ofan algorithm
– Overlapping Schwarz methods
– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring
– Multiscale methods, e.g. agglomeration multigrid
4-c
![Page 13: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/13.jpg)
Why do it?
• As a strategy for reducing the overall arithmetic complexity ofan algorithm
– Overlapping Schwarz methods
– “Divide and Conquer” methods, e.g. nested dissection ofmatrix, Schur complement substructuring
– Multiscale methods, e.g. agglomeration multigrid
4-d
![Page 14: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/14.jpg)
Why do it?
• Overlapping Schwarz methods
0 20 40 60 80
Schwarz Iterations
-15
-14
-13
-12
-11
-10
-9
-8
-7
-6
-5
-4
-3
-2
-1
0
10101010101010101010101010101010
Nor
m (
Glo
bal)
Res
idua
l
2nd Order Scheme2 Partition, 1 Overlap2 Partition, 2 Overlap2 Partition, 3 Overlap8 Partition, 1 Overlap8 Partition, 2 Overlap8 Partition, 3 Overlap
5
![Page 15: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/15.jpg)
Why do it?
• Overlapping Schwarz methods with subdomain size H, mesh cell
size h and overlap δ
Let A be the discretization matrix and Mas the additive Schwarz
preconditioner. There exists a constant C independent of H and h
such that the condition number κ
κ(M−1as A) ≤ CH−2
(
1 +
(
H
δ
)2)
. (1)
with 2-level coarse space correction
There exists a constant C independent of H and h such that
κ(M−1as A) ≤ C
(
1 +
(
H
δ
))
. (2)
6
![Page 16: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/16.jpg)
Why do it?
• Substructuring
A1 A2
A3 A4
x1
x2
=
b1
b2
, A−1 =
C1 C2
C3 C4
with S = A4 −A3A−11 A2, C1 = A−1
1 +A−11 A2S
−1A3A−11 ,
C2 = −A−11 A2S
−1, C3 = −S−1A3A−11 , C4 = S−1.
κ(M−1SchurA) = C(1 + log(H/δ))
7
![Page 17: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/17.jpg)
Graph Bisection (np hard)
Define a partitioning vector p ∈ Zn which 2-colors the vertices of a graph
p = [+1,−1,−1,+1,+1, ...,+1,−1]T (3)
+1
+1
+1
+1
+1+1
+1
+1
+1
+1
-1
-1
-1
-1
-1
-1
-1
-1
-1
• Minimize the cut-weight of the weighted graph
• Produce balanced partitions
8
![Page 18: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/18.jpg)
Heuristic Graph Partitioning
Three commonly used partitioning techniques
• Recursive coordinate bisection
• Recursive Cuthill-McKee
• Recusive Spectral bisection
9
![Page 19: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/19.jpg)
Recursive Coordinate Bisection
• Spatial coordinates are sorted along alternating horizontal andvertical directions
• Divisors are found to balance partitions
10
![Page 20: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/20.jpg)
Graph Ordering Cuthill-McKee
Algorithm: Graph ordering, Cuthill-McKee.
Step 1. Find vertex with lowest degree. This is the root vertex.
Step 2. Find all neighboring vertices connecting to the root by incident
edges. Order them by increasing vertex degree. This forms level 1.
Step 3. Form level k by finding all neighboring vertices of level k − 1
which have not been previously ordered. Order these new vertices by
increasing vertex degree.
Step 4. If vertices remain, go to step 3.
11
![Page 21: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/21.jpg)
Graph Ordering Cuthill-McKee
Matrix nonzero pattern
Figure 2: Natural Ordering (left) and Cuthill-McKee ordering (right)
12
![Page 22: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/22.jpg)
Recursive Cuthill-McKee
• The level structure computed in Cuthill-McKee ordering isutilized
• Divisors are found to balance partitions
13
![Page 23: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/23.jpg)
Recursive Spectral Bisection
Motivated by the observation that the cut-weight of a graph is precisely
Wc =1
4pTLp
Algorithm: Spectral Graph Bisection.
Step 1. Calculate the matrix L associated with the Laplacian of the
graph.
Step 2. Calculate the eigenvalues and eigenvectors of L.
Step 3. Order the eigenvalues by magnitude, λ1 ≤ λ2 ≤ λ3...λn.
Step 4. Determine the smallest nonzero eigenvalue, λf and its associated
eigenvector xf (the Fiedler vector).
Step 5. Sort elements of the Fiedler vector.
Step 6. Choose a divisor at the median of the sorted list and 2-color
vertices of the graph which correspond to elements of the Fielder vector
less than or greater than the median value.
14
![Page 24: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/24.jpg)
Recursive Spectral Bisection
15
![Page 25: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/25.jpg)
Multilevel k-way Partitioning
• Utilized successive k-way graph contraction to coarsen graph
• Perform high quality partitioning on coarsened graph
• Prolongate to finer graphs with local interface optimization toimprove cut-weight
16
![Page 26: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/26.jpg)
17
![Page 27: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/27.jpg)
18
![Page 28: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/28.jpg)
Metis, ParMetis
• Extremely fast
• Parallel implementation (requires some initial partitioning)
• Supports weighted graphs by vertices or edges
• Supports incremental load balancing (repartitioning) withminimized data migration
19
![Page 29: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/29.jpg)
20
![Page 30: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/30.jpg)
Zoltan
• Relatively new package under development at Sandia underGPL
• Interfaces with Metis or Jostle
• Documentation suggests that the package will contain most ofthe commonly needed services for parallel scientific codes:partitioning, repartitioning, data migration, etc.
21
![Page 31: Partitioning and Partitioning Toolsgraphics.stanford.edu/sss/barth_partitioning.pdf · Schwarz Iterations-15-14-13-12-11-10-9-8-7-6-5-4-3-2-1 0 10 10 10 10 10 10 10 10 10 10 10 10](https://reader035.vdocuments.us/reader035/viewer/2022081600/6052779d1c641b1f2f72d789/html5/thumbnails/31.jpg)
Partitioning Tools for SSS?
• Domain specific languages?
– Language for finite element methods
– Language for molecular dynamics
– <Insert your favorite problem domain here>
• Partial or full data dependency specification (analogous toscene graph specification in Java3d).
• Automatic tools for performance enhancement
– Use hardware performance statistics (memory accesspatterns) of previous executions in subsequence compilations
– Runtime data migration
22