hans de sterck department of applied mathematics university of colorado at boulder ulrike meier yang...
TRANSCRIPT
Hans De SterckDepartment of Applied Mathematics
University of Colorado at Boulder
Ulrike Meier YangCenter for Applied Scientific ComputingLawrence Livermore National Laboratory
Reducing Complexity in Algebraic Multigrid
2Copper 2004
Outline
introduction: AMG
complexity growth when using classical coarsenings
Parallel Modified Independent Set (PMIS) coarsening
scaling results
conclusions and future work
3Copper 2004
Introduction
solve
from 3D PDE – sparse!
large problems (109 dof) - parallel
unstructured grid problems
fAu =
A
5Copper 2004
— Select coarse “grids” — Define interpolation, — Define restriction and coarse-grid operators
AMG building blocks
Setup Phase:
Solve Phasemm(m) fuA =Relaxmm(m) fuA =Relax
1m(m)m ePe +=eInterpolat
m(m)mm uAfr −=Compute mmm euu +←Correct
1m1m1)(m reA +++ =
Solve
m(m)1m rRr =+Restrict
1,2,...m ,P (m) =
(m)(m)(m)T 1)(m(m)T(m) PAPA PR == +
6Copper 2004
AMG complexity - scalability
Operator complexity Cop=
e.g., 3D: Cop = 1 + 1/8 + 1/64 + … < 8 / 7
measure of memory use, and work in solve phase scalable algorithm:
O(n) operations per V-cycle (Cop bounded)
AND number of V-cycles independent of n
(AMG independent of n)
)(
)(
0Anonzeros
Anonzerosi
i∑
7Copper 2004
AMG Interpolation
after relaxation:
Ae 0 (relative to e)
heuristic: error after interpolation should also satisfy this relation approximately
derive interpolation from:
Fieaeaea jFj
ijjCj
ijiii ∈∀=++ ∑∑∈∈
0
8Copper 2004
AMG interpolation
≤θ<,}−{θ≥− aa kiikji 10xam≠
“large” aij should be taken into account accurately
“strong connections”: i strongly depends on j (and j strongly influences i ) if
with strong threshold θ
Fieaeaea jFj
ijjCj
ijiii ∈∀=++ ∑∑∈∈
0
9Copper 2004
AMG coarsening
(C1) Maximal Independent Set:
Independent: no two C-points are connected
Maximal: if one more C-point is added, the independence is lost
(C2) All F-F connections require connections to a common C-point (for good interpolation)
F-points have to be changed into C-points, to ensure (C2); (C1) is violated
more C-points, higher complexity
10Copper 2004
Classical Coarsenings
1. Ruge-Stueben (RS)— two passes: 2nd pass to ensure that F-F
have common C— disadvantage: highly sequential
2. CLJP— based on parallel independent set
algorithms developed by Luby and later by Jones & Plassman
— also ensures that F-F have common C3. hybrid RS-CLJP (“Falgout”)
— RS in processor interiors, CLJP at interprocessor boundaries
11Copper 2004
Classical coarsenings: complexity growth
proc dof Cop max avg nonzeros per row
tsetup tsolve Iter AMG
1 323 3.76 161 (5/8) 2.91 1.81 5 0.035
8 643 4.69 463 (6/11) 8.32 4.37 8 0.098
64 1283 5.72 1223 (7/16) 36.28 8.59 10 0.172
example: hybrid RS-CLJP (Falgout), 7-point finite difference Laplacian in 3D, θ= 0.25
increased memory use, long solution times, long setup times loss of scalability
12Copper 2004
our approach to reduce complexity
do not add C points for strong F-F connections
that do not have a common C point
less C points, reduced complexity, but worse convergence factors expected
can something be gained?
13Copper 2004
PMIS coarsening (De Sterck, Yang)
Parallel Modified Independent Set (PMIS)
do not enforce condition (C2)
weighted independent set algorithm:
points i that influence many equations (i
large), are good candidates for C-points
add random number between 0 and 1 to i
to break ties
14Copper 2004
pick C-points with maximal measures (like in CLJP), then make all their neighbors fine (like in RS)
proceed until all points are either coarse or fine
PMIS coarsening
15Copper 2004
PMIS select 1
select C-pts with maximal measure locally
make neighbor F-pts
remove neighbor edges
3.7 5.3 5.0 5.9 5.4 5.3 3.4
5.2 8.0 8.5 8.2 8.6 8.9 5.1
5.9 8.1 8.8 8.9 8.4 8.2 5.9
5.7 8.6 8.3 8.8 8.3 8.1 5.0
5.3 8.7 8.3 8.4 8.3 8.8 5.9
5.0 8.8 8.5 8.6 8.7 8.9 5.3
3.2 5.6 5.8 5.6 5.9 5.9 3.0
16Copper 2004
PMIS: remove and update 1
select C-pts with maximal measure locally
make neighbors F-pts
remove neighbor edges
3.7 5.3 5.0 5.9
5.2 8.0
5.9 8.1
5.7 8.6 8.1 5.0
8.4
8.6
5.6
17Copper 2004
PMIS: select 2
select C-pts with maximal measure locally
make neighbors F-pts
remove neighbor edges
3.7 5.3 5.0 5.9
5.2 8.0
5.9 8.1
5.7 8.1 5.0
8.4
8.6
5.6
8.6
18Copper 2004
PMIS: remove and update 2
select C-pts with maximal measure locally
make neighbors F-pts
remove neighbor edges
3.7 5.3
5.2 8.0
19Copper 2004
PMIS: final grid
select C-pts with maximal measure locally
make neighbor F-pts
remove neighbor edges
20Copper 2004
Preliminary results: 7pt 3D Laplacian on an un-structured grid (n = 76,527), serial,
θ=0.5, GS
with GMRES(10)
Coarsening Cop # its Time # its Time
RS 4.43 12 5.74 8 5.37
CLJP 3.83 12 5.39 8 5.15
PMIS 1.56 58 8.59 17 4.53
Implementation in CASC/LLNL’s Hypre/BoomerAMG library (Falgout, Yang, Henson, Jones, …)
21Copper 2004
PMIS results: 27-point finite element Laplacian in 3D, 403 dof per proc (IBM Blue)
Falgout (θ=0.5) and PMIS-GMRES(10) (θ=0.25)
proc Cop max nz levels tsetup tsolve Iter ttotal
1 1.62 124 12 8.39 6.48 7 14.87
8 2.19 446 16 16.65 11.20 8 27.85
64 2.43 697 17 37.95 16.89 9 54.84
256 2.51 804 19 48.36 18.42 9 66.78
384 2.55 857 19 57.70 21.66 10 79.36
1 1.11 36 6 4.60 8.42 10 13.02
8 1.11 42 7 5.81 13.27 13 19.08
64 1.11 48 8 6.84 16.83 15 23.67
256 1.12 51 9 7.23 18.50 16 25.73
384 1.12 52 9 7.56 19.62 17 27.18
22Copper 2004
PMIS results: 7-point finite difference Laplacian in 3D, 403 dof per proc (IBM Blue)
Falgout (θ=0.5) and PMIS-GMRES(10) (θ=0.25)
proc Cop max nz Levels tsetup tsolve Iter ttotal
1 3.62 104 11 4.91 4.02 6 8.99
8 4.45 292 14 10.52 6.95 7 17.47
64 5.07 450 16 21.82 11.89 9 33.71
256 5.28 521 19 26.72 13.18 9 39.90
384 5.35 549 19 30.41 14.13 10 44.54
1 2.32 56 7 2.28 8.05 13 10.36
8 2.37 65 8 3.78 13.30 17 17.08
64 2.39 71 9 5.19 19.44 21 24.63
256 2.40 73 10 5.59 20.47 22 26.06
384 2.41 74 10 5.97 22.56 24 28.53
23Copper 2004
Conclusions
PMIS leads to reduced, scalable complexities for large problems on parallel computers
using PMIS-GMRES, large problems can be done efficiently, with good scalability, and without requiring much memory (Blue Gene/L)
24Copper 2004
Future work
parallel aggressive coarsening, multi-pass interpolation to further reduce complexity (using one-pass RS, PMIS, or hybrid)
improved interpolation formulas for more aggressively coarsened grids (Jacobi improvement, …), to reduce need for GMRES
parallel First-Order System Least-Squares-AMG code for large-scale PDE problems Blue Gene/L applications