memory-aware scheduling for sparse direct methodseagullo/thesis/siam_cse_2009_sparse-mem… ·...
TRANSCRIPT
![Page 1: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/1.jpg)
Memory-Aware Scheduling for Sparse DirectMethods
Emmanuel AGULLO, ICL - University of TennesseeJean-Yves L’EXCELLENT, LIP - INRIA
Abdou GUERMOUCHE, LaBRI, Universite de Bordeaux
MS31 Parallel Sparse Matrix Computations and EnablingAlgorithms
SIAM CSE 2009, Miami, FL, March 2-6, 2009
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 1
![Page 2: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/2.jpg)
Context
Context
Solving sparse linear systems
Ax = b⇒ Direct methods: A = LU
Typical matrix: BRGM matrixF 3.7× 106 variablesF 156× 106 non zeros in AF 4.5× 109 non zeros in LUF 26.5× 1012 flops
Hardware paradigmF Many-core architecture.F Large global amount of
memory.F Limited memory per core.
Software challenge→ Need for algorithms whose
memory usage scales withthe number of processors.
F Case study: MUMPS
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 2
![Page 3: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/3.jpg)
Context
Context
Solving sparse linear systems
Ax = b⇒ Direct methods: A = LU
Typical matrix: BRGM matrixF 3.7× 106 variablesF 156× 106 non zeros in AF 4.5× 109 non zeros in LUF 26.5× 1012 flops
Hardware paradigmF Many-core architecture.F Large global amount of
memory.F Limited memory per core.
Software challenge→ Need for algorithms whose
memory usage scales withthe number of processors.
F Case study: MUMPS
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 2
![Page 4: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/4.jpg)
Context
Context
Solving sparse linear systems
Ax = b⇒ Direct methods: A = LU
Typical matrix: BRGM matrixF 3.7× 106 variablesF 156× 106 non zeros in AF 4.5× 109 non zeros in LUF 26.5× 1012 flops
Hardware paradigmF Many-core architecture.F Large global amount of
memory.F Limited memory per core.
Software challenge→ Need for algorithms whose
memory usage scales withthe number of processors.
F Case study: MUMPS
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 2
![Page 5: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/5.jpg)
Context
Context
Solving sparse linear systems
Ax = b⇒ Direct methods: A = LU
Typical matrix: BRGM matrixF 3.7× 106 variablesF 156× 106 non zeros in AF 4.5× 109 non zeros in LUF 26.5× 1012 flops
Hardware paradigmF Many-core architecture.F Large global amount of
memory.F Limited memory per core.
Software challenge→ Need for algorithms whose
memory usage scales withthe number of processors.
F Case study: MUMPS
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 2
![Page 6: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/6.jpg)
Context
Outline
1. MUMPS
2. Limits to memory scalability
3. A new memory-aware algorithm
4. Preliminary results
5. Conclusion
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 3
![Page 7: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/7.jpg)
MUMPS
Outline
1. MUMPS
2. Limits to memory scalability
3. A new memory-aware algorithm
4. Preliminary results
5. Conclusion
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 4
![Page 8: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/8.jpg)
MUMPS
MUMPS: a MUltifrontal Massively Parallel sparse directSolver
Solution of large sparse linear systems with:F Symmetric positive definite matrices;F General symmetric matrices;F General unsymmetric matrices.
ImplementationF Distributed Multifrontal Solver (F90, MPI based);F Dynamic Distributed Scheduling;F Use of BLAS, BLACS, ScaLAPACK.
InterfacesF Fortran, C, Matlab, Scilab, Visual Studio.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 5
![Page 9: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/9.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
3
5
4
2
1
1 2 3 4 5
3
5
4
2
1
1 2 3 4 5
Non−zero Fill−in
A=
00
0
0
0
0 0 0
0
0
00
0 0
0 0
0
0
0 0
0
0
L+U−I=
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 10: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/10.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 11: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/11.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 12: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/12.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 13: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/13.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 14: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/14.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 15: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/15.jpg)
MUMPS
The multifrontal method (Duff, Reid’83)
Storage divided into two parts:F Factors systematically written to
disk;F Active Storage kept in memory.
FactorsStack of
contribution
blocks
Active
frontalmatrix
Active Storage
3
2
4
5
1
1
5
4 2
3
3
4
4
5
5
Factors
Contribution block
Elimination tree
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 6
![Page 16: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/16.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 17: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/17.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 18: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/18.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 19: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/19.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 20: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/20.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 21: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/21.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 22: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/22.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 23: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/23.jpg)
MUMPS
Memory behaviour (serial postorder traversal)
3
21
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 7
![Page 24: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/24.jpg)
MUMPS
Memory efficiency of MUMPS
Definition: Memory Efficiency on p processors (or cores)
e(p) = Sseqp×Smax(p) , Sseq: serial storage, Smax: parallel storage
Results: Memory Efficiency of MUMPS (with factors on disk)
Number p of processors 16 32 64 128AUDI KW 1 0.16 0.12 0.13 0.10
CONESHL MOD 0.28 0.28 0.22 0.19CONV3D64 0.42 0.40 0.41 0.37QIMONDA07 0.30 0.18 0.11 -
ULTRASOUND80 0.32 0.31 0.30 0.26
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 8
![Page 25: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/25.jpg)
Limits to memory scalability
Outline
1. MUMPS
2. Limits to memory scalability
3. A new memory-aware algorithm
4. Preliminary results
5. Conclusion
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 9
![Page 26: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/26.jpg)
Limits to memory scalability
Parallel multifrontal scheme
F Type 1 : Nodes processed on a single processorF Type 2 : Nodes processed with a parallel 1D blocked factorizationF Type 3 : Parallel 2D cyclic factorization (root node)
P0
P0
P3P2
P0 P1
P3
P0 P1
P0
P0
P3
P0
P2 P2
P0
P2P2
P0
P0
P1 P3
P3
TIM
E
: STATIC
2D static decomposition
SUBTREES
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 10
![Page 27: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/27.jpg)
Limits to memory scalability
Parallel multifrontal scheme
F Type 1 : Nodes processed on a single processorF Type 2 : Nodes processed with a parallel 1D blocked factorizationF Type 3 : Parallel 2D cyclic factorization (root node)
P0P1
P0
P0
P1
P3
P2
P1
P3P2
P0 P1
P3
P0 P1
P0
P0
P3
P0
P2 P2
P0
P2P2P3P0
P0
P0
P1 P3
P3
TIM
E
P0
: STATIC
P2
1D pipelined factorization
: DYNAMIC
P3 and P0 chosen by P2 at runtime
2D static decomposition
SUBTREES
P2P3
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 10
![Page 28: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/28.jpg)
Limits to memory scalability
Limits to memory scalability
P0P1
P0
P0
P1
P3
P2
P1
P3P2
P0 P1
P3
P0 P1
P0
P0
P3
P0
P2 P2
P0
P2P2P3P0
P0
P0
P1 P3
P3
TIM
E
P0
: STATIC
P2
1D pipelined factorization
: DYNAMIC
P3 and P0 chosen by P2 at runtime
2D static decomposition
SUBTREES
P2P3
F Many simultaneous active tasks;F Large master tasks;F Large subtrees;F Proportional mapping.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 11
![Page 29: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/29.jpg)
Limits to memory scalability
Limits to memory scalability
P0P1
P0
P0
P1
P3
P2
P1
P3P2
P0 P1
P3
P0 P1
P0
P0
P3
P0
P2 P2
P0
P2P2P3P0
P0
P0
P1 P3
P3
TIM
E
P0
: STATIC
P2
1D pipelined factorization
: DYNAMIC
P3 and P0 chosen by P2 at runtime
2D static decomposition
SUBTREES
P2P3
F Many simultaneous active tasks;F Large master tasks;F Large subtrees;F Proportional mapping.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 11
![Page 30: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/30.jpg)
Limits to memory scalability
Limits to memory scalability
P0P1
P0
P0
P1
P3
P2
P1
P3P2
P0 P1
P3
P0 P1
P0
P0
P3
P0
P2 P2
P0
P2P2P3P0
P0
P0
P1 P3
P3
TIM
E
P0
: STATIC
P2
1D pipelined factorization
: DYNAMIC
P3 and P0 chosen by P2 at runtime
2D static decomposition
SUBTREES
P2P3
F Many simultaneous active tasks;F Large master tasks;F Large subtrees;F Proportional mapping.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 11
![Page 31: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/31.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (1/2)Elimination tree :
d=0
d=1
d=2
d=3
d=4
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.
Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 12
![Page 32: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/32.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (1/2)Proportional mapping:
d=0
d=1
d=2
d=3
d=4
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.
Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 12
![Page 33: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/33.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (1/2)Proportional mapping:
d=0
d=1
d=2
d=3
d=4
512
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.
Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 12
![Page 34: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/34.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (1/2)Proportional mapping:
d=0
d=1
d=2
d=3
d=4
256 256
128 128 128 128
512
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.
Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 12
![Page 35: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/35.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (1/2)Proportional mapping:
d=0
d=1
d=2
d=3
d=4
256 256
128 128 128 128
512
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.
Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 12
![Page 36: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/36.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (1/2)Proportional mapping:
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.
Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 12
![Page 37: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/37.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Elimination tree :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 38: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/38.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 39: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/39.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 40: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/40.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 41: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/41.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 42: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/42.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 43: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/43.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 44: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/44.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 45: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/45.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 46: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/46.jpg)
Limits to memory scalability
Proportional mapping VS postorder traversal (2/2)Postorder traversal :
d=0
d=1
d=2
d=3
d=4
512
512
512
512 512512
512
TraversalF Postorder traversal, node by node;F All processors on each node.
Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 13
![Page 47: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/47.jpg)
A new memory-aware algorithm
Outline
1. MUMPS
2. Limits to memory scalability
3. A new memory-aware algorithm
4. Preliminary results
5. Conclusion
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 14
![Page 48: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/48.jpg)
A new memory-aware algorithm
Memory-aware mapping algorithmElimination tree :
d=0
d=1
d=2
d=3
d=4
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees if
memory allows for it.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 15
![Page 49: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/49.jpg)
A new memory-aware algorithm
Memory-aware mapping algorithmMemory-aware mapping:
d=0
d=1
d=2
d=3
d=4
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees if
memory allows for it.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 15
![Page 50: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/50.jpg)
A new memory-aware algorithm
Memory-aware mapping algorithmMemory-aware mapping:
d=0
d=1
d=2
d=3
d=4
512
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees if
memory allows for it.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 15
![Page 51: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/51.jpg)
A new memory-aware algorithm
Memory-aware mapping algorithmMemory-aware mapping:
d=0
d=1
d=2
d=3
d=4
256 256
512
512
512
512
512
MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees if
memory allows for it.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 15
![Page 52: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/52.jpg)
A new memory-aware algorithm
Memory-aware mapping algorithmMemory-aware mapping:
d=0
d=1
d=2
d=3
d=4
256 256
512
512
512
512
512
Advantages
, Robust: guaranteed (if memory M0 <Sseq
p )., Efficient: available memory provides coarse-grain parallelism.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 15
![Page 53: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/53.jpg)
Preliminary results
Outline
1. MUMPS
2. Limits to memory scalability
3. A new memory-aware algorithm
4. Preliminary results
5. Conclusion
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 16
![Page 54: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/54.jpg)
Preliminary results
Preliminary resultsF Excellent memory scalability:
I memory efficiency closed to 1.F Competitive (time) efficiency
I closed to proportional mapping (if enough memory);I memory provides coarse-grain parallelism:
0
2
4
6
8
10
12
14
16
18
20
0 5 10 15 20 25 30
avg
num
ber
of p
roce
ssor
s pe
r no
de (
norm
aliz
ed)
Distance to root node (depth)
Proportional mappingM0=1/32*sequential peakM0=2/32*sequential peakM0=5/32*sequential peakM0=8/32*sequential peak
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 17
![Page 55: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/55.jpg)
Conclusion
Outline
1. MUMPS
2. Limits to memory scalability
3. A new memory-aware algorithm
4. Preliminary results
5. Conclusion
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 18
![Page 56: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix](https://reader030.vdocuments.us/reader030/viewer/2022041114/5f23679a20dee77f6c609b32/html5/thumbnails/56.jpg)
Conclusion
Conclusion
Prototype of a memory-aware algorithmF Maximizes the amount of coarse-grain parallelism with respect
to the amount of memory available per processor/core.F New static mapping implemented, with constraints on dynamic
schedulers; experimented within the OOC version of MUMPS.F Very good memory scalability obtained.
On-going workF Further tuning and validation.F Generalization to the in-core case.F Reinject dynamic information to schedulers.
AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 19