memory-aware scheduling for sparse direct methodseagullo/thesis/siam_cse_2009_sparse-mem… ·...

Memory-Aware Scheduling for Sparse DirectMethods

Emmanuel AGULLO, ICL - University of TennesseeJean-Yves L’EXCELLENT, LIP - INRIA

Abdou GUERMOUCHE, LaBRI, Universite de Bordeaux

MS31 Parallel Sparse Matrix Computations and EnablingAlgorithms

SIAM CSE 2009, Miami, FL, March 2-6, 2009

AGULLO - GUERMOUCHE - L’EXCELLENT Memory-Aware Scheduling for Sparse Direct Methods 1

Context

Solving sparse linear systems

Ax = b⇒ Direct methods: A = LU

Typical matrix: BRGM matrixF 3.7× 106 variablesF 156× 106 non zeros in AF 4.5× 109 non zeros in LUF 26.5× 1012 flops

Hardware paradigmF Many-core architecture.F Large global amount of

memory.F Limited memory per core.

Software challenge→ Need for algorithms whose

memory usage scales withthe number of processors.

F Case study: MUMPS

Page 3: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Context

F Case study: MUMPS

Page 4: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Context

F Case study: MUMPS

Page 5: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Context

F Case study: MUMPS

Page 6: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Context

Outline

1. MUMPS

2. Limits to memory scalability

3. A new memory-aware algorithm

4. Preliminary results

5. Conclusion

Page 7: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

Outline

1. MUMPS

5. Conclusion

Page 8: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

MUMPS: a MUltifrontal Massively Parallel sparse directSolver

Solution of large sparse linear systems with:F Symmetric positive definite matrices;F General symmetric matrices;F General unsymmetric matrices.

ImplementationF Distributed Multifrontal Solver (F90, MPI based);F Dynamic Distributed Scheduling;F Use of BLAS, BLACS, ScaLAPACK.

InterfacesF Fortran, C, Matlab, Scilab, Visual Studio.

Page 9: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

The multifrontal method (Duff, Reid’83)

3

5

4

2

1

1 2 3 4 5

3

5

4

2

1

1 2 3 4 5

Non−zero Fill−in

A=

00

0

0 0 0

0

00

0 0

0

0 0

0

L+U−I=

Storage divided into two parts:F Factors systematically written to

disk;F Active Storage kept in memory.

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 10: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 11: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 12: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 13: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 14: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 15: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

FactorsStack of

contribution

blocks

Active

frontalmatrix

Active Storage

3

2

4

5

1

5

4 2

3

4

5

Factors

Contribution block

Elimination tree

Page 16: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

Memory behaviour (serial postorder traversal)

3

21

Page 17: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 18: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 19: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 20: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 21: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 22: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 23: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

3

21

Page 24: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

MUMPS

Memory efficiency of MUMPS

Definition: Memory Efficiency on p processors (or cores)

e(p) = Sseqp×Smax(p) , Sseq: serial storage, Smax: parallel storage

Results: Memory Efficiency of MUMPS (with factors on disk)

Number p of processors 16 32 64 128AUDI KW 1 0.16 0.12 0.13 0.10

CONESHL MOD 0.28 0.28 0.22 0.19CONV3D64 0.42 0.40 0.41 0.37QIMONDA07 0.30 0.18 0.11 -

ULTRASOUND80 0.32 0.31 0.30 0.26

Page 25: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Limits to memory scalability

Outline

1. MUMPS

5. Conclusion

Page 26: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Parallel multifrontal scheme

F Type 1 : Nodes processed on a single processorF Type 2 : Nodes processed with a parallel 1D blocked factorizationF Type 3 : Parallel 2D cyclic factorization (root node)

P0

P3P2

P0 P1

P3

P0 P1

P0

P3

P0

P2 P2

P0

P2P2

P0

P1 P3

P3

TIM

E

: STATIC

2D static decomposition

SUBTREES

Page 27: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Parallel multifrontal scheme

F Type 1 : Nodes processed on a single processorF Type 2 : Nodes processed with a parallel 1D blocked factorizationF Type 3 : Parallel 2D cyclic factorization (root node)

P0P1

P0

P1

P3

P2

P1

P3P2

P0 P1

P3

P0 P1

P0

P3

P0

P2 P2

P0

P2P2P3P0

P0

P1 P3

P3

TIM

E

P0

: STATIC

P2

1D pipelined factorization

: DYNAMIC

P3 and P0 chosen by P2 at runtime

SUBTREES

P2P3

Page 28: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

P0P1

P0

P1

P3

P2

P1

P3P2

P0 P1

P3

P0 P1

P0

P3

P0

P2 P2

P0

P2P2P3P0

P0

P1 P3

P3

TIM

E

P0

: STATIC

P2

: DYNAMIC

SUBTREES

P2P3

F Many simultaneous active tasks;F Large master tasks;F Large subtrees;F Proportional mapping.

Page 29: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

P0P1

P0

P1

P3

P2

P1

P3P2

P0 P1

P3

P0 P1

P0

P3

P0

P2 P2

P0

P2P2P3P0

P0

P1 P3

P3

TIM

E

P0

: STATIC

P2

: DYNAMIC

SUBTREES

P2P3

Page 30: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

P0P1

P0

P1

P3

P2

P1

P3P2

P0 P1

P3

P0 P1

P0

P3

P0

P2 P2

P0

P2P2P3P0

P0

P1 P3

P3

TIM

E

P0

: STATIC

P2

: DYNAMIC

SUBTREES

P2P3

Page 31: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Proportional mapping VS postorder traversal (1/2)Elimination tree :

d=0

d=1

d=2

d=3

d=4

MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees.

Advantages and drawbacks, Fine-grain + coarse-grain parallelism;/ Bad memory efficiency.

Page 32: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Proportional mapping VS postorder traversal (1/2)Proportional mapping:

d=0

d=1

d=2

d=3

d=4

Page 33: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

512

Page 34: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

256 256

128 128 128 128

512

Page 35: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

256 256

128 128 128 128

512

Page 36: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Page 37: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Proportional mapping VS postorder traversal (2/2)Elimination tree :

d=0

d=1

d=2

d=3

d=4

TraversalF Postorder traversal, node by node;F All processors on each node.

Advantages and drawbacks/ Only fine-grain parallelism;, High memory efficiency.

Page 38: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Proportional mapping VS postorder traversal (2/2)Postorder traversal :

d=0

d=1

d=2

d=3

d=4

Page 39: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

Page 40: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

Page 41: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

Page 42: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

Page 43: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

Page 44: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Page 45: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Page 46: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

512

512 512512

512

Page 47: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

A new memory-aware algorithm

Outline

1. MUMPS

5. Conclusion

Page 48: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Memory-aware mapping algorithmElimination tree :

d=0

d=1

d=2

d=3

d=4

MappingF Initially: all processors on root node;F Recursively split the set of processors on child subtrees if

memory allows for it.

Page 49: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Memory-aware mapping algorithmMemory-aware mapping:

d=0

d=1

d=2

d=3

d=4

Page 50: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

512

Page 51: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

256 256

512

Page 52: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

d=0

d=1

d=2

d=3

d=4

256 256

512

Advantages

, Robust: guaranteed (if memory M0 <Sseq

p )., Efficient: available memory provides coarse-grain parallelism.

Page 53: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Preliminary results

Outline

1. MUMPS

5. Conclusion

Page 54: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Preliminary results

Preliminary resultsF Excellent memory scalability:

I memory efficiency closed to 1.F Competitive (time) efficiency

I closed to proportional mapping (if enough memory);I memory provides coarse-grain parallelism:

0

2

4

6

8

10

12

14

16

18

20

0 5 10 15 20 25 30

avg

num

ber

of p

roce

ssor

s pe

r no

de (

norm

aliz

ed)

Distance to root node (depth)

Proportional mappingM0=1/32*sequential peakM0=2/32*sequential peakM0=5/32*sequential peakM0=8/32*sequential peak

Page 55: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Conclusion

Outline

1. MUMPS

5. Conclusion

Page 56: Memory-Aware Scheduling for Sparse Direct Methodseagullo/thesis/siam_cse_2009_sparse-mem… · Solving sparse linear systems Ax = b)Direct methods: A = LU Typical matrix: BRGM matrix

Conclusion

Prototype of a memory-aware algorithmF Maximizes the amount of coarse-grain parallelism with respect

to the amount of memory available per processor/core.F New static mapping implemented, with constraints on dynamic

schedulers; experimented within the OOC version of MUMPS.F Very good memory scalability obtained.

On-going workF Further tuning and validation.F Generalization to the in-core case.F Reinject dynamic information to schedulers.

memory-aware scheduling for sparse direct methodseagullo/thesis/siam_cse_2009_sparse-mem… ·...

Documents