sparse linear algebra: direct methods
TRANSCRIPT
![Page 1: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/1.jpg)
Sparse Linear Algebra: Direct Methods
P. Amestoy and A. ButtariINPT(ENSEEIHT)
A. Guermouche (Univ. Bordeaux-LaBRI),J.-Y. L’Excellent and Bora Ucar (INRIA-CNRS/LIP-ENS Lyon) and
F.-H. Rouet (Lawrence Berkeley National Laboratory)
2012-2013
1 / 217
![Page 2: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/2.jpg)
Outline
Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationConclusion
2 / 217
![Page 3: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/3.jpg)
A selection of references
I Books
I Duff, Erisman and Reid, Direct methods for Sparse Matrices,Clarendon Press, Oxford 1986.
I Dongarra, Duff, Sorensen and van der Vorst, Solving LinearSystems on Vector and Shared Memory Computers, SIAM, 1991.
I Davis, Direct methods for sparse linear systems, SIAM, 2006.I George, Liu, and Ng, Computer Solution of Sparse Positive
Definite Systems, book to appear
I Articles
I Gilbert and Liu, Elimination structures for unsymmetric sparse LUfactors, SIMAX, 1993.
I Liu, The role of elimination trees in sparse factorization, SIMAX,1990.
I Heath, Ng and Peyton, Parallel Algorithms for Sparse LinearSystems, SIAM review 1991.
3 / 217
![Page 4: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/4.jpg)
Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationConclusion
4 / 217
![Page 5: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/5.jpg)
Motivations
I solution of linear systems of equations → key algorithmic kernel
Continuous problem↓
Discretization↓
Solution of a linear system Ax = b
I Main parameters:I Numerical properties of the linear system (symmetry, pos.
definite, conditioning, . . . )I Size and structure:
I Large (order > 107 to 108), square/rectangularI Dense or sparse (structured / unstructured)
I Target computer (sequential/parallel/multicore)
→ Algorithmic choices are critical
5 / 217
![Page 6: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/6.jpg)
Motivations for designing efficient algorithms
I Time-critical applications
I Solve larger problems
I Decrease elapsed time (code optimization, parallelism)
I Minimize cost of computations (time, memory)
6 / 217
![Page 7: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/7.jpg)
Difficulties
I Access to data:I Computer: complex memory hierarchy (registers, multilevel
cache, main memory (shared or distributed), disk)I Sparse matrix: large irregular dynamic data structures.
→ Exploit the locality of references to data on the computer(design algorithms providing such locality)
I Efficiency (time and memory)I Number of operations and memory depend very much on the
algorithm used and on the numerical and structural properties ofthe problem.
I The algorithm depends on the target computer (vector, scalar,shared, distributed, clusters of Symmetric Multi-Processors(SMP), multicore).
→ Algorithmic choices are critical
7 / 217
![Page 8: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/8.jpg)
Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationConclusion
8 / 217
![Page 9: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/9.jpg)
Sparse matrices
Example:
3 x1 + 2 x2 = 52 x2 - 5 x3 = 1
2 x1 + 3 x3 = 0
can be represented as
Ax = b,
where A =
3 2 00 2 −52 0 3
, x =
x1x2x3
, and b =
510
Sparse matrix: only nonzeros are stored.
9 / 217
![Page 10: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/10.jpg)
Sparse matrix ?
0 100 200 300 400 500
0
100
200
300
400
500
nz = 5104
Matrix dwt 592.rua (N=592, NZ=5104);Structural analysis of a submarine
10 / 217
![Page 11: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/11.jpg)
Sparse matrix ?
Matrix from Computational Fluid Dynamics;(collaboration Univ. Tel Aviv)
0 1000 2000 3000 4000 5000 6000 7000
0
1000
2000
3000
4000
5000
6000
7000
nz = 43105
“Saddle-point” problem
11 / 217
![Page 12: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/12.jpg)
Preprocessing sparse matrices
Original (A =lhr01) Preprocessed matrix (A′(lhr01))
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
nz = 184270 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
nz = 18427
Modified Problem:A′x ′ = b′ with A′ = PnPDrADcQPtQn
12 / 217
![Page 13: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/13.jpg)
Factorization process
Solution of Ax = b
I A is unsymmetric :I A is factorized as: A = LU, where
L is a lower triangular matrix, andU is an upper triangular matrix.
I Forward-backward substitution: Ly = b then Ux = y
I A is symmetric:I positive definite A = LLT
I general A = LDLT
I A is rectangular m × n with m ≥ n and minx ‖Ax− b‖2 :I A = QR where Q is orthogonal (Q−1 = QT) and R is triangular.I Solve: y = QTb then Rx = y
13 / 217
![Page 14: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/14.jpg)
Factorization process
Solution of Ax = b
I A is unsymmetric :I A is factorized as: A = LU, where
L is a lower triangular matrix, andU is an upper triangular matrix.
I Forward-backward substitution: Ly = b then Ux = y
I A is symmetric:I positive definite A = LLT
I general A = LDLT
I A is rectangular m × n with m ≥ n and minx ‖Ax− b‖2 :I A = QR where Q is orthogonal (Q−1 = QT) and R is triangular.I Solve: y = QTb then Rx = y
14 / 217
![Page 15: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/15.jpg)
Factorization process
Solution of Ax = b
I A is unsymmetric :I A is factorized as: A = LU, where
L is a lower triangular matrix, andU is an upper triangular matrix.
I Forward-backward substitution: Ly = b then Ux = y
I A is symmetric:I positive definite A = LLT
I general A = LDLT
I A is rectangular m × n with m ≥ n and minx ‖Ax− b‖2 :I A = QR where Q is orthogonal (Q−1 = QT) and R is triangular.I Solve: y = QTb then Rx = y
15 / 217
![Page 16: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/16.jpg)
Difficulties
I Only non-zero values are stored
I Factors L and U have far more nonzeros than A
I Data structures are complex
I Computations are only a small portion of the code (the rest isdata manipulation)
I Memory size is a limiting factor ( → out-of-core solvers )
16 / 217
![Page 17: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/17.jpg)
Key numbers:
1- Small sizes : 500 MB matrix;Factors = 5 GB; Flops = 100 Gflops ;
2- Example of 2D problem: Lab. Geosiences Azur, ValbonneI Complex 2D finite difference matrix n=16× 106 , 150× 106
nonzerosI Storage (single prec): 2 GB (12 GB with the factors)I Flops: 10 TeraFlops
3- Example of 3D problem: EDF (Code Aster, structuralengineering)
I Real matrix finite elements n = 106 , nz = 71× 106 nonzerosI Storage: 3.5× 109 entries (28 GB) for factors, 35 GB totalI Flops: 2.1× 1013
17 / 217
![Page 18: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/18.jpg)
Example in structural mechanics:
Car body,227,362 unknowns,5,757,996 nonzeros,MSC.Software
Size of factors:51.1 million entriesNumber of operations:44.9× 109
18 / 217
![Page 19: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/19.jpg)
Example in structural mechanics:
BMW crankshaft,148,770 unknowns,5,396,386 nonzeros,MSC.Software
Size of factors:97.2 million entriesNumber of operations:127.9× 109
19 / 217
![Page 20: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/20.jpg)
Data structure for sparse matrices
I Storage scheme depends on the pattern of the matrix and on thetype of access required
I band or variable-band matricesI “block bordered” or block tridiagonal matricesI general matrixI row, column or diagonal access
20 / 217
![Page 21: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/21.jpg)
Data formats for a general sparse matrix A
What needs to be represented
I Assembled matrices: MxN matrix A with NNZ nonzeros.
I Elemental matrices (unassembled): MxN matrix A with NELTelements.
I Arithmetic: Real (4 or 8 bytes) or complex (8 or 16 bytes)
I Symmetric (or Hermitian)→ store only part of the data.
I Distributed format ?
I Duplicate entries and/or out-of-range values ?
21 / 217
![Page 22: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/22.jpg)
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
a31
a23a22
a11
a33
1 2 3
1
2
3
I Coordinate formatIRN [1 : NNZ ] = 1 3 2 2 3JCN [1 : NNZ ] = 1 1 2 3 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33
I Compressed Sparse Column (CSC) formatIRN [1 : NNZ ] = 1 3 2 2 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33COLPTR [1 : N + 1] = 1 3 4 6
column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1
I Compressed Sparse Row (CSR) format:Similar to CSC, but row by row
22 / 217
![Page 23: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/23.jpg)
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
a31
a23a22
a11
a33
1 2 3
1
2
3
I Coordinate formatIRN [1 : NNZ ] = 1 3 2 2 3JCN [1 : NNZ ] = 1 1 2 3 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33
I Compressed Sparse Column (CSC) formatIRN [1 : NNZ ] = 1 3 2 2 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33COLPTR [1 : N + 1] = 1 3 4 6
column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1
I Compressed Sparse Row (CSR) format:Similar to CSC, but row by row
23 / 217
![Page 24: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/24.jpg)
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
a31
a23a22
a11
a33
1 2 3
1
2
3
I Coordinate formatIRN [1 : NNZ ] = 1 3 2 2 3JCN [1 : NNZ ] = 1 1 2 3 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33
I Compressed Sparse Column (CSC) formatIRN [1 : NNZ ] = 1 3 2 2 3VAL [1 : NNZ ] = a11 a31 a22 a23 a33COLPTR [1 : N + 1] = 1 3 4 6
column J is stored in IRN/A locations COLPTR(J)...COLPTR(J+1)-1
I Compressed Sparse Row (CSR) format:Similar to CSC, but row by row
24 / 217
![Page 25: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/25.jpg)
Classical Data Formats for Assembled Matrices
I Example of a 3x3 matrix with NNZ=5 nonzeros
a31
a23a22
a11
a33
1 2 3
1
2
3
I Diagonal format (M=N):NDIAG = 3IDIAG = −2 0 1
VAL =
na a11 0na a22 a23a31 a33 na
(na: not accessed)
VAL(i,j) corresponds to A(i,i+IDIAG(j)) (for 1 ≤ i + IDIAG(j) ≤N)
25 / 217
![Page 26: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/26.jpg)
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:
Y( 1 :M) = 0DO k=1,NNZ
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO
I CSC format:
Y( 1 :M) = 0DO J=1,N
Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k)∗XjENDDO
ENDDO
I CSR format
DO I =1,MYi=0DO k=ROWPTR( I ) ,ROWPTR( I+1)−1
Yi = Yi + VAL( k)∗X(JCN( k ) )ENDDOY( I )=Yi
ENDDO
26 / 217
![Page 27: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/27.jpg)
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:Y( 1 :M) = 0DO k=1,NNZ
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO
I CSC format:
Y( 1 :M) = 0DO J=1,N
Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k)∗XjENDDO
ENDDO
I CSR format
DO I =1,MYi=0DO k=ROWPTR( I ) ,ROWPTR( I+1)−1
Yi = Yi + VAL( k)∗X(JCN( k ) )ENDDOY( I )=Yi
ENDDO
27 / 217
![Page 28: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/28.jpg)
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:Y( 1 :M) = 0DO k=1,NNZ
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO
I CSC format:Y( 1 :M) = 0DO J=1,N
Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k)∗XjENDDO
ENDDO
I CSR format
DO I =1,MYi=0DO k=ROWPTR( I ) ,ROWPTR( I+1)−1
Yi = Yi + VAL( k)∗X(JCN( k ) )ENDDOY( I )=Yi
ENDDO
28 / 217
![Page 29: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/29.jpg)
Sparse Matrix-vector products Y ← AX
Algorithm depends on sparse matrix format:
I Coordinate format:Y( 1 :M) = 0DO k=1,NNZ
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k ) ∗ X(JCN( k ) )ENDDO
I CSC format:Y( 1 :M) = 0DO J=1,N
Xj=X( J )DO k=COLPTR( J ) ,COLPTR( J+1)−1
Y( IRN( k ) ) = Y( IRN( k ) ) + VAL( k)∗XjENDDO
ENDDO
I CSR formatDO I =1,M
Yi=0DO k=ROWPTR( I ) ,ROWPTR( I+1)−1
Yi = Yi + VAL( k)∗X(JCN( k ) )ENDDOY( I )=Yi
ENDDO
29 / 217
![Page 30: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/30.jpg)
Example of elemental matrix format
A1 =123
−1 2 32 1 11 1 1
, A2 =345
2 −1 31 2 −13 2 1
A =
−1 2 3 0 0
2 1 1 0 01 1 3 −1 30 0 1 2 −10 0 3 2 1
= A1 + A2
30 / 217
![Page 31: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/31.jpg)
Example of elemental matrix format
A1 =123
−1 2 32 1 11 1 1
, A2 =345
2 −1 31 2 −13 2 1
I N=5 NELT=2 NVAR=6 A =
∑NELTi=1 Ai
IELTPTR [1:NELT+1] = 1 4 7ELTVAR [1:NVAR] = 1 2 3 3 4 5ELTVAL [1:NVAL] = -1 2 1 2 1 1 3 1 1 2 1 3 -1 2 2 3 -1 1
I Remarks:
I NVAR = ELTPTR(NELT+1)-1I Order of element i : Si = ELTPTR(i + 1)− ELTPTR(i)I NVAL =
∑S2i (unsym) ou
∑Si (Si + 1)/2 (sym),
I storage of elements in ELTVAL: by columns
31 / 217
![Page 32: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/32.jpg)
File storage: Rutherford-Boeing
I Standard ASCII format for filesI Header + Data (CSC format). key xyz:
I x=[rcp] (real, complex, pattern)I y=[suhzr] (sym., uns., herm., skew sym. (A = −AT ), rectang.)I z=[ae] (assembled, elemental)I ex: M T1.RSA, SHIP003.RSE
I Supplementary files: right-hand-sides, solution, permutations. . .
I Canonical format introduced to guarantee a uniquerepresentation (order of entries in each column, no duplicates).
32 / 217
![Page 33: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/33.jpg)
File storage: Rutherford-Boeing
DNV-Ex 1 : Tubular joint-1999-01-17 M_T1
1733710 9758 492558 1231394 0
rsa 97578 97578 4925574 0
(10I8) (10I8) (3e26.16)
1 49 96 142 187 231 274 346 417 487
556 624 691 763 834 904 973 1041 1108 1180
1251 1321 1390 1458 1525 1573 1620 1666 1711 1755
1798 1870 1941 2011 2080 2148 2215 2287 2358 2428
2497 2565 2632 2704 2775 2845 2914 2982 3049 3115
...
1 2 3 4 5 6 7 8 9 10
11 12 49 50 51 52 53 54 55 56
57 58 59 60 67 68 69 70 71 72
223 224 225 226 227 228 229 230 231 232
233 234 433 434 435 436 437 438 2 3
4 5 6 7 8 9 10 11 12 49
50 51 52 53 54 55 56 57 58 59
...
-0.2624989288237320E+10 0.6622960540857440E+09 0.2362753266740760E+11
0.3372081648690030E+08 -0.4851430162799610E+08 0.1573652896140010E+08
0.1704332388419270E+10 -0.7300763190874110E+09 -0.7113520995891850E+10
0.1813048723097540E+08 0.2955124446119170E+07 -0.2606931100955540E+07
0.1606040913919180E+07 -0.2377860366909130E+08 -0.1105180386670390E+09
0.1610636280324100E+08 0.4230082475435230E+07 -0.1951280618776270E+07
0.4498200951891750E+08 0.2066239484615530E+09 0.3792237438608430E+08
0.9819999042370710E+08 0.3881169368090200E+08 -0.4624480572242580E+08
33 / 217
![Page 34: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/34.jpg)
File storage: Matrix-market
I Example
%%MatrixMarket matrix coordinate real general
% Comments
5 5 8
1 1 1.000e+00
2 2 1.050e+01
3 3 1.500e-02
1 4 6.000e+00
4 2 2.505e+02
4 4 -2.800e+02
4 5 3.332e+01
5 5 1.200e+01
34 / 217
![Page 35: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/35.jpg)
Examples of sparse matrix collections
I The University of Florida Sparse Matrix Collectionhttp://www.cise.ufl.edu/research/sparse/matrices/
I Matrix market http://math.nist.gov/MatrixMarket/
I Rutherford-Boeinghttp://www.cerfacs.fr/algor/Softs/RB/index.html
I TLSE http://gridtlse.org/
35 / 217
![Page 36: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/36.jpg)
Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationConclusion
36 / 217
![Page 37: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/37.jpg)
Gaussian elimination
A = A(1), b = b(1), A(1)x = b(1): a11 a12 a13a21 a22 a23a31 a32 a33
x1x2x3
=
b1b2b3
2← 2− 1× a21/a113← 3− 1× a31/a11
A(2)x = b(2) a11 a12 a13
0 a(2)22 a
(2)23
0 a(2)32 a
(2)33
x1
x2x3
=
b1
b(2)2
b(2)3
b(2)2 = b2 − a21b1/a11 . . .
a(2)32 = a32 − a31a12/a11 . . .
Finally A(3)x = b(3) a11 a12 a13
0 a(2)22 a
(2)23
0 0 a(3)33
x1
x2x3
=
b1
b(2)2
b(3)3
a(3)(33)
= a(2)(33)− a
(2)32 a
(2)23 /a
(2)22 . . .
Typical Gaussian elimination step k : a(k+1)ij = a
(k)ij −
a(k)ik a
(k)kj
a(k)kk
37 / 217
![Page 38: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/38.jpg)
Relation with A = LU factorization
I One step of Gaussian elimination can be written:A(k+1) = L(k)A(k) , with
L(k) =
1.
.1
−lk+1,k .. .−ln,k 1
and lik =
a(k)ik
a(k)kk
.
I Then, A(n) = U = L(n−1) . . .L(1)A, which gives A = LU ,
with L = [L(1)]−1 . . . [L(n−1)]−1 =
1 0
..
.li,j 1
,
I In dense codes, entries of L and U overwrite entries of A.
I Furthermore, if A is symmetric, A = LDLT with dkk = a(k)kk :
A = LU = At = U tLt implies (U)(Lt)−1 = L−1U t = D diagonal and
U = DLt , thus A = L(DLt) = LDLt
![Page 39: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/39.jpg)
Dense LU factorization
I Step by step columns of A are set to zero and A is updatedL(n−1) . . .L(1)A = U leading toA = LU where L = [L(1)]−1 . . . [L(n−1)]−1
I - zero entries in column of A can be replaced by entries in L- row entries of U can be stored in corresponding locations of A
Algorithm 1 Dense LU factorization1: for k = 1 to n do2: L(k : k) = 1 ; L(k + 1 : n, k) = A(k+1:n,k)
A(k,k)
3: U(k, k : n) = A(k, k : n)4: for j = k + 1 to n do5: for i = k + 1 to n do6: A(i , j) = A(i , j)− L(i , k)× U(k, j)7: end for8: end for9: end for
When |A(k, k)| is relatively too small, numerical pivoting required39 / 217
![Page 40: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/40.jpg)
Gaussian elimination and sparsity
Step k of LU factorization (akk pivot):
I For i > k compute lik = aik/akk (= a′ik),
I For i > k , j > k update remaining rows/cols in matrix
a′ij = aij −aik × akj
akk= aij − lik × akj
I If aik 6= 0 and akj 6= 0 then a′ij 6= 0
I If aij was zero → its non-zero value must be stored k j
k
i
x
x
x
x
k j
k
i
x
x
x
0
fill-in
40 / 217
![Page 41: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/41.jpg)
Gaussian elimination and sparsity
I Idem for Cholesky :
I For i > k compute lik = aik/√
akk (= a′ik),
I For i > k , j > k , j ≤ i (lower triang.)
a′ij = aij −aik × ajk√
akk= aij − lik × ajk
41 / 217
![Page 42: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/42.jpg)
Gaussian elimination and sparsity
I Interest of permuting a matrix:X X X X XX X 0 0 0X 0 X 0 0X 0 0 X 0X 0 0 0 X
1↔ 5
X 0 0 0 X0 X 0 0 X0 0 X 0 X0 0 0 X XX X X X X
I Ordering the variables has a strong impact on
I fill-inI number of operationsI shape of the dependency graph (tree) and parallelism
I Fill reduction is NP-hard in general [Yannakakis 81]
42 / 217
![Page 43: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/43.jpg)
Illustration: Reverse Cuthill-McKee on matrix dwt 592.rua
Harwell-Boeing matrix: dwt 592.rua, structural computing on asubmarine. NZ(LU factors)=58202
Original matrix Factorized matrix
0 100 200 300 400 500
0
100
200
300
400
500
nz = 51040 100 200 300 400 500
0
100
200
300
400
500
nz = 58202
43 / 217
![Page 44: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/44.jpg)
Illustration: Reverse Cuthill-McKee on matrix dwt 592.rua
NZ(LU factors)=16924
Permuted matrix Factorized permuted matrix(RCM)
0 100 200 300 400 500
0
100
200
300
400
500
nz = 51040 100 200 300 400 500
0
100
200
300
400
500
nz = 16924
44 / 217
![Page 45: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/45.jpg)
Table: Benefits of Sparsity on a matrix of order 2021 × 2021 with 7353nonzeros. (Dongarra etal 91) .
Procedure Total storage Flops Time (sec.)on CRAY J90
Full Syst. 4084 Kwords 5503 ×106 34.5Sparse Syst. 71 Kwords 1073×106 3.4Sparse Syst. and reordering 14 Kwords 42×103 0.9
45 / 217
![Page 46: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/46.jpg)
Sparse LU factorization : a (too) simple algorithm
Only non-zeros are stored and operated on
Algorithm 2 Simple sparse LU factorization1: Permute matrix A to reduce fill-in and flops (NP complete problem)2: for k = 1 to n do3: L(k : k) = 1 ; For nonzeros in column k: L(k + 1 : n, k) = A(k+1:n,k)
A(k,k)
4: U(k, k : n) = A(k, k : n)5: for j = k + 1 to n limited to nonzeros in row U(k, :) do6: for i = k + 1 to n limited to nonzeros in col. L(:, k) do7: A(i , j) = A(i , j)− L(i , k)× U(k, j)8: end for9: end for
10: end for
Questions
Dynamic data structure for A to accommodate fill-inData access efficiency; Can we predict position of fill-in ?|A(k , k)| too small −→ numerical permutation needed !!!
46 / 217
![Page 47: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/47.jpg)
Control of numerical stability: numerical pivoting
I In dense linear algebra partial pivoting commonly used (at eachstep the largest entry in the column is selected).
I In sparse linear algebra, flexibility to preserve sparsity is offered :I Partial threshold pivoting : Eligible pivots are not too small with
respect to the maximum in the column.
Set of eligible pivots = {r | |a(k)rk | ≥ u ×maxi |a(k)ik |}, where0 < u ≤ 1.
I Then among eligible pivots select one preserving better sparsity.I u is called the threshold parameter (u = 1 → partial pivoting).I It restricts the maximum possible growth of: aij = aij − aik×akj
akkto
1 + 1u which is sufficient to the preserve numerical stability.
I u ≈ 0.1 is often chosen in practice.
I For symmetric indefinite problems 2× 2 pivots (with threshold)is also used to preserve symmetry and sparsity.
![Page 48: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/48.jpg)
Threshold pivoting and numerical accuracy
Table: Effect of variation in threshold parameter u on matrix 541× 541with 4285 nonzeros (Dongarra etal 91) .
u Nonzeros in LU factors Error
1.0 16767 3 ×10−9
0.25 14249 6 ×10−10
0.1 13660 4 ×10−9
0.01 15045 1 ×10−5
10−4 16198 1 ×102
10−10 16553 3 ×1023
48 / 217
![Page 49: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/49.jpg)
Threshold pivoting and numerical accuracy
Table: Effect of variation in threshold parameter u on matrix 541× 541with 4285 nonzeros (Dongarra etal 91) .
u Nonzeros in LU factors Error
1.0 16767 3 ×10−9
0.25 14249 6 ×10−10
0.1 13660 4 ×10−9
0.01 15045 1 ×10−5
10−4 16198 1 ×102
10−10 16553 3 ×1023
Difficulty: numerical pivoting implies dynamic datastructures thatcan not be forecasted symbolically
49 / 217
![Page 50: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/50.jpg)
Three-phase scheme to solve Ax = b
1. Analysis stepI Preprocessing of A (symmetric/unsymmetric orderings, scalings)I Build the dependency graph (elimination tree, eDAG . . . )I Apre = P Dr A Qc Dc PT ,
2. Factorization (Apre = LU, LDLT, LLT, QR)
Numerical pivoting
3. Solution based on factored matricesI triangular solves: Ly = bpre , then Uxpre = yI improvement of solution (iterative refinement), error analysis
50 / 217
![Page 51: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/51.jpg)
Efficient implementation of sparse algorithms
I Indirect addressing is often used in sparse calculations: e.g.sparse SAXPY
do i = 1, m
A( ind(i) ) = A( ind(i) ) + alpha * w( i )
enddo
I Even if manufacturers provide hardware for improving indirectaddressing
I It penalizes the performance
I Identify dense blocks or switch to dense calculations as soon asthe matrix is not sparse enough
51 / 217
![Page 52: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/52.jpg)
Effect of switch to dense calculations
Matrix from 5-point discretization of the Laplacian on a 50× 50 grid(Dongarra etal 91)
Density for Order of Millions Timeswitch to full code full submatrix of flops (seconds)
No switch 0 7 21.81.00 74 7 21.40.80 190 8 15.00.60 235 11 12.50.40 305 21 9.00.20 422 50 5.50.10 531 100 3.70.005 1420 1908 6.1
Sparse structure should be exploited if density < 10%.
52 / 217
![Page 53: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/53.jpg)
Introduction to Sparse Matrix ComputationsMotivation and main issuesSparse matricesGaussian eliminationConclusion
53 / 217
![Page 54: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/54.jpg)
Complexity of sparse direct methods
Regular problems 2D 3D(nested dissections) N × N grid N × N × N gridNonzeros in original matrix Θ(N2) Θ(N3)Nonzeros in factors Θ(N2log n) Θ(N4)Floating-point ops Θ(N3) Θ(N6)
3D example in earth science:acoustic wave propagation,27-point finite difference grid
Current goal [Operto and Virieux,
http://seiscope.oca.eu/]:LU on complete earth
Extrapolation on a 1000× 1000× 1000 grid:15 exaflops, 200 TBytes for factors, 32 TBytes of working memory!
54 / 217
![Page 55: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/55.jpg)
Summary – sparse matrices main issues
I Widely used in engineering and industry (critical for performanceof simulations)
I Irregular dynamic data structures of very large size(Tera/PetaBytes)
I Strong relations between sparse matrices and graphs
I Which graph for which matrix (symmetric, unsymmetric,triangular)?
I Efficient algorithms needed for:
I Which graph for which algorithm ?I Reordering sparse matricesI Predict data structures of factor matricesI Model factorization and accomodate numerical issues
55 / 217
![Page 56: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/56.jpg)
Outline
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
56 / 217
![Page 57: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/57.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
57 / 217
![Page 58: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/58.jpg)
Graph notations and definitions
A graph G = (V ,E ) consists of a finite set V , called the vertex setand a finite, binary relation E on V , called the edge set.
Three standard graph models
Undirected graph: The edges are unordered pair of vertices, i.e.,{u, v} ∈ E for some u, v ∈ V .
Directed graph: The edges are ordered pair of vertices, that is, (u, v)and (v , u) are two different edges.
Bipartite graph: G = (U ∪ V ,E ) consists of two disjoint vertex setsU and V such that for each edge (u, v) ∈ E , u ∈ U and v ∈ V .
An ordering or labelling of G = (V ,E ) having n vertices, i.e.,|V | = n, is a mapping of V onto 1, 2, . . . , n.
58 / 217
![Page 59: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/59.jpg)
Matrices and graphs: Rectangular matrices
The rows/columns and nonzeros of a given sparse matrix correspond (withnatural labelling) to the vertices and edges, respectively, of a graph.
Rectangular matrices
A =
1 2 3 4
1 × ×2 × ×3 × ×
Bipartite graph
1
r2
r3
1c
2c
3c
4c
r
The set of rows corresponds to one of the vertex set R, the set of columns
corresponds to the other vertex set C such that for each aij 6= 0, (ri , cj) is
an edge.59 / 217
![Page 60: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/60.jpg)
Matrices and graphs: Square unsymmetric pattern
The rows/columns and nonzeros of a given sparse matrix correspond (withnatural labelling) to the vertices and edges, respectively, of a graph.
Squareunsymmetricpattern matrices
A =
1 2 3
1 × ×2 × ×3 ×
Graph models
I Bipartite graph as before.
I Directed graph
v2 3v
1v
The set of rows/cols corresponds the vertex set V such that for each
aij 6= 0, (vi , vj) is an edge. Transposed view possible too, i.e., the edge
(vi , vj) directed from column i to row j . Usually self-loops are omitted. 60 / 217
![Page 61: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/61.jpg)
Matrices and graphs: Square unsymmetric pattern
A special subclass
Directed acyclic graphs (DAG):A directed graphs with no loops(maybe except for self-loops).
DAGs
We can sort the vertices suchthat if (u, v) is an edge, then uappears before v in the ordering.
Question: What kind of matrices have a DAG structure ?
61 / 217
![Page 62: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/62.jpg)
Matrices and graphs: Symmetric pattern
The rows/columns and nonzeros of a given sparse matrix correspond (withnatural labelling) to the vertices and edges, respectively, of a graph.
Square symmetricpattern matrices
A =
1 2 3
1 ×2 × × ×3 × ×
Graph models
I Bipartite and directed graphs as before.
I Undirected graph
v2 3v
1v
The set of rows/cols corresponds the vertex set V such that for each
aij , aji 6= 0, {vi , vj} is an edge. No self-loops; usually the main diagonal is
assumed to be zero-free. 62 / 217
![Page 63: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/63.jpg)
Definitions: Edges, degrees, and paths
Many definitions for directed and undirected graphs are the same.We will use (u, v) to refer to an edge of an undirected or directedgraph to avoid repeated definitions.
I An edge (u, v) is said to incident on the vertices u and v .
I For any vertex u, the set of vertices in adj(u) = {v : (u, v) ∈ E}are called the neighbors of u. The vertices in adj(u) are said tobe adjacent to u.
I The degree of a vertex is the number of edges incident on it.
I A path p of length k is a sequence of vertices 〈v0, v1, . . . , vk〉where (vi−1, vi ) ∈ E for i = 1, . . . , k. The two end points v0 andvk are said to be connected by the path p, and the vertex vk issaid to be reachable from v0.
63 / 217
![Page 64: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/64.jpg)
Definitions: Components
I An undirected graph is said to be connected if every pair ofvertices is connected by a path.
I The connected components of an undirected graph are theequivalence classes of vertices under the “is reachable” fromrelation.
I A directed graph is said to be strongly connected if every pair ofvertices are reachable from each other.
I The strongly connected components of a directed graph are theequivalence classes of vertices under the “are mutuallyreachable” relation.
64 / 217
![Page 65: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/65.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
65 / 217
![Page 66: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/66.jpg)
Definitions: Trees and spanning trees
A tree is a connected, acyclic, undirected graph. If an undirectedgraph is acyclic but disconnected, then it is a forest.
Properties of trees
I Any two vertices are connected by a unique path.
I |E | = |V | − 1
A rooted tree is a tree with a distinguished vertex r , called the root.
There is a unique path from the root r to every other vertex v . Anyvertex y in that path is called an ancestor of v . If y is an ancestor ofv , then v is a descendant of y .
The subtree rooted at v is the tree induced by the descendants of v ,rooted at v .
A spanning tree of a connected graph G = (V ,E ) is a treeT = (V ,F ), such that F ⊆ E .
66 / 217
![Page 67: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/67.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
67 / 217
![Page 68: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/68.jpg)
Ordering of the vertices of a rooted tree
I A topological ordering of a rooted tree is an ordering thatnumbers children vertices before their parent.
I A postorder is a topological ordering which numbers the verticesin any subtree consecutively.
u w
x
y
z
v
with topological ordering
1 3
2
4
5
6
Rooted spanning tree
w
yz
xu
v
Connected graph G
u w
x
y
z
v
1
6
54
3
2
Rooted spanning treewith postordering
68 / 217
![Page 69: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/69.jpg)
How to explore a graph:
Two algorithms such that each edge is traversed exactly once in theforward and reverse direction and each vertex is visited. (Connectedgraphs are considered in the description of the algorithms.)
I Depth first search : Starting from a given vertex (mark it)follow a path (and mark each vertex in the path) until it isadjacent to only marked vertices. Return to previous vertex andcontinue.
I Breadth first search Select a vertex and put it into an initiallyempty queue of vertices to be visited. Repeatedly remove thevertex x at the head of the queue and place all neighbors of xthat were not enqueue before into the queue.
69 / 217
![Page 70: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/70.jpg)
Illustration of DFS and BFS exploration
z
xu
vts w
v
z t
sy x
w
u
Breadth−first spanning tree (v)1
32 4
5 6 7
8
v
z
y
x
t
s
w u
1
2
3
4
7
8
5 6
Depth−first spanning tree (v)
y
70 / 217
![Page 71: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/71.jpg)
Properties of DFS and BFS exploration
I If the graph G is not connected the algorithm is restarted and infact it detects the connected components of G .
I Each vertex is visited once.
I The visited edges of the DFS algorithm is a spanning forest of ageneral graph G so-called the depth first spanning forest of G .
I The visited edges of the BFS algorithm is a spanning forest of ageneral graph G so-called the breadth first spanning forest ofG .
71 / 217
![Page 72: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/72.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
72 / 217
![Page 73: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/73.jpg)
Graph shapes and peripheral vertices
I The distance between two vertices of a graph is the size of theshortest path joining those vertices.
I The eccentricity l(v) = maxd(v ,w)|w ∈ V
I The diameter δ(G ) = maxl(v)|v ∈ V
I v is a peripheral vertex if l(v) = δ(G )
I Example of connected graph with δ(G ) = 5 and peripheralvertices s,w , x .
yz
xu
wvts
73 / 217
![Page 74: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/74.jpg)
Pseudo-peripheral vertices
I The rooted level structure L(v) of G is a partitioning of GL(v) = Lo(v), . . . , Ll(v)(v) such that Lo(v) = vLi (v) = Adj(Li−1(v)) \ Li−2(v)
I The eccentricity l(v) is also-called the length of L(v). Thewidth w(v) of L(v) is defined as w(v) = max |Li (v)||0 ≤ l(v)
I The aspect ratio of L(v) is defined as l(v)/w(v).
I A pseudo-peripheral is a node with a large eccentricity.
I Algorithm to determine a pseudo-peripheral :Let i = 0, ri be an arbitrary node, and nblevels = l(ri ). Thealgorithm iteratively updates nblevels by selecting the vertex ri+1
of minimum degree of Ll(ri ) until l(ri+1) > nblevels.
74 / 217
![Page 75: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/75.jpg)
Pseudo-peripheral vertices
yz
xu
wvts
w
z x
y
uv
t
s
2nd step: select W in L3 (min. degree)
L0(w)
L1
L2
L3
L4
L5
v
z
s
x
y
w
u
t
L0(v)
L1
L2
First step, select v
Level structure(v)
L3
75 / 217
![Page 76: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/76.jpg)
Permutation matrices
A permutation matrix is a square (0, 1)-matrix where each row andcolumn has a single 1.
If P is a permutation matrix, PPT = I , i.e., it is an orthogonalmatrix. Let,
A =
1 2 3
1 × ×2 × ×3 ×
and suppose we want to permute columns as [2, 1, 3]. Definep2,1 = 1, p1,2 = 1, p3,3 = 1, and B = AP (if column j to be atposition i , set pji = 1)
B =
2 1 3
1 × ×2 × ×3 ×
=
1 2 3
1 × ×2 × ×3 ×
1 2 3
1 12 13 1
76 / 217
![Page 77: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/77.jpg)
Symmetric matrices and graphs
I Remarks:I Number of nonzeros in column j = |adjG (j)|I Symmetric permutation ≡ renumbering the graph
3
4
25
11
2
3
4
5
1 2 3 4 5
Symmetric matrix Corresponding graph
77 / 217
![Page 78: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/78.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
78 / 217
![Page 79: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/79.jpg)
Matching in bipartite graphs and permutations
A matching in a graph is a set of edges no two of which share acommon vertex. We will be mostly dealing with matchings inbipartite graphs.
In matrix terms, a matching in the bipartite graph of a matrixcorresponds to a set of nonzero entries no two of which are in thesame row or column.
A vertex is said to be matched if there is an edge in the matchingincident on the vertex, and to be unmatched otherwise. In a perfectmatching, all vertices are matched.
The cardinality of a matching is the number of edges in it. Amaximum cardinality matching or a maximum matching is amatching of maximum cardinality. Solvable in polynomial time.
79 / 217
![Page 80: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/80.jpg)
Matching in bipartite graphs and permutations
Given a square matrix whose bipartite graph has a perfect matching,such a matching can be used to permute the matrix such that thematching entries are along the main diagonal.
1
r2
r3 3c
2c
1cr 1 2 3
1 × ×2 × ×3 ×
2 1 3
1 × ×2 × ×3 ×
=
1 2 3
1 × ×2 × ×3 ×
1 2 3
1 12 13 1
80 / 217
![Page 81: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/81.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
81 / 217
![Page 82: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/82.jpg)
Definitions: Reducibility
Reducible matrix: An n × n square matrix is reducible if there existsan n × n permutation matrix P such that
PAPT =
(A11 A12
O A22
),
where A11 is an r × r submatrix, A22 is an (n − r)× (n − r)submatrix, where 1 ≤ r < n.
Irreducible matrix: There is no such a permutation matrix.
Theorem: An n × n square matrix is irreducible iff its directed graphis strongly connected.
Proof: Follows by definition.
82 / 217
![Page 83: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/83.jpg)
Graph definitions and relations to sparse matricesWhich graph for which matrix?Trees and spanning treesOrdering and tree traversalPeripheral and pseudo-peripheral verticesGraph matching and matrix permutationReducibility and connected componentsDefinition of hypergraphs
Postorderings and memory usage
83 / 217
![Page 84: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/84.jpg)
Definitions: Hypergraphs
Hypergraph: A hypergraph H = (V ,N) consists of a finite set V called thevertex set and a set of non-empty subsets of vertices N called the hyperedgeset or the net set. A generalization of graphs.
For a matrix A, define a hypergraph whose vertices correspond to the rowsand whose nets correspond to the columns such that vertex vi is in net nj iffaij 6= 0 (the column-net model).
A sample matrix
1 2 3 4
1 × × ×2 × ×3 × × ×
The column-net hypergraph model
43n
1n
1v
2v
3v
n
n2v
2n
4n
3n
1n
1
2
3
v
v
84 / 217
![Page 85: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/85.jpg)
Outline
Reordering sparse matricesFill-in and reorderingThe elimination graph modelFill-in characterizationFill-reducing heuristicsReduction to Block Triangular Form
85 / 217
![Page 86: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/86.jpg)
Reordering sparse matricesFill-in and reorderingThe elimination graph modelFill-in characterizationFill-reducing heuristicsReduction to Block Triangular Form
86 / 217
![Page 87: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/87.jpg)
Fill-in and reordering
Step k of LU factorization (akk pivot):I For i > k compute lik = aik/akk (= a′ik),I For i > k , j > k
a′ij = aij −aik × akj
akk= aij − lik × akj
I If aik 6= 0 and akj 6= 0 then a′ij 6= 0I If aij was zero → non-zero a′ij must be stored: fill-in
k j
k
i
x
x
x
x
k j
k
i
x
x
x
0
Interest ofpermutinga matrix:
X X X X XX X 0 0 0X 0 X 0 0X 0 0 X 0X 0 0 0 X
X 0 0 0 X0 X 0 0 X0 0 X 0 X0 0 0 X XX X X X X
87 / 217
![Page 88: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/88.jpg)
Symmetric matrices and graphs
I Assumptions: A symmetric and pivots are chosen from thediagonal
I Structure of A symmetric represented by the graph G = (V ,E )
I Vertices are associated to columns: V = {1, . . . , n}I Edges E are defined by: (i , j) ∈ E ↔ aij 6= 0
I G undirected (symmetry of A)
88 / 217
![Page 89: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/89.jpg)
Symmetric matrices and graphs
I Remarks:I Number of nonzeros in column j = |adjG (j)|I Symmetric permutation ≡ renumbering the graph
3
4
25
11
2
3
4
5
1 2 3 4 5
Symmetric matrix Corresponding graph
89 / 217
![Page 90: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/90.jpg)
Breadth-First Search (BFS) of a graph to permute a matrix
Exercice 1 (BFS traversal to order a matrix)
Given a symmetric matrix A and its associated graph G = (V ,E )
1. Derive from BFS traversal of G an algorithm to reorder matrix A
2. Explain the structural property of the permuted matrix?
3. Explain why such an ordering can help reducing fill-in during LUfactorisation?
4. How can the proposed algorithm be used on an unsymmetricmatrix?
90 / 217
![Page 91: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/91.jpg)
Reordering sparse matricesFill-in and reorderingThe elimination graph modelFill-in characterizationFill-reducing heuristicsReduction to Block Triangular Form
91 / 217
![Page 92: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/92.jpg)
The elimination graph model for symmetric matrices
I Let A be a symmetric positive define matrix of order n
I The LLT factorization can be described by the equation:
A = A0 = H0 =
(d1 vT
1
v1 H1
)=
( √d1 0
v1√d1
In−1
)(1 00 H1
)( √d1
vT1√d1
0 In−1
)= L1A1LT
1, where
H1 = H1 −v1vT
1
d1
I The basic step is applied on H1H2 · · · to obtain :
A = (L1L2 · · ·Ln−1) In(LTn−1 . . .L
T2 LT
1
)= LLT
92 / 217
![Page 93: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/93.jpg)
The basic step: H1 = H1 − v1vT1
d1
What is v1vT1 in terms of structure?
v1 is a column of A, hence theneighbors of the correspondingvertex.
v1vT1 results in a dense sub-
block in H1.
If any of the nonzeros in densesubmatrix are not in A, then wehave fill-ins.
93 / 217
![Page 94: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/94.jpg)
The elimination process in the graphs
GU(V ,E )← undirected graph of Afor k = 1 : n − 1 do
V ← V − {k} {remove vertex k}E ← E − {(k , `) : ` ∈ adj(k)} ∪ {(x , y) : x ∈ adj(k) and y ∈adj(k)}Gk ← (V ,E ) {for definition}
end for
Gk are the so-called elimination graphs (Parter,’61).
4
321
6 5
12
34
56
H0 =G0 :
94 / 217
![Page 95: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/95.jpg)
A sequence of elimination graphs
4
321
6 5
G0 :
4
32
6 5
6 5
34G2 : H2 =
34
56
45
6H3 =
6 5
4
G1 :
G3 :
12
34
56
H0 =
23
45
6
H1 =
95 / 217
![Page 96: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/96.jpg)
Fill-in and reordering
“Before permutation” Permuted matrix(A”(lhr01)) (A′(lhr01))
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
nz = 184270 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
nz = 18427
Factored matrix (LU(A′))
0 200 400 600 800 1000 1200 1400
0
200
400
600
800
1000
1200
1400
nz = 76105
96 / 217
![Page 97: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/97.jpg)
Reordering sparse matricesFill-in and reorderingThe elimination graph modelFill-in characterizationFill-reducing heuristicsReduction to Block Triangular Form
97 / 217
![Page 98: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/98.jpg)
Fill-in characterization
Let A be a symmetric matrix (G (A) its associated graph), L thematrix of factors A = LLt ;
Fill path theorem, Rose, Tarjan, Leuker, 76
lij 6= 0 iff there is a path in G (A) between i and j such that all nodesin the path have indices smaller than both i and j .
i
j
p1 p2
pk
p1
p2
pk
j
i
j
i
98 / 217
![Page 99: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/99.jpg)
Fill-in characterization (proof intuition)
Let A be a symmetric matrix (G (A) its associated graph), L thematrix of factors A = LLt ;
Fill path theorem, Rose, Tarjan, Leuker, 76
lij 6= 0 iff there is a path in G (A) between i and j such that all nodesin the path have indices smaller than both i and j .
i
j
p1 p2
pk
p1
p2
pk
j
i
j
i
99 / 217
![Page 100: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/100.jpg)
Fill-in characterization (proof intuition)
Let A be a symmetric matrix (G (A) its associated graph), L thematrix of factors A = LLt ;
Fill path theorem, Rose, Tarjan, Leuker, 76
lij 6= 0 iff there is a path in G (A) between i and j such that all nodesin the path have indices smaller than both i and j .
i
j
p1 p2
pk
p1
p2
pk
j
i
j
i
100 / 217
![Page 101: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/101.jpg)
Fill-in characterization (proof intuition)
Let A be a symmetric matrix (G (A) its associated graph), L thematrix of factors A = LLt ;
Fill path theorem, Rose, Tarjan, Leuker, 76
lij 6= 0 iff there is a path in G (A) between i and j such that all nodesin the path have indices smaller than both i and j .
i
j
p1 p2
pk
p1
p2
pk
j
i
j
i
Fill(i,j) !!!
Fill(i,j) !!!
101 / 217
![Page 102: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/102.jpg)
Introducing the filled graph G+(A)
I Let F = L + LT be the filled matrix,and G (F) the filled graph of A denoted by G+(A).
I Lemma (Parter 1961) : (vi , vj) ∈ G+ if and only if (vi , vj) ∈ G
or ∃k < min(i , j) such that (vi , vk) ∈ G+ and (vk , vj) ∈ G+.
5
1
6
2
34
+G (A) = G(F) F = L + L
T
1
54
32
6
102 / 217
![Page 103: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/103.jpg)
Reordering sparse matricesFill-in and reorderingThe elimination graph modelFill-in characterizationFill-reducing heuristicsReduction to Block Triangular Form
103 / 217
![Page 104: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/104.jpg)
Fill-reducing heuristics
Three main classes of methods for minimizing fill-in duringfactorization
I Global approach: The matrix is permuted into a matrix with agiven pattern
I Fill-in is restricted to occur within that structureI Cuthill-McKee (block tridiagonal matrix)
(Remark: BFS traversal of the associated graph)I Nested dissections (“block bordered” matrix)
(Remark: interpretation using the fill-path theorem)
Graph partitioning Permuted matrix
(1)
(5)
(4)
(2)
S1
S2
S3
S1
12
34
S2
S3
104 / 217
![Page 105: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/105.jpg)
Fill-reducing heuristics
I Local heuristics: At each step of the factorization, selection ofthe pivot that is likely to minimize fill-in.
I Method is characterized by the way pivots are selected.I Markowitz criterion (for a general matrix).I Minimum degree or Minimum fill-in (for symmetric matrices).
I Hybrid approaches: Once the matrix is permuted to blockstructure, local heuristics are used within the blocks.
105 / 217
![Page 106: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/106.jpg)
Local heuristics to reduce fill-in during factorization
Let G (A) be the graph associated to a matrix A that we want toorder using local heuristics.Let Metric such that Metric(vi ) < Metric(vj) implies vi is a betterthan vj
Generic algorithmLoop until all nodes are selected
Step1: select current node p (so called pivot) withminimum metric value,
Step2: update elimination graph,Step3: update Metric(vj) for all non-selected nodes vj .
Step3 should only be applied to nodes for which the Metric valuemight have changed.
106 / 217
![Page 107: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/107.jpg)
Reordering unsymmetric matrices: Markowitz criterion
I At step k of Gaussian elimination:
Ak
L
U
I rki = number of non-zeros in row i of Ak
I ckj = number of non-zeros in column j of Ak
I aij must be large enough and should minimize(rki − 1)× (ck
j − 1) ∀i , j > k
I Minimum degree : Markowitz criterion for symmetricdiagonally dominant matrices
107 / 217
![Page 108: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/108.jpg)
Minimum degree algorithm
I Step 1:Select the vertex that possesses the smallest number ofneighbors in G 0.
23
45
67
89
10
1
14
3
5
6
8
9
10
2
7
(a) Sparse symmetric matrix(b) Elimination graph
The node/variable selected is 1 of degree 2.
108 / 217
![Page 109: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/109.jpg)
I Notation for the elimination graph
I Let G k = (V k ,E k), the graph built at step k .I G k describes the structure of Ak after elimination of k pivots.I G k is non-oriented (Ak is symmetric)I Fill-in in Ak ≡ adding edges in the graph.
109 / 217
![Page 110: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/110.jpg)
Illustration
Step 1: elimination of pivot 1
/
12
34
56
78
910
4
3
5
6
8
9
10
2
7
1
23
45
67
89
10
1
7
4
3
5
6
8
9
10
2
1
(a) Elimination graph (b) Factors and active submatrix
Initial nonzeros Fill−inNonzeros in factors
![Page 111: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/111.jpg)
Minimum degree algorithm applied to the graph:
I Step k : Select the node with the smallest number of neighbors
I G k is built from G k−1 by suppressing the pivot and adding edgescorresponding to fill-in.
111 / 217
![Page 112: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/112.jpg)
Minimum degree algorithm based on elimination graphs
∀i ∈ [1 . . . n] ti = |AdjG0(i)|For k = 1 to n Do
p = mini∈Vk−1(ti )
For each i ∈ AdjG k−1(p) DoAdjG k (i) = (AdjG k−1(i) ∪ AdjG k−1(p)) \ {i , p}ti = |AdjG k (i)|
EndForV k = V k−1 \ p
EndFor
112 / 217
![Page 113: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/113.jpg)
Illustration (cont’d)
Graphs G1,G2,G3 and corresponding reduced matrices.
e
7
4
3
5
6
8
9
10
2
12
34
56
78
910
12
34
56
78
910
4
3
5
6
8
9
10
7
23
45
67
89
10
1
4
5
6
8
9
10
7
(a) Elimination graphs
(b) Factors and active submatrices
Original nonzero Fill−in
Original nonzero modified Nonzeros in factors
113 / 217
![Page 114: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/114.jpg)
Minimum Degree does not always minimize fill-in !!!
12
34
67
89
5
4
3
1
2
6
7
9
8
5
4
3
1
2
6
7
9
8
Consider the following matrix
Remark: Using initial ordering
No fill−in
Corresponding elimination graph
Step 1 of Minimum Degree:
Select pivot 5 (minimum degree = 2)
Updated graph
Add (4,6) i.e. fill−in
114 / 217
![Page 115: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/115.jpg)
Reduce memory complexity
Decrease the size of working spaceUsing the elimination graph, working space is of order O(#nonzerosin factors).
I Property: Let pivot be the pivot at step k
If i ∈ AdjGk−1(pivot) Then AdjGk−1(pivot) ⊂ AdjGk (i)
I We can then use an implicit representation of fill-in by definingthe notions of element (variable already eliminated) andquotient graph. A variable of the quotient graph is adjacent tovariables and elements.
I We can show that ∀k ∈ [1 . . .N] Size of quotient graph= O(G 0)
115 / 217
![Page 116: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/116.jpg)
Influence on the structure of factors
Harwell-Boeing matrix: dwt 592.rua, structural computing on asubmarine. NZ(LU factors)=58202
0 100 200 300 400 500
0
100
200
300
400
500
nz = 5104
116 / 217
![Page 117: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/117.jpg)
Mininimum fill-in heuristics
Recalling the generic algorithmLet G (A) be the graph associated to a matrix A that we want toorder using local heuristics.Let Metric be such that Metric(vi ) < Metric(vj) ≡ vi is a betterthan vj
Generic algorithmLoop until all nodes are selected
Step1: Select current node p (so called pivot) withminimum metric value,
Step2: update elimination (or quotient) graph,Step3: update Metric(vj) for all non-selected nodes vj .
Step3 should only be applied to nodes for which the Metric valuemight have changed.
117 / 217
![Page 118: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/118.jpg)
Minimum fill based algorithm
I Metric(vi ) is the amount of fill-in that vi would introduce if itwere selected as a pivot.
I Illustration: r has a degree d = 4 and a fill-in metric ofd × (d − 1)/2 = 6 whereas s has degree d = 5 but a fill-inmetric of d × (d − 1)/2− 9 = 1.
r
s
i2
i1
i5
i3 i4j1
j2j3 j4
118 / 217
![Page 119: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/119.jpg)
Minimum fill-in properties
I The situation typically occurs when {i1, i2, i3} and {i2, i3, i4, i5}were adjacent to two already selected nodes (here e2 and e1)
e1 and e2 are previously selected nodes
rs
i1i2
i3i4
i5j1
j2j3
j4
r
s
i2
i1
i5
i3 i4j1
j2j3 j4
e2e1
I The elimination of a node vk affects the degree of nodes adjacentto vk . The fill-in metric of Adj(Adj(vk)) is also affected.
I Illustration: selecting r affects the fill-in of i1 (fill edge (j3, j4)should be deduced).
![Page 120: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/120.jpg)
Impact of fill-reducing heuristics
Number of operations (millions)
METIS SCOTCH PORD AMF AMDgupta2 2757.8 4510.7 4993.3 2790.3 2663.9ship 003 83828.2 92614.0 112519.6 96445.2 155725.5twotone 29120.3 27764.7 37167.4 29847.5 29552.9wang3 4313.1 5801.7 5009.9 6318.0 10492.2xenon2 99273.1 112213.4 126349.7 237451.3 298363.5
I METIS (Karypis and Kumar) and SCOTCH (Pellegrini) are global strategies(recursive nested dissection based orderings).
I PORD (Schulze, Paderborn Univ.) recursive dissection based on a bottomup strategy to build the separator
I AMD (Amestoy, Davis and Duff) local strategy based on ApproximateMinimum Degree.
I AMF (Amestoy) local strategy based on Approx. Minimum Fill.
120 / 217
![Page 121: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/121.jpg)
Impact of fill-reducing heuristics
Time for factorization (seconds)
1p 16p 32p 64p 128p
coneshl METIS 970 60 41 27 14PORD 1264 104 67 41 26
audi METIS 2640 198 108 70 42PORD 1599 186 146 83 54
Matrices with quasi dense rows:Impact on the analysis time (seconds) of gupta2 matrix
AMD METIS QAMD
Analysis 361 52 23Total 379 76 59
I QAMD (Amestoy) Approximate Minimum Degree (local) strategy designed
for matrices with quasi dense rows.
121 / 217
![Page 122: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/122.jpg)
Reordering sparse matricesFill-in and reorderingThe elimination graph modelFill-in characterizationFill-reducing heuristicsReduction to Block Triangular Form
122 / 217
![Page 123: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/123.jpg)
Reduction to Block Triangular Form (BTF)
I Suppose that there exists permutations matrices P and Q suchthat
PAQ =
B11
B21 B22
B31 B32 B33
. . . .
. . . .
. . . .BN1 BN2 BN3 . . . BNN
I If N > 1 A is said to be reducible (irreducible otherwise)
I Each Bii is supposed to be irreducible (otherwise finerdecomposition is possible)
I Advantage: to solve Ax = b only Bii need be factoredBiixi = (Pb)i −
∑i−1j=1 Bijyj , i = 1, . . . ,N with y = QTx
123 / 217
![Page 124: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/124.jpg)
A two stage approach to compute the reduction
I Stage (1): compute a (column) permutation matrix Q such thatAQ1 has non zeros on the diagonal (find a maximum transversalof A), then
I Stage(2): compute a (symmetric) permutation matrix P suchthat PAQ1Pt is BTF.
The diagonal blocks of the BTF are uniquely defined. Techniquesexits to directly compute the BTF form. They do not presentadvantage over this two stage approach.
124 / 217
![Page 125: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/125.jpg)
Main components of the algorithm
I Objective : assume AQ1 has non zeros on the diagonal andcompute P such that PAQ1Pt is BTF.
I Use of digraph (directed graph) associated to the matrix withself loops (diagonal entries) not needed.
I Symmetric permutations on digraph ≡ relabelling nodes of thegraph.
I If there is closed path through all nodes in the digraph then thedigraph can be subdivided in two parts.
I Strong components of a graph are the set of nodes belonging toa closed path.
I The strong components of the graph are the diagonal blocks Bii
of the BTF format.
125 / 217
![Page 126: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/126.jpg)
Main components of the algorithm
34
51
2
21
6
3 4
53
45
Digraph of A.
PAPt =
BTF of A
6
12
A = 6
126 / 217
![Page 127: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/127.jpg)
Preamble : Finding the triangular form of a matrix
I Observations: a triangular matrix has a BTF form with blocks ofsize 1(BTF will be considered as generalization of the triangular formwhere each diagonal entry is now a strong component).Note that if the matrix has triangular form the digraph has nocycle (i.e. acyclic).
I Sargent and Westerberg Algorithm :
1. Select a starting node and trace a path until finding a node withno paths leave.
2. Number the last node first, eliminate it (and all edges pointing toit).
3. Continue from previous node on the path (or choose a newstarting node).
127 / 217
![Page 128: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/128.jpg)
Preamble : Finding the triangular form of a matrix
21
6
3 4
5
12
34
56
A = PAPt =
6
51
2
43
Digraph of A.Steps
Path
1
6 3 34
35 5 2
1225555
52 3 4 6 7 8 9
A permuted to triangular form
(not unique)
128 / 217
![Page 129: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/129.jpg)
Generalizing Sargent and Westerberg Algorithm tocompute BTF
A composite node is a set nodes belonging to a closed path.Strat from any node and follow a path until(1) a closed path is found:
collapse all nodes in closed path into a composite nodethe path continue from the composite node.
(2) reaching a node or composite node with no path leave:the node or composite node is numbered next.
129 / 217
![Page 130: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/130.jpg)
Generalizing Sargent and Westerberg Algorithm tocompute BTF
22 2334 4
3
5354322 2
1222
212
63 3 3
2
1 2 3 4 5 76 8 9 10 11
PATHS
STEPS
21
6
3 4
5
Digraph of A.
21
6
3 4
5
Digraph of A.
130 / 217
![Page 131: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/131.jpg)
Generalizing Sargent and Westerberg Algorithm tocompute BTF
21
6
3 4
5
Digraph of A.
22 2334 4
3
5354322 2
1222
212
63 3 3
2
1 2 3 4 5 76 8 9 10 11
PATHS
STEPS
21
6
3 4
5
Digraph of A.
34
5
12
A = 6
BTF of A
63
4
12
5PAPt =
131 / 217
![Page 132: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/132.jpg)
Concluding remarks
Summary:
Fill-in has been characterized;Ordering strategies and associated graphs have been introduced toreduce fill-in or to permute to target structure (block tridiagonal,recursive block-bordered, block triangular form) Influence onperformance (sequential and parallel)
Next step:
How to model sparse factorization, predict data structures to run thenumerical factorization ?
132 / 217
![Page 133: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/133.jpg)
Outline
Graphs to model factorizationThe elimination tree
133 / 217
![Page 134: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/134.jpg)
Graphs to model factorizationThe elimination tree
134 / 217
![Page 135: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/135.jpg)
The elimination tree for symmetric matrices
The elimination tree provides structural information relevant todescribing both Gaussian elimination of symmetric matrices and theorthogonal factorization. Its extension to unsymmetric matrices willbe illustrated
135 / 217
![Page 136: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/136.jpg)
Set-up
Reminder
I A spanning tree of a connected graph G = (V ,E ) is a treeT = (V ,F ), such that F ⊆ E .
I A topological ordering of a rooted tree is an ordering thatnumbers children vertices before their parent.
I A postorder is a topological ordering which numbers the verticesin any subtree consecutively.
Let A be an n × n symmetric positive-definite and irreducible matrix,A = LLT its Cholesky factorization, and G+(A) its filled graph(graph of F = L + LT ).
136 / 217
![Page 137: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/137.jpg)
A first definition
Since A is irreducible, each of the first n − 1 columns of L has atleast one off-diagonal nonzero (prove?).
For each column j < n of L, remove all the nonzeros in the column jexcept the first one below the diagonal.
Let Lt denote the remaining structure and consider the matrixFt = Lt + LT
t . The graph G (Ft) is a tree called the elimination tree.
138 JOSEPH W. H. LIU
a
d
fa
o
o o h0
o oo
d
I
oj
FIG. 2.1. An example of matrix structures.
3 10
84
G(AG(F)
G(Ft)= T(A
FIG. 2.2. Graph structures of the example in Fig. 2.1.
138 JOSEPH W. H. LIU
a
d
fa
o
o o h0
o oo
d
I
oj
FIG. 2.1. An example of matrix structures.
3 10
84
G(AG(F)
G(Ft)= T(A
FIG. 2.2. Graph structures of the example in Fig. 2.1.
137 / 217
![Page 138: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/138.jpg)
A first definition
The elimination tree of A is a spanning tree of G+(A) satisfying therelation PARENT [j ] = min{i > j : `ij 6= 0}.
138 JOSEPH W. H. LIU
a
d
fa
o
o o h0
o oo
d
I
oj
FIG. 2.1. An example of matrix structures.
3 10
84
G(AG(F)
G(Ft)= T(A
FIG. 2.2. Graph structures of the example in Fig. 2.1.
+G (A) = G(F)
a
h
i
d
e
j
b
f
c g
j 10d4
i9
e5
b2
a1
c3
g7
h8f
6h8
d4
e5
b2
f6
c3
a1
i9
g7
j 10
ab
cd
ef
gh
ij
entries in T(A)fill−in entries
F =A =
ab
cd
ef
gh
ij
ab
cd
ef
gh
ij
entries in T(A)fill−in entries
F =A =
ab
cd
ef
gh
ij
G(A)
T(A)138 / 217
![Page 139: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/139.jpg)
A second definition: Represents column dependencies
I Dependency between columns of L:I Column i > j depends on
column j iff `ij 6= 0
I Use a directed graph toexpress this dependency(edge from j to i , if columni depends on column j)
I Simplify redundantdependencies (transitivereduction)
j
i
j i
I The transitive reduction of the directed filled graph gives theelimination tree structure. Remove a directed edge (j , i) if thereis a path of length greater than one from j to i .
139 / 217
![Page 140: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/140.jpg)
Directed filled graph and its transitive reduction
h8d
4e
5
b2
f6
c3
a1
i9
g7
j 10
d4
i9
e5
b2
a1
c3
g7
h8f
6
j 10d
4
i9
e5
b2
a1
c3
g7
h8f
6
j 10
T(A)
Directed filled graph Transitive reduction
140 / 217
![Page 141: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/141.jpg)
The elimination tree: a major tool to model factorization
The elimination tree and its generalisation to unsymmetric matricesare compact structures of major importance during sparsefactorization
I Express the order in which variables can be eliminated: becausethe elimination of a variable only affects the elimination of itsancestors, any topological order of the elimination tree will leadto a correct result and to the same fill-in
I Express concurrency: because variables in separate subtrees donot affect each other, they can be eliminated in parallel
I Efficient to characterize the structure of the factors moreefficiently than with elimination graphs
141 / 217
![Page 142: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/142.jpg)
Outline
Efficiency of the solution phaseSparsity in the right hand side and/or solution
142 / 217
![Page 143: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/143.jpg)
Efficiency of the solution phaseSparsity in the right hand side and/or solution
143 / 217
![Page 144: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/144.jpg)
Exploit sparsity of the right-hand-side/solution
Applications
I Highly reducible matrices and/or sparse right-hand-sides (linearprogramming, seismic processing)
I Null-space basis computationI Partial computation of A−1
I Computing variances of the unknowns of a data fitting problem =computing the diagonal of a so-called variance-covariance matrix.
I Computing short-circuit currents = computing blocks of aso-called impedance matrix.
I Approximation of the condition number of a SPD matrix.
Core idea
An efficient algorithm has to take advantage of the sparsity of A andof both the right-hand sides and the solution.
![Page 145: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/145.jpg)
Exploit sparsity of the right-hand-side/solution
Applications
I Highly reducible matrices and/or sparse right-hand-sides (linearprogramming, seismic processing)
I Null-space basis computationI Partial computation of A−1
I Computing variances of the unknowns of a data fitting problem =computing the diagonal of a so-called variance-covariance matrix.
I Computing short-circuit currents = computing blocks of aso-called impedance matrix.
I Approximation of the condition number of a SPD matrix.
Core idea
An efficient algorithm has to take advantage of the sparsity of A andof both the right-hand sides and the solution.
![Page 146: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/146.jpg)
Exploit sparsity in RHS : an quick insight of mainproperties
solve y ← L \ b
L yb\1
5
2
3
4
L yb\
1
5
2
3
4
I In all application cases,only part of factors needs to be loaded
I Objectives with sparse RHSI Efficient use of the RHS sparsityI Characterize LU factors to be loaded
from diskI Efficiently load only needed factors
from disk
(1) Predicting structure of the solution vector,
Gilbert-Liu, ’93
146 / 217
![Page 147: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/147.jpg)
Exploit sparsity in RHS : an quick insight of mainproperties
solve y ← L \ b
L yb\1
5
2
3
4
L yb\
1
5
2
3
4
I In all application cases,only part of factors needs to be loaded
I Objectives with sparse RHSI Efficient use of the RHS sparsityI Characterize LU factors to be loaded
from diskI Efficiently load only needed factors
from disk
(1) Predicting structure of the solution vector,
Gilbert-Liu, ’93
147 / 217
![Page 148: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/148.jpg)
Exploit sparsity in RHS : an quick insight of mainproperties
solve y ← L \ b
L yb\1
5
2
3
4
L yb\
1
5
2
3
4
I In all application cases,only part of factors needs to be loaded
I Objectives with sparse RHSI Efficient use of the RHS sparsityI Characterize LU factors to be loaded
from diskI Efficiently load only needed factors
from disk
(1) Predicting structure of the solution vector,
Gilbert-Liu, ’93
148 / 217
![Page 149: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/149.jpg)
Exploit sparsity in RHS : an quick insight of mainproperties
solve y ← L \ b
L yb\1
5
2
3
4
L yb\
1
5
2
3
4
I In all application cases,only part of factors needs to be loaded
I Objectives with sparse RHSI Efficient use of the RHS sparsityI Characterize LU factors to be loaded
from diskI Efficiently load only needed factors
from disk
(1) Predicting structure of the solution vector,
Gilbert-Liu, ’93
149 / 217
![Page 150: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/150.jpg)
Exploit sparsity in RHS : an quick insight of mainproperties
solve y ← L \ b
L yb\1
5
2
3
4
L yb\
4
3
1
5
2
I In all application cases,only part of factors needs to be loaded
I Objectives with sparse RHSI Efficient use of the RHS sparsityI Characterize LU factors to be loaded
from diskI Efficiently load only needed factors
from disk
(1) Predicting structure of the solution vector,
Gilbert-Liu, ’93
150 / 217
![Page 151: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/151.jpg)
Application : elements in A−1
AA−1 = I , specific entry: a−1ij = (A−1ej)i ,
A−1ej – column j of A−1
Theorem: structure of x (based on Gilbert and Liu ’93)
For any matrix A such that A = LU, the structure of the solution (x) is given by the set ofnodes reachable from nodes associated with right-hand side entries by paths in the e-tree.
compute some elements inA−1
2 4 5 8 11
1263
7 13
14
1
9
10
2
7
14
b xy
Which factors needed to compute a−182 ?a−182 = (U−1(L−1e2))8
We have to load :L factors associated with nodes 2, 3, 7, 14and U factors associated with nodes 14, 13, 9, 8
Note:A part of the tree is concerned
151 / 217
![Page 152: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/152.jpg)
Application : elements in A−1
AA−1 = I , specific entry: a−1ij = (A−1ej)i ,
A−1ej – column j of A−1
Theorem: structure of x (based on Gilbert and Liu ’93)
For any matrix A such that A = LU, the structure of the solution (x) is given by the set ofnodes reachable from nodes associated with right-hand side entries by paths in the e-tree.
compute some elements inA−1
5 8 11
126
13
9
1042
3
7
14 2
7
14
2
7
14
3
b xy
1
Which factors needed to compute a−182 ?a−182 = (U−1(L−1e2))8
We have to load :L factors associated with nodes 2, 3, 7, 14
and U factors associated with nodes 14, 13, 9, 8
Note:A part of the tree is concerned
152 / 217
![Page 153: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/153.jpg)
Application : elements in A−1
AA−1 = I , specific entry: a−1ij = (A−1ej)i ,
A−1ej – column j of A−1
Theorem: structure of x (based on Gilbert and Liu ’93)
For any matrix A such that A = LU, the structure of the solution (x) is given by the set ofnodes reachable from nodes associated with right-hand side entries by paths in the e-tree.
compute some elements inA−1
2 4 5 11
1263
7
1 10
14
13
9
8
2
7
14 14
2
7
14
3
8
9
13
b xy
Which factors needed to compute a−182 ?a−182 = (U−1(L−1e2))8
We have to load :L factors associated with nodes 2, 3, 7, 14and U factors associated with nodes 14, 13, 9, 8
Note:A part of the tree is concerned
153 / 217
![Page 154: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/154.jpg)
Application : elements in A−1
AA−1 = I , specific entry: a−1ij = (A−1ej)i ,
A−1ej – column j of A−1
Theorem: structure of x (based on Gilbert and Liu ’93)
For any matrix A such that A = LU, the structure of the solution (x) is given by the set ofnodes reachable from nodes associated with right-hand side entries by paths in the e-tree.
compute some elements inA−1
5 11
126
10
2
7
14 14
2
7
14
3
8
9
13
b xy
3
21 4
7
14
13
9
8
Which factors needed to compute a−182 ?a−182 = (U−1(L−1e2))8
We have to load :L factors associated with nodes 2, 3, 7, 14and U factors associated with nodes 14, 13, 9, 8
Note:A part of the tree is concerned
154 / 217
![Page 155: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/155.jpg)
Entries of the inverse: a single one
Notation for later use
P(i): denotes the nodes in the uniquepath from the node i to the rootnode r (including i and r).
P(S): denotes⋃
s∈S P(s) for a set ofnodes S .
Use the elimination tree
For each requested(diagonal) entry a−1ii ,
(1) visit the nodes of theelimination tree fromthe node i to theroot: at each nodeaccess necessary partsof L,
(2) visit the nodes fromthe root to the node iagain; this timeaccess necessary partsof U.
155 / 217
![Page 156: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/156.jpg)
Experiments: interest of exploiting sparsity
Implementation
These ideas have been implemented in MUMPS solver during Tz.Slavova’s PhD.
Experiments: computation of the diagonal of the inverse of matricesfrom data fitting in Astrophysics (CESR, Toulouse)
Matrix Time (s)size No ES ES
46,799 6,944 47272,358 27,728 408
148,286 >24h 1,391
Interest
Exploiting sparsity of the right-hand sides reduces the number ofaccesses to the factors (in-core: number of flops, out-of-core:accesses to hard disks).
156 / 217
![Page 157: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/157.jpg)
Outline
Factorization of sparse matricesIntroductionElimination tree and Multifrontal approachEquivalent orderings and elimination treesSome parallel solversConcluding remarks
157 / 217
![Page 158: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/158.jpg)
Factorization of sparse matrices
Outline
1. Introduction
2. Elimination tree and multifrontal method
3. Postorderings, equivalent orderings and memory usage
4. Concluding remarks
158 / 217
![Page 159: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/159.jpg)
Factorization of sparse matricesIntroductionElimination tree and Multifrontal approachEquivalent orderings and elimination treesSome parallel solversConcluding remarks
159 / 217
![Page 160: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/160.jpg)
Recalling the Gaussian elimination
Step k of LU factorization (akk pivot):
I For i > k compute lik = aik/akk (= a′ik),
I For i > k , j > k such that aik and akj are nonzeros
a′ij = aij −aik × akj
akk
I If aik 6= 0 et akj 6= 0 then a′ij 6= 0
I If aij was zero → its non-zero value must be stored
I Orderings (minimum degree, Cuthill-McKee, ND) limit fill-in,the number of operations and modify the tasks graph
160 / 217
![Page 161: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/161.jpg)
Three-phase scheme to solve Ax = b
1. Analysis stepI Preprocessing of A (symmetric/unsymmetric orderings, scalings)I Build the dependency graph (elimination tree, eDAG . . . )
2. Factorization (A = LU, LDLT, LLT, QR)
Numerical pivoting
3. Solution based on factored matricesI triangular solves: Ly = b, then Ux = yI improvement of solution (iterative refinement), error analysis
161 / 217
![Page 162: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/162.jpg)
Factorization of sparse matricesIntroductionElimination tree and Multifrontal approachEquivalent orderings and elimination treesSome parallel solversConcluding remarks
162 / 217
![Page 163: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/163.jpg)
Elimination tree and Multifrontal approach
We recall that:
The elimination tree is the dependency graph of thefactorization.
Building the elimination tree (small matrices)
I Permute matrix (to reduce fill-in) PAPT.
I Build filled matrix AF = L + LT where PAPT = LLT
I Transitive reduction of associated filled graph
Multifrontal approach
→ Each column corresponds to a node of the tree
→ (multifrontal) Each node k of the tree corresponds to the partialfactorization of a frontal matrix whose row and column structureis that of column k of AF .
163 / 217
![Page 164: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/164.jpg)
Illustration of multifrontal factorization
We assume pivots are chosen down the diagonal in order.
_
_
_
3, 4
4 4
1, 3, 4_ 2, 3, 4
3
1 2
F
F
Elimination graph
Filled matrix
Treatment at each node:
I Assembly of the frontal matrix using the contributions from thesons.
I Gaussian elimination on the frontal matrix
164 / 217
![Page 165: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/165.jpg)
I Elimination of variable 1 (a11 pivot)I Assembly of the frontal matrix
1 3 4
1 x x x
3 x
4 x
I Contributions : aij =−(ai1×a1j )
a11i > 1, j > 1 on a33, a44, a34 and a43 :
a(1)33 = − (a31 × a13)
a11a(1)34 = − (a31 × a14)
a11
a(1)43 = − (a41 × a13)
a11a(1)44 = − (a41 × a14)
a11
Terms − ai1×a1ja11
of the contribution matrix are stored for later updates.
165 / 217
![Page 166: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/166.jpg)
I Elimination of variable 2 (a22 pivot)I Assembly of frontal matrix: update of elements of pivot row and
column using contributions from previous updates (none here)2 3 4
2 x x x
3 x
4 x
I Contributions on a33, a34, a43, and a44.
a(2)33 = − (a32 × a23)
a22
a(2)34 = − (a32 × a24)
a22
a(2)43 = − (a42 × a23)
a22
a(2)44 = − (a42 × a24)
a22
166 / 217
![Page 167: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/167.jpg)
I Elimination of variable 3.I Assembly of frontal matrix using the previous contribution
matrices(a(1)33 a
(1)34
a(1)43 a
(1)44
)and
(a(2)33 a
(2)34
a(2)43 a
(2)44
):
a′33 = a33 + a(1)33 + a
(2)33
a′34 = a34 + a(1)34 + a
(2)34 , (a34 = 0)
a′43 = a43 + a(1)43 + a
(2)43 , (a43 = 0)
a′44 = a(1)44 + a
(2)44
I Contribution on variable 4:a(3)44 = a′44 −
(a′43×a34)a33
167 / 217
![Page 168: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/168.jpg)
3 4
3 x x
4 x
I Contribution on a44 : a(3)44 = a′44 −
(a′43×a′34)a′33
Note that a44 is partially summed since it still does not includethe entry from the initial matrix
I Elimination of variable 4I Frontal involves only a44 : a44 = a44 + a
(3)44
4
4 x
168 / 217
![Page 169: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/169.jpg)
Supernodes/supervariables
Definition 1
A supernode (or supervariable) is a set of contiguous columns in thefactors L that share essentially the same sparsity structure. Columnsi1, i2, . . . , ip in the filled graph form a supernode: |Lik+1
| = |Lik | − 1.
I All algorithms (ordering, symbolic factor., factor., solve)generalize to blocked versions.
I Use of efficient matrix-matrix kernels (improve cache usage).
I Same concept as supervariables for elimination tree/minimumdegree ordering.
I Supernodes and pivoting: pivoting inside a supernode does notincrease fill-in.
169 / 217
![Page 170: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/170.jpg)
Amalgamation
I GoalI Exploit a more regular structure in the original matrixI Decrease the amount of indirect addressingI Increase the size of frontal matrices
I How?I Relax the number of nonzeros of the matrixI Amalgamation of nodes of the elimination tree
170 / 217
![Page 171: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/171.jpg)
I Consequences?I Increase in the total amount of flopsI But decrease of indirect addressingI And increase in performance
I Amalgamation of supernodes (same lower diagonal structure) iswithout fill-in
171 / 217
![Page 172: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/172.jpg)
Illustration of amalgamation
Original Matrix Elimination tree
1
2
3
4
5
6
1 2 3 4 5 6
F3
F3
F4
F4
6
4, 5, 6
5, 6
3, 5, 62, 4, 51, 4
(Leaves)
(Root)
Structure of node i = frontal matrix noted i, i1, i2 . . . if
172 / 217
![Page 173: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/173.jpg)
Illustration of amalgamation
Amalgamation
(WITHOUT fill-in) (WITH fill-in )
3, 5, 6
4, 5, 6
2, 4, 51, 4 2, 4, 51, 4
3, 4, 5, 6
fill-in : (3,4) and (4,3)
173 / 217
![Page 174: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/174.jpg)
Amalgamation and Supervariables
Amalgamation of supervariables does not cause fill-inInitial Graph:
1
2
4
3
5
6
7
8
9 10
11
12
13
Reordering: 1, 3, 4, 2, 6, 8, 10, 11, 5, 7, 9, 12, 13Supervariables: {1, 3, 4} ; {2, 6, 8} ; {10, 11} ; {5, 7, 9, 12, 13}
174 / 217
![Page 175: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/175.jpg)
Supervariables and multifrontal method
AT EACH NODE
F22 ← F22 − FT12F−111 F12
Pivot can ONLY be chosen from F11 block since F22 is NOT fullysummed
175 / 217
![Page 176: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/176.jpg)
Multifrontal method
From children to parent
I ASSEMBLY: Scatter-addoperations (indirectaddressing)
I ELIMINATION: Densepartial Gaussian elimination,Level 3 BLAS (TRSM,GEMM)
I CONTRIBUTION to parent
176 / 217
![Page 177: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/177.jpg)
Multifrontal method
From children to parent
I ASSEMBLY: Scatter-addoperations (indirectaddressing)
I ELIMINATION: Densepartial Gaussian elimination,Level 3 BLAS (TRSM,GEMM)
I CONTRIBUTION to parent
177 / 217
![Page 178: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/178.jpg)
Multifrontal method
From children to parent
I ASSEMBLY: Scatter-addoperations (indirectaddressing)
I ELIMINATION: Densepartial Gaussian elimination,Level 3 BLAS (TRSM,GEMM)
I CONTRIBUTION to parent
178 / 217
![Page 179: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/179.jpg)
Multifrontal method
From children to parent
I ASSEMBLY: Scatter-addoperations (indirectaddressing)
I ELIMINATION: Densepartial Gaussian elimination,Level 3 BLAS (TRSM,GEMM)
I CONTRIBUTION to parent
179 / 217
![Page 180: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/180.jpg)
Multifrontal method
From children to parent
I ASSEMBLY: Scatter-addoperations (indirectaddressing)
I ELIMINATION: Densepartial Gaussian elimination,Level 3 BLAS (TRSM,GEMM)
I CONTRIBUTION to parent
180 / 217
![Page 181: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/181.jpg)
Multifrontal method
From children to parent
I ASSEMBLY: Scatter-addoperations (indirectaddressing)
I ELIMINATION: Densepartial Gaussian elimination,Level 3 BLAS (TRSM,GEMM)
I CONTRIBUTION to parent
181 / 217
![Page 182: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/182.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
FactorsActivefrontalmatrix
Stack ofcontributionblocks
Active storage
5
54
145
34
23
Elimination tree
182 / 217
![Page 183: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/183.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
183 / 217
![Page 184: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/184.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
184 / 217
![Page 185: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/185.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
185 / 217
![Page 186: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/186.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
186 / 217
![Page 187: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/187.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
187 / 217
![Page 188: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/188.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
188 / 217
![Page 189: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/189.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
189 / 217
![Page 190: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/190.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
190 / 217
![Page 191: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/191.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
191 / 217
![Page 192: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/192.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
192 / 217
![Page 193: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/193.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
193 / 217
![Page 194: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/194.jpg)
The multifrontal method [Duff & Reid ’83]
1 2 3 4 512345
1 2 3 4 512345
A= L+U-I=
Storage is divided into two parts:
I Factors
I Active memory
5
54
145
34
23
Active memory
194 / 217
![Page 195: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/195.jpg)
The multifrontal method [Duff & Reid ’83]
I Factors are incompressible andusually scale fairly; they canoptionally be written on disk.
I In sequential, the traversal thatminimizes active memory is known[Liu’86].
I In parallel, active memory becomesdominant.Example: share of active storage onthe AUDI matrix using MUMPS4.10.0
1 processor: 11%256 processors: 59%
5
54
145
34
23
Active memory
195 / 217
![Page 196: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/196.jpg)
Postorderings and memory usage
I Assumptions:I Tree processed from the leaves to the rootI Parents processed as soon as all children have completed
(postorder of the tree)I Each node produces and sends temporary data consumed by its
father.I Exercise: In which sense is a postordering-based tree traversal
more interesting than a random topological ordering ?
I Furthermore, memory usage also depends on the postorderingchosen:
a b ab
c
d f
e
g
h
c
d
e
f
g
h
ii
Best (abcdefghi) Worst (hfdbacegi)
Leaves
Root
196 / 217
![Page 197: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/197.jpg)
Postorderings and memory usage
I Assumptions:I Tree processed from the leaves to the rootI Parents processed as soon as all children have completed
(postorder of the tree)I Each node produces and sends temporary data consumed by its
father.I Exercise: In which sense is a postordering-based tree traversal
more interesting than a random topological ordering ?
I Furthermore, memory usage also depends on the postorderingchosen:
a b ab
c
d f
e
g
h
c
d
e
f
g
h
ii
Best (abcdefghi) Worst (hfdbacegi)
Leaves
Root
197 / 217
![Page 198: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/198.jpg)
Example 1: Processing a wide tree
1 2 3 6
4 5
7
![Page 199: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/199.jpg)
1 2 3 6
4 5
7
Memory
..
.
..
.
unused memory space stack memory spacefactor memory space non-free memory space
1
1
65
5
5
5
5
5
11
1 1
1
1
1
1
1
1
2
2
2
2
2
2
2
23
3
3
3
3
3
3
3
4
4
4
4
4
6
6
6
6
6
6
4
7
7
7
Active memory
![Page 200: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/200.jpg)
Example 2: Processing a deep tree
1 2
3
4
Memory
unused memory space stack memory spacefactor memory space non-free memory space
1
11
1122
1 23
1 23
1 2 33
1 2 33 4
1 23 4
..
.
12
..
.
Allocation of 3
Assembly step for 3
Factorization step for 3 +
Stack step for 3
![Page 201: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/201.jpg)
Modelization of the problem
I Mi : memory peak for complete subtree rooted at i ,
I tempi : temporary memory produced by node i ,
I mparent : memory for storing the parent.
M2 M3
M(parent)
M1
temp3temp2
temp1
Mparent = max( maxnbchildrenj=1 (Mj +∑j−1
k=1 tempk),
mparent +∑nbchildren
j=1 tempj)(1)
Objective: order the children to minimize Mparent
201 / 217
![Page 202: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/202.jpg)
Modelization of the problem
I Mi : memory peak for complete subtree rooted at i ,
I tempi : temporary memory produced by node i ,
I mparent : memory for storing the parent.
M2 M3
M(parent)
M1
temp3temp2
temp1
Mparent = max( maxnbchildrenj=1 (Mj +∑j−1
k=1 tempk),
mparent +∑nbchildren
j=1 tempj)(1)
Objective: order the children to minimize Mparent
202 / 217
![Page 203: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/203.jpg)
Memory-minimizing schedules
Theorem 1
[Liu,86] The minimum of maxj(xj +∑j−1
i=1 yi ) is obtained when thesequence (xi , yi ) is sorted in decreasing order of xi − yi .
Corollary 1
An optimal child sequence is obtained by rearranging the childrennodes in decreasing order of Mi − tempi .
Interpretation: At each level of the tree, child with relatively largepeak of memory in its subtree (Mi large with respect to tempi )should be processed first.
⇒ Apply on complete tree starting from the leaves(or from the root with a recursive approach)
![Page 204: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/204.jpg)
Optimal tree reordering
Objective: Minimize peak of stack memory
Tree Reorder (T ):Begin
for all i in the set of root nodes doProcess Node(i);
end forEnd
Process Node(i):if i is a leaf then
Mi=mi
elsefor j = 1 to nbchildren do
Process Node(j th child);end forReorder the children of i in decreasing order of (Mj − tempj);Compute Mparent at node i using Formula (1);
end if
![Page 205: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/205.jpg)
Factorization of sparse matricesIntroductionElimination tree and Multifrontal approachEquivalent orderings and elimination treesSome parallel solversConcluding remarks
205 / 217
![Page 206: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/206.jpg)
Equivalent orderings of symmetric matrices
Let F be the filled matrix of a symmetric matrix A (that is,F = L + Lt , where A = LLT)G+(A) = G (F) is the associated filled graph.
Definition 2 (Equivalent orderings)
P and Q are said to be equivalent orderings iffG+(PAPT) = G+(QAQT)By extension, a permutation P is said to be an equivalent ordering ofa matrix A iff G+(PAPT) = G+(A)
It can be shown that an equivalent reordering also preserves theamount of arithmetic operations for sparse Cholesky factorization.
206 / 217
![Page 207: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/207.jpg)
Relation with elimination trees
I Let A be a reordered matrix, and G+(A) be its filled graphI In the elimination tree, any topological tree traversal (that
processes children before parents) corresponds to an equivalentordering P of A and the elimination tree of PAPT is identical tothat of A.
u w
x
y
z
v
w
yz
xu
v
uv
wx
yz
F
F
A =
1 3
2
4
5
6
with topological orderingElimination Tree of A
341
562
Graph G (A)+
207 / 217
![Page 208: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/208.jpg)
Tree rotations (Liu, 1988)
Definition 3
An ordering that does not introduce any fill is referred to as a perfectelimination ordering (or PEO in short)
Natural ordering is a PEO of the filled matrix F.
Theorem 2
For any node x of G+(A) = G (F), there exists a PEO on G (F) suchthat x is numbered last.
I Essence of tree rotations :I Nodes in the clique of x in F are numbered lastI Relative ordering of other nodes is preserved.
208 / 217
![Page 209: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/209.jpg)
Example of equivalent orderings
On the right-hand side tree rotation applied on w :(clique of w is {w , x} and for other nodes relative ordering w.r.t. treeon the left is preserved).
u
vz
wx
yF
F
u w
x
y
z
v
1
6
2
54
3
with a postordering
x
w
z
vy
u
xw
5
4
6
32
1
yu
zv
FF
F =
Remark: Tree rotations can help reducing the temporary memory usage!
![Page 210: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/210.jpg)
Example of equivalent orderings
On the right-hand side tree rotation applied on w :(clique of w is {w , x} and for other nodes relative ordering w.r.t. treeon the left is preserved).
u
vz
wx
yF
F
u w
x
y
z
v
1
6
2
54
3
with a postordering
x
w
z
vy
u
xw
5
4
6
32
1
yu
zv
FF
F =
Remark: Tree rotations can help reducing the temporary memory usage!
![Page 211: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/211.jpg)
Factorization of sparse matricesIntroductionElimination tree and Multifrontal approachEquivalent orderings and elimination treesSome parallel solversConcluding remarks
211 / 217
![Page 212: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/212.jpg)
Some parallel solvers
212 / 217
![Page 213: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/213.jpg)
Some (shared memory) sparse direct codes
Code Technique Scope Availability (www.)
BCSLIB Multifrontal SYM/UNS Boeing → Access Analytics
HSL MA87 Supernodal SPD cse.clrc.ac.uk/Activity/HSL
MA41 Multifrontal UNS cse.clrc.ac.uk/Activity/HSL
MA49 Multifr. QR RECT cse.clrc.ac.uk/Activity/HSL
PanelLLT Left-looking SPD NgPARDISO Left-right SYM/UNS SchenkPSL† Left-looking SPD/UNS SGI productSuperLU MT Left-looking UNS nersc.gov/∼xiaoye/SuperLU
SuiteSparseQR Multifr. QR RECT cise.ufl.edu/research/sparse/SPQR
TAUCS Left/Multifr. SYM/UNS tau.ac.il/∼stoledo/taucs
WSMP† Multifrontal SYM/UNS IBM product† Only object code is available.
213 / 217
![Page 214: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/214.jpg)
Distributed-memory sparse direct codes
Code Technique Scope Availability (www.)
DSCPACK Multifr./Fan-in SPD cse.psu.edu/∼raghavan/Dscpack
MUMPS Multifrontal SYM/UNS graal.ens-lyon.fr/MUMPS
mumps.enseeiht.fr
PaStiX Fan-in SPD labri.fr/perso/ramet/pastix
PSPASES Multifrontal SPD cs.umn.edu/∼mjoshi/pspases
SPOOLES Fan-in SYM/UNS netlib.org/linalg/spooles
SuperLU Fan-out UNS nersc.gov/∼xiaoye/SuperLU
S+ Fan-out† UNS cs.ucsb.edu/research/S+
WSMP † Multifrontal SYM IBM product‡ Only object code is available.
214 / 217
![Page 215: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/215.jpg)
Distributed-memory sparse direct codes
Code Technique Scope Availability (www.)
DSCPACK Multifr./Fan-in SPD cse.psu.edu/∼raghavan/Dscpack
MUMPS Multifrontal SYM/UNS graal.ens-lyon.fr/MUMPS
mumps.enseeiht.fr
PaStiX Fan-in SPD labri.fr/perso/ramet/pastix
PSPASES Multifrontal SPD cs.umn.edu/∼mjoshi/pspases
SPOOLES Fan-in SYM/UNS netlib.org/linalg/spooles
SuperLU Fan-out UNS nersc.gov/∼xiaoye/SuperLU
S+ Fan-out† UNS cs.ucsb.edu/research/S+
WSMP † Multifrontal SYM IBM product
Case study: Comparison of MUMPS and SuperLU
215 / 217
![Page 216: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/216.jpg)
Factorization of sparse matricesIntroductionElimination tree and Multifrontal approachEquivalent orderings and elimination treesSome parallel solversConcluding remarks
216 / 217
![Page 217: Sparse Linear Algebra: Direct Methods](https://reader036.vdocuments.us/reader036/viewer/2022090904/613c7d56c957d930775e49f4/html5/thumbnails/217.jpg)
Concluding remarks
I Key parameters in selecting a method1. Functionalities of the solver2. Characteristics of the matrix
I Numerical properties and pivoting.I Symmetric or generalI Pattern and density
3. Preprocessing of the matrixI ScalingI Reordering for minimizing fill-in
4. Target computer (architecture)
I Substantial gains can be achieved with an adequate solver: interms of numerical precision, computing and storage
I Good knowledge of matrix and solversI Many challenging problems
I Active research area
217 / 217