lecture 9. linear solveraice.sjtu.edu.cn/msda/data/courseware/spice/lect09... · let y = ux. 2....
TRANSCRIPT
2010-11-15 Slide 1
PRINCIPLES OF CIRCUIT SIMULAITONPRINCIPLES OF CIRCUIT SIMULAITON
Lecture 9. Lecture 9. Linear Solver:Linear Solver:
LU Solver and Sparse MatrixLU Solver and Sparse Matrix
Guoyong Shi, [email protected] of Microelectronics
Shanghai Jiao Tong UniversityFall 2010
2010-11-15 Lecture 9 slide 2
OutlineOutlinePart 1:• Gaussian Elimination• LU Factorization• Pivoting• Doolittle method and Crout’s Method• SummaryPart 2: Sparse Matrix
2010-11-15 Lecture 9 slide 3
MotivationMotivation• Either in Sparse Tableau Analysis (STA) or in
Modified Nodal Analysis (MNA), we have to solve linear system of equations: Ax = b
0 0 00 0
0
T
i v
A iI A v
K K e S
⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟ ⎜ ⎟ ⎜ ⎟− =⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎜ ⎟⎜ ⎟ ⎝ ⎠ ⎝ ⎠⎝ ⎠ 2 2
1 3 3 1
52
3 4 3
1 1 10
1 1 1 S
G GR R R e
IeR R R
⎡ ⎤+ + − −⎢ ⎥ ⎛ ⎞⎛ ⎞⎢ ⎥ = ⎜ ⎟⎜ ⎟⎢ ⎥ ⎝ ⎠ ⎝ ⎠− +⎢ ⎥⎣ ⎦
2010-11-15 Lecture 9 slide 4
MotivationMotivation• Even in nonlinear circuit analysis, after
"linearization", again one has to solve a system of linear equations: Ax = b.
• Many other engineering problems require solving a system of linear equations.
• Typically, matrix size is of 1000s to millions. • This needs to be solved 1000 to million times
for one simulation cycle.• That's why we'd like to have very efficient
linear solvers!
2010-11-15 Lecture 9 slide 5
Problem DescriptionProblem DescriptionProblem:
Solve Ax = bA: nxn (real, non-singular), x: nx1, b: nx1
Methods:– Direct Methods (this lecture)
Gaussian Elimination, LU Decomposition, Crout– Indirect, Iterative Methods (another lecture)
Gauss-Jacobi, Gauss-Seidel, Successive Over Relaxation (SOR), Krylov
2010-11-15 Lecture 9 slide 6
Gaussian Elimination Gaussian Elimination ---- ExampleExample2 5
2 4x y
x y+ =⎧
⎨ + =⎩
2 1 51 2 4⎛ ⎞⎜ ⎟⎝ ⎠
12
−2 1 5
3 302 2
⎛ ⎞⎜ ⎟⎜ ⎟⎝ ⎠
21
xy=⎧
⎨ =⎩
1 0 2 12 11 31 2 1 02 2
⎛ ⎞⎛ ⎞⎛ ⎞ ⎜ ⎟⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
LU factorization
2010-11-15 Lecture 9 slide 7
Use of LU FactorizationUse of LU Factorization
1 0 2 12 1 51 31 2 41 02 2
x xy y
⎛ ⎞⎛ ⎞⎛ ⎞⎛ ⎞ ⎛ ⎞ ⎛ ⎞⎜ ⎟⎜ ⎟= =⎜ ⎟⎜ ⎟ ⎜ ⎟ ⎜ ⎟⎜ ⎟⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠ ⎝ ⎠⎝ ⎠⎝ ⎠
2 1: 30
2
u xv y
⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟=⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠1 0 51 412
uv
⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
Define
532
uv
⎛ ⎞⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Triangle systems are easy to solve (by back-substitution.)
Solve
L U
Solving the L-system:
L
2010-11-15 Lecture 9 slide 8
Use of LU FactorizationUse of LU Factorization
2 1302
x uy v
⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
1 0 51 412
uv
⎛ ⎞⎛ ⎞ ⎛ ⎞⎜ ⎟ =⎜ ⎟ ⎜ ⎟⎜ ⎟⎝ ⎠ ⎝ ⎠⎝ ⎠
532
uv
⎛ ⎞⎛ ⎞ ⎜ ⎟=⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
2 1 53 302 2
xy
⎛ ⎞ ⎛ ⎞⎛ ⎞⎜ ⎟ ⎜ ⎟=⎜ ⎟⎜ ⎟ ⎜ ⎟⎝ ⎠⎝ ⎠ ⎝ ⎠
21
xy⎛ ⎞ ⎛ ⎞
=⎜ ⎟ ⎜ ⎟⎝ ⎠ ⎝ ⎠
Solving the U-system:
UU
2010-11-15 Lecture 9 slide 9
LU FactorizationLU Factorization
( )Ax L Ux Ly b= = =
1. Let y = Ux. 2. Solve y from Ly = b3. Solve x from Ux = y
The task of L & U factorization is to find the elements in matrices L and U.
== LUA LU
2010-11-15 Lecture 9 slide 10
Advantages of LU FactorizationAdvantages of LU Factorization• When solving Ax = b for multiple b, but the
same A, then we only LU-factorize A only once.
• In circuit simulation, entries of A may change, but structure of A does not alter. – This factor can used to speed up repeated LU-
factorization.– Implemented as symbolic factorization in the
“sparse1.3” solver in Spice 3f4.
2010-11-15 Lecture 9 slide 11
Gaussian EliminationGaussian Elimination• Gaussian elimination is a process of row
transformation
Ax = b11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
1
2
3
n
bbb
b
Eliminate the lower triangular part
2010-11-15 Lecture 9 slide 12
Gaussian EliminationGaussian Elimination
11 12 13 1
22 23 2
32 33 3
2 3
00
0
n
n
n
n n nn
a a a aa a aa a a
a a a
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
1
2
3
n
bbb
b
11 0a ≠
1
2
3
n
bbb
b
1
11
iaa
−
Entries updated
2010-11-15 Lecture 9 slide 13
Eliminating 1Eliminating 1stst ColumnColumn• Column elimination is equiv to row transformation
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
2 1
1 1
3 1
1 1
1
1 1
1 0 0 0
1 0 0
0 1 0
0 0 1n
aaaa
aa
⎡ ⎤⎢ ⎥⎢ ⎥−⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥−⎢ ⎥
⎣ ⎦
X
L1-1 A = A(2)
11 12 13 1(2) (2) (2)22 23 2(2) (2) (2)32 33 3
(2) (2) (2)2 3
0
0
0
n
n
n
n n nn
a a a a
a a a
a a a
a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
n
bbb
b
1(2)2(2)3
(2)n
b
b
b
b
2010-11-15 Lecture 9 slide 14
11 12 13 1(2) (2) (2)22 23 2
(3) (3)33 3
(3) (3)3
0
0 0
0 0
n
n
n
n nn
a a a a
a a a
a a
a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1(2) (2) (2)22 23 2(2) (2) (2)32 33 3
(2) (2) (2)2 3
0
0
0
n
n
n
n n nn
a a a a
a a a
a a a
a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Eliminating 2Eliminating 2ndnd ColumnColumn
( 2 )3 2( 2 )2 2
( 2 )2
( 2 )2 2
1 0 0 00 1 0 0
1 0
0 0 1n
aa
aa
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥
−⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥
−⎢ ⎥⎣ ⎦
X
L2-1 A(2) = A(3)
1(2)2(3)3
(3)n
b
b
b
b
(2)22 0a ≠
2010-11-15 Lecture 9 slide 15
Continue on EliminationContinue on Elimination• Suppose all diagonals are nonzero
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1
2
3
n
bbb
b
Ln-1 Ln-1
-1 ••• L2-1 L1
-1
11 12 13 1(2) (2) (2)22 23 2
(3) (3)33 3
( )
0
0 0
0 0 0
n
n
n
nnn
a a a a
a a a
a a
a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
1(2)2(3)3
( )nn
b
b
b
b
A(n) x = b(n)
Upper triangular
A(n)
2010-11-15 Lecture 9 slide 16
Triangular SystemTriangular System
11 12 13 1 11(2) (2) (2) (2)22 23 2 22
(3) (3) (3)333 3 3
( ) ( )
0
0 0
0 0 0
n
n
n
n nnnn n
a a a a bxa a a bx
xa a b
xa b
⎡ ⎤ ⎛ ⎞⎛ ⎞⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ =⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟
⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎝ ⎠
Gaussian elimination ends up with the following upper triangular system of equations
Solve this system from bottom up: xn, xn-1, ..., x1
2010-11-15 Lecture 9 slide 17
LU FactorizationLU Factorization• Gaussian elimination leads to LU factorization
( Ln-1 Ln-1
-1 ••• L2-1 L1
-1 ) A = U
A = (L1 L2 ••• Ln-1 Ln) U = LU
2 1
1 1( 2 )
3 1 3 2( 2 )
1 1 2 2
( 2 )1 2
( 2 )1 1 2 2
1 0 0 0
1 0 0
1 0
* 1n n
aa
a aa a
a aa a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1(2) (2) (2)22 23 2
(3) (3)33 3
( )
0
0 0
0 0 0
n
n
n
nnn
a a a a
a a a
a a
a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
L = (L1 L2 ••• Ln-1 Ln)
2010-11-15 Lecture 9 slide 18
Complexity of LUComplexity of LU11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
11 0a ≠21
11
aa
−
31
11
aa
−
1
11
naa
−
# of mul / div = (n-1)*n ≈ n2
2 3
1
1 ( 1)(2 1) ( )6
n
i
i n n n O n=
= + +∑ ∼
n >> 1
2010-11-15 Lecture 9 slide 19
Cost of BackCost of Back--SubstitutionSubstitution
11 12 13 1 11(2) (2) (2) (2)22 23 2 22
(3) (3) (3)333 3 3
( ) ( )
0
0 0
0 0 0
n
n
n
n nnnn n
a a a a bxa a a bx
xa a b
xa b
⎡ ⎤ ⎛ ⎞⎛ ⎞⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ =⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟⎜ ⎟⎢ ⎥ ⎜ ⎟
⎜ ⎟ ⎜ ⎟⎢ ⎥ ⎝ ⎠⎣ ⎦ ⎝ ⎠
( )
( )
nn
n nnn
bxa
=( 1) ( 1)
1 1,1 ( 1)
1, 1
n nn n n n
n nn n
b a xx
a
− −− −
− −− −
−=
Total # of mul / div = 2
1
1 ( 1) ( )2
n
i
i n n O n=
= +∑ ∼
2010-11-15 Lecture 9 slide 20
Zero DiagonalZero Diagonal
⎥⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
××××
−
−
××××
00000000
101100
011100
00000000
RR
RR
Example 1: After two steps of Gaussian elimination:
⎥⎥⎥⎥⎥⎥⎥⎥⎥
⎦
⎤
⎢⎢⎢⎢⎢⎢⎢⎢⎢
⎣
⎡
××××
−
××××
00000000
110000
011100
00000000
RR
Gaussian elimination cannot continue
2010-11-15 Lecture 9 slide 21
PivotingPivotingSolution 1:Interchange rows to bring a non-zero element into position (k, k):
Solution 2: How about column exchange? YesThen the unknowns are re-ordered as well.
⎡ ⎤⎢ ⎥× ×⎢ ⎥⎢ ⎥× ×⎣ ⎦
0 1 100
⎡ ⎤⎢ ⎥× ×⎢ ⎥⎢ ⎥× ×⎣ ⎦
0 1 100
1 0 100
x xx x
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
00 1 1
0
x x
x x
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
In general both rows and columns can be exchanged!
2010-11-15 Lecture 9 slide 22
Small DiagonalSmall DiagonalExample 2: ⎥
⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
⎥⎥⎦
⎤
⎢⎢⎣
⎡ × −
7525.6
5.125.1225.11025.1
2
14
xx
⎥⎥⎦
⎤
⎢⎢⎣
⎡
×−=⎥
⎦
⎤⎢⎣
⎡
⎥⎥⎦
⎤
⎢⎢⎣
⎡
×−
× −
52
15
4
1025.6
25.6
1025.10
25.11025.1xx
⎪⎩
⎪⎨⎧
=+×
=− 25.6)25.1()1025.1(
5
214
2
xx
x01 =x
Assume finite arithmetic: 3-digit floating point, we have
12.5 rounded off
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
⎥⎥⎦
⎤
⎢⎢⎣
⎡ × −
7525.6
5.125.1225.11025.1
2
14
xx
pivoting
510−
Unfortunately, (0, 5) is not the solution. Considering the 2nd
equation: 12.5 * 0 + 12.5 * 5 = 62.5 ≠ 75.
2010-11-15 Lecture 9 slide 23
Accuracy Depends on PivotingAccuracy Depends on Pivoting
Reason:a11 (the pivot) is too small relative to the other numbers!
Solution: Don't choose small element to do elimination. Pick a large element by row / column interchanges.
Correct solution to 5 digit accuracy isx1 = 1.0001x2 = 5.0000
⎥⎦
⎤⎢⎣
⎡=⎥
⎦
⎤⎢⎣
⎡
⎥⎥⎦
⎤
⎢⎢⎣
⎡ × −
7525.6
5.125.1225.11025.1
2
14
xx
pivoting
510−
2010-11-15 Lecture 9 slide 24
What causes accuracy problem?What causes accuracy problem?• Ill Conditioning: The A matrix close to singular• Round-off error: Relative magnitude too big
01
=−=+
yxyx
01.001.10=−
=−yx
yx 0=− yx
resulting in numerical errors
0=− yx
1=+ yx
ill-conditioned
2010-11-15 Lecture 9 slide 25
Pivoting StrategiesPivoting Strategies1. Partial Pivoting
2. Complete Pivoting
3. Threshold Pivoting
2010-11-15 Lecture 9 slide 26
Pivoting Strategy 1Pivoting Strategy 11. Partial Pivoting: (Row interchange only)
Choose r as the smallest integer such that:
)(
,...,
)( max kjk
nkj
krk aa
==
Search Area
U
L rk
k
Rows k to n
2010-11-15 Lecture 9 slide 27
Pivoting Strategy 2Pivoting Strategy 2
2. Complete Pivoting: (Row and column interchange)Choose r and s as the smallest integer such that:
)(
,...,,...,
)( max kij
nkjnki
krs aa
==
=
Search Area
rows k to n;cols k to n
U
L r
s
k
k
2010-11-15 Lecture 9 slide 28
Pivoting Strategy 3Pivoting Strategy 3
3. Threshold Pivoting:a. Apply partial pivoting only ifb. Apply complete pivoting only if
)()( krkp
kkk aa ε<
)()( krsp
kkk aa ε<
U
L r
s
)(
,...,
)( max kjk
nkj
krk aa
==
)(
,...,,...,
)( max kij
nkjnki
krs aa
==
=
Implemented in Spice 3f4
user specified
2010-11-15 Lecture 9 slide 29
Variants of LU FactorizationVariants of LU Factorization• Doolittle Method• Crout Method• Motivated by directly filling in L/U elements in the
storage space of the original matrix "A".
== LUAL
U21
31 32
1 2 3
1 0 0 01 0 0
1 0
1n n n
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
22 23 2
33 3
00 0
0 0 0
n
n
n
nn
u u u uu u u
u u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Reuse the storage
2010-11-15 Lecture 9 slide 30
Variants of LU FactorizationVariants of LU Factorization
== LUAL
U
21
31 32
1 2 3
1 0 0 01 0 0
1 0
1n n n
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
22 23 2
33 3
00 0
0 0 0
n
n
n
nn
u u u uu u u
u u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
Reuse the storage
Hence we need a sequential method to process the rows and columns of A in certain order – processed rows / columns are not used in the later processing.
2010-11-15 Lecture 9 slide 31
Doolittle Method Doolittle Method –– 1 1
21
31 32
1 2 3
1 0 0 01 0 0
1 0
1n n n
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
22 23 2
33 3
00 0
0 0 0
n
n
n
nn
u u u uu u u
u u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
( )11 12 13 1na a a a( )11 12 13 1nu u u u =
First solve the 1st row of U, i.e., U(1, :)
=
Keep this row
2010-11-15 Lecture 9 slide 32
Doolittle Method Doolittle Method –– 2 2
Then solve the 1st column of L, i.e., L(2:n, 1)
11 12 13 1
22 23 2
33 3
00 0
0 0 0
n
n
n
nn
u u u uu u u
u u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
21
31 32
1 2 3
1 0 0 01 0 0
1 0
1n n n
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
=
21
31
1n
aa
a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
21
3111
1n
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
=
11 11u a=
2010-11-15 Lecture 9 slide 33
Doolittle Method Doolittle Method –– 3 3
21
31 32
1 2 3
1 0 0 01 0 0
1 0
1n n n
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
22 23 2
33 3
00 0
0 0 0
n
n
n
nn
u u u uu u u
u u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
( )22 23 2na a a
( ) ( )21 12 13 1 22 23 2n nu u u u u u+
=
Solve the 2nd row of U, i.e., U(2, 2:n)
=
(1)
(2)
(3)
2010-11-15 Lecture 9 slide 34
Doolittle Method Doolittle Method –– 4 4
=UL \ 135
2 4 6
21
31 32
1 2 3
1 0 0 01 0 0
1 0
1n n n
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
11 12 13 1
22 23 2
33 3
00 0
0 0 0
n
n
n
nn
u u u uu u u
u u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
The computation order of the Doolittle Method:
11 12 13 1
21 22 23 2
31 32 33 3
1 2 3
n
n
n
n n n nn
a a a aa a a aa a a a
a a a a
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
=
2010-11-15 Lecture 9 slide 35
CroutCrout MethodMethod• Similar to the Doolittle Method, but starts from the 1st
column (Doolittle starts from the 1st row.)
=UL \ 246
1 3 5
11
21 22
31 32 33
1 2 3
0 0 00 0
0
n n n nn
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
The diagonals of U are normalized !
== LUAL
U
The computation order of the Crout Method:
12 13 1
23 2
3
10 10 0 1
0 0 0 1
n
n
n
u u uu u
u
⎡ ⎤⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎢ ⎥⎣ ⎦
2010-11-15 Lecture 9 slide 36
Storage of LU FactorizationStorage of LU Factorization
• In sparse matrix implementation, this type of storage requires increasing memory space because of fill-ins during the factorization.
)3(A
U
L
U
L )4(A
Using only one 2-dimensional array !
2010-11-15 Lecture 9 slide 37
SummarySummary• LU factorization has been used in virtually all
circuit simulators – Good for multiple RHS and sensitivity calculation
• Pivoting is required to handle zero diagonals and to improve numerical accuracy– Partial pivoting (row exchange): tradeoff between
accuracy and efficiency– Matrix condition number is used to analyze the
effect of round-off errors and numerical stability
2010-11-15 Slide 38
PRINCIPLES OF CIRCUIT SIMULAITONPRINCIPLES OF CIRCUIT SIMULAITON
Part 2. Part 2. Programming Techniques Programming Techniques
for Sparse Matrices for Sparse Matrices
2010-11-15 Lecture 9 slide 39
OutlineOutline• Why Sparse Matrix Techniques?• Sparse Matrix Data Structure• Markowitz Pivoting• Diagonal Pivoting for MNA Matrices• Modified Markowitz pivoting• How to Handle Sparse RHS• Summary
2010-11-15 Lecture 9 slide 40
Why Sparse Matrix?Why Sparse Matrix?
Motivation:– n = 103 equations– Complexity of Gaussian elimination ~ O(n3) – n = 103 ~ 109 flops operations
(1 GHz computer) 10 secstorage 106 words
Exploiting Sparsity– MNA 3 nonzeros / row– Can reach complexity for Gaussian elimination
• ~ O(n1.1) – O(n1.5) (Empirical complexity)
2010-11-15 Lecture 9 slide 41
Sparse Matrix ProgrammingSparse Matrix Programming• Use linked-list data structure
– to avoid storing zeros – used to be hard before 1980s: in Fortran!
• Avoid trivial operations 0x = 0, 0+x = x• Two kinds of zero
– Structural zeros – always 0 independent of numerical operations
– Numerical zeros – resulting from computation• Avoid losing sparsity (very important!)
– sparsity changes with pivoting
2010-11-15 Lecture 9 slide 42
NonNon--zero Fillzero Fill--insins• Gaussian elimination causes nonzero fill-ins
3 4 5 0 6 0 0 1 2 8
5 0 -3 2 6 1 2 7 0 03 0 0 0 -1 0 0 3 -2 4
0 4 8 0 10 0 0 2 2 31 5 4 0 0 0 0 1 0 6
3 4 5 0 6 0 0 1 2 8
5 0 -3 2 6 1 2 7 0 00 -4 -5 0 -7 0 0 2 -4 -4
0 4 8 0 10 0 0 2 2 31 5 4 0 0 0 0 1 0 6
-1
fill-ins
2010-11-15 Lecture 9 slide 43
How to Maintain How to Maintain SparsitySparsity• One should choose appropriate pivoting (during
Gaussian Elimination, G.E.) to avoid large increment of fill-ins.
x x x xx x x x
x x x xx x x x
x x x xx x x x x x x x x x x
fill-ins
after GEbefore GE
x x x x x x x x x x x xx x x x x xx x x x x xx x x x x xx x x x x xx x x x x x
before GE after GE
x
xx
xx
After row/colreordering no fill-ins
introduced
2010-11-15 Lecture 9 slide 44
Markowitz CriterionMarkowitz Criterion• Markowitz criterion
– kth pivot; – A(k) is the reduced matrix– NZ = nonzero– The num of nonzeros in a row (column) is also called the row
(column) degree.– The column degrees can be used for column ordering.
A(k)
# NZ this col (column degrees)
# NZ this row
(row degrees)
x x x x 4xx xx xx 3
x x 2xx xx 2
x x 23 3 2 2 3 ci \ ri
If chosen for pivoting
2010-11-15 Lecture 9 slide 45
Markowitz ProductMarkowitz Product• If Gaussian Elimination to pivot on (i, j)• Markowitz product = (ri –1)(cj-1)
= maximum possible number of fill-ins if pivoting at (i, j)
• Recommendations: (implemented in Sparse1.3)– Best with largest magnitude of pivot element and
smallest Markowitz product– Try threshold test after choosing smallest
Markowitz product (M.P.)– Break ties (if equal M.P.) by choosing element with
largest magnitude
2010-11-15 Lecture 9 slide 46
Sparse Matrix Data StructureSparse Matrix Data StructureExample Matrix
r \ c 1 2 31.2 0
01.7
1.50
1
02.1
1
23
struct elem{real value;int row;int col;struct elem *next_in_row;struct elem *next_in_col;
} Element;
Matrix Element structure
2010-11-15 Lecture 9 slide 47
Data Structure in Sparse 1.3Data Structure in Sparse 1.3• Sparse 1.3 – Written by Ken Kundert, 1985~1988, then PhD
student at Berkeley, later with Cadence Design Systems, Inc.
2.1 3 1
value row col
FirstInRow[1] 1.0 1 1 1.2 1 2diag[1]
FirstInRow[2] 1.5 2 2diag[2]
FirstInRow[3] 2.1 3 1 1.7 3 3diag[3]
FirstInCol[1] FirstInCol[2] FirstInCol[3]
2010-11-15 Lecture 9 slide 48
ASTAP Data StructureASTAP Data Structure• ASTAP is an IBM simulator using STA (Sparse
Tableau Analysis).
1.70
03
01.5
1.22
2.10
11
32
1r \ c
Row Pointers point to the beginning of Col Indices.
Nonzeros in the same row are indexed by their col indexes continuously.
Used by many iterative sparse solvers
values stored row-wise
Row Pointers 1 3 4 6
Col Indices 1 2 2 1 3
Values 1.0 1.2 1.5 2.1 1.7
1 2 3 4 ...-1
-1
2010-11-15 Lecture 9 slide 49
Key Loops in a SPICE ProgramKey Loops in a SPICE Program
+ + +
−+ + +
=
= + ⋅ +
∂⎡ ⎤ ⎡ ⎤= − +⎢ ⎥ ⎣ ⎦∂⎣ ⎦
1 1 1
( ) ( ) ( 1)1 1 1
( , )
( , )n n n n
k k kn n n
dxC f x tdt
Cx Cx h f x tfCx x xx
( )−+ +∂
=∂
( 1)1 1,k
n nf x tA
x
Newton-Raphson(at point x)
Invoke linear solver
x := x + x := x + ΔΔxx
Update stamps related to time
t := t + t := t + ΔΔtt
2010-11-15 Lecture 9 slide 50
Linear Solves in SimulationLinear Solves in Simulation
time t
time points
At each time point, Ax = b has to be solved for many times
Newton-Raphson(at point x)
Invoke linear solver
x := x + x := x + ΔΔxx
Update stamps related to time
t := t + t := t + ΔΔtt
2010-11-15 Lecture 9 slide 51
Structure of Matrix StampsStructure of Matrix Stamps• In circuit simulation, matrix being solved
repeatedly is of the same structur;• only some entries vary at different frequency
or time points.
C = ConstantT = Time varyingX = Nonlinear (varying even at the same time point)
Typical matrix structure
XX X
XX
T CT
A = T CT
C CC
2010-11-15 Lecture 9 slide 52
Strategies for EfficiencyStrategies for Efficiency• Utilizing the structural information can greatly
improve the solving efficiency.
• Strategies:– Weighted Markowitz Product– Reuse the LU factorization– Iterative solver (by conditioning)– ...
2010-11-15 Lecture 9 slide 53
A Good (Sparse) LU SolverA Good (Sparse) LU SolverProperties of a good LU solver:• Should have a good column ordering algorithm.• With a good column ordering, partial (row)
pivoting would be enough !• Should have an ordering/elimination separated
design:– i.e., ordering is separated from elimination. – SuperLU does this, – but Sparse1.3 doesn’t.
2010-11-15 Lecture 9 slide 54
Optimal Ordering is NPOptimal Ordering is NP--hardhard• The ordering has a significant impact on the
memory and computational requirements for the latter stages.
• However, finding the optimal ordering for A(in the sense of minimizing fill-in) has been proven to be NP-complete.
• Heuristics must be used for all but simple (or specially structured) cases.
M.R. Garey and D.S. Johnson, Computers and Intractibility: A Guide to the Theory of NP-CompletenessW.H. Freeman, New York, 1979.
2010-11-15 Lecture 9 slide 55
Column OrderingColumn OrderingWhy Important ?• A good column ordering greatly reduces the
number of fill-ins, resulting in a vast speedup.• However, searching a pivot with minimum
degree at each step (in Sparse 1.3) is not efficient.
• Best to get a good ordering before elimination (e.g. SuperLU), but not easy!
2010-11-15 Lecture 9 slide 56
Available Ordering AlgorithmsAvailable Ordering AlgorithmsSuperLU uses the following algorithms:
• Multiple Minimum Degree (MMD) applied to the structure of (ATA). – Mostly good
• Multiple Minimum Degree (MMD) applied to the structure of (AT+A). – Mostly good
• Column Approximate Minimum Degree (COLAMD). – Mostly not good!
2010-11-15 Lecture 9 slide 57
SummarySummary• Exploiting sparsity reduces CPU time and
memory• Markowitz algorithm reflects a good tradeoff
between overhead (computation of MP) and savings (less fill-ins)
• Use weighted Markowitz to account for different types of element stamps in nonlinear dynamic circuit simulation
• Consider sparse RHS and selective unknowns for speedup
2010-11-15 Lecture 9 slide 58
NoNo--turnturn--in Exercisein Exercise• Spice3f4 contains a solver called Sparse 1.3 (in
src/lib/sparse)• This is a independent solver that can be used outside
Spice3f4.• Download the sparse package from the course web
page (sparse.tar.gz) (or ask TA).• Find the test program called "spTest.c".• Modify this program if necessary so that you can run
the solver.• Create some test matrices to test the sparse solver.• Compare the solved results to that by MATLAB.
2010-11-15 Lecture 9 slide 59
SoftwareSoftware• Sparse1.3 is in C and was programmed by Dr. Ken
Kundert (fellow of Cadence; architect of Spectre). • Source code is available from
http://www.netlib.org/sparse/• SparseLib++ is in C++ and comes from NIST. The
authors are J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington.
• See``A Sparse Matrix Library in C++ for High Performance Architectures”, Proc. of the Second Object Oriented Numerics Conference, pp. 214-218, 1994.
• The paper and the C++ source code are available from http://math.nist.gov/sparselib%2b%2b/
2010-11-15 Lecture 9 slide 60
ReferencesReferences1. G. Dahlquist and A. Bjorck, Numerical Methods
(translated by N. Anderson), Prentice Hall, Inc. Englewood Cliffs, New Jersey, 1974.
2. W. J. McCalla, Fundamentals of Computer-Aided Circuit Simulation, Kluwer Academic Publishers. 1. Chapter 3, “Sparse Matrix Methods”
3. Albert Ruehli (Ed.), “Circuit Analysis, Simulation and Design”, North-Holland, 1986. 1. K. Kundert, “Sparse Matrix Techniques”
4. J. Dongarra, A. Loumsdaine, R. Pozo, K. Remington, ``A Sparse Matrix Library in C++ for High Performance Architectures,” Proc. of the Second Object Oriented Numerics Conference, pp. 214-218, 1994.