methods to invert a matrix - cleveland state university info on web... · 2013-03-20 · lecture 3:...
TRANSCRIPT
Lecture 3: Determinants & Matrix Inversion
Methods to Invert a MatrixThe approaches available to find the inverse of a matrix are extensive and diverse. All methods seek tosolve a linear system of equations that can be expressed in a matrix format as
[ ]{ } { }bxA =
for the unknowns contained in the vector {x}, i.e.,
[ ]{ } { }bxA
{ } [ ] { }bAx 1−=
The methods used to accomplish this can be loosely grouped into the following three categories:
• methods that explicitly calculate {x}
{ } [ ] { }bAx
p y { }• methods that implicitly calculate {x}, and• iterative methods that calculate {x}
Of course hybrid methods exist that are combinations of two or methods in the categories listed above.y g
Lecture 3: Determinants & Matrix Inversion
Consider the following list of methods which is not comprehensive:
1. Explicit methods for sparse matrices – includes Cramer’s rule which is a specific case of using p p p gself adjoint matrices. Variations include:
a. Gauss elimination- Row echelon form; all entries below a nonzero entry in the matrix are zero - Bareiss algorithm; every element computed is the determinant of a [A]g y p [ ]- Tri-diagonal matrix algorithm; special form of Gauss elimination
b. Gauss-Jordan elimination 2. LDU decomposition – an implicit method that factors [A] into a product of a lower and upper
triangular matrices and a diagonal matrix. Variations includeg ga. LU reduction – a special parallelized version of a LDU decomposition algorithm
- Crout matrix decomposition is a special type of LU decomposition3. Cholesky LDL decomposition – an implicit method that decomposes [A], when it is a positive-
definite matrix, into the product of a lower triangular matrix, a diagonal matrix and the conjugate p g g j gtranspose
a. Frontal solvers used in finite element methods b. Nested dissection – for symmetric matrices, based on graph partitioning c. Minimum degree algorithm d. Symbolic Cholesky decomposition
Lecture 3: Determinants & Matrix Inversion
5. Iterative methods: a. Gauss-Seidel methods
• Successive over relaxation (SOR)( )• Back fit algorithms
b. Conjugate gradient methods (CG) – used often in optimization problems • Nonlinear conjugate gradient method• Biconjugate gradient method (BiCG) j g g ( )• Biconjugate gradient stabilized method (BiCGSTAB)• Conjugate residual method
c. Jacobi method d. Modified Richardson iteration e. Generalized minimal residual method (GMRES) – based on the Arnoldi iteration f. Chebyshev iteration – avoids inner products but needs bounds on the spectrum g. Stone's method (SIP: Strongly Implicit Procedure) – uses an incomplete LU decomposition h. Kaczmarz method i. Iterative refinement – procedure to turn an inaccurate solution in a more accurate one
6. Levinson recursion – for Toeplitz matrices 7. SPIKE algorithm – hybrid parallel solver for narrow-banded matrices
Details and derivations are presented for several of the methods in this section of the notes. Concepts associated with determinants, cofactors and minors are presented first.
Lecture 3: Determinants & Matrix Inversion
The Determinant of a Square Matrix
A square matrix of order n (an n x n matrix), i.e.,
[ ]
n
aaaaaa
K
K
22221
11211
i l d fi d l h i d i d h d i f h i
[ ]
=
nnnn
n
aaa
aaaA
L
MOMM
K
21
22221
possesses a uniquely defined scalar that is designated as the determinant of the matrix, or merely the determinant
[ ] AA =det [ ]
Observe that only square matrices possess determinants.
Lecture 3: Determinants & Matrix Inversion
Vertical lines and not brackets designate a determinant, and while det[A] is a number and has no elements, it is customary to represent it as an array of elements of the matrix
[ ] n
n
aaaaaa
AMOMM
K
K
22221
11211
det =
A general procedure for finding the value of a determinant sometimes is called “expansion
nnnn aaa L
MOMM
21
A general procedure for finding the value of a determinant sometimes is called expansion by minors.” We will discuss this method after going over some ground rules for operating with determinants.
Lecture 3: Determinants & Matrix Inversion
Rules for Operating with Determinants
Rules pertaining to the manipulation of determinants are presented in this section without p g p pformal proof. Their validity is demonstrated through examples presented at the end of the section.
Rule #1: Interchanging any row (or column) of a determinant with its immediate adjacent g g y ( ) jrow (or column) flips the sign of the determinant.
Rule #2: The multiplication of any single row (column) of determinant by a scalar constant is equivalent to the multiplication of the determinant by the scalar.
Rule #3: If any two rows (columns) of a determinant are identical, the value of the determinant is zero and the matrix from which the determinant is derived is said to be singular.
Rule #4: If any row (column) of a determinant contains nothing but zeroes then the matrix from which the determinant is derived is singular.
Rule #5: If any two rows (two columns) of a determinant are proportional i e the twoRule #5: If any two rows (two columns) of a determinant are proportional, i.e., the two rows (two columns) are linearly dependent, then the determinant is zero and the matrix from which the determinant is derived is singular.
Lecture 3: Determinants & Matrix Inversion
Rule #6: If the elements of any row (column) of a determinant are added to or subtracted from the corresponding elements of another row (column) the value of the determinant is unchangedunchanged.
Rule #6a: If the elements of any row (column) of a determinant are multiplied by a constant and then added or subtracted from the corresponding elements of another row (column), the value of the determinant is unchangedvalue of the determinant is unchanged.
Rule #7: The value of the determinant of a diagonal matrix is equal to the product of the terms on the diagonal.
Rule #8: The value for the determinant of a matrix is equal to the value of the determinant of the transpose of the matrix.
Rule #9: The determinant of the product of two matrices is equal to the product of the d i f h ideterminants of the two matrices.
Rule #10: If the determinant of the product of two square matrices is zero, then at least one of the two matrices is singular.
Rule #11: If an m x n rectangular matrix A is post-multiplied by an n x m rectangular matrix B, the resulting square matrix [C] = [A][B] of order m will, in general, be singular if m > n.
Lecture 3: Determinants & Matrix Inversion
In Class Example
Lecture 3: Determinants & Matrix Inversion
Minors and CofactorsConsider the nth order determinant:
[ ] n
n
aaaaaa
AMOMM
K
K
22221
11211
det =
nnnn aaa L
MOMM
21
The mth order minor of the nth order matrix is the determinant formed by deleting ( n – m )d ( ) l i h th d d i F l h i |M| f hrows and ( n – m ) columns in the nth order determinant. For example the minor |M|ir of the
determinant |A| is formed by deleting the ith row and the rth column. Because |A| is an nth
order determinant, the minor |M|ir is of order m = n – 1 and contains m2 elements.
I l i f d b d l ti d l i th th d d d t i tIn general, a minor formed by deleting p rows and p columns in the nth ordered determinant |A| is an (n – p)th order minor. If p = n – 1, the minor is of first order and contains only a single element from |A|.
F thi it i t th t th d t i t |A| t i 2 l t f fi t d iFrom this it is easy to see that the determinant |A| contains n2 elements of first order minors, each containing a single element.
Lecture 3: Determinants & Matrix Inversion
When dealing with minors other than the (n – 1)th order, the designation of the eliminated rows and columns of the determinant |A| must be considered carefully. It is best to consider | | yconsecutive rows j, k, l, m … and consecutive columns r, s, t, u … so that the (n – 1)th, (n – 2)th, and (n – 3)th order minors would be designated, respectively, as |M|j,r, |M|jk,rs and |M|jkl,rst.
The complementary minor, or the complement of the minor, is designated as |N| (with subscripts). This minor is the determinant formed by placing the elements that lie at the intersections of the deleted rows and columns of the original determinant into a square array in the same order that they appear in the original determinant For example given thein the same order that they appear in the original determinant. For example, given the determinant from the previous page, then
23aN =
23213123
2323
aaN
aN
=3331
31,23 aa
Lecture 3: Determinants & Matrix Inversion
The algebraic complement of the minor |M| is the “signed” complementary minor. If a minor is obtained by deleting rows i, k, l and columns r, s, t from the determinant |A| the minor is designatedminor is designated
rstiklM
,
the complementary minor is designatedthe complementary minor is designated
rstiklN
,
and the algebraic complement is designatedand the algebraic complement is designated
( )rstikl
tsrlki N,
1 ++++++− L
hThe cofactor, designated with capital letters and subscripts, is the signed (n – 1)th minorformed from the nth order determinant. Suppose the that the (n – 1)th order minor is formed by deleting the ith row and jth column from the determinant |A|. Then corresponding cofactorisis
( )ij
jiij MA +−= 1
Lecture 3: Determinants & Matrix Inversion
Observe the cofactor has no meaning for minors with orders smaller than (n – 1) unless the minor itself is being treated as a determinant of order one less than the determinant |A| from which it was derived.which it was derived.
Also observe that when the minor is order (n – 1), the product of the cofactor and the complement is equal to the product of the minor and the algebraic complement.
W bl th f t f t i f d ( t i ) i tWe can assemble the cofactors of a square matrix of order n (an n x n matrix) into a square cofactor matrix, i.e.,
nAAA K 11211
[ ]
=
nnnn
nC
AAA
AAAA
L
MOMM
K
21
22221
So when the elements of a matrix are denoted with capital letters the matrix represents a matrix of cofactors for another matrix.
Lecture 3: Determinants & Matrix Inversion
In Class Example
Lecture 3: Determinants & Matrix Inversion
Rules for Operations with CofactorsThe determinant for a three by three matrix can be computed via the expansion of the matrix b minors as follo s:by minors as follows:
[ ] 131231
131221
232211232221
131211
detaaaa
aaaaa
aaaaa
aaaaaaa
A +−==232233323332
333231
aaaaaaaaa
This can be confirmed using the classic expansion technique for 3 x 3 determinants. This expression can be rewritten as:expression can be rewritten as:
[ ]313121211111232221
131211
det MaMaMaaaaaaa
A +−==
333231 aaa
or using cofactor notation:
[ ] 313121211111det AaAaAaAA ++==
Lecture 3: Determinants & Matrix Inversion
Rule #12: A determinant may be evaluated by summing the products of every element in any row or column by the respective cofactor. This is known as Laplace’s expansion.
Rule #13: If all cofactors in a row or a column are zero, the determinant is zero and matrix from which they are derived is singular.
Rule #14: If the elements in a row or a column of a determinant are multiplied by cofactors of the corresponding elements of a different row or column, the resulting sum of these products are zero.
Lecture 3: Determinants & Matrix Inversion
The Adjoint MatrixThe adjoint matrix is the matrix of transposed cofactors. If we have an nth order matrix
[ ]
= n
n
aaaaaa
AMOMM
K
K
22221
11211
[ ]
nnnn aaa K
MOMM
21
this matrix possess the following matrix of cofactorsthis matrix possess the following matrix of cofactors
n
AAAAAA K 11211
[ ]
=
nnnn
nC
AAA
AAAA
K
MOMM
K
21
22221
Lecture 3: Determinants & Matrix Inversion
and the adjoint of the matrix is defined as the transpose of the cofactor matrix
[ ] [ ][ ]T[ ] [ ][ ]
=
n
n
TC
AAAAAA
AAadj
K
K
22212
12111
=
nnnn
n
AAA K
MOMM
21
22212
We will show in the next section that finding the inverse of a square matrix can be accomplished with the following expression:
[ ][ ] [ ]A
AadjA =−1
For a 3 x 3 matrix this is known as Cramer’s ruleFor a 3 x 3 matrix this is known as Cramer s rule.
Lecture 3: Determinants & Matrix Inversion
Direct Inversion MethodSuppose an n x n matrix is post multiplied by its adjoint and the resulting n x n matrix is identified as [P]identified as [P]
[ ]
= n
n
n
n
AAAAAA
aaaaaa
PK
K
K
K
22212
12111
22221
11211
[ ]
=
nnnnnnnn AAAaaa
P
K
MMMM
K
MMMM
2121
Th l t f t i [P] di id d i t t t i i l t th t li l thThe elements of matrix [P] are divided into two categories, i.e., elements that lie along the diagonal
nn AaAaAap +++= K 111212111111
nn
nn
AaAaAap
AaAaAapp
+++=
+++=MKMMM
K 222222212122
111212111111
nnnnnnnnnn AaAaAap +++= K2211
Lecture 3: Determinants & Matrix Inversion
and those that do notK nn AaAaAap 212212211112 +++=
MKMMM
K nn
nn
AaAaAap
AaAaAapp
313212311113
212212211112
+++=
+++=
MMMM
K
MKMMM
K
nn
nn
AaAaAap
AaAaAap
232232213132
121222112121
+++=
+++=
MKMMM
K
MKMMM
nnnnnn AaAaAap 33223113 +++=
The elements of [P] that lie on the diagonal are all equal to the determinant of [A] (see Rule #12 and recognize the Laplace expansion for each diagonal value). Note that the non-diagonal elements will be equal to zero since they involve the expansion of one row of matrix A with the cofactors of an entirely different row (see Rule #14).
Lecture 3: Determinants & Matrix Inversion
Thus
AAAA
AAaAaAap nn =+++= K 111212111111
AAaAaAap
AAaAaAap
nnnnnnnnnn
nn
=+++=
=+++=
K
MKMMM
K
2211
222222212122
andK 0212212211112 =+++= nn AaAaAap
K
MKMMM
K
0
0
121222112121
313212311113
=+++=
=+++=
nn
nn
AaAaAap
AaAaAap
MKMMM
K
MKMMM
0232232213132 =+++= nn AaAaAap
MKMMM
K
MKMMM
033223113 =+++= nnnnnn AaAaAap
Lecture 3: Determinants & Matrix Inversion
which leads to
A
A
K
00
00
[ ] [ ] [ ] [ ]IA
A
AAadjAP =
==
K
MMM
K
00
00
or
[ ] [ ] [ ]A
AadjAI =
When this expression is compared to
[ ] [ ] [ ] 1−= AAI[ ] [ ] [ ]
then it is evident that
[ ] [ ]A
AadjA =−1
A
The inverse exists only when the determinant of A is not zero, i.e., when A is not singular.
Lecture 3: Determinants & Matrix Inversion
In Class Example
Lecture 3: Determinants & Matrix Inversion
The direct inversion method presented above is referred to as a “brute force” approach. From a comp tational standpoint the method is inefficient (b t doable) hen the matri isFrom a computational standpoint the method is inefficient (but doable) when the matrix is quite large. There are more efficient methods for solving large systems of linear equations that do not involve finding the inverse.
Generally these approaches are divided into the following two categories:Generally these approaches are divided into the following two categories:
• Direct Elimination (not inversion) Methods (LDU decomposition, Gauss elimination, Cholesky)
• Iterative Methods (Gauss-Seidel, Jacobi)
We will look at methods from both categories.
Lecture 3: Determinants & Matrix Inversion
Direct Elimination Methods
Elimination methods factor the matrix [A] into products of triangular and diagonal matricesElimination methods factor the matrix [A] into products of triangular and diagonal matrices, i.e., the matrix can be expressed as
[ ] [ ] [ ] [ ]UDLA =
Where [L] and [U] are lower and upper triangular matrices with all diagonal entries equal to “1”. The matrix [D] is a diagonal matrix.
Variations of this decomposition are obtained if the matric [D] is associated with either the matrix [L] or the matrix [U], i.e.,
[ ] [ ] [ ]
where [L] and [U] in this last expression are not necessarily the same as the matrices
[ ] [ ] [ ]ULA =
[ ] [ ] p yidentified in the previous expression.
Lecture 3: Determinants & Matrix Inversion
In an expanded format
[ ]
=
= n
n
n
n
uuuuu
lll
aaaaaa
AMOMM
K
K
MOMM
K
K
MOMM
K
K
222
11211
2221
11
22221
11211
000
Th t i [L] d [U] i thi d iti t i Diff i th
nnnnnnnnnn ulllaaa K
MOMM
K
MOMM
K
MOMM
2121
The matrices [L] and [U] in this decomposition are not unique. Differences in the many variations of elimination methods are simply differences in how these two matrices are constructed.
In solving a system of linear equations we can now write
as
[ ] { } { }bxA =
as[ ] { } [ ][ ] { } { }bxULxA ==
Lecture 3: Determinants & Matrix Inversion
If we let[ ]{ } { }yxU =
then
which is an easier computation Solving this last expression for each y can be
[ ][ ] { } [ ] { } { }byLxUL ==
which is an easier computation. Solving this last expression for each yi can be accomplished with the following expression
ylbi
jiji
1
− ∑−
With the vector {y} known, the vector of unknowns {x} are computed from
nil
yii
jjiji
i ,,2,11 L==∑=
With the vector {y} known, the vector of unknowns {x} are computed from
1,,1 Lnil
xuyx
n
ijjiji
i =−
=∑
+=
The process for solving for the unknown vector quantities {x} can be completed without computing the inverse of [A].
lii
Lecture 3: Determinants & Matrix Inversion
The Gauss Direct Elimination MethodThe Gauss elimination method begins with a forward elimination process and finds unknowns by backward substitutionunknowns by backward substitution.
Here the decomposition of the n x n matrix [A]
[ ] [ ] [ ]ULA =
is accomplished as follows. For each value of i (where i ranges from 1 to n) compute
[ ] [ ] [ ]ULA =
11 −== ij ijl
u L
and1
1,,1
=
==
ii
jjji
u
ijl
u L
njluali
kjkkiijji ,,1
1
1L=−= ∑
−
=
If at any stage in the elimination process the coefficient of the first equation, i.e., ajj(often referred to as the pivot point) or ljj becomes zero the method fails.
Lecture 3: Determinants & Matrix Inversion
In Class Example
Lecture 3: Determinants & Matrix Inversion
Cholesky’s Decomposition – A Direct Elimination MethodIn linear algebra, the Cholesky decomposition or Cholesky triangle is a decomposition of a
i i i i d fi i i i h d f l i l i d iHermitian, positive-definite matrix into the product of a lower triangular matrix and its conjugate transpose. It was discovered by André-Louis Cholesky for real matrices (as opposed to matrices with elements that are complex). When it is applicable, the Cholesky decomposition is roughly twice as efficient as the LU decomposition for solving systems ofdecomposition is roughly twice as efficient as the LU decomposition for solving systems of linear equations.
The distinguishing feature of the Cholesky decomposition is that the matrix [A], which is t i l d iti d fi it b d d i t d l t i lsymmetrical and positive-definite, can be decomposed into upper and lower triangular
matrices that are the transpose of each other, i.e.,
[ ] [ ] [ ]TLLA =
This can be loosely thought of as the matrix equivalent of taking the square root. Note that [A] is a positive definite matrix if for all non-zero vectors {z} the inner product
[ ] [ ] [ ]
is always greater than zero. This is guaranteed if all the eigenvalues of the matrix are positive.{ } [ ]{ } 0>zAz T
Lecture 3: Determinants & Matrix Inversion
Once the decomposition has been performed, the solution of the system of equations proceeds by forward and backwards substitution in the same manner as the Gauss elimination methodelimination method.
Convenience recurrence relationships for the Cholesky decomposition are as follows for each successive column (ith index)
( )lal
i
i
kikiiii
1
1
1
2−= ∑−
−
=
h i b difi d h h i f f di k
nijl
llal
ii
kikjkji
ji ,,11 L+=−
=∑=
These expressions can be modified where the expressions are free of needing to take a square root if the previous matrix expression is factored such that
[ ] [ ] [ ] [ ]TLDLA =
where again [D] is a diagonal matrix.
[ ] [ ] [ ] [ ]LDLA =
Lecture 3: Determinants & Matrix Inversion
Recurrence relationships for the Cholesky LDL decomposition. They are expressed as follows for each successive column (ith index)
( )
l
ldadi
kikkkiiii
1
1
1
2
=
−= ∑−
=
nijd
lldal
li
kikjkkkji
ji
ii
,,1
11
1 L+=−
=
=
∑−
=
With [A] decomposed into a triple matrix product the solution to the system of equations proceeds with
d ii
p
{ } [ ] { }[ ][ ][ ] { }xLDL
xAbT=
=
[ ] { }yL=
Lecture 3: Determinants & Matrix Inversion
Again
ylbi
jiji
1
− ∑−
nil
ylby
ii
jjiji
i ,,2,11 L==∑=
but now (verify for homework)
xldyn
jkjiki − ∑1,,1 Lni
lx
ii
ijjkjiki
i ==∑
+=
Lecture 3: Determinants & Matrix Inversion
In Class Example
Lecture 3: Determinants & Matrix Inversion
Gauss-Seidel : An Iterative Method
For large systems of equations the explicit and implicit elimination methods (backwards and forwards) can be inadequate from a memory storage perspective, execution time and/or round-off error issues. Iterative solution strategies are used when these problems arise. Solutions found using iterative methods can be obtained with a foreknowledge of the error tolerance associated with the method.associated with the method.
The most common iterative scheme is the Gauss-Seidel method. Here an initial estimate is made for the solution vector {x}. Subsequent to this guess each equation in the system of li ti i d t l f ( d t ) f th k f i l l tilinear equations is used to solve for (update) one of the unknowns from a previous calculation. For a linear system of three equations in three unknowns the updating procedure can be expressed by the following expressions
( ) ( ) ( ) ( )11 xaxab mm −− −−( ) ( ) ( ) ( )
( ) ( ) ( ) ( )31
2312122
11
31321211
axaxabx
axaxabx
mmm
m
−−=
−−=
−
Note that m and (m-1) represent the current and previous iteration respectively.
( ) ( ) ( ) ( )33
23213133
22
axaxabx
amm
m −−=
Lecture 3: Determinants & Matrix Inversion
In general the Gauss-Seidel iteration format can be expressed as
( ) ( )( ) ( )( ) ∑∑ −
− nm
imm b 1
11
The original equations must be of a form where the diagonal elements of [A] contain l l F t bl t t t ith ll d fi d b d
( ) ( )( ) ( )( )
−−= ∑∑
+== ikk
mik
kk
miki
iii
m xaxaba
x1
1
1
only nonzero values. For a stable structure, or a component with well-defined boundary conditions this will always be possible.
For convenience a vector of zeroes is often used as an initial guess at the solution {x}. g { }Iterations are repeated until the solution converges to “close enough” to the true solution. Since we very seldom know the true solution a priori (why do the problem if we know the answer) iterations continue until the next “answer” obtained is only fractionally different than the “answer” from the previous iteration Mathematically this is expresseddifferent than the answer from the previous iteration. Mathematically this is expressed as
( ) ( )( ) ( ) ( )%100max
1
,...,1i
mi
mi
m
nia xxx −
−
−=ε
with( )ix
ςε <a
Lecture 3: Determinants & Matrix Inversion
where( ) ( )%105.0 2 q−×=ς
and q is the desired number of accurate significant figures.
Many techniques have been developed in order to improve convergence of the Gauss-Seidel method. The simplest and the most widely used is successive over-relaxation method (SOR). In this approach a newly computed value mxi is modified by taking a weighted average of the current and previous values in the following manner:
Here β is the relaxation or weighting factor. For over-relaxation methods β ranges between d h i i h i l f ill l l h
( ) ( ) ( )( )imi
mupdatedi
m xxx 11 −−+= ββ
1 and 2. The assumption is that using current value of mxi will slowly converge to the exact solution whereas the modification of mxi defined above will speed convergence along.
When β is between 0 and 1 the approach is termed a successive under-relaxation methodWhen β is between 0 and 1 the approach is termed a successive under relaxation method (SUR). Choice of a β value is problem dependent.
Lecture 3: Determinants & Matrix Inversion
In Class Example
Lecture 3: Determinants & Matrix Inversion
The Jacobi method calculates all new values on the basis of the following iterative
The Jacobi Iterative Method
expression
( ) ( )( )
−= ∑ −n
km
ikiim xabx 11
The Gauss-Seidel method typically converges to solutions faster than the Jacobi method, h J bi i i h h i d h l i f f
( ) ( )( )
∑
≠=
ikk
kikiii
i a 1
the Jacobi iteration scheme has certain advantages when solution of a system of equations is conducted with parallel processing (multiple computers solving the system at the same time).
Lecture 3: Determinants & Matrix Inversion
The Conjugate Gradient Iterative Method
Conjugate gradient methods are the most popular for solving large systems of linearConjugate gradient methods are the most popular for solving large systems of linear equations of the form
H { } i k t d {b} i k t d [A] i k
[ ]{ } { }bxA =
Here {x} is an unknown vector and {b} is a known vector and [A] is a known, square, symmetric, positive-definite matrix. Conjugate gradient methods are well suited for sparse matrices. If [A] is dense the best solution strategy is factoring [A] and using back substitution.
As noted earlier, a matrix is positive definite if for every nonzero vector {z}
{ } [ ]{ } 0>zAz T
This may mean little at this point because it is not a very intuitive idea. One has a hard time assembling a mental picture of a positive-definite matrix as opposed to a matrix that is not positive definite. A matrix that is positive definite is defined as a Hermitian matrix
{ } [ ]{ } 0>zAz
and there are a number of advantageous qualities associated with this category of matrices.
Lecture 3: Determinants & Matrix Inversion
If a matrix is a positive definite Hermitian matrix then:
• all eigenvalues of the matrix are positive;all eigenvalues of the matrix are positive;
• all the leading principal minors of the matrix are positive; and
• th i t i l t i [B] h th t• there exists nonsingular square matrix [B] such that
[ ] [ ] [ ]BBA T=
Quadratic forms are introduced next and we will see how the concept of positive definite matrices affects quadratic forms. The following scalar function
( ) cbxaxxf ++= 2
is said to have a quadratic form in scalar Algebra. If we change the formulation slightly such that
( ) cbxaxxf ++=
( ) 021 2 +−= bxAxxf
Lecture 3: Determinants & Matrix Inversion
Extending this equation format to at least two variables leads to equations of the form
( ) ( ) ( )22112
22221122
11121 221, xbxbxaxxaxaxxf +−++=
and in general
( ) ( ) ( )2211222211211121 2,f
( ) ∑∑∑ −
+=nnn
xbxxaxxaxf 21
This last expression suggests that we can rewrite it in a matrix format as
( ) ∑∑∑=
==
=≠
−
+=k
kk
jiji
jiij
jiji
jiiji xbxxaxxaxf1
1,1,
22
Where [A] is a known n x n matrix the vector {b} of known quantities is an n x 1 vector
{ }( ) { } [ ]{ } { } { }xbxAxxf TT −=21
Where [A] is a known n x n matrix, the vector {b} of known quantities is an n x 1 vector as is the vector of unknowns {x}. Several observations can be made. First, in the last two equations both sides of the equal sign represent scalar quantities. As a quick check on the matrix expression let {x} and {b} represent 3 x 1 vectors, and [A] is a 3 x 3 square matrix. If [A] is symmetric and positive-definite then the quadratic function f is minimized by the solution of
[ ]{ } { }bxA =
Lecture 3: Determinants & Matrix Inversion
Let’s consider an example. If we have the following system of linear equations to solve
223 21 =+ xx [ ]{ } { }
then the solution of this system, the quadratic form of the system, and the minimization
862223
21
21
−=++
xxxx [ ]{ } { }bxA =
of the quadratic form is depicted in the following three graphs
[ ]
=23
A[ ]
62
{ }
= 1xx
{ }
=2
b
{ }
2xx
{ }
−
=8
b
Lecture 3: Determinants & Matrix Inversion
{ }( ) ( ) ( ) 212
2122
1 82643 xxxxxxxf +−++=
Contour plot depicting level surfaces of the quadratic function f({x})
Surface plot of the quadratic function
{ }( ) { } [ ]{ } { } { }xbxAxxf TT −=21
Lecture 3: Determinants & Matrix Inversion
Taking the partial derivative of a scalar valued function that depends on more than one variable results in partial derivatives with respect to each variable. Returning to the system with two unknown variablessystem with two unknown variables
then ( ) ( ) ( )2211
22222112
211121 2
21, xbxbxaxxaxaxxf +−++=
( )[ ]
( )[ ]21
12121111
21
2,
2,
bxaxaxxf
bxaxax
xxf
+∂
−+=∂
∂
This suggest the following matrix format
( )[ ]2222112
2
21 2 bxaxax
f−+=
∂
( )[ ]∂ f ( )[ ]
( )[ ]
−
=
∂∂
∂∂
2
1
2
1
2212
1211
21
1
21
22
,
,
bb
xx
aaaa
xxxf
xxxf
∂ 2x
Lecture 3: Determinants & Matrix Inversion
Now set the last result equal to the zero vector, i.e.,
111211 20 bxaa
Recall from calculus that taking derivatives and setting the resulting expression(s) equal to zero yields a local minimum or local maximum We can in general restate this in a
−
=
2
1
2
1
2212
1211
20 bxaa
to zero yields a local minimum or local maximum. We can in general restate this in a matrix format as
{ } [ ]{ } { }bxA −= *0or
[ ]{ } { }bxA =*
Thus the minimum/maximum point on the quadratic surface
{ }( ) { } [ ]{ } { } { }xbxAxxf TT −=21
occurs at {x}* which is coincidently the solution of the system of linear equations we wish to solve.
{ }( ) { } [ ]{ } { } { }2
Lecture 3: Determinants & Matrix Inversion
We now seek a way to obtain the solution vector {x}* in an iterative fashion. Graphically the iterative solution would start at some arbitrary starting point (could be the origin but in the figure below the iterative process starts at a point defined by a nonthe origin, but in the figure below the iterative process starts at a point defined by a non-zero initial vector 0{x}).
Before establishing an iterative solution technique we define a residual vector i{r} and an error vector i{e} at the ith step.
Lecture 3: Determinants & Matrix Inversion
Before establishing an iterative solution technique we define a residual vector i{r} and an error vector i{e} at the ith step. The error vector is expressed as
and this is a vector that indicates how far we are from the solution. The residual vector is
{ } { } { }xxe ii −=
defined as
{ } { } [ ] { }xAbr ii −=
and this residual vector indicates how far the matrix product [A] i{x} is from correct vector {b}. Substitution of the expression for the error vector into the expression for the residual vector leads to the following relationship between the two vectors
{ } { } [ ] { } { }( ){ } [ ] { } [ ]{ }{ } [ ] { } { }
xAeAb
xeAbr
i
i
ii
−−=
+−=
{ } [ ] { } { }{ } [ ] { }eAr
beAbii
i
−=
−−=
Lecture 3: Determinants & Matrix Inversion
More importantly, the residual vector should be thought of in terms of the following expression
{ } [ ] { } { }( )
( )[ ]
ii
xxf
bxAr
∂
−−=
L( )[ ]
( )[ ]xxf
xxxf
∂
∂∂
−=21
1
21
,
,,
LM
L
Thought must be given to this expression. This represents the derivative of the quadratic l ti l t d t th t f th it ti t t th ith t Th
{ }xn i
x
∂
scalar equation evaluated at the components of the iteration vector at the ith step. Thus the residual vector will always point in the direction of steepest descent, i.e., towards the solution point. This is extremely important in non-linear problems where iterative solution schemes easily bog down.y g
Lecture 3: Determinants & Matrix Inversion
Now let’s pause and think about the solution vector {x} in general terms. A vector with n elements can be written in terms of a linear combination of basis vectors. When n is equal to 3 (i.e., Cartesian three space) we can express a vector as a linear combination of the three unit vectors (1,0,0), (0,1,0) and (0,0,1). But in general
{ } { }∑=
=n
ii
i sx1α
For the Cartesian three space example
{ } { }= ∑ α3
1sx
ii
i
At the left we see indicated that{ } { } { }
++==
ααα
321
33
22
11
1
00
10
01
sssi At the left we see indicated that
the α’s are scaling factors. This is an important idea to latch onto. We will not treat them as
+
+
=
α
ααα
1
321
10
01
00 a vector.
=αα
3
2
Lecture 3: Determinants & Matrix Inversion
Next consider the solution expressed as the following vector sumvector sum
{ } { } { }∑=
+=n
ii
i sxx1
0 α
The fuchsia line represents
{ }∑=
n
ii
i s1α
i.e., a linear combination of the blue vectors. Each blue vector corresponds to a specific value f iof i.
The last green vector is the solution and the first red vector represents the initial guess.
{ }x0
Lecture 3: Determinants & Matrix Inversion
For the first iterative step in the last figure
This expression can be generalized into the following iterative form
{ } { } { }1101 sxx α+=
This expression can be generalized into the following iterative form
{ } { } { }iiii sxx α+= −1
The iteration can be continued until a solution is obtained that is “close enough.” The issues boil down to finding the α’s, i.e., the scaling factors as well as the basis vectors, i e the si’s Let’s see if we can use the concept of the direction of steepest descent toi.e., the si s. Let s see if we can use the concept of the direction of steepest descent to define the basis vectors and the scale factor at each step.
Lecture 3: Determinants & Matrix Inversion
Return to the example and use the starting vector 0{x} = (-2, -2). Our first step should be in the direction of steepest descent towards the minimum. That direction is the solid pblack line in the figure below
The next issue is how far along the path of steepest descent should we travel? If we d fi th tdefine the step as
{ } { } { }rxx 0001 α+=
or in general
{ } { } { }rxx iiii α+=+1
then how big should the step be? A line search is a procedure that chooses the α to minimize f along a direction.
Also note that the residual vector is now being used as a basis vector in the expressions above.
Lecture 3: Determinants & Matrix Inversion
The following figure illustrates what we are trying to do. The process is restricted to selecting a point along the surface of the function f and the plane that defines the direction of steepest descent from the point ( 2 2)direction of steepest descent from the point (-2, -2).
0{ }0{r}
As we traverse along the line of steepest d t l t th it d d th
new search directiondescent we can plot the magnitude and the direction of the gradient to the surface at any point on the line. We stop searching along the line when the gradient
1{r}g g
(derivative) vector is orthogonal to the line of steepest descent. The scaling factor αis now defined.
The new (red) search direction is an updated direction of deepest descent and will be perpendicular to the old (blue) direction.
Lecture 3: Determinants & Matrix Inversion
From the geometry of the previous figure we can see that residual vector 0{r} and the updated residual vector 1{r} are orthogonal. Thus the numerical value for α can be obtained from the following dot product taken between these two vectors
{ } { }{ } [ ] { }( ) { }rxAb
rrT
T
01
010
−=
=
{ } [ ] { }( ) { }{ } [ ] { } { }( )( ) { }{ } [ ] { }( ) { } [ ] { }( ) { }rrArxAb
rrxAb
rxAb
TT
T
00000
0000
−−=
+−=
=
α
α
{ } [ ] { }( ) { } [ ] { }( ) { }{ } [ ] { }( ) { } [ ] { }( ) { }
{ } { } [ ] { }( ) { }rrArr
rrArxAbTT
TT
00000
00000
=
=−
α
α
or
{ } { }rr T 000 { } { }
{ } [ ] { }rAr T 000 =α
Lecture 3: Determinants & Matrix Inversion
The iterations in there most basic format are as follows
{ } { } [ ] { }xAbr ii −=
{ } { }{ } [ ] { }rAr
rriTi
iTii =α
{ } { } { }rxx iiii α+=+1