linear algebra

1

I. REVIEW OF LINEAR ALGEBRA

A. Equivalence

Definition A1. If A and B are two m x n matrices, then A is equivalent to B if we can obtain Bfrom A by a finite sequence of elementary row or elementary column operations.

Theorem A.1. If A is any nonzero m x n matrix, then A is equivalent to a partitioned matrix of the form

I O

O Ok k n-k

m-k k m-k n-k

Theorem A.2. Two m x n matrices A and B are equivalent if and only if B = PAQ for some nonsingular matrices P and Q.

Theorem A.3. An n x n matrix A is nonsingular if and only if A is equivalent to In.

B. Rank

Definition B1. Let

+ , * a11 a12 @@@ a1n * * * A = * a21 a22 @@@ a2n * * . . . * * . . . * * am1 am2 @@@ amn * . -

be an m x n matrix. The rows of A, considered as vectors in Rn, span a subspace of Rn, called therow space of A. Similarly, the columns of A,considered as vectors in Rm, span a subspace of Rm

called the column space of A.

Definition B2. The dimension of the row(column)space of A is called the row(column) rank ofA.

2

Cij ' (&1)i%j *Mij*

*A* ' jn

i ' 1aij Cij for constant j

*A* ' jn

j ' 1aij Cij for constant i.

Theorem B.1. The row rank and column rank of the matrix A = [aij] are equal.

Since the row and column ranks of a matrix are equal, we shall now merely refer to therank of a matrix. Note that rank In =n. Theorem A.2 states that A is equivalent to B if and only ifthere exist nonsingular matrices P and Q such that B = PAQ. If A is equivalent to B, then rank A= rank B, for rank B = rank(PAQ) = rank(PA) = rank A.

We also recall from the previous section 7 that if A is an m by n matrix, then A is equivalent to a matrix, + , * Ik 0 * C = * * * 0 0 * . -

Now rank A = rank C = k.

C. Determinants

The minor *Mij* of the square (n x n) matrix A is the determinant of (n-1)x(n-1) matrix Mij

formed from A minus the ith row and jth column.

The cofactor of element aij is

Laplace expansion of determinant

or

Theorem C.1. If A is a matrix, then *A* = *AT*.

3

Theorem C.2. If matrix B results from matrix A by interchanging two rows(columns) of A, then *B* = -*A*.

Theorem C.3. If two rows(columns) of A are equal, then *A* = 0.

Theorem C.4. If a row(column) of A consists entirely of zeros, then *A* = 0.

Theorem C.5. If B is obtained from A by multiplying a row(column) of A by a real number c, then *B* = c*A*.

Theorem C.6. If B = [bij] is obtained from A = [aij] by adding to each element of the rth row(column) c times the corresponding element of the sth row (column)(r û s),then *B* = *A*.

Theorem C.7. If a matrix A = [aij] is upper (lower) triangular, then *A* = a11a22@@@ann; that is, the determinant of a triangular matrix is the product of the elements on themain diagonal. Theorem C.8. If A is an n x n matrix, then A is nonsingular if and only if *A* û 0.

Theorem C.9. If A and B are n x n matrices, then *AB* = *A*@*B*.

D. Special Matrices

(1) The adjoint of A is Adj(A) = [Cji]

(2) Inverse of A; A-1 = Adj(A)/*A*

(3) Symmetric matrix; A = AT

(4) Skew-symmetric matrix; A = -AT for square, real A.

(5) Associate matrix of A; (A*)T

(6) Hermitian matrix; A equals its associate

(7) Involutary matrix; AA = I

(8) Orthogonal Matrix; A-1 = AT

4

( , ) ' jn

i ' 1jn

j ' 1ai cij bj .

(9) Toeplitz matrix; a(i,j)=R(i-j)

(10) Autocorrelation matrix; A = E[x@xT] for n by 1 vector x. A is Toeplitz if x represents evenly spaced samples from a wide-sense stationary random process.

(11) Autocovariance matrix; A = E[(x-E[x])(x-E[x])T] for n by 1 vector x. A is Toeplitz if x represents evenly spaced samples from a wide-sense stationary random process.

E. Positive Definite Matrices

An n x n matrix C with the property that xTCx > 0 for any nonzero vector x in Rn is calledpositive definite. Such a matrix is nonsingular, for if C is singular, then the homogeneoussystem Cx = 0 has a nontrivial solution xo. Then xo

TCxo = 0, contradicting the requirement thatxTCx > 0 for nonzero x.

Conversely, if C = [cij] is any n x n symmetric matrix that is positive definite (that is, xTCx > 0if x is a nonzero vector in Rn), then we define (",ß),for

" = a1"1 + a2"2 + @@@ + an"n and

ß = b1"1 + b2"2 + @@@ + bn"n in V by

It is not difficult to show that this defines an inner product on V.

Theorem E.1. A real, symmetric matrix C is positive definite iff there exists a nonsingular matrix A such that C=ATA

Theorem E.2. ATA is positive definite for nonsingular A since xT(ATA)x = (Ax)T(Ax) = 5Ax52

2 > 0 for every nonzero n by 1 vector x.

Theorem E.3. A matrix C is positive definite (semidefinite) iff its eigenvalues are all positive (nonegative). Similarly, C is negative definite (semidefinite) iff its eigenvalues are all negative. C is indefinite if it has both positive and negative eigenvalues.

Theorem E.4. A matrix C is positive definite iff every upper left hand determinant (leading principal minor) of C is positive. This is called Sylvester's test.

5

Theorem E.5. The Toeplitz matrix formed from an autocorrelation function is at least positive semidefinite.

Theorem E.6. The non-Toeplitz autocorrelation matrix A = E[x@xT] is also positive semidefinite.

F. Orthogonal Matrices

Definition F1. A square matrix A is called orthogonal if A-1 = AT. Of course, we can also say thatA is orthogonal if ATA = In.

Theorem F.1 All the roots of the characteristic polynomial of a real symmetric matrix are real numbers.

Theorem F.2 If A is a symmetric n x n matrix, then the eigenvectors that belong to distinct eigenvalues of A are orthogonal.

Theorem F.3. The n x n matrix A is orthogonal if and only if the columns(and rows) of A form an orthonormal set.

Theorem F.4 If A is a symmetric n x m matrix, then there exists an orthogonal matrix P such that P-1AP = PTAP = D, a diagonal matrix. The eigenvalues of A lie on the main diagonal of D.

G. Characteristic Equation, Eigenvalues, And Eigenvectors

The characteristic equation for an n x n matrix A is

P(8) = *8I-A* = 0, or

P(8) = 8n+a18n-1+a28

n-2+...+an-18+an = 0

or

P(8) = (8-81)(8-82)...(8-8n)

Now

6

P(0) = *-A* = (-1)n*A* and

P(0) = (-1)n(818283...8n) so

*A* = 818283...8n

For each eigenvalue 8i, the exists an eigenvector xi which is a solution to

[8iI-A]xi = 0

Also, each xi can be found as any nonzero column of Adj[8iI-A].

The modal matrix is one formed from columns, kixi.

Theorem G.1. The eigenvalues of a symmetric matrix A are real.

Theorem G.2. The eigenvectors, corresponding to distinct eigenvalues of a symmetric matrix A, form an orthogonal set (Same as Thm. F.2.).

Theorem G.3. If A has eigenvalues 8i, then the matrix Ak has eigenvalues 8ik. This can be

proved using Thm. F.4.

Theorem G.4. If the nonsingular matrix A has eigenvalues 8i, then the matrix A-1 has eigenvalues 1/8i.

Theorem G.5. If A has eigenvalues 8i, then the matrix I@k + A has eigenvalues 8i+k.

Theorem G.6. Two matrices are similar if they have the same eigenvalues.

If P is a nonsingular matrix and

A = P@S@P-1

then A and S are similar, since if 8 is an eigenvalue of S, then

S@x = 8@x,

P@S@P-1(P@x) = 8@(P@x)

7

H. Spectral Decomposition

Expanding on Thm. F.4, any arbitrary complex-valued n by n matrix A with n linearlyindependent eigenvectors can be decomposed as

A = P@S@P-1

where

S = diag(81,...,8n),

P = [p1,...,pn],

8i is the ith eigenvalue of A and pi is the corresponding eigenvector. This is proved as follows.By the definition of eigenvectors and eigenvalues,

Api = 8ipi so

AP = PS, from which we get

A = P@S@P-1

For the general case, the eigenvalues can be positive or negative. If A is symmetric, then S isreal. If A is positive semidefinite, then the diagonal elements of S (eigenvalues of A) arenonnegative (from Thm. E.3).

8

II. The Singular Value Decomposition (SVD)

A. Orthogonality And The SVD

The singular value decomposition-the SVD-is a powerful computational tool for analyzingmatrices and problems involving matrices which has applications in many fields. In theremaining sections, we shall define the SVD, describe some other applications, and present analgorithm for computing it. The algorithm is representative of algorithms currently used forvarious matrix eigenvalue problems and serves as an introduction to computational techniquesfor these problems as well.

Although it is still not widely known, the singular value decomposition has a fairly longhistory. Much of the fundamental work was done by Gene Golub and his colleagues W. Kahan,Peter Businger, and Christian Reinsch. Our discusssion will be based largely on a paper byGolub and Reinsch(1971). The underlying matrix eigenvalue algorithms have been developed byJ.G.F. Francis, H. Rutishauser, and J.H. Wilkinson and are presented in Wilkinson’s book(1965).Recent books by Lawson and Hanson(1974) and Stewart(1973)discuss the SVD as well as manyrelated topics.

In elementary linear algebra, a set of vectors is defined to be independent if none of them canbe expressed as a linear combination of the others. In computational linear algebra, it is veryuseful to have a quantative notion of the "amount" of independence. We would like to define aquantity that reflects the fact that, for example, (1, 0, 0), (0, 1, 0), and (0, 0, 1) are veryindependent, whereas (1.01, 1.00, 1.00), (1.00, 1.01, 1.00), and (1.00, 1.00, 1.01) are almostdependent.

Since two vectors are dependent if they are parallel, it is reasonable to regard them as veryindependent if they are perpendicular or orthogonal. Using a superscript T to denote thetranspose of a vector or matrix, two vectors u and v are orthogonal if their inner product is zero,that is, if

uTv = 0.

Moreover, a vector u has length 1 if

uTu = 1.

A square matrix is called orthogonal (see Def. F1) if its columns are mutually orthogonal vectorseach of length 1. Thus a matrix U is orthogonal if

UTU = I, the identity matrix.

9

Note that an orthogonal matrix is automatically nonsingular, since U-1 = UT. In fact, we shallsoon make precise the idea that an orthogonal matrix is very nonsingular and that its columns arevery independent.

The simplest examples of orthogonal matrices are planar rotations of the form + , * cos2 sin2 * U = * * *-sin2 cos2 * . -

If x is a vector in 2-space, then Ux is the same vector rotated through an angle 2. It is useful toassociate orthogonal matrices with such rotations, even though in higher dimensions orthogonalmatrices can be more complicated. For example,

+ , * 24 36 23 * 1 * * U = ))) * 41 -12 -24 * 49 * * * 12 -31 36 * . -is orthogonal but cannot be interpreted as a simple plane rotation.

Multiplication by orthogonal matrices does not change such important geometrical quantitiesas the length of a vector or the angle between two vectors. Orthogonal matrices also have highlydesirable computational properties because they do not magnify errors. For any matrix A and any two orthogonal matrices U and V, consider the matrix G defined by

G = UTAV.

If uj and vj are the columns of U and V, respectively, then the individual components of G are

Fij = uiTAvj.

The idea behind the singular value decomposition is that by proper choice of U and V it ispossible to make most of the Fij zero; in fact, it is possible to make G diagonal with nonnegativediagonal entries. Consequently, we make the following definition.

A singular value decomposition of an m-by-n real matrix A is any factorization of the form

A = UGVT,

where U is an m-by-m orthogonal matrix, V is an n-by-n orthogonal matrix, and G is an m-by-n

10

diagonal matrix with Fij = 0 if iûj and Fii = Fi$ 0. The quantities Fi are called the singular valuesof A, and the columns of U and V are called the left and right singular vectors. Note thesimilarity to Thm. F.4 and section I.H.

Readers familiar with matrix eigenvalues should note that the matrices AAT and ATA have thesame nonzero eigenvalues and that the singular values of A are the positive square roots ofthese eigenvalues. Moreover, the left and right singular vectors are particular choices of theeigenvectors of AAT and ATA, respectively. (See theorems E.1, E.2, and E.3).

In the language of abstract linear algebra, the matrix A is the representation of some lineartransformation in a particular coordinate system. By making one orthogonal change ofcoordinates in the domain of this transformation and a second orthogonal change of coordinatesin the range, the representation becomes diagonal.

B. Rank And Condition Number

The notion of the rank of a matrix is fundamental to much of linear algebra (see section I.B.).The usual definition is the maximum number of independent columns, or, equivalently, the orderof the maximal nonzero subdeterminant in the matrix. Using such a definition, it is difficult toactually determine the rank of a general matrix in practice. However, if the matrix is diagonal, itis clear that its rank is the number of nonzero diagonal entries. If a set of independent vectors ismultiplied by an orthogonal matrix, the resulting set is still independent. In other words, the rankof a general matrix A is equal to the rank of the diagonal matrix G in its SVD. Consequently, apractical definition of the rank of a matrix is the number of nonzero singular values. We shalluse the letter k to denote rank.

An m-by-n matrix with m $ n is said to be of full rank if k = n or rank deficient if k < n. Forsquare matrices, the more common terms nonsingular and singular are often used for fullrank and rank deficient, respectively. Since the rank of a matrix must always be an integer, it is necessarily a discontinuous functionof the elements of the matrix. Arbitrarily small changes(such as roundoff errors) in a rank-defic-ient matrix can make all of its singular values nonzero and hence create a matrix which istechnically of full rank. In practice, we work with the effective rank, the number of singularvalues greater than some prescribed tolerance which reflects the accuracy of the data. This isalso a discontinuous function, but the discontinuities are much less numerous and troublesomethan those of the theoretical rank.

The great advantage of the use of the SVD in determining the rank of a matrix is thatdecisions need be made only about the negligibility of single numbers-the small singular

11

2A2 ' ( jm

i ' 1jn

j ' 1*a(i,j)*2).5

2A2 ' max2Ax22

2x22

'max2x22'1 2Ax22

2A2 ' ( jn

i ' 1

2i )

.5,

values-rather than vectors or sets of vectors.

We can now precisely define the measure of independence mentioned earlier. The condititonnumber of a matrix A of full rank is

cond(A) = Fmax/Fmin

where Fmax and Fmin are the largest and smallest singular values of A. If A is rank deficient, then Fmin

= 0 and cond(A) is said to be infinite.

Clearly, cond(A) $ 1. If cond(A) is close to 1, then the columns of A are very independent. Ifcond(A) is large, then the columns of A are nearly dependent. If A is square, then terms like nearlysingular or far from singular can be given fairly precise meanings. A matrix A is considered to bemore singular than a matrix B if cond(A) > cond(B).

If A is orthogonal, then cond(A) = 1, and so the columns of an orthogonal matrix are asindependent as possible. Conversely, if cond(A) = 1, then it turns out that A must be a scalarmultiple of an orthogonal matrix.

Two common norms for the m by n matrix A are

and

which are respectively the Euclidean matrix norm or Frobenius norm, and the spectral norm. Usingthe singular values Fi where F1 is assumed to be the largest, the two norms are expressedrespectively as

5A5 = F1

12

C. Evaluating Determinants and Finding Singular Matrices

Let A(z) be a square matrix which depends in a possibly complicated way on some parameter z.Consider finding a value, or values, of z for which A(z) is singular. Equivalently, find z for whichthe determinant of A(z) is zero.

Using det to denote determinant,

det(A) = det(U)@det(G)@det(VT).

The determinant of an orthogonal matrix is ±1, and the determinant of a diagonal matrix is simplythe product of its elements, so

det(A) = ±F1@F2@...@Fn.

Computing determinants can be tricky because the value can vary over a huge range, andfloating-point underflows and oveflows are frequent.

Many problems which are formulated in terms of determinants do not require the actual value ofthe determinant but simply some indication of when it is zero. It is theoretically possible to compute*det(A)* by taking the product of the singular values of A. But in most situations, it is sufficient touse only the smallest singular value and thereby avoid underflow/overflow and other numericaldifficulties.

D. The Eigenvalue Problem

This is not really a practical application of SVD but rather an indication of how the SVD is relatedto a larger class of important matrix problems.

Let A be a square matrix. The eigenvalues of A are the numbers 8 for which

Ax = 8x

has a nonzero solution x. Since this is equivalent to requiring

det(A-8I) = 0

the eigenvalues could theoretically be computed by finding the roots of this polynomial. However,consider

13

aii $ jj û 1

*aij *

+ , * 1 e * A = * * * e 1 * . -for some small but not negligible number e. The true eigenvalues are 81 = 1 + e and 82 = 1 - e. Thepolynomial det(A-8I) is

82 - 28 + (1-e2).

If e is small, then e2 is much smaller, and it is necessary to have twice the precision in thecoefficients of the polynomial than is present in either the matrix elements or the eigenvalues. Thisdifficulty becomes even more pronounced for higher-order matrices. Consequently, modern methodsfor computing eigenvalues avoid the use of polynomials or polynomial root finders.

The connection between SVD and eigenvalues is simplest for matrices which are symmetric andpositive semidefinite. Symmetry is easy to define and check-A is symmetric if aij = aji for all i and j. Positive semidefiniteness is much more elusive-positive semidefinite means that xTAx $ 0 for all x. Roughly this means that the diagonal elements of A arefairly large compared to the off-diagonal elements. In fact, a sufficient condition is that for each i

but this is not necessary (see section I.E.).

It is not difficult to show that if A is real and symmetric, then all its eigenvalues are real, and ifA is positive semidefinite, then all its eigenvalues are nonnegative. If A = UGVT and A = AT, then

A2 = ATA = VGTUTUGVT

= VG2VT

Thus,

A2V = VG2.

Letting vj denote the columns of V and looking at the jth column of this equation, we find

A2vj = F2jvj

Since the eigenvalues of A2 are the squares of the eigenvalues of A (from Thm. G.3) and since the

14

eigenvalues of A are nonnegative, we conclude that the eigenvalues of a symmetric, positive-semidefinite matrix (see Thm. E.3) are equal to its singular values and that the eigenvectors are thecolumns of V. Thus, V is a modal matrix for A.

If A is symmetric but not positive semidefinite, then some of its eigenvalues are negative. It turnsout that the absolute values of the eigenvalues are equal to the singular values and that the signs ofthe eigenvalues can be recovered by comparing the columns of U with those of V, but we shall notgo into details. If A is not symmetric, there is no simple connection between its eigenvalues and itssingular values.

This shows that the SVD algorithm could be used to compute the eigenvalues of symmetricmatrices. However, this is not particularly efficient and not recommended. Our only point is that theSVD is closely related to matrix eigenvalue problems. Careful study of the SVD algorithm will helpin understanding eigenvalue algorithms as well.

E. Linear Sets of Equations

Let A be a given m by n matrix, where m$n, and let b be a given m vector. We want to find alln vectors x which solve

A@x = b (1)

Note that this includes the cases where A is square and singular, and square and nonsingular.Important questions include;

(1) Are the equations consistent ? (2) Do any solutions exist ? (3) Is the solution unique ? (4) Do any nonzero solutions exist ? (5) What is the general form of the solution ?

There are many methods for answering these questions, but the SVD is the only reliable method.

Using the SVD of A, equation (1) becomes

UGVTx = b (2)

15

2r22 ' jm

i ' 1r 2

i (8)

and hence

Gz = d (3)

where z = VTx and d = UTb. The system of equations in (3) is diagonal and is therefore easilystudied. It breaks up into as many as three sets, depending on the values of the dimensions m andn and the rank k, the number of nonzero singular values. The three sets are:

Fjzj = dj, if j # n and Fj û 0, (4)

0@zj = dj, if j # n and Fj = 0, (5)

0 = dj, if j > n. (6)

Theorem E.1. The equations in (1) are consistent and a solution exists iff dj=0 whenever Fj=0 orj>n.

Theorem E.2. If k<n, then the zj associated with a zero Fj can be given an arbitrary value and stillyield a solution to the system. When transformed back to the original coordinates by

x = V@z, (7)

these arbitrary components of z serve to parameterize the space of all possible solutions x.

The Linear Least-Squares Problem

This is an extension of the previous problem, but we now seek n vectors x for which Ax is onlyapproximately equal to b in the sense that the length of the residual vector,

r = A@x - b,

is minimized. The problem is to pick an x which minimizes

16

If A has full rank, then the solution to x is unique and may be solved more efficiently bymethods other than the SVD, such as the Toeplitz recursion. However, the SVD also handles therank deficient case.

Since orthogonal matrices preserve the norm (see section II.A.), equation (8) can be rewritten as

5r5 = 5UT(AVVTx - b)5 = 5Gz - d5 (9)

using the SVD and the fact that minimizing the norm squared of r is equivalent to minimizing thenorm of r. The vector z which minimizes 5r5 is given by

zj = dj/Fj, if Fj û 0,

zj = anything, if Fj = 0.

Therefore, if the problem is rank deficient, the solution which minimizes 5r5 is not unique. In thisrank deficient case, it is often desirable to pick the minimum norm solution for z and thereforefor x. This gives us a unique solution. This solution is obtained by setting

zj = 0, if Fj = 0

Once z is determined for the full rank or deficient rank cases, we find the solution vector x usingequation (7). The error 5r52 is calculated in either case as

5r52 = G dj2 10)

for values of j where Fj = 0.

17

F. The Discrete Karhunen-Loeve Transform (KLT)

In many statistical pattern recognition problems and maximum likelihood problems we are givena random vector x, of dimension n, which has a joint Gaussian density with parameters

E[x] = mx,

Cx = E[(x - mx)(x - mx)T)]

mx and Cx are the mean vector and autocovariance matrix of x. Cx is Toeplitz if x represents evenlyspaced samples from a stationary random process. (See (11) in section I.D.) We will assume that Cx

is positive definite. The discrete Karhunen-Loeve Transform (KLT) is a transform,

z = A@x (1)

such that;

(1) The covariance matrix Cz = E[(z - mz)(z - mz)T] is diagonal so that A is an orthogonal matrix,

and the elements z(i) of z are independent Gaussian random variables.

(2) The sequence Fii = Cz(i,i) is non-increasing.

(3) No other transform satisfies (2) with larger Fii values.

Define the SVD of Cx as

Cx = U@G@VT. (2)

Using equation (1) and

x = A-1z (3)

we get

18

Cx = E[(A-1z - A-1mz)(A-1z - A-1mz)

T]

= E[A-1@(z - mz)(z - mz)TA]

= A-1@Cz@A (4)

Since A is an orthogonal matrix and since Cz(i,i) are nonegative, equation (4) represents an SVD ofCx. Therefore, for the KLT, we take the SVD of Cx as in equation (2) and get

V = U,

A = UT = VT,

Cz = G,

mz = UTx

Thus the transformation matrix A or UT diagonalizes the covariance matrix of z. The rows of A forman orthonormal set of n basis vectors.

G. Summary

(1) A singular value decomposition of an m-by-n real matrix A is any factorization of the form

A = UGVT,

where U is an m-by-m orthogonal matrix, V is an n-by-n orthogonal matrix, and G is an m-by-ndiagonal matrix with Fij = 0 if iûj and Fii = Fi$ 0. The quantities Fi are called the singular values ofA, and the columns of U and V are called the left and right singular vectors.

(2) The rank of A is the number of nonzero singular values of A.

(3) The condititon number of a matrix A of full rank is

cond(A) = Fmax/Fmin,

where Fmax and Fmin are the largest and smallest singular values of A.

19

(4) det(A) = ±F1@F2@...@Fn.

(5) The columns of U and V are respectively eigenvectors of AAT and ATA. These sets of columnsare called the left singular vectors (for U) and the right singular vectors (for V).

(6) The singular values are the positive square roots of the eigenvalues of ATA.

(7) If A is symmetric and positive semidefinite, then the singular values of A are the eigenvaluesof A, and the SVD is a spectral decomposition of A. (See section I part H.) If A is an autocorrela-tion matrix, it satisfies this condition. (See section I part E.)

(8) The SVD can be used to solve linear sets of equations, including those involving least-squares,even when the A matrix is rank deficient.

(9) The orthogonal KLT transformation matrix for a random vector x with covariance matrix Cx =U@G@VT is defined as

A = UT

The diagonal covariance matrix for the transformed vector z is

Cz = G

linear algebra

Documents