chapter 4 eigenvalue problem - purdue university

Chapter 4

EIGENVALUE PROBLEM

The German word eigen is cognate with the Old English word agen, which became owen inMiddle English and “own” in modern English.

Given A ∈ Rn×n

solve Ax = λxwhere λ ∈ C

and 0 6= x ∈ Cn.

4.1 Mathematics

4.2 Reduction to Upper Hessenberg Form

4.3 The Power Method

4.4 The QR Iteration

4.5 The Lanczos Method

4.6 Other Eigenvalue Algorithms

4.7 Computing the SVD

4.1 Mathematics

The adjacency matrix of a directed graph is given by

0 0 0 0 11 0 1 0 00 0 0 1 00 1 0 0 00 0 0 1 0

96

where aij is the number of arcs from i to j. The number of paths of length k joining i to j isgiven by (Ak)ij. Is there a closed form expression for Ak? I.e., a more direct way of formingAk?

4.1.1 Complex Values

The properties of vectors and matrices of real numbers extends to vectors x ∈ Cm andmatrices A ∈ Cm×n.

We define the conjugate transpose by

AH = AT.

A Hermitian matrix satisfies AH = A and a unitary matrix UH = U−1.Cn is a vector space.The inner product for x, y ∈ C

n is xHy.

4.1.2 Eigenvalues and eigenvectors

If Ax = λx where λ ∈ C and 0 6= x ∈ Cn then λ is an eigenvalue1 and x is an eigenvector.Try solving for x in

(A − λI)x = 0.

There exists a solution x 6= 0 only if det(A − λI) = 0. Hence the eigenvalues λ are roots ofthe characteristic polynomial

p(ζ) := det(ζI − A).

For example for A =

[0 11 1

]

det(ζI − A) = det

[ζ −1−1 ζ − 1

]= ζ2 − ζ − 1. (4.1)

An eigenvalue λ satisfies λ2 − λ − 1 = 0 ⇒ λ = 12±

√5

2.

In general the characteristic polynomial has n roots, counting multiplicities:

det(ζI − A) = (ζ − λ1)(ζ − λ2) · · · (ζ − λn) .

Hence there are n eigenvalues. Because A is real, p(ζ) has real coefficients and thus eacheigenvalue is either real or one of a complex conjugate pair.

1characteristic value

97

imaginary

real

Setting ζ = 0 in (4.1), we get

det(−A) = (−1)nλ1λ2 · · ·λn,

and since det(−A) = det((−1)A) = (−1)ndetA,we have

detA = λ1λ2 · · ·λn,

⇒ 0 is an eigenvalue of a singular matrix.Note that for an eigenpair (λ, x)

A2x = A(λx) = λAx = λ2x,A−1Ax = A−1λx ⇒ A−1x = λ−1x,Amx = λmx.

Moreover, if q(ζ) is a polynomial,q(A)x = q(λ)x.

For a block triangular matrix T with diagonal blocks T11, T22, . . . , Tpp we have

det(ζI − T ) = det(ζI − T11)det(ζI − T22) · · ·det(ζI − Tpp).

example

A =

1 2 4

0 2 00 1 2

det(ζI − A) = det(ζ − 1)det(ζI −[

2 01 2

]) = (ζ − 1)(ζ − 2)2 .

This property of determinants follows immediately if we can show

det

[A B0 C

]= detA · detC.

This is shown by considering two cases:

98

(i) A singular,

(ii) A nonsingular, in which case

[A B0 C

]=

[A 00 I

] [I A−1B0 C

].

4.1.3 Similarity transformations

Return to the problem of a simpler form for Ak.Definition If detX 6= 0, then X−1AX =: B is called a similarity transformation and we

say that B is similar to A.First a couple of easy-to-prove facts:

1. If B is similar to A and C is similar to B, then C is similar to A. Therefore similaritytransformations can have several steps.

2. Similar matrices have identical eigenvalues including multiplicities.

Note that Ak = (XBX−1)k = · · · = XBkX−1. Can we find a simple B which is similarto A?

For each of the eigenvalues λi we have

Axi = λixi for some xi 6= 0.

HenceAX = XΛ

whereX = (x1, x2, . . . , xn) and Λ = diag(λ1, λ2, . . . , λn).

If the eigenvectors can be chosen to be linearly independent, then X is nonsingular and

X−1AX = Λ,

in which case we say that A is diagonalizable. Otherwise A is defective.example

A =

1 0 0

0 1 20 0 1

Eigenvalues are 1, 1, 1. Eigenvectors must satisfy (A − I)x = 0. 0 0 0

0 0 20 0 0

x1

x2

x3

= 0 ⇒ x3 = 0

⇒ x =

0

x2

x3

99

where x2 and x3 are arbitrary but not both zero. Thus the eigenvectors corresponding to 1lie in a 2-dimensional subspace. We do not have three linearly independent eigenvectors, sothis matrix is defective.

Theorem: A matrix having distinct eigenvalues is diagonalizable.Proof for n = 3. Suppose

Axi = λixi, xi 6= 0, i = 1, 2, 3.

Supposeα1x1 + α2x2 + α3x3 = 0. (4.2)

We need to show α1 = α2 = α3 = 0. Without loss of generality it is enough to show α1 = 0.Multiply (4.2) by (A − λ2I)(A − λ3I),

(A − λ2I)(A − λ3I)xi = (A − λ2I)(λi − λ3)xi = (λi − λ2)(λi − λ3)xi,

and (4.2) becomesα1(λ1 − λ2)(λ1 − λ3)x1 = 0.

Distinctness ⇒ α1 = 0. 2

Example Recall that

[0 11 1

]has eigenvalues 1

2+

√5

2, 1

2−

√5

2. The corresponding eigen-

vectors are

[−1

2+

√5

2

1

],

[−1

2−

√5

2

1

]. So

[0 11 1

]= X

[12

+√

52

0

0 12−

√5

2

]X−1 where

X =

[−1

2+

√5

2−1

2−

√5

2

1 1

], so

[0 11 1

]k

=1√5

[(1+

√5

2)k−1 − (1−√

52

)k−1 (1+√

52

)k − (1−√5

2)k

(1+√

52

)k − (1−√5

2)k (1+

√5

2)k+1 − (1−√

52

)k+1

].

For large k, Ak is dominated by the largest eigenvalue(s). Hence the spectral radius

ρ(A) := max1≤i≤n

|λi(A)|.

Jordan canonical form. Any matrix can be transformed by means of a similarity trans-formation to the form diag(J1, J2, . . . , Jm) where each basic Jordan block Jk has the form

Jk =

λk 1

λk. . .. . . 1

λk

.

The Cayley-Hamilton Theorem asserts that any matrix A satisfies p(A) = 0 where p(ζ)is the characteristic polynomial of A.

100

4.1.4 Unitary similarity transformations

The X needed to diagonalize a matrix can be very ill-conditioned, e.g., for A =

[1 + ε 1

0 1

]we need X =

[1 −1/ε0 1

]to get X−1AX =

[1 + ε 0

0 1

]. (For ε = 0, A is defec-

tive.) However, to get eigenvalues (and eigenvectors) it is sufficient to have X−1AX =upper triangular. And it is possible to achieve this using an X with κ2(X) = 1. For a

unitary matrix U ∈ Cn×n, κ2(U) = ‖U−1‖2‖U‖2 = ‖UH‖2‖U‖2 = 1.THEOREM Schur decomposition If A ∈ Cn×n, then there exists a unitary matrix U ∈

Cn×n such that R = UHAU is upper triangular.Proof. By induction on n. True for n = 1. Assume true for n − 1. We will seek unitary

P such that

PHAP =

[λ bH

0 B

].

for some λ, bH, B. In other words we want

PHAPe1 = λe1

or equivalentlyA(Pe1) = λ(Pe1).

Let λ be an eigenvalue of A and x an eigenvector with ‖x‖2 = 1. We are done if we can chooseunitary P so that Pe1 = x. Equivalently we want PHx = e1. A generalized Householderreflection

PH = I − 2vvH

vHv, v = x − e1

can be shown to work. To guard against division by zero, choose x so that eT1 x ≤ 0. There

is unitary V such that V HBV =: R is upper triangular. Let

U = P

[1

V

].

then

UHAU =

[λ bH

0 R

]. 2

The construction in this proof is known as deflation: it depends on being able to findone eigenvalue of a matrix. If A is real and has only real eigenvalues then U and R may bechosen to be real. If A is real but has some imaginary eigenvalues (which must occur in pairsλ, λ), then neither U nor R can be real. Unfortunately complex arithmetic is expensive:double the storage, four times the computation. In the case of factoring polynomials we canavoid complex arithmetic by allowing the factors to be either linear or quadratic with twoimaginary roots λ and λ. Analogously real matrices have a real Schur decomposition:

101

THEOREM Real Schur decomposition If A ∈ Rn×n, then there exists an orthogonal

matrix Q ∈ Rn×n such that R = QTAQ is block upper triangular with diagonal blocks thateither are 1 by 1 or are 2 by 2 with imaginary eigenvalues.

4.1.5 Symmetric matrices

Theorem If A is real symmetric, then there exists a real orthogonal Q such that QTAQ =Λ = diag(λ1, λ2, . . . , λn), λi ∈ R.

Proof. Consider the real Schur decomposition

R = QTAQ.

ThenRT = QTATQ = QTAQ = R.

Hence R is symmetric block diagonal with blocks that either are 1 by 1 or are symmetricand 2 by 2 with imaginary eigenvalues. However, a 2 by 2 symmetric matrix cannot haveimaginary eigenvalues, so R must be diagonal. 2

AQ = QΛA(Qei) = (Qei)λi

Qei is an eigenvector, and λi is eigenvalue. These eigenvectors form an orthonormal set. Theeigenvalues are real. If A is s.p.d., the eigenvalues are positive.

4.1.6 Matrix norms

THEOREM ρ(A) ≤ ‖A‖ for a consistent matrix norm.Proof. There exists an x 6= 0 and |λ| = ρ(A) such that Ax = λx.

|λ|‖x‖ ≤ ‖A‖‖x‖. 2

THEOREM The spectral norm

‖A‖2 =√

ρ(ATA).

Proof.

‖A‖2 = maxx 6=0

√xTATAx√

xTx.

Since ATA is real symmetric, there exists a real orthogonal Q and real diagonal Λ such that

QT(ATA)Q = Λ.

So

‖A‖22 = max

x 6=0

xTQΛQTx

xTx.

102

Let y = QTx:

‖A‖22 = max

y 6=0

yTΛy

yTy

= max1≤i≤n

λi (exercise)

= ρ(ATA). 2

If A is symmetric‖A‖2 = ρ(A2)1/2 = ρ(A).

THEOREM‖AT‖2 = ‖A‖2.

Proof. It is enough to show that ρ(AAT) = ρ(ATA), and for this it is enough to showthat if λ 6= 0 then

λ is an eigenvalue of AAT ⇔ λ is an eigenvalue of ATA.

Thus let AATx = λx λ 6= 0, x 6= 0. Then ATA(ATx) = λ(ATx). λ 6= 0 ⇒ ATx 6= 0, whichmeans λ is an eigenvalue of ATA. Similarly, a nonzero eigenvalue of ATA is an eigenvalue ofAAT. 2

4.1.7 ? Gerschgorin disks

Useful information about eigenvalues is often provided byTHEOREM (Gerschgorin)

λ(A) ⊂ D1 ∪ D2 ∪ · · · ∪ Dn

whereDi = {z : |z − aii| ≤

∑j 6=i

|aij|} Gerschgorin disk .

Example

A =

1 0.1 0.1

0.1 1.1 0.10.1 0.1 2

D1 : |z − 1| ≤ 0.2

D2 : |z − 1.1| ≤ 0.2D3 : |z − 2| ≤ 0.2

21D UD 3D0

plane

complex

103

Proof.λ ∈ λ(A) ⇒ Ax = λx for some x 6= 0.

(λ − aii)xi =∑j 6=i

aijxj .

|λ − aii‖xi| ≤∑j 6=i

|aij‖xj | ≤ ‖x‖∞∑j 6=i

|aij |.

Choose i so that |xi| = ‖x‖∞, and divide by this. 2

THEOREM If k Gerschgorin disks of the matrix A are disjoint from the other disks, thenexactly k eigenvalues of A lie in the union of the k disks. 2

Example (continued). Two eigenvalues lie in D1 ∪ D2; one lies in D3. Since similaritytransformations do not change λ(A), consider D−1 = diag(1, 1, ε) and

D−1AD =

1 0.1 0.1/ε

0.1 1.1 0.1/ε0.1ε 0.1ε 2

.

D3 : |z − 2| ≤ 0.2ε

contains an eigenvalue as long as D3 and D1 ∪ D2 are disjoint:

2 − 0.2ε > 1.2 + 0.1/ε

e.g., ε = 0.2. Hence |λ − 2| ≤ 0.04 for some λ ∈ λ(A).

Review questions

1. For A ∈ Cm×n what is the meaning of AH?

2. What is a Hermitian matrix?

3. What is a unitary matrix?

4. What is the inner product of x and y for x, y ∈ Cn?

5. Define an eigenvalue and eigenvector.

6. What is the characteristic polynomial of a matrix?

7. Define the multiplicity of an eigenvalue.

8. What can we say about the eigenvalues of a real matrix?

9. What is the relationship between the determinant of a matrix and its eigenvalues?

104

10. Given eigenvalues and eigenvectors of a matrix A what can we say about those of f(A)where f(ζ) is a polynomial? where f(ζ) = ζ−1?

11. What can we say about the determinant of a block triangular matrix with diagonalblocks T11, T22, . . . , Tpp?

12. What can we say about the eigenvalues of a block triangular matrix with diagonalblocks T11, T22, . . . , Tpp?

13. Define a similarity transformation.

14. Show that being similar is an equivalence relation.

15. Show that similar matrices have the same eigenvalues including multiplicities.

16. What does it mean for a matrix to be diagonalizable?

17. What condition on the eigenvalues is sufficient for a matrix to be diagonalizable? Proveit.

18. What is a defective matrix? Give an example.

19. What condition on the eigenvalues of a matrix is sufficient for it to be diagonalizable?

20. How can one express the kth power of a matrix in terms of the kth power of scalars?

21. Define the spectral radius of a matrix?

22. What is a basic Jordan block?

23. What is a Jordan canonical form?

24. What can be proved about Jordan canonical forms?

25. What is the Cayley-Hamilton Theorem?

26. What is the condition number of a unitary matrix?

27. What is a Schur decomposition?

28. How many linearly independent eigenvectors must an n by n matrix have?

29. What is the name of the process called by which we “factor out” an eigenvalue andeigenvector of a matrix?

30. Under what condition can a Schur decomposition consist of real matrices?

31. To what extent can we factor a polynomial with real coefficients into lower degreepolynomials with real coefficients?

105

32. What is a real Schur decomposition of a real matrix?

33. What form does a real Schur decomposition of a real symmetric matrix take?

34. What can we say about the eigenvalues and eigenvectors of a real symmetric matrix?

35. What can be said about the eigenvalues and eigenvectors of a Hermitian matrix?

36. What is the relation between a consistent norm of a matrix and its eigenvalues?

37. Define the spectral norm and give a formula for it.

38. What is the spectral norm of a symmetric matrix?

39. What is the spectral norm of the transpose of a matrix?

Exercises

1. Find the eigenvectors of

1 2 4

0 2 00 1 2

.

2. Suppose A−2I is a singular matrix. What, in terms of the matrix A, is (special about)the null space of A − 2I?

3. (a) Show that if a real matrix A satisfies xHAx > 0 for all complex x 6= 0, it must besymmetric positive definite.

(b) Let A be a complex matrix. Show that if xHAx is real for all complex x, it mustbe Hermitian.

4. A matrix is idempotent if A2 = A. Show that if A is idempotent, then its eigenvaluesare zeros and ones.

5. Determine the eigenvalues of xyT where x and y are column vectors.

6. Show that

[0 10 0

]is not diagonalizable.

7. (a) Let A ∈ Rn×n have the nonreal eigenvalue λ. Show that there exists X ∈ R

n×2,M ∈ R2×2 satisfying AX = XM where X has linearly independent columns andλ(M) = {λ, λ}.

(b) Show that there exists an orthogonal matrix Q and a nonsingular matrix T suchthat

QTXT−1 =

(I2

0

).

106

(c) Show that QTAQ =

(B C0 E

)where λ(P ) = {λ, λ}. Hint: obtain the first two

columns by some matrix operation.

8. Show that if ρ(A) < 1 then I − A is nonsingular.

9. Show that a Hermitian matrix has real eigenvalues. You may use the fact that yHy ≥ 0for any complex vector y and that λ2 ≥ 0 implies λ is real.

10. Show that the eigenvalues of a real symmetric positive definite matrix are positive.

11. Show that for a real diagonal matrix D

maxx 6=0

xTDx

xTx= max

1≤i≤ndii.

12. Show that for a consistent matrix norm the condition number

κ(A) ≥ max |λ(A)|min |λ(A)| .

13. Why can we be sure that all the eigenvalues of the following matrix are real? Give acomplete answer.

2 0.5 0.2 0.10 4 0.4 −0.30.3 0.2 6 −0.10.1 −0.1 0.1 8

14. Show that the spectral radius of a matrix does not exceed the norm of the matrix forany matrix norm subordinate to some vector norm.

15. What is the spectral radius of

[0 02 0

]? What is the spectral norm of this matrix?

16. from BIT “(University of Uppsala, Second Course) The eigenvalues to the matrix

A(ε) =

3 0.5 ε 00.5 1 −ε 0ε −ε 2 −10 0 −1 2

are separated for small enough ε. Localize the eigenvalues and determine how large εcan be and still guarantee that the eigenvalues are separated.Answer: The eigenvalues are close to 0.882, 1, 3 and 3.118. They are separated as longas |ε| ≤ 0.04.” Hint: be willing to diagonalize 2 by 2 matrices.

107

17. What are the possible Jordan canonical forms for a real 4 by 4 matrix two of whoseeigenvalues are both equal to i?

18. Extend the construction of the Householder reflection to include any complex vectorx. That is, construct an elementary unitary matrix P such that Px is a scalar multipleof e1.

19. What are the possible Jordan canonical forms for a real 4 by 4 matrix two of whoseeigenvalues are both equal to i?

20. Give an example of a 2 by 2 matrix A such that ρ(A) = 0 but ‖A‖2 = 1.

21. Extend the construction of the Householder reflection to include any complex vectorx. That is, construct an elementary unitary matrix P such that Px is a scalar multipleof e1.

22. Let c, s ∈ R satisfy c2 + s2 = 1. The matrix

[c s−s c

]has eigenvalues λ1 = c + is and

λ2 = c − is. Calculate the eigenvector x1 corresponding to λ1.Calculate xT

1 x1.Calculate xH

1 x1.

23. Let A be a 5 by 3 matrix with singular values 2, 1, 0. For the following questions it isenough to state the answer. What are the eigenvalues of ATA?of AAT?

24. Find and correct the subtle but fatal flaw in the following proof:

Theorem The eigenvalues of an orthogonal matrix have modulus 1.Proof. Let λ be an eigenvalue of an orthogonal matrix Q. Then Qx = λxand xTQT = λxT, whence

xTQTQx = λ2xTx

xTx = λ2xTx

1 = λ2

1 = |λ|.

108

4.2 Reduction to Upper Hessenberg Form

In a finite number of arithmetic operations no algorithm can by similarity transformationstriangularize A. The best we can do is

QTAQ = H =

× × × · · · · · · ×× × × · · · · · · ×

× × · · · · · · ×. . .

. . ....

× × ×× ×

upperHessenberg

form

We can do this column by column.Suppose that elements in the first k−1 columns under the first subdiagonal have already

been eliminated so that the matrix has the form[Hk Bk

0 ak Ak

]where Hk is a k by k upper Hessenberg matrix and ak is an n − k by 1 column vector. Toleave Hk unchanged, we seek an orthogonal transformation of the form[

I 0

0 PTk

] [Hk Bk

0 ak Ak

] [I 0

0 Pk

]=

[Hk BkPk

0 PTk ak PT

k AkPk

].

Also, we want PTk ak to be all zero except for its first element. This can be accomplished

with a Householder reflection, for which PTk = Pk.

The cost of doing the kth column is as follows:

determine Pk = I − βkvkvTk n − k + 1 mults, 1 square root

form PkAk 2(n − k) mults per column times (n − k) columns

form

[Bk

Ak

]Pk 2(n − k) mults per row times n rows

n − k + 1 + 2(n − k)2 + 2n(n − k)

The total cost is 53

n3 multiplications.In sum with

Pk :=

[I 0

0 Pk

]we have

Pn−2(· · · (P2(P1︸︷︷︸QT

A P1)P2) · · · )Pn−2︸︷︷︸Q

=: H.

If A is symmetric, then H is symmetric.

symmetric + upper Hessenberg ⇒ tridiagonal

Cost: 23

n3 mults.

109

Review questions

1. What is an upper Hessenberg matrix?

2. How close can we come to upper triangularizing a matrix in a finite number of arith-metic operations?

3. Show how an orthogonal similarity transformation can be used to zero out all but thefirst two elements in the first column of a matrix. For the orthogonal matrix it issufficient to state a previously obtained result.

4. If a square matrix A is reduced by a similarity transformation to an upper Hessenbergmatrix H using Householder reflections P1, P2, . . . , what is the resulting factorizationof A?

5. What results if a symmetric matrix is reduced to upper Hessenberg form using orthog-onal similarity transformations?

6. How does the cost of reducing an n by n matrix to upper Hessenberg form using or-thogonal similarity transformations depend on n for a general matrix? for a symmetricmatrix?

Exercises

1. By means of an orthogonal similarity transformation make the following matrix tridi-agonal:

4 2 12 4 21 2 4

.

Note that the Householder reflection for a vector

[xy

]is given by

−sign(x)(x2 + y2)−1/2

[x yy −x

].

2. Apply an orthogonal similarity transformation to the following matrix so that the (3,1)and (4,1) elements of the transformed matrix are both zero:

1 1 3 2

−4 0 1 00 0 2 13 5 1 0

You may leave your answer as a product of 4 by 4 matrices.

110

3. Reduction of a full 5 by 5 matrix A to upper Hessenberg form H can be accomplishedby a similarity transformation

H = Pq · · ·P2P1AP1P2 · · ·Pq

where each Pk is a Householder reflection.

(a) How many reflections q are used?

(b) In what order are the 2q matrix products computed? (You may use parenthesesto show this.)

(c) Indicate the structure of P2 using the symbols 0, 1, × for elements of P2, where× represents an arbitrary value.

4. Determine an orthogonal similarity transformation that makes the following matrixtridiagonal:

1 3 43 −4 34 3 1

.

Hint: (x2 + y2)−1/2

[x yy −x

] [xy

]=

[(x2 + y2)1/2

0

].

4.3 The Power Method

Eigenvalues are the roots of the characteristic polynomial p(λ) := det(λI − A). Thus weexpect no direct method; any method for finding eigenvalues must have an iterative part.

The following algorithm is unstable:

form the characteristic polynomial;find the roots of the characteristic polynomial.

This is because finding the roots of the polynomial can be much more ill-conditioned thanfinding eigenvalues; i.e., the results are much more sensitive to errors in the coefficients. Onthe other hand the roots of λn + cn−1λ

n−1 + · · · + c1λ + c0 = 0 are the eigenvalues of itscompanion matrix

0 −c0

1 0 −c1

1. . . −c2

. . .. . .

.... . . 0 −cn−2

1 −cn−1

.

111

It is reasonable to solve a polynomial equation by computing the eigenvalues of its companionmatrix.

The power method is good for a few eigenvalues and eigenvectors. If A is diagonalizable,

X−1AX = Λ,A[x1, x2, . . . , xn] = [x1, x2, . . . , xn] diag(λ1, λ2, . . . , λn).

Assume there is a unique eigenvalue of maximum magnitude:

|λ1| > |λ2| ≥ |λ3| ≥ · · · ≥ |λn|.

(What can we say about λ1?) Let z0 = initial guess for x1. Algorithm:

z1 = Az0, z2 = Az1, . . . , zk+1 = Azk

zk = Akz0

= (XΛX−1)kz0

= XΛkX−1z0

= Xλk1diag

(1,

(λ2

λ1

)k

, . . . ,

(λn

λ1

)k)

X−1z0

= λk1X

(e1e

T1 + O

(∣∣∣∣λ2

λ1

∣∣∣∣k))

X−1z0

= λk1

(x1(e

T1 X−1z0) + O

(∣∣∣∣λ2

λ1

∣∣∣∣k))

assuming eT1 X−1z0 6= 0.

The Rayleigh quotient

zTk Azk

zTk zk

= λ1 + O

(∣∣∣∣λ2

λ1

∣∣∣∣k)

.

The Rayleigh quotient is the value λ that satisfies zkλ ≈2 Azk. If A is symmetric, then

zTk Azk

zTk zk

= λ1 + O

(∣∣∣∣λ2

λ1

∣∣∣∣2k)

.

The power method is good only for the largest eigenvalue in magnitude.Inverse iteration. Apply power method to A−1 to get smallest eigenvalue in magnitude

assuming |λn| < |λn−1|. More generally consider

(A − σI)−1, σ ∈ C,

112

which has eigenvalues1

λ1 − σ,

1

λ2 − σ, . . . ,

1

λn − σ.

The largest of these is1

λp − σwhere λp is closest to σ.

zk+1 = (A − σI)−1zk use LU factorization

Note that Azk+1 = σzk+1 + zk is available, so use

λp ≈zT

k+1Azk+1

zTk+1zk+1

= σ +zT

k+1zk

zTk+1zk+1

,

for which one can show the error is O(|λp − σ|k/|λq − σ|k) where λq is the second closesteigenvalue to σ.

To prevent overflow or underflow, form a normalized sequence:

yk = zk/‖zk‖2,

zk+1 = (A − σI)−1yk.

Then

λp ≈ σ +zT

k+1yk

zTk+1zk+1

.

4.3.1 Rayleigh quotient iteration

The idea is to use the latest Rayleigh quotient estimate to continually update the eigenvalueestimate λ∗ in inverse iteration.

zk+1 = (A − σkI)−1yk

Rayleigh quotient estimate for A

σk+1 = σk +zT

k+1yk

zTk+1zk+1

The order of convergence is quadratic in general

‖yk+1 − (±xp)‖ = O(‖yk − xp‖2)

and cubic for a symmetric matrix

‖yk+1 − (±xp)‖ = O(‖yk − xp‖3).

If eigenvalue λp < 0, the signs will alternate.

113

Review questions

1. Given a polynomial, how does one construct a matrix whose eigenvalues are the rootsof the polynomial? What do we call such a matrix?

2. What is the algorithm for the power method?

3. What limitation of floating-point arithmetic is a significant danger for the powermethod and how is this avoided?

4. Under what condition does the power method converge and what does it converge to?

5. Assuming the power method converges, how rapidly does it converge?

6. What is the Rayleigh quotient and what does it approximate?

7. How can we generalize the power method to estimate the eigenvalue closest to somegiven value σ?

8. Assuming inverse iteration with a (fixed) shift σ converges, how rapidly does it con-verge?

9. Explain in words the idea of Rayleigh quotient iteration. What is its order of conver-gence in general? for a symmetric matrix?

Exercises

1. Beginning with initial guess[

0 0 1]T

, do two iterations of the power method for 1 0.1 0.1

0.1 1.1 0.10.1 0.1 2

.

Also form the Rayleigh quotients for the zeroth and first iterates.

2. If we applied the power method to (A − σI)−1, we could change σ from one iterationto the next using at each iteration a new and better approximation to an eigenvalue ofA. There is a serious drawback, however. What is it? Secondly, what difficulty mightwe expect if σ is extremely close to an eigenvalue of A?

3. For the matrix

0 0 0 0 11 0 1 0 00 0 0 1 00 1 0 0 00 0 0 1 0

114

do two iterations of the power method beginning with [1, 1, 1, 1, 1]T. (Renormalizationis unnecessary.) Use the results of the two iterations to estimate the largest eigenvalueof the matrix.

4.4 The QR Iteration

What if we try to make A upper triangular rather than upper Hessenberg? Consider the2 × 2 case

A =

[a bc d

].

Then

v =

[a + sign(a)

√a2 + c2

c

] aftermuch

algebraP =

−sign(a)√a2 + c2

[a cc −a

].

For example,

A =

[2 −11 0

], P =

−1√5

[2 11 −2

]=: QT

1

ourorthogonal

matrix

QT1 A =

−1√5

[5 −20 −1

], but QT

1 AQ1 =

[85

95−1

525

]=: A1.

After postmultiplication it is no longer upper triangular, but the (2,1) element has beenreduced. Using A1

QT2 = − 1√

135

[85

−15−1

5−8

5

], A2 := QT

2 A1Q2 =

[1813

−2513

113

813

].

The (2,1) element is still smaller.

QT3 = − 1√

2513

[1813

113

113

−1813

], A3 := QT

3 A2Q3 =

[3225

4925− 1

251825

]

And still smaller. However convergence becomes very slow. (Eigenvalues are 1, 1.)For

A =

[0 11 0

], QT

1 =

[0 11 0

], and QT

1 AQ1 =

[0 11 0

]there is no convergence. (Eigenvalues are 1, −1.)

115

More generally for an n × n matrix

A0 := A,Ak+1 := (QT

k+1Ak)︸︷︷︸upper

triangular

Qk+1

where QTk+1 is a sequence of n − 1 Householder reflections. As k → ∞, Ak becomes a block

upper triangular matrix. The first block contains eigenvalues of largest magnitude, thesecond block contains eigenvalues of next to largest magnitude, . . ., the last block containseigenvalues of least magnitude. If |λ1| > |λ2| > · · · > |λn|,

Ak →

λ1 × · · · ×λ2 · · · ×

. . ....

λn

Schur

form

If A ∈ Rn×n, this is possible only if all eigenvalues are real. Rate of convergence depends oneigenvalue ratios, in particular (Ak)ij = O(|λi/λj|k) for j < i.

Convergence can be improved by a shifted QR iteration. For example, if λ(A) = {−1, 1, 1.1},then λ(A − 0.9I) = {−1.9, 0.1, 0.2}, which are much better separated. The iteration is

Ak+1 := (QTk+1(Ak − σk+1I))︸︷︷︸upper triangular

Qk+1 + σk+1I .

Note that as before Ak+1 = QTk+1AkQk+1, but a different Qk+1 is computed. For the shifts

we use approximations to eigenvalues, which can be obtained from diagonal elements of Ak.In particular, we seek for σk+1 an approximation to the smallest eigenvalue, for which the(n, n) entry is a typical choice.

From before,

A1 =

[85

95−1

525

].

The last diagonal element is an approximation to the smaller eigenvalue. So we use a shiftσ2 = 2

5:

A1 − σ2I =

[65

95−1

50

], QT

2 =−1√

3725

[65

−15−1

5−6

5

],

A2 = QT2 (A1 − σ2I)Q2 + σ2I =

[242185

−361185

9185

128185

],

which is better than without the shift. The cost of a QR iteration is 43n3 multiplications in

general, but only 4n2 multiplications for Ak = upper Hessenberg.The shifted QR iteration preserves an upper Hessenberg form. More specifically, the

following can be proved (Exercise 2):

116

Assume that Ak is upper Hessenberg with nonzero subdiagonal elements2 and thatAk − σk+1I = Qk+1Rk+1 where Qk+1 is orthogonal and Rk+1 is upper triangular.Then Ak+1 = Rk+1Qk+1 + σk+1I is upper Hessenberg.

Thus we should choose

A0 = (Pn−2(· · · ((P1A)P1) · · · ))Pn−2 upper Hessenberg.

If A is symmetric, then Ak is symmetric and therefore tridiagonal.

Review questions

1. What does one QR iteration do to the eigenvalues of a matrix?

2. In the QR iteration Ak+1 = QTk+1AkQk+1 what is special about Qk+1? Give a complete

answer.

3. What does one QR iteration Ak+1 = QTk+1AkQk+1 do to the eigenvectors of a matrix?

4. If Ak is a sequence of square matrices produced by a QR iteration, under what as-sumption do the elements below the diagonal converge to zero? Under this assumptionwhat do the diagonal elements converge to?

5. Give the formula for one iteration of a shifted QR iteration and explain how the Qmatrix is chosen.

6. What does one shifted QR iteration do to the eigenvalues of a matrix?

7. What preprocessing step is recommended before doing a QR iteration?

8. How does the cost of a QR iteration for an n by n matrix depend on n for a generalmatrix? for an upper Hessenberg matrix?

9. Name some computationally significant properties that are preserved by a QR iterationusing Householder reflections or Givens rotations.

Exercises

1. Show that a single QR iteration is enough to transform the square matrix xyT to aright triangular matrix. Assume x 6= 0 and y1 6= 0. Hint: what can you prove about arank one upper triangular matrix?

2. Prove that the shifted QR iteration preserves an upper Hessenberg form for a matrixwith nonzero subdiagonal elements. Hints:

2If a subdiagonal element is zero, the eigenvalue problem divides into two smaller problems.

117

(i) Do a (1, n−1) by (n−1, 1) partitioning of of Ak−σk+1I and Qk+1 and an (n−1, 1)by (n−1, 1) partitioning of Rk+1, and show that Qk+1 must be upper Hessenberg.

(ii) Do not use the fact that Qk+1 is orthogonal.

4.5 The Lanczos Method

The Lanczos method computes an approximation to the extreme eigenvalues of a symmetricmatrix. The generalization to a nonsymmetric matrix is known as the Arnoldi method.

The most direct way of presenting this method is to consider the transformation of asymmetric matrix A to tridiagonal form

AQ = QT (4.3)

where Q = [q1q2 · · · qn] is an orthogonal matrix and T is a symmetric tridiagonal matrix

T =

α1 β1

β1 α2. . .

. . .. . . βn−1

βn−1 αn

.

For large sparse matrices, most of whose elements are zeros, it is impractical to compute allof this. As a heuristic the Lanczos method computes only the upper left m by m block ofT , call it Tm,

T =

Tm0

βm

βm

0

. . .. . .

. . .. . .

. . .. . .

. . .

,

and computes its extreme eigenvalues as an approximation to those of T .The algorithm operates by satisfying eq. (4.3) column by column. Taking the first column

of this equation givesAq1 = α1q1 + β1q2.

Together with the condition that q1, q2 be an orthonormal multiset, this constitutes n + 3equations for 2n+2 unknowns. It happens that the factorization (4.3) is not unique, and weare free to choose q1 to be an arbitrary normalized vector. We are left with n + 2 equationsfor n + 2 unknowns. We obtain

q2 = (Aq1 − α1q1)/β1

118

where α1 is obtained by the orthogonality of q1, q2 and β1 by normalization of q2.Taking the kth column of eq. (4.3) gives

qk+1 = (Aqk − αkqk − βk−1qk−1)/βk.

Orthogonality of qk+1 with qj for j ≤ k − 1 is automatic, orthogonality with qk determinesαk, and normalization determines βk.

Review questions

1. What distinguishes the Lanczos method from the Arnoldi method?

2. Which eigenvalues of a matrix does the Lanczos method find?

Exercises

1. Show that if the Lanczos algorithm is carried to completion, the last column of eq. (4.3)can be satisfied.

2. Fill in the blanks for Lanczos method:

q1 = arbitrary normalized vector;p = Aq1;α1 = ;p = p − α1q1;β1 = ;q2 = ;for k = 2, . . . , n − 1 {

p = Aqk;αk = ;p = p − αkqk − βk−1qk−1;βk = ;qk+1 = ;

}αn = ;

4.6 Other Eigenvalue Algorithms

4.6.1 Jacobi

The Jacobi method is applicable to symmetric matrices. The idea is to use a Givens rotationin a similarity transformation to zero out a designated element of the matrix. To zero out

119

the (i, j)th element of a matrix, j < i, we choose a sine s and cosine c so that[c s−s c

] [aii aij

aij ajj

] [c −ss c

]=

[a′

ii 00 a′

jj

].

This rotation is then used to recombine the ith and jth rows and the ith and jth columnsof the full matrix A. For formulas for s and c see the book by Heath.

These transformations are applied in succession to all the off diagonal elements of thematrix. However, the annihilation of one pair of elements will cause previously annihilatedelements to reappear. Nonetheless, the process does converge fairly quickly if it is repeatedagain and again.

4.6.2 Bisection

Let A be a symmetric tridiagonal matrix without any nonzeros in its subdiagonal. (If thereis a zero, then the matrix is block diagonal and the eigenvalues of each block can be foundindependently.)

By constructing what is known as a Sturm sequence, it is possible to count the number ofeigenvalues of A that are less than any given real number σ. This information can be usedby the bisection method to isolate eigenvalues.

The method relies upon the fact that the leading principal submatrices of A have strictlyinterlacing eigenvalues; more specifically, (strictly) between any two successive eigenvaluesof the kth order leading principal submatrix Ak is exactly one eigenvalue of Ak−1.

A Sturm sequence d0, d1, . . . , dn is defined by setting d0 = 1 and dk = det(Ak − σI)for k ≥ 1. These determinants can be calculated efficiently. The eigenvalues of Ak − σI,k = 1, 2, . . . , n are strictly interlacing. We can use this fact together with the fact that dk isthe product of the eigenvalues of Ak − σI to count the number of negative eigenvalues νk ofAk − σI. Set ν0 = 0 and suppose that νk−1 is known. Then the interlacing property impliesthat νk is either νk−1 or νk−1 + 1. If dk−1 and dk are both positive or both negative, thenνk = νk−1. If they are of (strictly) opposite sign, then νk = νk−1 +1. The case of a zero valuein the Sturm sequence is left as an exercise. The result of such a calculation is a count νn ofthe number of negative eigenvalues of A − σI or equivalently the number of eigenvalues ofA which are less than σI.

Exercise

1. Explain in the cases where dk−1 and/or dk is zero whether or not to increment thecount of negative eigenvalues.

4.7 Computing the SVD

Consider the case A ∈ Rm×n, n ≤ m.

120

1. eigenvalues, eigenvectors of ATA, AAT:

(ATA)(V ei) = σ2i (V ei),

(AAT)(Uei) =

{σ2

i (Uei), i ≤ n,0, i ≥ n + 1.

2. Golub–Kahan algorithm

I. UT0 AV0 =

[B0

]B =

b11 b12 0

b22. . .. . . bn−1,n

0 bnn

II. UT1 BV1=Σ1

Therefore

A = U0

[U1

I

] [Σ1

0

]V T

1 V T0 .

Review question

1. Show how to transform the problem of computing a singular value decomposition intoproblems of diagonalizing matrices.

Exercise

1. Let A ∈ Rm×n, n < m. Using the SVD, relate the eigenvalues of AAT to those of ATA.

121

chapter 4 eigenvalue problem - purdue university

Documents