parallel numerical algorithmssolomon2.web.engr.illinois.edu/teaching/cs554/slides/... · 2019. 11....

BasicsPower Iteration

QR IterationKrylov MethodsOther Methods

Parallel Numerical AlgorithmsChapter 5 – Eigenvalue Problems

Section 5.2 – Eigenvalue Computation

Michael T. Heath and Edgar Solomonik

Department of Computer ScienceUniversity of Illinois at Urbana-Champaign

CS 554 / CSE 512

Michael T. Heath and Edgar Solomonik Parallel Numerical Algorithms 1 / 42



Outline

1 Basics

2 Power Iteration

3 QR Iteration

4 Krylov Methods

5 Other Methods




DefinitionsTransformationsParallel Algorithms

Eigenvalues and Eigenvectors

Given n× n matrix A, find scalar λ and nonzero vector xsuch that

Ax = λx

λ is eigenvalue and x is corresponding eigenvector

A always has n eigenvalues, but they may be neither realnor distinct

May need to compute only one or few eigenvalues, or all neigenvalues

May or may not need corresponding eigenvectors





Problem Transformations

Shift : for scalar σ, eigenvalues of A− σI are eigenvaluesof A shifted by σ, λi − σ

Inversion : for nonsingular A, eigenvalues of A−1 arereciprocals of eigenvalues of A, 1/λi

Powers : for integer k > 0, eigenvalues of Ak are kthpowers of eigenvalues of A, λki

Polynomial : for polynomial p(t), eigenvalues of p(A) arevalues of p evaluated at eigenvalues of A, p(λi)





Similarity Transformations

B is similar to A if there is nonsingular T such that

B = T−1AT

Then

By = λy ⇒ T−1ATy = λy ⇒ A(Ty) = λ(Ty)

so A and B have same eigenvalues, and if y iseigenvector of B, then x = Ty is eigenvector of A

Similarity transformations preserve eigenvalues, andeigenvectors are easily recovered





Similarity Transformations

Forms attainable by similarity transformation

A T Bdistinct eigenvalues nonsingular diagonalreal symmetric orthogonal real diagonalcomplex Hermitian unitary real diagonalnormal unitary diagonalarbitrary real orthogonal real block triangular (real Schur)arbitrary unitary upper triangular (Schur)arbitrary nonsingular almost diagonal (Jordan)





Preliminary Reduction

Eigenvalues easier to compute if matrix first reduced tosimpler form by similarity transformation

Diagonal or triangular most desirable, but cannot alwaysbe reached in finite number of steps

Preliminary reduction usually to tridiagonal form (forsymmetric matrix) or Hessenberg form (for nonsymmetricmatrix)

Preliminary reduction usually done by orthogonaltransformations, using algorithms similar to QRfactorization, but transformations must be applied fromboth sides to maintain similarity





Parallel Algorithms for Eigenvalues

Algorithms for computing eigenvalues and eigenvectorsemploy basic operations such as

vector updates (saxpy)inner productsmatrix-vector and matrix-matrix multiplicationsolution of triangular systemsorthogonal (QR) factorization

In many cases, parallel implementations will be based onparallel algorithms we have already seen for these basicoperations, although there will sometimes be new sourcesof parallelism




Power IterationInverse IterationSimultaneous Iteration

Power Iteration

Simplest method for computing one eigenvalue-eigenvector pair is power iteration, which in effect takessuccessively higher powers of matrix times initial startingvector

x0 = arbitrary nonzero vectorfor k = 1, 2, . . .

yk = Axk−1xk = yk/‖yk‖∞

end

If A has unique eigenvalue λ1 of maximum modulus, thenpower iteration converges to eigenvector corresponding todominant eigenvalue





Power Iteration

Convergence rate of power iteration depends on ratio|λ2/λ1|, where λ2 is eigenvalue having second largestmodulus

It may be possible to choose shift, A− σI, so that ratio ismore favorable and yields more rapid convergence

Shift must then be added to result to obtain eigenvalue oforiginal matrix





Parallel Power Iteration

Power iteration requires repeated matrix-vector products,which are easily implemented in parallel for dense orsparse matrix, as we have seen

Additional communication may be required fornormalization, shifts, convergence test, etc.

Though easily parallelized, power iteration is often too slowto be useful in this form





Inverse Iteration

Inverse iteration is power iteration applied to A−1, whichconverges to eigenvector corresponding to dominanteigenvalue of A−1, which is reciprocal of smallesteigenvalue of A

Inverse of A is not computed explicitly, but onlyfactorization of A (and only once) to solve system of linearequations at each iteration

x0 = arbitrary nonzero vectorfor k = 1, 2, . . .

Solve Ayk = xk−1 for ykxk = yk/‖yk‖∞

end





Inverse Iteration

Shifting strategy can greatly accelerate convergence

Inverse iteration is especially useful for computingeigenvector corresponding to approximate eigenvalue,since it converges rapidly when approximate eigenvalue isused as shift

Inverse iteration also useful for computing eigenvalueclosest to any given value β, since if β is used as shift,then desired eigenvalue corresponds to smallesteigenvalue of shifted matrix





Parallel Inverse Iteration

Inverse iteration requires initial factorization of matrix Aand solution of triangular systems at each iteration, so itappears to be much less amenable to efficient parallelimplementation than power iteration

However, inverse iteration is often used to computeeigenvector in situations where

approximate eigenvalue is already known, so using it asshift yields very rapid convergencematrix has previously been reduced to simpler form (e.g.,tridiagonal) for which linear system is easy to solve





Simultaneous Iteration

To compute many eigenvalue-eigenvector pairs, couldapply power iteration to several starting vectorssimultaneously, giving simultaneous iteration

X0 = arbitrary n× q matrix of rank qfor k = 1, 2, . . .

Xk = AXk−1end

span(Xk) converges to invariant subspace determined byq largest eigenvalues of A, provided |λq| > |λq+1|

Normalization is needed at each iteration, and columns ofXk become increasingly ill-conditioned basis for span(Xk)




Orthogonal IterationQR Iteration

Orthogonal Iteration

Both issues addressed by computing QR factorization ateach iteration, giving orthogonal iteration

X0 = arbitrary n× q matrix of rank qfor k = 1, 2, . . .

Compute reduced QR factorizationQ̂kRk = Xk−1

Xk = AQ̂kend

Converges to block triangular form, and blocks aretriangular where moduli of consecutive eigenvalues aredistinct





Parallel Orthogonal Iteration

Orthogonal iteration requires matrix-matrix multiplicationand QR factorization at each iteration, both of which weknow how to implement in parallel with reasonableefficiency





QR Iteration

If we take X0 = I, then orthogonal iteration shouldproduce all eigenvalues and eigenvectors of A

Orthogonal iteration can be reorganized to avoid explicitformation and factorization of matrices XkInstead, sequence of unitarily similar matrices is generatedby computing QR factorization at each iteration and thenforming reverse product, giving QR iteration

A0 = Afor k = 1, 2, . . .

Compute QR factorizationQkRk = Ak−1

Ak = RkQkend





QR Iteration

In simple form just given, each iteration of QR methodrequires Θ(n3) work

Work per iteration is reduced to Θ(n2) if matrix is inHessenberg form, or Θ(n) if symmetric matrix is intridiagonal form

Preliminary reduction is usually done by Householder orGivens transformations

In addition, number of iterations required is reduced bypreliminary reduction of matrix

Convergence rate also enhanced by judicious choice ofshifts





Parallel QR Iteration

Preliminary reduction can be implemented efficiently inparallel, using algorithms analogous to parallel QRfactorization for dense matrix

But subsequent QR iteration for reduced matrix isinherently serial, and permits little parallel speedup for thisportion of algorithm

This may not be of great concern if iterative phase isrelatively small portion of total time, but it does limitefficiency and scalability





Reduction to Hessenberg/Tridiagonal

To reduce to Hessenberg can execute QR on each column

If Hi is the Householder transformation used to annihilateith subcolumn, perform similarity transformation

A(i+1) = HiA(i)HTi

More generally, run QR on b subcolumns of A to reduce toband-width b

B(i+1) = QiB(i)QTi

To avoid fill QTi must not operate on the b columns whichQi reduces

Once reduction completed to a band-width subsequenteliminations introduce fill (bulges) but can be done





Reduction from Banded to Tridiagonal

Two-sided successive orthogonal reductions generatebulges of fill when applied to a banded matrixBulges can be chased with pipelined parallelism




Krylov SubspacesArnoldi IterationLanczos IterationKrylov Subspace Methods

Krylov Subspace Methods

Krylov subspace methods reduce matrix to Hessenberg(or tridiagonal) form using only matrix-vector multiplication

For arbitrary starting vector x0, if

Kk =[x0 Ax0 · · · Ak−1x0

]then

K−1n AKn = Cn

where Cn is upper Hessenberg (in fact, companion matrix)






To obtain better conditioned basis for span(Kn), computeQR factorization

QnRn = Kn

so thatQHn AQn = RnCnR

−1n ≡H

with H upper Hessenberg






Equating kth columns on each side of equationAQn = QnH yields recurrence

Aqk = h1kq1 + · · ·+ hkkqk + hk+1,kqk+1

relating qk+1 to preceding vectors q1, . . . , qk

Premultiplying by qHj and using orthonormality,

hjk = qHj Aqk, j = 1, . . . , k

These relationships yield Arnoldi iteration, which producesupper Hessenberg matrix column by column using onlymatrix-vector multiplication by A and inner products ofvectors





Arnoldi Iteration

x0 = arbitrary nonzero starting vectorq1 = x0/‖x0‖2for k = 1, 2, . . .

uk = Aqkfor j = 1 to k

hjk = qHj uk

uk = uk − hjkqjendhk+1,k = ‖uk‖2if hk+1,k = 0 then stopqk+1 = uk/hk+1,k

end





Arnoldi Iteration

IfQk =

[q1 · · · qk

],

thenHk = Q

Hk AQk

is upper Hessenberg matrix

Eigenvalues of Hk, called Ritz values, are approximateeigenvalues of A, and Ritz vectors given by Qky, where yis eigenvector of Hk, are corresponding approximateeigenvectors of A

Eigenvalues of Hk must be computed by another method,such as QR iteration, but this is easier problem if k � n





Arnoldi Iteration

Arnoldi iteration expensive in work and storage becauseeach new vector qk must be orthogonalized against allprevious columns of Qk, which must be stored

So Arnoldi process usually restarted periodically withcarefully chosen starting vector

Ritz values and vectors produced are often goodapproximations to eigenvalues and eigenvectors of A afterrelatively few iterations

Work and storage costs drop dramatically if matrixsymmetric or Hermitian, since recurrence then has onlythree terms and Hk is tridiagonal (so usually denoted Tk),yielding Lanczos iteration





Lanczos Iteration

q0 = 0β0 = 0x0 = arbitrary nonzero starting vectorq1 = x0/‖x0‖2for k = 1, 2, . . .

uk = Aqkαk = q

Hk uk

uk = uk − βk−1qk−1 − αkqkβk = ‖uk‖2if βk = 0 then stopqk+1 = uk/βk

end





Lanczos Iteration

αk and βk are diagonal and subdiagonal entries ofsymmetric tridiagonal matrix Tk

As with Arnoldi, Lanczos iteration does not produceeigenvalues and eigenvectors directly, but only tridiagonalmatrix Tk, whose eigenvalues and eigenvectors must becomputed by another method to obtain Ritz values andvectors

If βk = 0, then algorithm appears to break down, but in thatcase invariant subspace has already been identified (i.e.,Ritz values and vectors are already exact at that point)





Lanczos Iteration

In principle, if Lanczos algorithm is run until k = n,resulting tridiagonal matrix is orthogonally similar to A

In practice, rounding error causes loss of orthogonality,invalidating this expectation

Problem can be overcome by reorthogonalizing vectors asneeded, but expense can be substantial

Alternatively, can ignore problem, in which case algorithmstill produces good eigenvalue approximations, but multiplecopies of some eigenvalues may be generated






Virtue of Arnoldi and Lanczos iterations is ability to producegood approximations to extreme eigenvalues for k � n

Moreover, they require only one matrix-vector multiplicationby A per step and little auxiliary storage, so are ideallysuited to large sparse matrices

If eigenvalues are needed in middle of spectrum, say nearσ, then algorithm can be applied to matrix (A− σI)−1,assuming it is practical to solve systems of form(A− σI)x = y





Parallel Krylov Subspace Methods

Krylov subspace methods composed of

vector updates (saxpy)inner productsmatrix-vector multiplicationcomputing eigenvalues/eigenvectors of tridiagonal matrices

Parallel implementation requires implementing each ofthese in parallel as before

For early iterations, Hessenberg or tridiagonal matricesgenerated are too small to benefit from parallelimplementation, but Ritz values and vectors need not becomputed until later




Jacobi MethodBisection MethodsDivide-and-Conquer

Jacobi Method

Jacobi method for symmetrix matrix starts with A0 = Aand computes sequence

Ak+1 = JTk AkJk

where Jk is plane rotation that annihilates symmetric pairof off-diagonal entries in Ak

Plane rotations are applied repeatedly from both sides insystematic sweeps through matrix until magnitudes of alloff-diagonal entries are reduced below tolerance

Resulting diagonal matrix is orthogonally similar to originalmatrix, so diagonal entries are eigenvalues, andeigenvectors given by product of plane rotations





Parallel Jacobi Method

Jacobi method, though slower than QR iteration serially,parallelizes better

Parallel implementation of Jacobi method performs manyannihilations simultaneously, at locations chosen so thatrotations do not interfere with each other (analogous toparallel Givens QR factorization)

Computations are still rather fine grained, however, soeffectiveness is limited on parallel computers withunfavorable ratio of communication to computation speed





Bisection or Spectrum-Slicing

For real symmetric matrix, can determine how manyeigenvalues are less than given real number σ

By systematically choosing various values for σ (slicingspectrum at σ) and monitoring resulting count, anyeigenvalue can be isolated as accurately as desired

For example, by computing inertia using A = LDLT

factorization of A− σI for various σ, individual eigenvaluescan be isolated as accurately as desired using intervalbisection technique





Sturm Sequence

Another spectrum-slicing method for computing individualeigenvalues is based on Sturm sequence property ofsymmetric matrices

Let pr(σ) denote determinant of leading principal minor ofA− σI of order r

Zeros of pr(σ) strictly separate those of pr−1(σ), andnumber of agreements in sign of successive members ofsequence pr(σ), for r = 1, . . . , n, gives number ofeigenvalues of A strictly greater than σ

Determinants pr(σ) easy to compute if A transformed totridiagonal form before applying Sturm sequence technique





Parallel Multisection

Spectrum-slicing methods can be implemented in parallelby assigning disjoint intervals of real line to different tasks,and then each task carries out bisection process usingserial spectrum-slicing method for its subinterval

Both for efficiency of spectrum-slicing technique and forstorage scalability, matrix is first reduced (in parallel) totridiagonal form, so that each task can store entire(reduced) matrix

Load imbalance is potential problem, since differentsubintervals may contain different numbers of eigenvalues





Parallel Multisection

Clustering of eigenvalues cannot be anticipated inadvance, so dynamic load balancing may be required toachieve reasonable efficiency

Eigenvectors must still be determined, usually by inverseiteration

Since each task holds entire (tridiagonal) matrix, each canin principle carry out inverse iteration for its eigenvalueswithout any communication among tasks

Orthogonality of resulting eigenvectors cannot beguaranteed, however, and thus communication is requiredto orthogonalize them





Divide-and-Conquer Method

Another method for computing eigenvalues andeigenvectors of real symmetric tridiagonal matrix is basedon divide-and-conquer

Express symmetric tridiagonal matrix T as

T =

[T1 OO T2

]+ β uuT

Can now compute eigenvalues and eigenvectors of smallermatrices T1 and T2

To relate these back to eigenvalues and eigenvectors oforiginal matrix requires solution of secular equation, whichcan be done reliably and efficiently





Divide-and-Conquer Method

Applying this approach recursively yieldsdivide-and-conquer algorithm that is naturally parallel

Parallelism in solving secular equations grows asparallelism in processing independent tridiagonal matricesshrinks, and vice versa

Algorithm is complicated to implement and difficultquestions of numerical stability, eigenvector orthogonality,and load balancing must be addressed




References

Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst,eds., Templates for the Solution of Algebraic EigenvalueProblems: A Practical Guide, SIAM, 2000

J. W. Demmel, M. T. Heath, and H. A. van der Vorst, Parallelnumerical linear algebra, Acta Numerica 2:111-197, 1993

P. J. Eberlein and H. Park, Efficient implementation of Jacobialgorithms and Jacobi sets on distributed memory architectures,J. Parallel Distrib. Comput., 8:358-366, 1990

G. W. Stewart, A Jacobi-like algorithm for computing the Schurdecomposition of a non-Hermitian matrix, SIAM J. Sci. Stat.Comput. 6:853-864, 1985

G. W. Stewart, A parallel implementation of the QR algorithm,Parallel Comput. 5:187-196, 1987


BasicsDefinitionsTransformationsParallel Algorithms

Power IterationPower IterationInverse IterationSimultaneous Iteration

QR IterationOrthogonal IterationQR Iteration

Krylov MethodsKrylov SubspacesArnoldi IterationLanczos IterationKrylov Subspace Methods

Other MethodsJacobi MethodBisection MethodsDivide-and-Conquer

parallel numerical algorithmssolomon2.web.engr.illinois.edu/teaching/cs554/slides/... · 2019. 11....

Documents