idr --- a brief introductionmhg/talks/idrintro.pdf · 2009. 10. 29. · idr — a brief...

IDR — a brief introduction

Martin H. GutknechtETH Zurich, Seminar for Applied Mathematics

Minisymposium “Induced Dimension Reduction (IDR) methods”SIAM Conf. on Applied Linear Algebra, Monterey, CA, USA, Oct. 27, 2009

Prerequisites History IDR basics Case s=1 Case s>1 Conclusions

Outline

Prerequisites

History

IDR basics

Case s=1

Case s>1

Conclusions

M.H. Gutknecht SIAM-ALA09 p. 2


Prerequisites: Krylov (sub)space solvers

Given: linear system Ax = b, initial approx. x0 ∈ CN .

Construct: approximate solutions (“iterates”) xn andcorresponding residuals rn :≡ b − Axn with

xn ∈ x0 + Kn(A, r0) , rn ∈ r0 + AKn(A, r0)

where r0 :≡ b − Ax0 is the initial residual, and

Kn :≡ Kn(A, r0) :≡ span r0, Ar0, . . . , An−1r0

is the nth Krylov subspace generated by A from r0.

We can, e.g., construct xn such that ‖rn‖ is minimal. Conjugate Residual (CR) method (Stiefel, 1955), GCR and GMRES.



Some further Krylov space solvers are based on otherorthogonal or oblique projections:

Conjugate Gradient (CG) method (Hestenes/Stiefel,1952):

rn ∈ r0 + AKn , rn ⊥ Kn .

Biconjugate Gradient (BICG) method (Lanczos, 1952;Fletcher, 1976):

rn ∈ r0 + AKn , rn ⊥ Kn :≡ Kn(A⋆, r0) .

ML(s)BICG method (M.-C. Yeung and T. F. Chan, 1999):

rsj ∈ r0 + AKsj , rsj ⊥ Kj(A⋆, R0) :≡s∑

i=1

Kj(A⋆, r(i)0 ) .



Prerequisites: residual polynomials

rn ∈ r0 + AKn(A, r0) implies that

∃ρn ∈ Pn , ρn(0) = 1 : rn = ρn(A)r0 .

Means roughly: ‖rn‖ is small if |ρn(t)| is small at theeigenvalues of A.

Means also: “everything” can be formulated in terms ofresidual polynomials.



Special cases:

In CG the residual polynomials are orthogonal polynomials(OPs) w.r.t. a weight function determined by EVals of A(symmetric) and by r0.

In BICG the residual polynomials are formal orthogonalpolynomials (FOPs). Lanczos polynomials.

In (Bi)Conjugate Gradient Squared (CGS),

ρCGSn =

(ρ

BICG

n

)2.

in BICGSTAB, ρBiCGSTAB

n = ρBICG

n Ωn ,

where Ωn(t) :≡ (1 − ω1t) · · · (1 − ωnt). Here, at step n,ωn is chosen to minimize the residual on a straight line.



History of IDR: references

P. WESSELING AND P. SONNEVELD, Numericalexperiments with a multiple grid and a preconditionedLanczos type method, in Approximation Methods forNavier-Stokes Problems, R. Rautmann, ed., Lecture Notesin Mathematics, vol. 771, Springer, 1980, pp. 543–562.Introduce Induced Dimension Reduction (IDR) method,attributed to Sonneveld (“Public. in preparation”), on 31

2pages: a “Lanczos-type method” for nonsymmetric linearsystems which does not require AT.

P. SONNEVELD, CGS, a fast Lanczos-type solver fornonsymmetric linear systems, SIAM J. Sci. Statist.Comput., 10 (1989), pp. 36–52.Received Apr. 24, 1984. Introduces (Bi)CG Squared.



H. A. VAN DER VORST, Bi-CGSTAB: a fast and smoothlyconverging variant of Bi-CG for the solution ofnonsymmetric linear systems, SIAM J. Sci. Statist.Comput., 13 (1992), pp. 631–644.Received May 21, 1990. Presented at HouseholderTylosand. 1st preprint: “CGSTAB: A more smoothlyconverging variant of CG-S”, coauthored by P. Sonneveld.

M.-C. YEUNG AND T. F. CHAN, ML(k)BiCGSTAB: aBiCGSTAB variant based on multiple Lanczos startingvectors, SIAM J. Sci. Comput., 21 (1999), pp. 1263–1290.Received May 16, 1997. Introduce first ML(k)BiCG andthen ML(k)BiCGSTAB — versions of BICG andBICGSTAB, resp., with “multiple left (shadow) residuals”.Fundamental idea. Astonishing numerical results.Some details complicated and not so well explained.



P. SONNEVELD AND M. B. VAN GIJZEN, IDR(s): a family ofsimple and fast algorithms for solving large nonsymmetricsystems of linear equations,Report 07-07, Department of Applied MathematicalAnalysis, Delft University of Technology;SIAM J. Sci. Comput., 31 (2008), pp. 1035–1062.Generalizing IDR ≈ IDR(1) to IDR(s). A fundamentalgeneralization. Very good numerical results.Detailed description of method, connection to BICGSTAB.

G. SLEIJPEN, P. SONNEVELD, AND M. B. VAN GIJZEN,Bi-CGSTAB as an induced dimension reduction method,Report 08-07, Department of Applied MathematicalAnalysis, Delft University of Technology.Partly new view; explores connection to BiCGSTAB andML(k)BiCGSTAB from a partly different point of view.



M. B. VAN GIJZEN AND P. SONNEVELD, An elegant IDR(s)variant that efficiently exploits bi-orthogonality properties,Report 08-21, Department of Applied MathematicalAnalysis, Delft University of Technology.Uses the freedom in the choice of the “intermediate”residuals to come up with a new version of IDR(s) that isslightly more efficient and particularly ingenious.



This talk is based on:

M. H. GUTKNECHT, IDR explained. To appear in ETNA.

Further papers by:

KUNIYOSHI ABE AND GERARD SLEIJPEN,TIJMEN COLLIGNON,YUSUKE ONOUNE AND SEIJI FUJINO,VALERIA SIMONCINI AND DANIEL SZYLD,MASAAKI TANIO AND MASAAKI SUGIHARA MS38, Wes 2:45pm,MAN-CHUNG YEUNG,MHG AND JENS-PETER ZEMKE,...plus many more on applications.



IDR(s) basics: the setting

Given: linear system Ax = b ∈ CN , initial approx. x0. Let:

r0 :≡ b − Ax0 ,

Km :≡ Km(A, r0) :≡ span r0, Ar0, . . . , Am−1r0 ,

ν such that G0 :≡ Kν invariant ,

S ⊂ CN linear subspace of dimension N − s ,

for j = 1, 2, . . . : choose ωj 6= 0 and let

Gj :≡ (I − ωjA)(Gj−1 ∩ S) ,

for n = nj , . . . , nj+1 − 1 :

choose xn such that rn ∈ Gj ∩ (r0 + AKn︸︷︷︸⊂ Kn+1

) .

Note: Typically nj+1 := nj + s + 1 .



IDR(s) basics: the spaces Gj (case s = 1)G0 = R3

G0 ∩ S = S

I − ω1A

G1

G 1∩S

G2 ∩ S = 0

G2

I− ω2A

Gj :≡ (I − ωjA)(Gj−1 ∩ S)



IDR(s) basics: IDR theorem

Recall: Gj :≡ (I − ωjA)(Gj−1 ∩ S) , rn ∈ Gj ∩ (r0 + AKn) .

Genericness assumption: S ∩G0 contains no eigenvector of A .

THEOREM (IDR THEOREM (WES/SON80,SON/VGI07))

Gj ( Gj−1 unless Gj−1 = o .

Consequently: Gj = o for some j ≤ N.

IDR Thm. suggests: rn = o once rn ∈ GN , i.e., j = N .But typically rn ∈ Gj when n = j(s + 1) .However, normally rn = o once n = N .

Hence: IDR Thm. strongly underestimates convergence rate.M.H. Gutknecht SIAM-ALA09 p. 14


IDR(s) basics: what’s different?

Most currently used KSS (= Krylov subspace solvers) arebased on a different kind of “induced dimension reduction”:

rn ∈ L⊥n ∩ (r0 + AKn(A, r0)) ,

where, e.g.,

Ln = Kn(A, r0) (CG) ,

Ln = AKn(A, r0) (CR, GCR, GMRES) ,

Ln = Kn :≡ Kn(A⋆, r0) (BiCG) .

IDR: Gj is not an orthogonal complement of a Krylov subspace.

However, due to form of the recursion for Gj, Gj turns out to bethe image of an orthogonal complement of a Krylov subspace.



IDR(s) basics: recursions for rn

Recall: Gj :≡ (I − ωjA)(Gj−1 ∩ S) .

Wanted: rn+1 ∈ Gj ∩ (r0 + AKn+1) .

=⇒ rn+1 := (I − ωjA) vn , vn ∈ Gj−1 ∩ S ∩ (r0 + AKn) ,

=⇒ vn := rn −

ι(n)∑

i=1

γ(n)i ∆rn−i = rn − ∆Rn cn , (1)

where s ≤ι(n) ≤ n − nj−1 , ( ∆rn−i ∈ Gj−1)

∆rn :≡ rn+1 − rn ,

∆Rn :≡[

∆rn−1 . . . ∆rn−ι(n)

],

cn :≡[

γ(n)1 . . . γ

(n)ι(n)

]T.



Recall (1):

vn := rn −

ι(n)∑

i=1

γ(n)i ∆rn−i = rn − ∆Rn cn ∈ Gj−1 ∩ S .

Since dimS = N − s, there is P ∈ CN×s s.t. S⊥ = R(P) :

vn ∈ S ⇐⇒ vn ⊥ S⊥ = R(P) ⇐⇒ P⋆vn = o .

To achieve this, the term ∆Rn cn in (1) must be the obliqueprojection of rn into R(∆Rn) along S.

In order that this projection is uniquely defined, we needP⋆ ∆Rn to be nonsingular. Then

vn := rn − ∆Rn (P⋆ ∆Rn)−1 P⋆rn︸︷︷︸

≡:cn

= rn − ∆Rn cn . (2)

We need ι(n) = s to make P⋆ ∆Rn a square matrix.



IDR(s) basics: the first two steps (case s = 1)G0 = R

3

r2 = (I− ω1A)v1

r0

r1

v1

G0 ∩ S = S

I− ω1A

v2

r3

G1

G 1∩S



IDR(s) basics: all the residuals (case s = 1)G0 = R

3

r2 = (I− ω1A)v1

r0

r1

v1

G0 ∩ S = S

I− ω1A

v2

r3

G1

v3

G 1∩S

G2 ∩ S = 0 = v5

G2

I− ω2A

v4

r4

r5



IDR(s) basics: choice of ωj

Recall: rn+1 = (I − ωjA) vn (3)

Here, ωj is fixed for n + 1 = nj , . . . , nj+1 − 1.

So, only for n + 1 = nj , we may choose ωj s.t. ‖rn+1‖ is minimalamong all r of the form r = (I − ωjA) vn , i.e., r ⊥ Avn :

ωj :≡〈Avn, vn〉

‖Avn‖2 .



IDR(s) basics: recursions for xn

Note 1: vn ∈ r0 + AKn =⇒ ∃ x′n s.t. vn = b − Ax′

n ,i.e., vn is the residual of an “intermediate” iterate x′

n ∈ x0 + Kn .

Note 2: ∆rn = −A ∆xn, ∆Rn = −A ∆Xn,

Hence:

vn := rn − ∆Rn cn =⇒ x′n := xn − ∆Xn cn , (4)

rn+1 := (I − ωjA) vn =⇒ xn+1 := ωjvn + x′n . (5)

There are several ways to rearrange these four recursions andto combine them with the iterate-residual relationships; see[Sle/Son/vGi08].



The two formulas

rn+1 := (I − ωjA) vn , vn := rn − ∆Rn cn

can be combined into

rn+1 := rn −∆Rn cn − ωjA vn︸︷︷︸=∆rn

. (6)

Along with it:

xn+1 := xn −∆Xn cn + ωjvn︸︷︷︸=∆xn

. (7)

We may also combine the second formula and ∆rn = −A ∆xn.This is the choice in the “prototype algorithm” of [Son/vGi07].

So, there are many ways to implement IDR(s) — and more tocome!



IDR(s) basics: characterization by orthogonality

THEOREM (SON/VGI07, SLE/SON/VGI08)

Let Ω0(t) :≡ 1 , Ωj(t) :≡ (1 − ω1t) · · · (1 − ωj t) ∈ Pj , where

Pj :≡ polyns. of degree ≤ j that are 1 at 0. Then

Gj =

Ωj(A)w∣∣ w ⊥ Kj(A⋆, P)︸︷︷︸

≡: Kj

= Ωj(A)

[Kj(A⋆, P)

]⊥︸︷︷︸

= K⊥j

.

Note: Kj is the left-hand side (LHS) block Krylov space thatappears in the block Lanczos process with LHS block size s.

Note: We may have Lanczos breakdowns and a collapsingblock Krylov space (which requires deflation).



Since rn ∈ Gj ∩ (r0 + AKn) , and since the residual polynomialsmust have full degree, we have for n = nj , . . . , nj+1 − 1 :

rn = Ωj(A) wn , wn ∈ (r0 + AKn−j) ∩ K⊥j , wn 6∈ Kn−j .

(8)Generically, for n = nj = j (s + 1) , where wn ∈ (r0 + AKjs)

and wn ⊥ Kj with dim Kj = js , there is a unique wn satisf. (8).

But the s other vectors wn with nj ≤ n < nj+1 are not uniquelydetermined by (8).



s = 1: IDR(1) ∼ BICGSTAB

Every other set of vectors (wn, rn, vn−1, xn, . . . ) is uniquelydetermined — up to the choice of the parameters ωj .

Normally, the latter are chosen as in BICGSTAB, and thus

r2j = rBiCGSTAB

j , x2j = xBiCGSTAB

j , w2j = rBICG

j ,

where rBICG

j is the j th residual of BICG: rBICG

j = ρj(A)r0 .

Recursions (4) and (5), with γn :≡ γ(n)1 = 〈P, rn〉 / 〈P,∆rn−1〉 :

vn := (1 − γn)rn + γnrn−1 , x′n := (1 − γn)xn + γnxn−1 ,

rn+1 := (I − ωjA) vn , xn+1 := x′n + ωjvn .

(9)



s = 1: polynomial recursions

rn = Ωj(A)wn =

Ωj(A)ρj(A)r0 if n = 2j ,

Ωj(A)ρj+1(A)r0 if n = 2j + 1 ,

vn = Ωj−1(A)wn+1 =

Ωj−1(A)ρj(A)r0 if n = 2j − 1 ,

Ωj−1(A)ρj+1(A)r0 if n = 2j ,

Inserting these formulas into vn = (1 − γn)rn + γnrn−1 we get,after a short calculation, for n = 2j and n = 2j + 1, respectively,

ρj+1(t) := (1 − γ2j) (1 − ωj t) ρj (t) + γ2j ρj(t) ,ρj+1(t) := (1 − γ2j+1) ρj+1(t) + γ2j+1 ρj(t) .

(10)

Recall: w2j = ρj(A)r0 ⊥ Kj and w2j+1 = ρj(A)r0 ⊥ Kj .



BICG uses in its standard version coupled two-term recursions:

rBICG

j+1 := rBICG

j − αjAvBICG

j , vBICG

j+1 := rBICG

j+1 + βjvBICG

j .

The corresp. recursions for ρj and σj are

ρj+1(t)︸︷︷︸⊥Pj

:= ρj(t)︸︷︷︸⊥Pj−1

−αj tσj(t)︸︷︷︸⊥Pj−1

, σj+1(t)︸︷︷︸⊥t Pj

:= ρj+1(t)︸︷︷︸⊥Pj

+βj σj(t)︸︷︷︸⊥t Pj−1

.

In contrast, in IDR(1), by (10),

ρj+1(t)︸︷︷︸⊥Pj−1

:= (1 − γ2j) (1 − ωj t) ρj (t)︸︷︷︸⊥Pj−2

+γ2j ρj(t)︸︷︷︸⊥Pj−2

,

ρj+1(t)︸︷︷︸⊥Pj

:= (1 − γ2j+1) ρj+1(t)︸︷︷︸⊥Pj−1

+γ2j+1 ρj(t)︸︷︷︸⊥Pj−1

.



Comparing the recursions for (ρj , σj) with those for (ρj , ρj)we easily see:

(1 − γ2j+1)(ρj+1(t) − ρj(t)

)= −αj t σj(t) ,

or,

ρj+1(t) = ρj(t) −αj

1 − γ2j+1t σj(t) ,

or,

r2j+1 = r2j −αj

(1 − γ2j+1)A Ωj(A)v

BICG

j︸︷︷︸≡: s

BiCGSTAB

j

.

This formula expresses the odd indexed IDR(1) residuals interms of quantities from BICGSTAB and the coefficient γ2j+1.



ρj+1

ρj

tσj

−αj t σj

ρj+1

−α′j t σj

ρj+1(t) = ρj(t) −

≡: α′j︷︸︸︷

αj

1 − γ2j+1t σj(t) ,

BICG/BICGSTAB: ρj+1(t) := ρj(t) − αj tσj(t) ,

IDR(1): ρj+1(t) := (1 − γ2j+1) ρj+1(t) + γ2j+1 ρj(t) .



s = 1: Comments and conclusions

• IDR(1) can be viewed as a minor variation of BICGSTAB.

• It is not clear, why one or the other should be more stable.

• In fact, the existence of all even indexed IDR(1) residualsrequires (like the BIORES version of BICG) that noLanczos breakdowns and no pivot breakdowns occur.

• The smoothing step breakdown (ωj = 0) is also the sameand can be treated easily by choosing a non-optimal ωj .



s > 1: IDR(s) ∼ ML(s)BICGSTAB

• Relation IDR(1) ∼ BICGSTAB ⇐= BICG is matched byrelation IDR(s) ∼ ML(s)BICGSTAB ⇐= ML(s)BICG.

• If the parameters ωj were chosen the same in IDR(s) andML(s)BICGSTAB every (s + 1)th iterate were the same inboth methods — but normally the parameters are notchosen the same.

• ML(s)BICG and ML(s)BICGSTAB are due to Man-ChungYeung and Tony Chan ’97 /’99SISC. Connection to nonsym.block Lanczos [Aliaga/Boley/Freund/Hernández ’96/’99MC].

• ML(s)BICGSTAB requires a few more inner products andvector updates than IDR(s) and may be less stable.

• IDR(s) is algorithmically simpler and more flexible.



• The fundamental discovery due to Yeung and Chan is that,in the framework of Lanczos-type product methods,multiple left projections can both speed up theconvergence and reduce the MV count (per “degree” of wn,which is what is determined by orthogonality).Sonneveld and van Gijzen rediscovered this 10 years later.

• Yeung/Chan paper is ok, but the derivation of therecursions is complicated and “unusual” due to use of anon-biorthogonal basis for Kj and complex manipulationsfor reducing cost.Amazingly, their Matlab programhttp://www.uwyo.edu/mathmyeung/p12/mlbi gstab.txtis only 187 lines (incl. 30 lines of comments).

• The new paper of Yeung explains the method better andimproves it.



Conclusions

IDR(1) is as good as BICGSTAB.

IDR(s) is as good as ML(s)BICGSTAB.

The IDR(s) recurrence coefficients are simpler to computethan those of ML(s)BICGSTAB.

Typically, IDR(s) and ML(s)BICGSTAB outperform othermethods for a nonsymmetric problem.

The new IDR(s)BiO of van Gijzen and Sonneveld performseven slightly better.

IDR-like generalizations of BiCGStab2 and BiCGStab(ℓ)are even more promising, in particular if A is real, but has(strongly) non-real eigenvalues.



M. H. GUTKNECHT, IDR explained, Research Report 2009-14,Seminar for Applied Mathematics, ETH Zurich, March 2009.

G. L. G. SLEIJPEN, P. SONNEVELD, AND M. B. VAN GIJZEN,Bi-CGSTAB as an induced dimension reduction method, Report08-07, Department of Applied Mathematical Analysis, DelftUniversity of Technology, 2008.

P. SONNEVELD, CGS, a fast Lanczos-type solver fornonsymmetric linear systems, SIAM J. Sci. Statist. Comput., 10(1989), pp. 36–52.

Received Apr. 24, 1984.

P. SONNEVELD AND M. B. VAN GIJZEN, IDR(s): a family ofsimple and fast algorithms for solving large nonsymmetricsystems of linear equations, Report 07-07, Department ofApplied Mathematical Analysis, Delft University of Technology,2007.



P. SONNEVELD AND M. B. VAN GIJZEN, IDR(s): A family ofsimple and fast algorithms for solving large nonsymmetricsystems of linear equations, SIAM Journal on ScientificComputing, 31 (2008), pp. 1035–1062.

Received Mar. 20, 2007.

H. A. VAN DER VORST, Bi-CGSTAB: a fast and smoothlyconverging variant of Bi-CG for the solution of nonsymmetriclinear systems, SIAM J. Sci. Statist. Comput., 13 (1992),pp. 631–644.

Received May 21, 1990.

P. WESSELING AND P. SONNEVELD, Numerical experiments witha multiple grid and a preconditioned Lanczos type method, inApproximation methods for Navier-Stokes problems (Proc.Sympos., Univ. Paderborn, Paderborn, 1979), vol. 771 of LectureNotes in Math., Springer, Berlin, 1980, pp. 543–562.



M.-C. YEUNG AND T. F. CHAN, ML(k)BiCGSTAB: a BiCGSTABvariant based on multiple Lanczos starting vectors, SIAM J. Sci.Comput., 21 (1999), pp. 1263–1290.

Received May 16, 1997, electr. publ. Dec. 15, 1999.


idr --- a brief introductionmhg/talks/idrintro.pdf · 2009. 10. 29. · idr — a brief...

Documents