aspects of the recursive projection method applied to flow

Aspects of The Recursive Projection Method

applied to Flow Calculations

JOAKIM MÖLLER

Doctoral Thesis

Stockholm, Sweden 2005

TRITA-NA-0444ISSN 0348-2952ISRN KTH/NA/R--04/44--SEISBN 91-7283-940-6

KTH Numerisk analys och datalogiSE-100 44 Stockholm

SWEDEN

Akademisk avhandling som med tillstånd av Kungl Tekniska högskolan framläggestill offentlig granskning för avläggande av teknologie doktorsexamen torsdagen den20 januari 2005 kl. 10.15 i Seminarierum L1, Kungl. Tekniska Högskolan, DrottningKristinas väg 30, Stockholm.

c© Joakim Möller, Januari 2005

Tryck: Universitetsservice US AB

Abstract

In this thesis, we have investigated the Recursive Projection Method,RPM, as an accelerator for computations of both steady and unsteady flows,and as a stabilizer in a bifurcation analysis.

The criterion of basis extraction is discussed. It can be interpreted as atolerance for the accuracy of the eigenspace spanned by the identified basis,alternatively it can be viewed as a criterion when the approximative Krylovsequence becomes numerically rank deficient.

Steady state calculations were performed on two different turbulent test-cases; a 2D supersonic nozzle flow with the Spalart-Allmaras 1-equation modeland a 2D sub-sonic airfoil simulation using the k− ε model. RPM acceleratedthe test-cases with a factor between 2 and 5.

In multi-scale problems, it is often of interest to model the macro-scalebehavior, still retaining the essential features of the full systems. The “coarsetime stepper” is a heuristic approach for circumventing the analytical deriv-ation of models. The system studied here is a linear lattice of non-linearreaction sites coupled by diffusion. After reformulation of the time-evolutionequation as a fixed-point scheme, RPM coupled with arc-length continuationis used to calculate the bifurcation diagrams of the effective (but analyticallyunavailable) equation.

Within the frame-work of dual time-stepping, a common approach in un-steady CFD-simulation, RPM is used to accelerate the convergence. Twotest-cases were investigated; the von Karman vortex-street behind a cylin-der at Re = 100, and the periodic shock oscillation of a symmetric airfoil atM∞ = 0.76 with a Reynolds number Re = 11 × 10

6.It was believed that once a basis had been identified, it could be retained

for several steps. The simulations usually showed that the basis could onlybe retained for one step.

The need for updating the basis motivates the use of Krylov methods.The most common method is the (Block-) Arnoldi algorithm. As the itera-tion proceeds, Krylov methods become increasingly expensive and restart isrequired. Two different restart algorithm were tested. The first is that ofLehoucq and Maschhoff, which uses a shifted QR iteration, the second is ablock extension of the single-vector Arnoldi method due to Stewart. A flexiblehybrid algorithm is derived combining the best features of the two.

iii

Preface

This thesis consists of five papers and a summary of the work presented. The fivepapers are:

Paper I: J. Möller, The Recursive Projection Method Applied to Steady-state CFD Cal-

culations, Technical report, Royal Institute of Technology (KTH), Dept. ofNumerical Analysis and Computer Science, TRITA-NA-0445, 2004.

Paper II: J. Möller, O. Runborg, P. G. Kevrekidis, K. Lust, and I. G. Kevrekidis,Equation-free, effective computation for discrete systems: a time stepper based

approach, accepted in Int. J. Bifurcat. Chaos, 2004.

JM contributed to the implementation and performed the RPM simulations,the analysis in section 3.1., and (to a lesser degree) contributed to the write-up.

Paper III: S. Görtz and J. Möller, Recursive Projection Method for Efficient Unsteady

CFD Simulations, ECCOMAS 2004, 2004.

SG developed the EDGE-part of the code and generated the grids, JM de-veloped the RPM-part. The simulations and the write-up are joint work.

Paper IV: S. Görtz and J. Möller, Evaluation of the Recursive Projection Method for

Efficient Unsteady Turbulent CFD Simulation, ICASE 2004, 2004

The work was divided between the authors like for paper III.

Paper V: J. Möller, New Implementations of the Implicitly Restarted Block Arnoldi

Method., Technical report, Royal Institute of Technology (KTH), Dept. ofNumerical Analysis and Computer Science. TRITA-NA-0446, 2004.

v

Acknowledgments

I would sincerely like to thank Jesper Oppelstrup, my supervisor; for his importanthints and suggestions along the way. Without him, this thesis would not have beenwhat it is today. Axel Ruhe, for inspiring me to investigate Krylov-methods. MihaiDorobantu, for the introduction to RPM and for arranging the visit at UTRC, andIoannis Kevrekidis for the introduction to the Coarse Time Steppers and lettingme come and work with him at Princeton in sep. 2001. Richard Lehoucq, whoalways gracefully has answered my questions over the years and for providing theQR-algorithm used in paper V. I would also like to thank all my co-authors StefanGörtz, Olof Runborg, Kurt Lust, and Panayotis Kevrikidis. A special thank youto Gunilla Kreiss and Michael Hanke, for all the discussions, and the staff of PDCfor all help and support. Finally a big thank you to all my colleagues who makeNADA the pleasant place it is.

Financial support for the research is gratefully acknowledged from NADAPSCI(Parallel and Scientific Computing Institute), KTH, Princeton University, New Jer-sey, and UTRC (United Technology Research Center), Hartford, Connecticut.

vii

Contents

Contents ix

1 Introduction 1

2 The Recursive Projection Method Applied to Steady-state CFD

Calculations; Paper I 5

3 Equation-free, effective computation for discrete Systems: a

time stepper based approach; Paper II 9

4 Recursive Projection Method for Efficient Unsteady CFD Sim-

ulations; Paper III & Paper IV 11

5 New Implementations of the Implicitly Restarted Block Arnoldi

Method; Paper V 15

6 Conclusions 19

7 Outlook and Future Work. 21

Bibliography 23

ix

Chapter 1

Introduction

Many applications require us to compute steady state solutions of partial differ-ential equations (PDEs). The solution of the large non-linear system of algebraicequations that results from the spatial discretization is often obtained by integ-rating the corresponding system of ordinary differential equations (ODEs) in timeuntil the time derivatives become sufficiently small. Assuming that this system hasa steady-state y∗ and a fixed point iteration scheme of the type

yn+1 = F(yn), (1.1)

for the temporal evolution, the ultimate convergence of the iteration is determinedby the dominant eigenvalues of the Jacobian J = Fy evaluated at y∗. If all eigen-values lie strictly within the unit circle, the scheme is asymptotically convergent ina neighborhood of y∗ and the linear asymptotic convergence factor is the modulusof the largest eigenvalue of J.

The Recursive Projection Method (RPM), as presented by Shroff and Keller in1993, is an iterative procedure which can accelerate location of fixed points andstabilize unstable numerical procedures. It only requires the iterates obtained inthe fixed point-iteration. It adaptively seeks to identify the subspace correspondingto large eigenvalues of J, hence directions of slow or unstable time-evolution inphase space. This is accomplished by monitoring the successive residuals ∆yn =F(yn)−yn forming an approximate Krylov subspace. A projection P , constructedfrom the identified subspace, is used to split yn into pn = Pyn and qn = Qyn,where Q = I−P . In the p slow directions spanned by P , the fixed point iterationis replaced by (approximate) Newton iteration, whereas the fixed point iteration isused in the remaining directions. The overall algorithm takes the form

yn+1 = QF(yn) +(PJP − I

)−1

P(F(yn) − yn)

For the method to be efficient, the dimension of the slow/unstable subspaceshould be small, with a gap to the rest of the spectrum. The nature of many

1

transport PDEs encountered in engineering modeling (the action of viscosity, heatconduction, diffusion, and the resulting spectra) do indeed show a separation oftime-scales, which translates into a gap in the spectrum of J at the steady state.

The Jacobian is never formed explicitly since its actions can be approximatedvia finite-differences, making RPM a “matrix-free” method. The cost per step toadvance the algorithm is one function evaluation and the solution of an order p

linear system of equations. RPM still retains the simplicity of the fixed pointiteration, in the sense that no more information is needed than just the iterationfunction. It can be applied as a ’black box’ wrap-around to any code which definesan iteration function F.

The applications of RPM can be divided into two main groups: as a tool forbifurcation analysis, and as a convergence accelerator.

The first group is by far the largest. As long as the Jacobian (J − I) is non-singular, RPM as described above can be applied. At a turning point, the Jacobianbecomes singular. By incorporating pseudo-arclength continuation, as proposed byShroff and Keller [34], the RPM process can proceed past the turning point. Lustet al. [27] extended the idea of RPM to a hierarchy of different related methods byallowing more than one function evaluation per step and used the algorithms forbifurcation analysis of periodic solutions.

Since then, RPM and variants have been applied to a wide variety of problems.The different sections below contain a more complete list of references. In lateryears, much research has been conducted in the field of equation-free homogenized,or averaged, models and their properties. The method known as Coarse time step-

ping was introduced in [39]. Paper II is an early contribution to this fast growingfield of literature.

The other important use is RPM as convergence accelerator for computationalfluid dynamics computations. The applications can be further classified as steady-or unsteady- calculations. For steady flows, there are a few precursors to RPM.Bergman [3], attempted acceleration techniques for the transonic small perturbationequation. He assumed there was one real or a pair of complex conjugated dominanteigenvalues, and devised an extrapolation formula to eliminate the associated error.The formula has large coefficients and, not being a convergent iteration, it can onlybe applied a few times. The success of the scheme was limited.

Rizzi, Eriksson [31], applied a splitting technique for computing steady transonicEulerian flow. An explicit smoothing algorithm first reduces the short wavelengtherrors. Long wavelength components of the error are eliminated by projection of thesolution onto a low-dimensional space where Newton linearization can be employed.The acceleration obtained was quite small.

Dorobantu et al. [11] computed steady quasi-1D inviscid nozzle flows with goodresults. The work was used as a basis for [30], where RPM was successfully usedto accelerate 2D compressible viscous flow calculations. The work in paper I is anextension of these results.

The time-step that can be used in an explicit CFD-scheme is severely restrictedby stability. The interesting physical time scale can be several orders of magnitude

2

larger, which makes implicit time-stepping attractive. On the other hand, implicitschemes require the solution of large systems of non-linear equations in each step,usually by a pseudo-time marching scheme. The dual time-stepping approach dueto Jameson, [13], can be accelerated by RPM. Paper III describes RPM used toaccelerate time-accurate simulation of the vortex shedding in laminar flow around acylinder at Reynolds-number 100. The idea was extended to a 2D Explicit Algebraic

Reynolds Stress Model (EARSM) in paper IV for self-induced shock oscillations overan airfoil at Mach 0.76 and a chord Reynolds number of 11 million.

Since the time-steps are small enough to resolve important time scales, changesin the spectra of the linearized problems from step to step should be small. Onebelieves that the basis for the slow subspace P would be sufficiently accurate for anumber of successive time-steps. Numerical experiments suggest that this numberis very limited. Almost every step needs to either "restart" or apply some basisupdate technique. This motivates the study of matrix-free eigensolvers, e.g. Krylovmethods.

The most well-known Krylov method for computing a few eigenvalues of a non-symmetric matrix is based on the Arnoldi iteration. From a unitary vector v1 itconstructs a Krylov subspace, spanned at step m by the orthogonal basis Vm. TheArnoldi decomposition is

AVm = VmHm + fmeTm. (1.2)

where Hm is an upper Hessenberg matrix, i.e., hij = 0 for i > j + 1. As theiteration proceeds, Vm grows and with it the computational costs and memoryrequirements, necessitating some restarting technique. Sorensen [36] proposed thevery elegant ’Implicit Restart’ which produces a compressed Arnoldi-decompositionbased on the starting vector v+

1where the unwanted directions present in Vm have

been damped. Stewart [37] showed that the equivalent compressed decompositionis obtainable by applications of elementary unitary matrices.

Block Arnoldi is a generalization of (1.2). A block Krylov subspace is construc-ted from b orthogonal starting vectors forming the block-matrix Vb = [v1, . . . , vb].A block Arnoldi decomposition

AVmb = VmbHmb + FmETmb. (1.3)

results where Hmb is a banded upper Hessenberg matrix, i.e., a Hessenberg matrixwith an additional b − 1 sub-diagonals. Block methods handle clustered and/ormultiple eigenvalues more reliably than single vector methods do. Matrix-vectorproducts are replaced by matrix-matrix multiplications, which is favorable for highperformance computer architectures.

Lehoucq and Maschhoff [24] extended the implicit restarting techniques to BlockArnoldi. Paper V extends the Stewart restarting to block methods; for block-sizeb > 1 the two approaches are not equivalent.

The extremal eigenvalues and associated eigenvectors are the first to convergein the Arnoldi iteration. As this happens, the eigenvalues should be deflated, the

3

desired ones locked, and the unwanted ones purged, to speed convergence of otherparts of the spectrum. This procedure is done differently in Lehoucq’s and Stewart’sschemes.

Numerical experiments suggest that the Lehoucq block Arnoldi iteration is muchmore efficient than Stewart’s algorithm. We constructed a hybrid version of thetwo algorithms. Using the Lehoucq shift strategy to gain good convergence, andstrategies similar to Stewart’s to simplify deflation of converged eigenvalues, wearrive at a very flexible new version of the Implicitly Restarted Block Arnoldimethod.

In general, for fixed dimension of the Krylov space, single-vector methods con-verge faster than block-methods, because the matrix-polynomial is of higher degree.This suggests that single vector methods are the favorable tool also when basis up-dates should performed in RPM. Most of the test examples confirm this conjecture,but a few cases were found where block methods were competitive, indicating thatfurther investigation is needed to close the issue.

4

Chapter 2

The Recursive Projection Method

Applied to Steady-state CFD

Calculations; Paper I

In many applications, it is of interest to compute steady state solutions to partialdifferential equations. This can be accomplished by solving the non-linear systemof algebraic equations that results from the spatial discretization, e.g. by Newton’sMethod. For large scale problems this becomes prohibitively expensive, since eachstep requires the formation and “inverting” of a large matrix.

Another approach is to integrate the partial differential equation (PDE) in timeuntil the time derivatives become sufficiently small. The spatial discretization pro-duces a system of ordinary differential equations (ODEs)

dy

dt= f(y), y ∈ R

N . (2.1)

Note that many models, e.g. the incompressible Navier–Stokes equations, do notgive a system of ODEs , but rather a differential-algebraic system. In this case, itis necessary to first eliminate the algebraic constraints and transform the systemto an ODE system. From (2.1) we derive a fixed point iteration scheme of the type

yn+1 = F(yn), (2.2)

Even if initially the associated residual yn − F(yn) drops rapidly, the ultimateconvergence to a fixed point y∗, determined by the most dominant eigenvalue (λ1)of the Jacobian J = Fy, will be slow if the modulus of this eigenvalue is close toone:

‖yn+1 − y∗‖ ≈ |λ1|‖yn − y∗‖

for large n.

5

The Recursive Projection Method can be viewed as a compromise between thetwo approaches discussed above. Via an orthogonal decomposition, obtained by theprojection P , the solution is split in two parts,

yn = Pyn + (I − P)yn = Pyn + Qyn = pn + qn

P = VpVTp

Q = VqVTq = (I −VpV

Tp )

pn is the P projection of yn onto the maximal invariant subspace associated withthe p most dominant eigenvalues of J , where p � N . The space associated with Q

is the orthogonal complement, which is not invariant for non-symmetric Jacobians.Application of these projections to (2.2) splits the algorithm into two parts. The Q-part will now have better convergence properties since the influence of the dominanteigenvalues has been eliminated, but the slow modes are still present in the P-part.To fix this, the P-equation is replaced by an implicit relation,

pn+1 = PF(qn + pn+1). (2.3)

and applying one Newton step to (2.3), we arrive at RPM:

qn+1 = F(yn) −Vp

(VH

q F(yn)),

pn+1 = pn −Vp(VHp JnVp − I)−1VH

p

(F(yn) − yn

)

yn+1 = qn+1 + pn+1

(2.4)

Sufficiently close to the fixed point, the Newton part converges quadratically. Sincep � N , the cost of solving the linear system of equations is small.

The Jacobian Jn occurring in (2.4) is never formed explicitly since it only occursas a matrix-matrix product. The directional derivatives needed can be found byfinite difference approximation. Usually it is sufficient to use a first order scheme,since the error in general is smaller than the error associated with the approximateprojections.

According to Householder’s theorem [33], for any square matrix A, there is avector norm ‖ · ‖ such that ‖A‖ ≤ ρ(A) + ε = ρ̃, where ε > 0 is arbitrarily small.Let A = QJQ. Then, for sufficiently small ‖y0 − y∗‖, the scheme converges,asymptotically with rate

‖yn+1 − y∗‖

‖yk − y∗‖< ρ̃

for sufficiently large n.The basis Vp is constructed as follows. A matrix Kks

with fixed ks is formedfrom the successive q-iterates,

6

Kks=

[∆qn−ks+1 , . . . , ∆qn

],

∆qn = qn+1 − qn

The Kksform an approximate ks-dimensional Krylov space for the projected

Jacobian QJQ. Via a QR-factorization of Kks, a basis is extracted. If the diagonal

elements of R satisfy

∣∣∣∣rj,j

rj+1,j+1

∣∣∣∣ > ka, for j = i1, i2, ..., ir

the first ir columns out of the ks are added to the basis. ka is called the Krylovacceptance ratio. The method periodically checks the ka-criteria and thus decideshow large and when a basis should be formed. For a linear problem, it can be shownthat the obtained basis (in exact arithmetic) is equivalent to the one obtained fromArnoldi decomposition based on the normalized residuals ∆qn−ks+1. Furthermore,1

ka

is an upper bound for the eigenspace residual

‖QJQz − θz‖ ≤ri+1,i

ri,i

,

where the eigenvector z is spanned by the new basis vectors.ka can also be interpreted as a threshold for when Kks

should be considered rankdeficient, hence signaling when an approximate invariant subspace of dimension ir

is present. This motivates the use of QR-factorization with column pivoting tobetter determine the rank. The connection to the eigenspace residual is severed bypivoting, however.

The Newton part of the algorithm requires Vp to be computed with sufficientaccuracy. Assume that the approximative basis Vp satisfies

VpRp = X̃p + E

where X̃p is the matrix consisting of the p first eigenvectors and E is the error. If

‖R−1p ‖‖(EΛ̃p − JE))‖ < min

i|1 − λ̃i|, i = 1, ..., p

then the Newton algorithm behaves well. Λ̃p is a diagonal matrix containing the

first p eigenvalues λ̃i of the Jacobian J, see [30].In CFD applications it is important to take into account the magnitudes of the

different components of the state-vector y = [ρ, ρu, ρv, ρw, ρE]T

where ρ is thedensity, u, v and w are the velocity components and E is the total energy. Thecomponents may differ in magnitude by several decades, depending on the choice ofunits, or scaling, of the variables. If the state-vector is poorly scaled, the projectionsbecome easily perturbed which can affect the performance of RPM. [30] shows howthe robustness of RPM is improved by proper scaling.

7

In many models, some components, e.g. density, pressure, and turbulent kin-etic energy, must stay positive. This can be guaranteed for explicit time-steppingmethods, as long as the time-step is sufficiently small, and the inherent dynamicsof the model preserves positivity. For implicit methods with Newton linearizationsand long time-steps, there is no guarantee. Proper scaling of variables improvesrobustness in this respect also.

RPM was applied as a wrap-around to the CFD-solver NSMB [44]. NSMB wastransformed to a subroutine which, given a state-vector yn, updates boundaryconditions, turbulent source-terms, etc., and returns the state after ns iterations,i.e F(yn). RPM was implemented into MATLAB and linked to NSMB via a MEXinterface.

Two test-cases were investigated: A 2D supersonic nozzle flow with the Spalart-Allmaras turbulence model, and the subsonic flow around a 2D airfoil, using thek− ε turbulence model. RPM substantially accelerated the convergence, consistentwith the simulations in [30]. The extracted basis stayed small, in both cases ofthe size p = 26, compared to the size of the state-vector 40704 (nozzle) and 57344(airfoil), p � N indeed.

Preceeding the original Shroff and Keller RPM paper, the idea of splitting theiteration can be found in a series of papers by Jarausch and Mackens [16], [17],[18]. They refer to the method as a ’Condensed Newton’ method, but restrictthe treatment to symmetric problems. Later Jarausch considered the unsymmetriccase, [15], accomplishing the splitting via oblique projections based on singularsubspaces. As stated earlier, for RPM to be effective, the subspace associated withthe Newton part should be small. Davidson [10] considered a preconditioned RPMrequiring knowledge of the system Jacobian in order to be effective. By allowingseveral function evaluations per iteration, Lust [26] and Lust et al. [27] obtaineda hierarchy of different RPM related methods, known as ’Newton-Picard-Gauss-Seidel’ NPGS.

There are several applications of RPM to CFD as a tool for bifurcation analysis.Keller and von Sosen [20] considered the incompressible Navier–Stokes equations,a differential-algebraic system of higher index than one, thus requiring special care.Tiesinga, [40] used NPGS to analyze the incompressible driven cavity problem, andLove analyzed Kolmogorov and Taylor-Couette vortical flows, [25].

One of the first attempts to use RPM as an accelerator can be found in [7] and[6], where Burrage et al. applied RPM to linear systems of equations. RPM as aaccelerator has also been exploited by Scascighini and Troxler [1] in inverse designfor 2D and axi-symmetric inviscid flows. Campobasso and Giles, [8] and [9] usedRPM to stabilize a linear flow-solver.

The first successful case known to the author of RPM applied as a pure acceler-ator of a non-linear problem is due to Dorobantu et al, [11], who computed steadyquasi-1D inviscid nozzle flows. The experience drawn from their work was used asa basis to accelerate 2D-Compressible Navier–Stokes calculations, [30].

8

Chapter 3

Equation-free, effective computation

for discrete Systems: a time stepper

based approach; Paper II

Many interesting physical systems consist of a lattice of discrete interacting units,such as atoms in a crystal, neurons or other cells in tissue, or electronic functionalunits on a chip. We often like to model their macro-scale behavior by effectivecontinuum evolution equations, which, averaging over the micro-scales, still retainthe essential features of the discrete systems. As an example, the wave equationdescribes acoustics but averages over motions of individual atoms.

Recently, the “coarse time stepper” approach [39] has been developed as analternative to actual derivation of such effective equations. It uses suitably ini-tialized short time integrations of the detailed discrete model and extrapolates tomacro-scale time steps. It is plausible that the degrees of freedom in constructing acoarse time-stepper may be used to better approximate the qualitative features ofthe discrete system than does an effective continuum model produced by analyticalaveraging. However, a continuum effective evolution equation has the advantageof allowing mathematical manipulation and use of a well-developed set of tools forstudies of the model behavior in phase space. The current challenge, addressed ina rapidly growing number of papers, e.g. [39, 32, 21] is to carry out such stud-ies without formulation of an effective continuum model, using only a coarse timestepper and matrix-free analysis techniques.

The system studied in this paper is a linear lattice of non-linear reaction sitescoupled by diffusion over the inter-unit distance ∆x. The lattice supports travelingreaction fronts, with speeds related to ∆x. In particular, if ∆x is larger than acritical value, there are no traveling waves: the motion stops (“front pinning”),corresponding to a bifurcation in phase space. Note that the obvious reaction-diffusion continuum model does not have such a bifurcation, only a wave speedincreasing with diffusional coupling strength. The effective computational process

9

of the coarse time stepper smears the bifurcation into a continuum transition, anacceptable and possibly optimal way to represent the discrete bifurcation.

The paper presents a phase-plane analysis of a coarse time stepper continuummodel for a particular linear reacting lattice. It explains the construction of thecoarse time stepper, and the use of RPM with arc-length continuation to tracethe bifurcation diagram for the so defined effective (but analytically unavailable)equation. Initially, determination of the front propagation speed is reformulated asa fixed-point problem, amenable to RPM. Likewise, the “lifting” operator from acontinuum profile to a set of discrete solutions, taking into account the translationalinvariance of the continuum model, is defined to satisfy the necessary algebraicrelations with the grid sampling operator.

The relation of the type of bifurcation possessed by the discrete model, a Saddle-Node Infinite period, [22], to the smooth transition of the effective model is dis-cussed. In particular, the continuum is operationally defined by averaging a numberof shifted copies of the front profile sampled on the grid, and the effect of the numberof copies is investigated.

As already noted in the introduction, Ch. 1, the dominant application of RPMis in the field of bifurcation and continuation. Here we find the initial paper byShroff & Keller [34], where the method is introduced. The algorithm was furtherdeveloped by Lust [26] and Lust et al., [27], who defined two classes of relatedschemes, the Newton Picard Gauss Seidel (NPGS), and Newton Picard Jacobi(NPJ) schemes. The method is used to compute stable and unstable orbits in Love[25], and Luzyanina, Engelborghs, Lust and Roose, [28], used NPGS to computeperiodic solutions of delay-differential equations. Schwetlick et al. proposed anadaptive block elimination method based on RPM for solving the linear systemsof Newton iteration of non-linear parameter dependent equations. Javanosky andLiberda [14] consider a projected RPM method, applying the RPM-technique dir-ectly to the right hand side of the semi-discrete form u̇ = G(u, λ) of the partialdifferential equation. Restricting the treatment to symmetric operators, they usea Cayley transform to improve convergence of the basis updates. Their methodrequires explicit knowledge of the Jacobian Gu.

Noorden et al. apply variants of the Newton-Picard scheme in [42], [43], and [41]to computational chemistry applications. For RPM applied to bifurcation studiesof CFD-related topics, see Ch. 2.

10

Chapter 4

Recursive Projection Method for

Efficient Unsteady CFD Simulations;

Paper III & Paper IV

Most fluid flows are inherently unsteady and require time-dependent simulations.Unsteady computations are in general very CPU-intensive, so algorithmic savingsof computer time are necessary to pave the way for more wide-spread adoptionof unsteady CFD in design and analysis. Papers III and IV introduce RPM forunsteady flows and demonstrate worth-while savings.

A commonly used approach to solving the time-accurate Navier–Stokes equa-tions is dual time stepping [13], where a residual equation is solved iteratively toreach a steady-state in the inner loop of every physical (outer) time step. Anotherapproach is to solve the residual equation using Newton-Krylov methods, see e.g.[23], [29] and [5].

The starting point for dual time-stepping is the semi-discrete form:

dy

dt+ R (y) = 0 (4.1)

where y is the state vector of conserved variables, and R is the "residual", the fluximbalance of the various conserved quantities in the computational cells. A second-order accurate implicit backward difference formula (BDF) for the time derivativein Eq. (4.1) yields a fully discrete approximation:

3

2∆tyn+1 −

4

2∆tyn +

1

2∆tyn−1 + R

(yn+1

)= 0 (4.2)

where n − 1, n and n + 1 indicate the previous, present and future time level,respectively.

Because of the presence of R(yn+1

), Eq. (4.2) is a non-linear system of coupled

equations, which has to be solved iteratively at each time step. The desirable

11

stability properties of the BDF(2) scheme - it is both A- and L-stable - enablemuch larger time-steps than for explicit methods.

Recently some attention has been given to implicit Runge–Kutta schemes, [4]and [19]. These methods require solution of multiple (or larger) non-linear systems,but enjoy higher temporal accuracy and easier time-step adaption.

The idea behind dual time stepping is to transform Eq. (4.2) into a steady-stateproblem by adding a set of pseudo-time derivatives to the left-hand side,

dw

dτ+

[3w − 4yn + yn−1

2∆t+ R (w)

]= 0 (4.3)

and to advance w in pseudo-time τ by some suitable method until convergence. Onconvergence, the pseudo-temporal terms vanish, and we take

un+1 = limτ→∞

(w(τ))

To reduce the number of inner iterations, Hsu and Jameson [12] proposed animplicit-explicit hybrid scheme using dual time-stepping to correct the solutionobtained from an ADI-formulation of the equations.

In our work, the idea is to reduce the number of inner iterations by applyingRPM to (4.3). For the overall RPM procedure to be efficient, basis identificationmust be fast but still sufficiently accurate for RPM to behave well.

Assume that the solution at time step n has been computed and there is anapproximate basis V for the slow subspace. At the next step n + 1, there is achoice:

1. Restart: Before start, delete the old basis V.

2. Save the basis: Take V to approximate the new basis.

RPM continuously seeks to add vectors to the basis, which grows and even-tually must be either updated (without growth) or discarded (and startedanew).

In the numerical experiments, in most cases the restart procedure was used, seealso Ch. 1. A mix of the two approaches was also tested, with only slight increasein efficiency.

The papers present the method and its implementation into the compressibleflow solver EDGE for unstructured grids of arbitrary elements, jointly developed bythe Swedish Defence Research Agency (FOI) and KTH. RPM sends a vector y intothe flow solver, which returns F(y). A Matlab implementation of RPM was linkedto EDGE through an interface, using the FFA Matlab Toolbox [38], by reading andwriting files, see Fig.(4.1)

12

Figure 4.1: The link between the flow-solver and RPM.

13

In paper III, the implementation is evaluated by computing the periodic self-excited viscous flow over a cylinder for Re = 100. The computations could beaccelerated by a factor of 2.

Paper IV studies a periodic self-excited turbulent flow around an 18% thickcircular-arc airfoil in free flow at M∞ = 0.76, α = 0◦ and Re = 11 × 106. Thedemanding nature of this flow problem was illustrated by Wang [45].

These conditions give a periodic, 180◦ out-of-phase motion of the shocks over theupper and lower surface of the airfoil. The unsteadiness is driven by the interactionbetween the shocks, the boundary layer, and the vortex shedding in the wake.The Reynolds-averaged (RANS) Navier–Stokes equations were coupled with theEARSM turbulence model on a 2D hybrid grid with 12,000 nodes generated byICEM CFD.

RPM cut the compute time by about a factor of 2.5.It was believed that once a basis had been identified it could be retained for

several steps, but this was not the case. The basis should be kept for at most oneor two steps, hence restart was often the preferable strategy. In steps where thebasis indeed could be re-used, RPM accelerated the convergence by as much as 5times. This indicates that basis updating is a worthwhile issue to pursue.

14

Chapter 5

New Implementations of the

Implicitly Restarted Block Arnoldi

Method; Paper V

There is a large variety of applications where large-scale eigenvalue problems ap-pear. Typically one is only interested in computing a small number k of the N

eigenvalues. The problems of interest here are too large for direct diagonalizationmethods, like the QR-scheme. Krylov methods, on the other hand, are well suitedfor this purpose, needing only the action of the matrix on vector, and not the mat-rix itself. Via a starting block, a Krylov method generates an orthonormal basisVmb ∈ CN×mb, VH

mbVmb = Imb which spans the Block Krylov subspace

Kbm(A,V1) ≡ span

{V1,AV1, . . . , Am−1V1

}, (5.1)

where V1 ∈ CN×b, VH

1 V1 = Ib, is an orthonormal matrix. The Block Arnoldidecomposition is given by

AVmb = VmbHmb + FmETmb. (5.2)

where in the case of b = 1, the classical single vector Arnoldi decomposition isobtained. The Arnoldi algorithm yields a banded upper Hessenberg matrix Hmb,which can be viewed as a restriction of A to the Krylov subspace, [2].

With increasing m, storage and computational requirements increase and restartbecomes necessary. For the single vector case [36], later generalized by Lehoucq toBlock Arnoldi [24], Sorensen proposed an implicit restarting of the Arnoldi-method.A shifted QR-iteration on the Hessenberg matrix produces an improved startingblock V+

1 . It is equivalent to an (implicit) application of a polynomial Φ(A) =(A − µ1I) · · · (A − µmb−pI) on the starting matrix, where m = p − r, and r is thenumber of remaining blocks of vectors after restart. When shifts µ are chosen amongthe unwanted eigenvalues of Hmb, these very eigenvalues will be de-emphasized after

15

restart. For b > 1 only a subset of the unwanted eigenvalues can be used as shifts,rendering a whole family of possible restarted Arnoldi decompositions.

Stewart proposed a different restart method for single-vector Arnoldi. Usingthat the Arnoldi decomposition is uniquely determined by the starting vector, heshowed that the same compressed decomposition can be obtained via row-wiseapplications of Householder-transforms. For block-sizes larger than one, the twoapproaches are not equivalent.

When an eigenvalue has converged, it should be "deflated", i.e. tale no furtherpart in the iteration. There are two cases: if it belongs to the wanted set, it shouldbe locked, else it should be purged from the decomposition. Let the number ofeigenvalues to be deflated be l = ll + lp, where ll is the number to be locked,and lp the number to be purged. In the case of Stewart’s algorithm, (5.2) is firsttransformed to Schur form,

AUmb = UmbTmb + FmETmbWmb,

Umb = VmbWmb,(5.3)

Eigenvalues to be locked are moved to the top, whereas the ones to be purged aremoved to the bottom. Since the restart is accomplished by discarding the mb−rb−llcolumns, the lp directions are suppressed from the decomposition. By successiveapplication of row-wise Householder transforms, the decomposition is returned toHessenberg form and the Arnoldi process can be restarted.

We showed how the same deflation procedure can be used together with theLehoucq implicitly restarted Block Arnoldi iteration. Initially the process is thesame as the one described above. First transforming (5.2) to (5.3), the ll con-verged eigenvalues are moved to the top, and the lp to the bottom of the Schurmatrix. Then the decomposition is returned to Hessenberg form via Householdertransformations, giving an Arnoldi decomposition of order mb − lp.

After restart, we wish to have an Arnoldi decomposition of order rb + ll, wherethe active part is rb. If p shifts are to be applied, the Arnoldi decomposition must beextended before it can be restarted, see Fig.(5.1). The decomposition can then berestarted via shifted QR. If there is an odd number of complex eigenvalues amongthe shifts, one extra shift should be applied. In the case r = 1, this correspondsto a "full" restart, that is upon restarting the Arnoldi iteration again we only havethe new starting block V1.

The two different versions of Implicitly Restarted Arnoldi were tested on severalmatrices from the Harwell-Boeing collection. Numerical experiments showed thatrestart based on shifted QR showed better convergence than the restarted BlockArnoldi based on Householder transforms. Results obtained for the shifted QRrestart were similar to those of Lehoucq and Maschhoff in [24].

Measuring efficiency in the number matrix-vector products, almost all exper-iments suggested that the single-vector Arnoldi was more effective than blockArnoldi.

16

l

Hmb−l

ET

mbWll

Vmb−l mF+

l llTU T

l ,mb−ll

Tl ,l

lmb−l,l

l

Vl

l,mb−lHH

H

mF

Figure 5.1: The Arnoldi decomposition with ll locked eigenvectors after extendingthe order mb − lp decomposition with l vectors.

.

17

Chapter 6

Conclusions

The thesis investigates the Recursive Projection method applied to both CFD andCoarse Time Stepper models. It is shown how RPM can accelerate the solutionsof steady-state applications. RPM reduced the compute time by a factor between2 and 5. The importance of scaling the state-vector has been discussed. Scalingreduced the skewness of the projections and improved robustness of RPM.

By use of dual-time stepping, the same techniques could be used in unsteadysimulations successfully. Convergence in each time step was reached 2-3 times fasterthan without RPM. The hope that a basis could be retained for several steps wasthwarted by the results of numerical experiments. The basis could at most be re-used once. In the steps when the basis indeed could be re-used, large savings offunction evaluations were observed. This motivates the further study of differentbasis updating techniques.

We investigated the performance of Block Arnoldi methods, and derived a newflexible implementation of the restart procedure. It was found that in general it ispreferable to use the single vector method, but some applications suggested thatblock methods can compete, so further work is necessary to close the issue.

19

Chapter 7

Outlook and Future Work.

RPM has been demonstrated as an easily applicable technique for stabilization,convergence acceleration, and root following for problems which can be cast asfixed-point iterations. A number of questions have been raised in the course ofthe development reported here. The most important concerns effective methodsfor subspace identification. For unsteady flow calculations, the state changes onlylittle in each time-step, yet the basis identified in one step was, unexpectedly, notoften useful in the next. Various techniques have been tried to update the basis,ranging from formulation and approximate solution of a Riccati matrix equation forthe best basis update,[35], via simplified version with the corresponding Sylvesterequation to more heuristic low-rank update attempts. Since the update, when itis useful, really saves work, further investigation along these lines is indicated, forinstance to construct a method that automatically decides when a basis cab bere-used, when it should be updated, and when RPM should be restarted.

As the next point, the scaling of variables has been shown to be of importance.The scaling should be chosen to make all state vector components be of the sameorder of magnitude. Another goal is to decrease, by scaling, the spectral conditionnumber of the Jacobian to improve the basis sensitivity to perturbations. Thematter needs to be more analyzed in order to understand how the scaling affectsthe overall performance of the algorithm.

There is no guarantee that RPM preserves positivity variables. A positivityscheme that damps the RPM correction to ensure that e.g. k, ε > 0 can be asolution. This scheme must be constructed such that it is consistent with theprojections, is far as the author know such a scheme does not exists.

We have showed how RPM has successfully accelerated both steady and un-steady 1D and 2D CFD-applications in small test-problems. The next step is toapply RPM to a 3D-real world application problem.

The natural choice of updating the basis is the Block Arnoldi method, thoughthere are are other methods such as Jacobi-Davidson and two-sided Lanczos. Nu-merical experiments indicate that block Krylov methods can not compete with

21

single-vector algorithm if efficiency is measured in number of matrix-vector products.On the other hand, the convergence rate of the method depends crucially on thechoice of shifts, and further increase in efficiency is certainly to be expected if abetter theory can be derived. Furthermore, since block methods enables the use ofBLAS-3, measuring the efficiency in wall clock time, the efficiency picture might betotally different when performing parallel eigenvalue computations.

22

Bibliography

[1] A. Troxler A. Scascighini. Inverse design for 2D and axis-symmetric invis-cid flows . Technical report, Seminar for Applied Mathematics, Eidgenösslis-che Technische Hochschule Zürich, 2001. ECMI 2000, 25.10.2000-1.11.2000,Palermo.

[2] Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst editors.Templates for the solution of algebraic eigenvalue problems: a practical guide.SIAM, Philadelphia, 2000.

[3] M. Bergman. Accelerating an iterative process using eigenvalue extrapolation.Masters thesis, Dept. of Aeronautics, KTH, 1986.

[4] H. Bijl, M. Carpenter, and V. N. Vatsa. Time integration schemes for theunsteady Navier-Stokes equations. AIAA 2001-2612, 2001.

[5] H. Bijl and M.H. Carpenter. An assessment of iterative solution techniques forunsteady flow problems. ECCOMAS 2004, 2004.

[6] K. Burrage, J. Erhel, B. Pohl, and A. Williams. A deflation technique forlinear systems of equations. SIAM J. on Sc. Comp., 19(4):1245–1260, 1998.

[7] K. Burrage, A. Williams, j. Erhel, and B. Pohl. The implementation of ageneralized cross validation algorithm using deflation techniques for linear sys-tems . Technical report, Seminar for Applied Mathematics, EidgenösslischeTechnische Hochschule Zürich, 1994. report 94-02.

[8] M.S. Campobasso and M.B. Giles. Stabilization of a linearized Navier-Stokessolver for turbomachinery aeroelasticity. In P. Morgan S. Armfield andK. Srinivasan, editors, Computational Fluid Dynamics 2002, pages 343–348.Springer Verlag, Berlin, Germany, 2003.

[9] M.S. Campobasso and M.B. Giles. Stabilization of a linear flow solver for tur-bomachinery aeroelasticity using Recursive Projection Method. AIAA Journal,42(9):1765–1774, 2004.

[10] B. D. Davidson. Large-scale continuation and numerical bifurcation for partialdifferential equations. SIAM J. Numer. Anal., 34(5):2008–2027, 1997.

23

[11] M. Dorobantu, K. Lust, C. Jacobson, and A. Khibnik. The Recursive Projec-tion Method for acceleration of iterative methods. Technical report, UnitedTechnologies Research Center, 1999.

[12] J. M. Hsu and A. Jameson. An implicit-explicit hybrid scheme for calculatingcomplex unsteady flows. AIAA 2002-0714, 2002.

[13] A. Jameson. Time dependent calculations using multigrid, with application tounsteady flows past airfoils and wings. AIAA Paper 91-1596, 1991.

[14] V. Janovsky and O. Liberda. Continuation of invariant subspaces via theRecursive Projection Method. Appl. Math., 48:241–255, 2003.

[15] H. Jarausch. Analyzing stationary and periodic solutions of systems of para-bolic partial differential equations by using singular subspaces as reduced basis.Technical report, Inst. f. Geometrie u. Prakt. Mathematik, Aachen, Germany,1993. report 92.

[16] H. Jarausch and W. Mackens. Numerical treatment of bifurcation branchesby adaptive condensation. In T. Küpper, H.D. Mittelmann, and H. Weber,editors, Numerical methods for bifurcation problems, volume 70, pages 269–309. Birkhäuser, Basel, 1984.

[17] H. Jarausch and W. Mackens. Computing bifurcation diagrams for large non-linear variational problems. In P. Deufflhard and B. Engquist, editors, Large

scale scientific computing., Progress in Scientific Computing 7. Birkhäuser,Basel, 1987.

[18] H. Jarausch and W. Mackens. Solving large nonlinear systems of equationsby an adaptive condensation process. Numerische Mathematik., 50:633–653,1987.

[19] G. Jothiprasad, D. J. Mavriplis, and D. A. Caughey. Higher-order time in-tegration schemes for the unsteady Navier–Stokes equations on unstructuredmeshes. J. Comp. Phys, 191:542–566, 2003.

[20] B. Keller and H. von Sosen. New methods in cfd: DAE and RPM. In Proceed-

ings of the first asian cfd conference, pages 17–30, Hong Kong, 1995.

[21] I.G. Kevrekidis, C. W. Gear, J. M. Hyman, P. G. Kevrekidis, O. Runborg,and C. Theodoropoulos. Equation-free, coarse-grained multiscale computation:enabling microscopic simulations to perform system-level analysis. Comm.

Math. Sci., 1(4):715–762, 2003.

[22] P. G. Kevrekidis, I. G. Kevrekidis, and A. R. Bishop. Propagation failure,universal scaling and Goldstone modes. Phys. Lett. A, 279:361, 2001.

24

[23] D. A. Knoll and D.E. Keyes. Jacobian-free Newton-Krylov methods. a surveyof approaches and applications. J. Comp. Phys, 193:357–397, 2004.

[24] R. B. Lehoucq and K. J. Maschhoff. Implementation of an implicitly restartedBlock Arnoldi. Technical report, Argonne National Labratory, Argonne, IL,1997. Preprint MCS-P649-0297.

[25] P. Love. Numerical bifurcations in Kolmogorov and Taylor-vortex Flows. Ph.d.thesis, California Institute of Technology, Pasadena, California, 1999.

[26] K. Lust. Numerical bifurcation analysis of periodic solutions of partial differ-

ential equations. Ph.d. thesis, Katholieke Universiteit, Leuven, 1997.

[27] K. Lust and D. Roose. An adaptive Newton-Picard algorithm with subspaceiteration for computing periodic solutions. SIAM J. Sci. Comp., 19(4):1188–1209, 1998.

[28] T. Luzyanina, K. Engelborghs, K. Lust, and D. Roose. Computation, con-tinuation and bifurcation analysis of periodic solutions of delay differentialequations. International Journal of Bifurcation and Chaos, 7(11):2547–2560,1997.

[29] D.J. Mavriplis. An assessment of linear versus non-linear multigrid methodsfor unstructured mesh solvers. J. Comp. Phys, 175:302–335, 2002.

[30] J. Möller. Studies of Recursive Projection Methods for convergence accelera-

tion of steady state calculations. Licentiate thesis, Royal Institute of Techno-logy (KTH), Dept. of Numerical Analysis and Computer Science, Stockholm,Sweden, 2001. TRITA-NA-0119.

[31] A. Rizzi and L.-E Eriksson. Unigrid projection method to solve the Eulerequations for steady state transonic flow. In 8th Int’l Conf. on Num. Methods

in Fluid Dynamics, Lecture Notes in Physics. Springer-verlag, 1982.

[32] O. Runborg, C. Theodoropoulos, and I.G. Kevrikidis. Effective bifurcationanalysis: a time-stepper-based approach. Nonlinearity, 15:491–511, 2002.

[33] D. Serre. Matrices theory and applications. Springer-Verlag, New York, 2002.

[34] G. M. Shroff and H. B. Keller. Stabilization of unstable procedures: TheRecursive Projection Method. SIAM J. Numer. Anal., 30:1099–1120, August1993.

[35] V. Simoncini and M. Sadkane. Arnoldi-Riccati method for large eigenvalueproblems. BIT Numerical Mathematics, 36(3):579–594, 1996.

[36] D. C. Sorensen. Implicit application of polynomial filters in a k-Step Arnoldimethod. SIAM J. Matrix Anal. Appl., 13:357–385, 1992.

25

[37] G. W. Stewart. A Krylov–Schur algorithm for large eigenproblems. SIAM J.

Matrix Anal. Appl., 23(3):601–614, 2001.

[38] Swedish Defence Research Agency (FOI), Aeronautics Div. (FFA). FFA MatlabToolbox. http://www.edge.foi.se/matlab_tools/index.html, 2003. accessed03/2004.

[39] C. Theodoropoulos, Y.-H. Qian, and I. G. Kevrekidis. "Coarse" stability andbifurcation analysis using time-steppers: a reaction-diffusion example. P. Natl.

Acad. Sci, 97(18):9840–9843., 2000.

[40] G. Tiesinga. Multi-level ILU preconditioners and continuation methods in fluid

dynamics. Ph.d. thesis, Rijksuniversiteit, Groningen, 2000.

[41] T. L. van Noorden, S. M. Verduyn Lunel, and A. Bliek. A Broyden Rankp+1 Update Continuation Method with Subspace Iteration. SIAM Journal on

Scientific Computing, 25(6):921–1940, 2004.

[42] T.L. van Noorden. New algorithms for parameter-swing reactors. Ph.d. thesis,Wiskunde en Informaitca, Vrije Universiteit, Amsterdam, 2002.

[43] T.L. van Noorden, S.M.Verduyn Lunel, and A. Bliek. The efficient computationof periodic states of cyclically operated chemical processes. IMA J. Appl.Math,68(2):149–166, 2003.

[44] J. B. Vos, P. Leyland, P. A. Lindberg, V. van Kemenade, C. Gacherieu,N. Duquesne, P. Lötstedt, C. Weber, and A. Ytterström. NSMB handbook4.0. Technical report, Dept. of Aeronautics KTH, 1998.

[45] D. Wang, S. Wallin, M. Berggren, and P. Eliasson. A computational study ofunsteady turbulent buffet aerodynamics. AIAA Paper 2000-2657, 2000.

26

aspects of the recursive projection method applied to flow

Documents