highly scalable parallel domain decomposition methods … · 1. introduction. domain decomposition...

30
HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS WITH AN APPLICATION TO BIOMECHANICS [PREPRINT] This is the pre-peer reviewed versoin of the following article: ZAMM Z. Angew. MAth. Mech. 90, No. 1, 5-12 (2010) / http://dx.doi.org/10.1002/zamm.200900329 Axel Klawonn and Oliver Rheinbach Fakult¨ at f ¨ ur Mathematik, Universit¨ at Duisburg-Essen, Campus Essen, Universit¨ atsstr. 2, D-45117 Essen, Germany. URL: http://www.numerik.uni-due.de. Email: {axel.klawonn, oliver.rheinbach}@uni-due.de Abstract. Highly scalable parallel domain decomposition methods for elliptic partial differential equations are considered with a special emphasis on problems arising in elasticity. The focus of this survey article is on Finite Element Tearing and Interconnecting (FETI) methods, a family of nonoverlapping domain decomposition methods where the continuity between the subdomains, in principle, is enforced by the use of Lagrange multipliers. Exact onelevel and dual-primal FETI methods as well as related inexact dual-primal variants are described and theoretical convergence estimates are presented together with numerical results confirming the parallel scalability properties of these methods. New aspects such as a hybrid onelevel FETI/FETI-DP approach and the behavior of FETI-DP for anisotropic elasticity problems are presented. Parallel and numerical scalability of the methods for more than 65 000 processor cores of the JUGENE supercomputer is shown. An application of a dual-primal FETI method to a nontrivial biomechanical problem from nonlinear elasticity modeling arterial wall stress is given, showing the robustness of our domain decomposition methods for such problems. 1. Introduction. Domain decomposition methods are an efficient approach for the solution of elliptic partial differential equations on parallel computers. Here, we understand by domain decomposition methods precon- ditioned iterative algorithms for the solution of large linear systems of equations obtained, either directly or by linearization, from the discretization of partial differential equations. In such methods, the domain, on which the partial differential equation has to be solved, is decomposed into a number of overlapping or nonoverlapping subdomains. In each step of the iterative method and for each subdomain, a local problem is solved. This local problem is often an approximation of the partial differential equation restricted to the subdomain; here we neglect for the moment that the boundary conditions are usually different for the local problem and the problem on the original domain. Depending on the particular domain decomposition method, the local problem is solved approx- imately itself or exactly, using a direct algorithm, e.g, a Gaussian elimination algorithm. We will mention below that a domain decomposition method for elliptic problems additionally must have a small global problem in or- der to exploit efficiently a growing number of processors of a parallel computer, i.e., to be parallel scalable. For an extensive introduction to different domain decomposition methods, we refer to the books by Smith, Bjørstad, and Gropp [50], Toselli and Widlund [52], and Quarteroni and Valli [48]. To implement a domain decomposition method on a parallel computer, each subdomain will be associated with a processor core of a parallel computer, in general one processor core can obtain more than one subdomain. To efficiently exploit the parallel computing capabilities of a parallel computer, it is important to have a parallel scalable algorithm. There exist two different definitions of parallel scalability which are sometimes also denoted by weak and strong (parallel) scalability. An algorithm is weakly scalable if a given problem with a fixed number of unknowns is (ideally) solved twice as fast if twice as many processors are used, it is called strongly scalable if a problem of twice the size is solved in the same time if twice as many processors are used. Formally, weak and strong scalability are often measured using the speedup S(P 1 ,P 2 ) := (T 1 N 2 )/(T 2 N 1 ), where T 1 is the time that the implementation of an algorithm needs to solve a problem of size N 1 using P 1 processor cores and T 2 is the time that the same implementation of an algorithm needs for a problem of size N 2 using P 2 processor cores. The theoretically optimal case is given when the speedup equals P 2 /P 1 . In order to obtain a parallel scalable algorithm it is important to develop numerically scalable algorithms, i.e., algorithms which converge independently of or only weakly dependent on the number of subdomains. To develop such an algorithm for elliptic partial differential equations, it is important to solve, in 1

Upload: others

Post on 18-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS WIT H ANAPPLICATION TO BIOMECHANICS [PREPRINT]

This is the pre-peer reviewed versoin of the following article:ZAMM Z. Angew. MAth. Mech. 90, No. 1, 5-12 (2010) /

http://dx.doi.org/10.1002/zamm.200900329

Axel Klawonn and Oliver RheinbachFakultat fur Mathematik, Universitat Duisburg-Essen,Campus Essen,Universitatsstr. 2, D-45117 Essen, Germany.URL: http://www.numerik.uni-due.de.Email: axel.klawonn, [email protected]

Abstract. Highly scalable parallel domain decomposition methods forelliptic partial differential equations are considered with a specialemphasis on problems arising in elasticity. The focus of this survey article is on Finite Element Tearing and Interconnecting (FETI) methods,a family of nonoverlapping domain decomposition methods where the continuity between the subdomains, in principle, isenforced by theuse of Lagrange multipliers. Exact onelevel and dual-primal FETI methods as well as related inexact dual-primal variants are described andtheoretical convergence estimates are presented togetherwith numerical results confirming the parallel scalabilityproperties of these methods.New aspects such as a hybrid onelevel FETI/FETI-DP approachand the behavior of FETI-DP for anisotropic elasticity problems are presented.Parallel and numerical scalability of the methods for more than 65 000 processor cores of the JUGENE supercomputer is shown. An applicationof a dual-primal FETI method to a nontrivial biomechanical problem from nonlinear elasticity modeling arterial wall stress is given, showingthe robustness of our domain decomposition methods for suchproblems.

1. Introduction. Domain decomposition methods are an efficient approach for the solution of elliptic partialdifferential equations on parallel computers. Here, we understand by domain decomposition methods precon-ditioned iterative algorithms for the solution of large linear systems of equations obtained, either directly or bylinearization, from the discretization of partial differential equations. In such methods, the domain, on whichthe partial differential equation has to be solved, is decomposed into a number of overlapping or nonoverlappingsubdomains. In each step of the iterative method and for eachsubdomain, a local problem is solved. This localproblem is often an approximation of the partial differential equation restricted to the subdomain; here we neglectfor the moment that the boundary conditions are usually different for the local problem and the problem on theoriginal domain. Depending on the particular domain decomposition method, the local problem is solved approx-imately itself or exactly, using a direct algorithm, e.g, a Gaussian elimination algorithm. We will mention belowthat a domain decomposition method for elliptic problems additionally must have a small global problem in or-der to exploit efficiently a growing number of processors of aparallel computer, i.e., to be parallel scalable. Foran extensive introduction to different domain decomposition methods, we refer to the books by Smith, Bjørstad,and Gropp [50], Toselli and Widlund [52], and Quarteroni andValli [48]. To implement a domain decompositionmethod on a parallel computer, each subdomain will be associated with a processor core of a parallel computer,in general one processor core can obtain more than one subdomain. To efficiently exploit the parallel computingcapabilities of a parallel computer, it is important to havea parallel scalable algorithm. There exist two differentdefinitions of parallel scalability which are sometimes also denoted by weak and strong (parallel) scalability. Analgorithm is weakly scalable if a given problem with a fixed number of unknowns is (ideally) solved twice as fastif twice as many processors are used, it is called strongly scalable if a problem of twice the size is solved in thesame time if twice as many processors are used. Formally, weak and strong scalability are often measured usingthe speedupS(P1, P2) := (T1N2)/(T2N1), whereT1 is the time that the implementation of an algorithm needsto solve a problem of sizeN1 usingP1 processor cores andT2 is the time that the same implementation of analgorithm needs for a problem of sizeN2 usingP2 processor cores. The theoretically optimal case is given whenthe speedup equalsP2/P1. In order to obtain a parallel scalable algorithm it is important to develop numericallyscalable algorithms, i.e., algorithms which converge independently of or only weakly dependent on the numberof subdomains. To develop such an algorithm for elliptic partial differential equations, it is important to solve, in

1

Page 2: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

addition to theN local problems associated with the subdomains, a small coarse or global problem. Such a coarseproblem guarantees, in each step of the iterative method, a global exchange of information between all subdomains.It usually introduces to the algorithm a certain coupling ofthe subdomains and thus needs special attention withrespect to the load balancing when implemented on a parallelcomputer.

In the present article, we will focus on different members ofthe family of Finite ElementTearing andInterconnecting (FETI) algorithms, a class of highly scalable, nonoverlapping domain decomposition methods.In all FETI algorithms considered in this article, the original domain is decomposed intoN nonoverlapping sub-domainsΩi, i = 1, . . . , N, and for each subdomain, a local finite element stiffness matrix K(i) and a local loadvectorf (i) are assembled; the local solution vectors are denoted byu(i). Gathering these local stiffness matricesin a block diagonal matrixK and the local load vectors in a block vectorf , we obtain a linear system with a blockvector of unknownsu. We also have to impose continuity conditions on the interface between neighboring sub-domains. Here, we will only consider finite element meshes with matching interface nodes, thus we can representthose continuity conditions asBu = 0 whereB is a matrix with entries from0, 1,−1. Enforcing these con-tinuity conditions by introducing Lagrange multipliersλ, our decomposed problem leads to the following linearsystem

[K BT

B O

] [uλ

]=

[f0

]. (1.1)

If the original finite element problem is solvable, i.e., thesystem obtained from the finite element discretization onthe undecomposed domainΩ, then the decomposed system(1.1) is also solvable and leads to the same solution.The original idea of FETI domain decomposition methods is toeliminate the variableu and to iterate with theresulting Schur complement system on the Lagrange multipliersλ. In general, the block diagonal matrixK isnot invertible, thus, this approach cannot be implemented directly. Thus, certain conditions have to be introducedwhich on the one hand yield a regular linear system and on the other hand do not change the original solution.One possibility is to use the kernel vectors ofK to ensure that a consistent linear system is obtained from thefirst set of equations in (1.1); this is implemented using an appropriate projection and also introduces a coarseproblem. Then, a suitable pseudoinverse ofK can be used to eliminate the variablesu. This approach leads to theonelevel FETI method originally introduced by Farhat and Roux [10,11,16]. Another possibility is to subassemblethe block diagonal matrixK in a sufficient number of (primal) variablesu which leads to an invertible matrixK and allows us to eliminate the unknownsu; it also introduces a coarse problem. This approach leads tothedual-primal FETI method which was first introduced by Farhat, Lesoinne, Le Tallec, Pierson, and Rixen [12]. Itis sometimes important, to replace or enhance the coarse problem of the dual-primal FETI method, e.g., in threespace dimensions. A possibility is to introduce certain edge or face averages or edge first order moments, eitheradditionally or instead of the assembly in a selected numberof primal variablesu; see, e.g., Farhat, Lesoinne,Pierson [13], Klawonn and Widlund [35,37,38], Klawonn, Widlund, and Dryja [39], and Klawonn and Rheinbach[24, 26]. Since the elimination of the variablesu is carried out by a Gaussian elimination method, we will denotethe onelevel and the dual-primal FETI methods also as exact FETI methods. For a large number of subdomainsand processors, the exact elimination of the variablesu can lead to a deterioration of the parallel scalability of theFETI domain decomposition methods due to the nonlinear complexity of the direct Gaussian elimination method.A remedy leads to the inexact FETI methods introduced in Klawonn and Widlund [33, 34] and Klawonn [21] forthe onelevel FETI methods and by Klawonn and Rheinbach [25] for the dual-primal FETI methods.

The remainder of this article is organized as follows. In Section 2, we introduce our model problem ofisotropic linear elasticity and its discretization using finite elements. In Section 3, we introduce the onelevel andthe dual-primal FETI methods. Additionally, we present a hybrid FETI approach which contains both methods aslimit cases. In Section 4, we introduce our different preconditioners for the FETI methods and in Section 5, wediscuss the choice of primal constraints for FETI-DP in three dimensions. In Section 6, we collect several conditionnumber estimates for the different FETI methods and also present some new results for irregular subdomains intwo dimensions. In Section 7, we extend our FETI-DP method using exact subdomain solvers to inexact FETI-DPmethods in order to extend the parallel scalability on massively parallel systems. Finally, in Section 8, we apply

2

Page 3: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

our dual-primal FETI method within a nonlinear solver environment to solve a nonlinear elasticity problem arisingin the modeling of arterial wall mechanics.

2. Model problem. Throughout this paper, the system of linear elasticity willbe our model problem, for thepresentation of the algorithms and the theory. Other elliptic partial differential equations could be treated as wellusing the methods provided in this paper. We will then apply the algorithms to a highly nonlinear, anisotropic, andalmost incompressible problem from structural mechanics.

The equations of linear elasticity model the displacement of a linear elastic material under the action of externaland internal forces. The elastic body occupies a domainΩ ⊂ IRd, d = 2, 3, which is assumed to be polygonal orpolyhedral, respectively. We denote its boundary by∂Ω and assume that one part of it,∂ΩD, is clamped, i.e., withhomogeneous Dirichlet boundary conditions, and that the rest,∂ΩN := ∂Ω \ ∂ΩD, is subject to a surface forceg,i.e., a natural boundary condition. We can also introduce a body forcef , e.g., gravity.

The appropriate space for a variational formulation is the Sobolev spaceH1

0(Ω, ∂ΩD) := v ∈ (H1(Ω))d :

v = 0 on∂ΩD. The linear elasticity problem consists in finding the displacementu ∈ H1

0(Ω, ∂ΩD) of the elasticbodyΩ, such that

Ω

G(x)ε(u) : ε(v)dx +

Ω

G(x)β(x) divu divv dx = 〈F, v〉 ∀v ∈ H1

0(Ω, ∂ΩD). (2.1)

HereG andβ are material parameters which depend on the Young modulusE > 0 and the Poisson ratioν ∈(0, 1/2); we haveG = E/(1 + ν) andβ = ν/(1 − 2ν). In this article, if not denoted otherwise, we onlyconsider the case of compressible linear elasticity, whichmeans that the Poisson ratioν is bounded away from1/2. Furthermore,εij(u) := 1

2 ( ∂ui

∂xj+

∂uj

∂xi) is the linearized strain tensor, and

ε(u) : ε(v) =3∑

i,j=1

εij(u)εij(v), 〈F, v〉 :=

Ω

fT v dx +

∂ΩN

gT v dσ.

For convenience, we also introduce the notation

(ε(u), ε(v))L2(Ω) :=

Ω

ε(u) : ε(v)dx.

The bilinear form associated with linear elasticity is then

a(u, v) = (Gε(u), ε(v))L2(Ω) + (Gβ divu, divv)L2(Ω).

The wellposedness of the linear system (2.1) follows immediately from the continuity and ellipticity of the bilinearform a(·, ·), where the first follows from elementary inequalities and the latter from Korn’s first inequality; see,e.g., [7].

We will only consider compressible elastic materials. It istherefore sufficient to discretize our elliptic problemof linear elasticity (2.1) by low order, conforming finite elements, e.g., linear or trilinear elements. Let us assumethat a triangulationτh of Ω is given which is shape regular and has a typical diameter ofh. We denote byWh := Wh(Ω) the corresponding conforming finite element space of finite element functions. The associateddiscrete problem is then to finduh ∈ Wh, such that

a(uh, vh) = 〈F, vh〉 ∀vh ∈Wh. (2.2)

When there is no risk of confusion, we will drop the subscripth.In some cases, e.g., for some parallel scalability studies,we will also consider the following scalar, second

order model problem: findu ∈ H10 (Ω, ∂ΩD) := v ∈ H1(Ω) : v = 0 on∂ΩD, such that

a(u, v) = f(v) ∀v ∈ H10 (Ω, ∂ΩD), (2.3)

3

Page 4: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

where

a(u, v) :=

Ω

G∇u · ∇vdx, f(v) :=

Ω

fvdx +

∂ΩN

gNvds, (2.4)

and wheregN are the Neumann boundary data defined on∂ΩN ; it provides, together with the volume loadf , thecontributions to the load vector of the finite element problem. The coefficientG = G(x) is positive forx ∈ Ω.Here, we can again use low order, conforming finite elements as well as elements of higher order such as spectralelements and obtain again a discrete problem as in (2.2).

3. The FETI and FETI-DP saddle point formulations.

3.1. The FETI system.Let a domainΩ ⊂ Rd, d = 2, 3 be decomposed intoN nonoverlapping subdomainsΩi of diameterH , each of which is the union of finite elements with matching finite element nodes on the bound-aries of neighboring subdomains across the interfaceΓ :=

⋃i6=j ∂Ωi ∩ ∂Ωj , where∂Ωi, ∂Ωj are the boundaries

of Ωi,Ωj , respectively. The interfaceΓ is the union of edges and vertices (in 2D) and faces, edges, and vertices (in3D). Here, for simplicity, we regard edges in 2D and faces in 3D as open sets, that are shared by two subdomains,edges in 3D as open sets that are shared by more than two subdomains, and vertices, in 2D and 3D, as endpoints ofedges; see Fig. 3.1, and, e.g., Toselli and Widlund [52, Chapter 4.2]. For a more detailed definition of faces, edges,and vertices; see Klawonn and Widlund [38, Section 3] and Klawonn and Rheinbach [24,26, Section 2].

For each subdomainΩi, i = 1, . . . , N, we assemble the local stiffness matricesK(i) and local load vectorsf (i). We denote the unknowns on each subdomain, e.g., the displacements in the case of elasticity, byu(i), andobtain

K =

K(1) 0

. . .0 K(N)

, u =

u(1)

...u(N)

, f =

f (1)

...f (N)

.

The original, globally assembled problem onΩ can be written as

Kgug = fg (3.1)

using finite element assembly at the interface, i.e.,

Kg = [R(1)T , . . . , R(N)T ]

K(1) 0

. . .0 K(N)

R(1)

...R(N)

=

N∑

i=1

R(i)TK(i)R(i)

and

fg = [R(1)T , . . . , R(N)T ]

f (1)

...f (N)

=

N∑

i=1

R(i)T f (i).

Here,R(i)T andR(i), i = 1, . . .N are the local-to-global prolongation operators and their transpose, respectively.The discrete problem, with continuous displacementsu = [u(1)T , . . . , u(N)T ]T across the interface, can also

be formulated as minimization problem with the interface continuity constraintBu = 0, whereB = [B(1), . . . , B(N)]is a matrix with entries from0, 1,−1. The matrixB is also referred to as jump operator andBu = 0 as the jumpcondition. Then, introducing Lagrange multipliersλ to enforce the jump condition yields:

Find (u, λ), such that[K BT

B 0

] [uλ

]=

[f0

]. (3.2)

4

Page 5: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

Solving this problem is equivalent to solving the original problem (3.1). The general idea underlying all FETIdomain decomposition methods is to eliminate the variablesu in (3.2) and to solve the resulting Schur complementiteratively with a suitable Krylov space method for the Lagrange multipliersλ. Since the block diagonal matrixKis in general only positive semidefinite the elimination process usually cannot be carried out directly. In the firstattempt to circumvent this problem, i.e., the onelevel FETImethod, here also denoted as FETI-1, the idea is toenforce consistency of the linear system and then to use a pseudoinverse. Here, we will give a brief description ofFETI-1; see, e.g., [10,14,15,36,52] for more details.

For eachi = 1, . . . , N , letR(i) be a matrix with maximal rank andrange (R(i)) = ker (K(i)) and let

R =

R(1) 0

. . .0 R(N)

be the block matrix collecting all the localR(i). If ker (K(i)) = 0, e.g., as a result of essential boundaryconditions, thenR(i) is empty. We haverange (R) = ker (K).

Using a pseudoinverseK+, the first block row of (3.2) is equivalent to

u = K+(f −BTλ) +Rα

whereα is a column vector and where an additional condition,

RT (f −BTλ) = 0,

enforces solvability. Note that the pseudo inverseK+ may not necessarily be the Moore-Penrose inverse but can bechosen as a computationally cheaper alternative. Indeed, the computation ofK+ can be realized with essentiallythe same computational cost as a standard Cholesky decomposition. Using the second equation of (3.2) we obtainan equation for the Lagrange multipliers,

−BK+BTu = −BK+f −BRα (3.3)

or, equivalently,

Fλ = d+Gα, (3.4)

with the system matrixF = BK+BT , the right hand sided = BK+f , andG = BR.We can now solve this system using an iterative method like conjugate gradients. A projectionP = I −G(GTG)GT ,within the conjugate gradient algorithm enforces orthogonality to the kernel and also provides FETI-1 with a coarse problem. Thus, we solve the positive semidefinite system

PTBK+BTλ = PTBK+f

with λ satisfyingGTλ = RT f by projected conjugate gradients. An admissible initialλ0 has to be chosen, fordetails refer to Section 3.3, and an appropriate preconditionerM−1, see Section 4. The preconditioned systemthen can then be written as

PM−1PTFλ = PM−1PT f

with the symmetric system matrixF and the symmetric preconditionerPM−1PT .As the continuity constraintBu = 0 is enforced only weakly, the global solutionu in (3.2) is continuous

only at convergence of the algorithm. Generally, the displacement variables associated with the FETI iterates arediscontinuous across the interface whereas the normal derivatives of the displacements are continuous throughoutthe iteration.

5

Page 6: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

If material jumps across subdomain boundaries are present,in addition to the use ofρ-scaling, a differentinner product has to be used when defining the projectionPT . For an appropriate symmetric scaling matrixQ wethen havePT = I − G(GTQG)−1GTQ. A possible choice forQ is the Dirichlet preconditioner, see Section 4.Computationally cheaper alternatives exist, for more details, see, Section 4 and, e.g., [3], or, for an even cheaperchoice, i.e., a diagonal matrixQδ, see [22,36] and Section 4. For work on scalar elliptic equations of second orderwith arbitrary interior coefficient jumps, see [46].

3.2. The FETI-DP system.Another approach to cope with the semidefiniteness ofK is represented by thedual-primal FETI method introduced in Farhat et al. [12]. The basic idea is to divide the variablesu into twogroups, the primal and the dual unknowns. Then, continuity is enforced directly in the primal unknowns whereascontinuity in the dual unknowns is still treated by Lagrangemultipliers. If a sufficient number of primal unknownsis chosen, the variablesu can be eliminated without using a pseudo inverse and the Lagrange multipliers can becomputed iteratively solving the resulting Schur complement system. In this approach the use of a pseudoinverseis thus avoided by introducing constraints that are fulfilled throughout the iteration. We will now describe in moredetail, how the dual-primal FETI method can be derived from (3.2).

Each nodal vectoru(i) can be divided into a set of interior unknowns,u(i)I , associated with nodes in the interior

of Ωi, and interface variables,u(i)Γ , associated with nodes on the interfaceΓ. On a subset of the interface variables,

denoted byu(i)Π or primal variables, we will enforce the continuity of the solution by global subassembly of the

subdomain stiffness matricesK(i). For all other interface variables, denoted byu(i)∆ or dual displacement variables,

we will introduce Lagrange multipliers to enforce continuity as in FETI-1 methods. We denote the variables thatare not primal byu(i)

B = [u(i)TI , u

(i)T∆ ]T and partition the local stiffness matrices, fori = 1, . . . , N , accordingly,

which gives

K(i) =

[K

(i)BB K

(i)TΠB

K(i)ΠB K

(i)ΠΠ

], K

(i)BB =

[K

(i)II K

(i)T∆I

K(i)∆I K

(i)∆∆

], andK(i)

ΠB = [K(i)ΠI ,K

(i)Π∆].

Next, we subassemble the primal variablesu(i)Π , i = 1, . . . , N, as in a standard finite element assembly process.

Denoting byR(i)TΠ the standard prolongation matrices, which map from the local subdomain variablesu(i)

Π to theglobal variablesuΠ, we obtain

KΠΠ =

N∑

i=1

R(i)TΠ K

(i)ΠΠR

(i)Π (3.5)

= [R(1)TΠ , . . . , R

(N)TΠ ]

K(1)ΠΠ 0

. . .

0 K(N)ΠΠ

R(1)Π

R(N)Π

= RT

ΠKΠΠRΠ. (3.6)

and

K(i)ΠB = R

(i)TΠ K

(i)ΠB

for i = 1, . . . , N. Defining additional block matrices

KBB =

K(1)BB 0

. . .

0 K(N)BB

and KΠB = [ K

(1)ΠB, . . . , K

(N)ΠB ],

we obtain the partially assembled matrixK and corresponding right hand sidef , i.e.,

K =

[KBB KT

ΠB

KΠB KΠΠ

], f =

[fB

],

6

Page 7: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

wherefB = [ f(1)TB , . . . , f

(N)TB ]T . We define

uB = [u(1)TB , . . . , u

(N)TB ]T ,

accordingly.

Choosing a sufficient number of primal variablesu(i)Π to constrain the solutions on each subdomain results in

a symmetric positive definite matrixK; see, e.g., [12] or [38]. In this notation, the system matrixof the original,global problem (3.1) can be written as

Kg = RTKR = RTBKRB = RT

BRTΠKRΠRB

with suitable assembly operatorsRTB andRT

Π. Assuming a suitable numbering inKg and inK, i.e., the variablesuΠ are numbered last inKg and the variablesuΠ are numbered last inK, we can write

RTB =

[R

(1)TB . . . R

(N)TB 0

0 . . . 0 IΠ

]and RT

Π =

[IB 0 . . . 0

0 R(1)TΠ . . . R

(N)TΠ

],

whereIΠ andIB are identity matrices.In FETI-DP methods we will use the partially assembled matrix K = RT

ΠKRΠ and introduce a jump operatorB with entries0,−1, or 1 and Lagrange multipliersλ to enforce continuity on the remaining interface variablesu

(i)∆ . For convenience, we always use the full set of (redundant) Lagrange multipliers wherever more than two

subdomains share a single node.We introduce the notation

u = [uTB, u

TΠ ]T .

Now, we can formulate the FETI-DP saddle point problem,

[K BT

B 0

] [uλ

]=

[f0

], u ∈ Rn, λ ∈ Rm, (3.7)

from which the solution of the original finite element problem (2.2) can be obtained by identifying the solutionuin the interface variables. We will also use the notation

Ax = F ,

where

A :=

[K BT

B 0

], x :=

[uλ

], F :=

[f0

].

So far, we have not discussed how to choose the decompositionof the local interface vectorsu(i)Γ into primal

variablesu(i)Π and remaining (dual) interface vectorsu(i)

∆ . One immediate possibility is to choose selected vertices

(or corners) ofΩi as nodes where the variablesu(i)Γ are assembled; cf. Farhat et al. [12] or Mandel and Tezaur

[45]. In combination with the Dirichlet preconditioner, this yields good convergence bounds of the order of(1 +log(H/h))2 in two dimensions, see Mandel and Tezaur [45] and Klawonn, Rheinbach, and Widlund [32], butnot in three dimensions, see Klawonn, Widlund, and Dryja [39]; see also Section 6. Numerical evidence forthis deterioration in three dimensions was first given in Farhat, Lesoinne, and Pierson [13]; cf. also Klawonn,Rheinbach, and Widlund [30, Table 5, Figure 2]. In Section 5,we will discuss how to choose the primal variablesin order to obtain scalable algorithms in three dimensions.

7

Page 8: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

With the notation introduced before, we can rewrite (3.7) asKBB KT

ΠB BTB

KΠB KΠΠ 0BB 0 0

uB

λ

=

fB

fΠ0

. (3.8)

EliminatinguB by one step of block Gaussian elimination, we obtain the reduced system[

SΠΠ −KΠBK−1BBB

TB

−BBK−1BBK

TΠB −BBK

−1BBB

TB

][uΠ

λ

]=

[fΠ − KΠBK

−1BBfB

−BBK−1BBfB

], (3.9)

whereSΠΠ = KΠΠ − KΠBK−1BBK

TΠB . Here, we will also use the notation

Arxr = Fr,

where

Ar =

[SΠΠ −KΠBK

−1BBB

TB

−BBK−1BBK

TΠB −BBK

−1BBB

TB

], xr :=

[uΠ

λ

],Fr :=

[fΠ − KΠBK

−1BBfB

−BBK−1BBfB

].

By also eliminating the primal variablesuΠ, we obtain the reduced system

Fλ = d, (3.10)

where

F := BBK−1BBB

TB +BBK

−1BBK

TΠBS

−1ΠΠKΠBK

−1BBB

TB = BK−1BT ,

d := BBK−1BBfB +BBK

−1BBK

TΠBS

−1ΠΠ(fΠ − KΠBK

−1BBfB) = BK−1f .

The linear system (3.10) is the standard, exact FETI-DP system which is solved using preconditioned conjugategradients and an appropriate preconditionerM−1; cf. Section 7. In FETI-DP we do not need a projection and wecan chooseλ0 = 0 as an initial solution for the cg iteration.

3.3. A Hybrid FETI System. In onelevel FETI methods the coarse problem or global coupling is introducedby a projection whereas in dual-primal FETI methods this is achieved by subassembling a selected number ofprimal variables. These approaches can be combined to create a hybrid FETI-1/FETI-DP method [28]. In order todo so, some subdomains are aggregated into subclusters and the related subdomain stiffness matrices are partiallyassembled, within each subcluster, in the primal variables. The resulting, partially assembled subcluster stiffnessmatrices themselves are not subassembled. LetNC denote the number of subclusters. To keep the presentationsimple, let us assume that the number of subdomains in each subcluster isNS for all subclusters.

The problem then takes the form

K(1) 0 B(1)T

. . ....

0 K(NC) B(NC)T

B(1) · · · B(NC)

u(1)

...u(NC)

λ

=

f (1)

...f (NC)

0

,

where

K(i) =

K(1)BB 0 K

(1)TΠB

. . ....

0 K(NS)BB K

(NS)TΠB

K(1)ΠB · · · K

(NS)ΠB KΠΠ

, f =

f(1)B...

f(NS)B

8

Page 9: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

are the partially assembled subcluster stiffness matricesand right hand sides. Note, that the jump operatorB(i)

now enforces intra-subcluster continuity as well as inter-subcluster continuity.As in FETI-1, a pseudoinverse and a projection are used to cope with those subclusters whereK(i) is only

semidefinite. We collect a basis of the kernel of

K =

K(1) 0

. . .

0 K(NC)

in the matrix

R =

R(1) 0

. . .0 R(NC)

whererange (R(i)) = ker (K(i)), i = 1, . . . , NC .Reducing the system of equations to an equation inλ, using pseudoinverses where necessary, it remains to

solve

Fλ = d, (3.11)

where

F = BK+BT .

The admissible initial solutionλ0 for the cg iteration fulfillingGTλ0 = RT f is given by

λ0 = G(GTG)−1RT f.

Once a dual solutionλ of (3.11) has been found, the solutionu is determined uniquely byu = K+(f −BTλ) +Rα whereα = (GTG)−1GT (Fλ− d).

4. Preconditioners for FETI-1, FETI-DP, and hybrid three-level FETI. To define the preconditionerM−1

for all FETI methods, we introduce a scaled jump operatorBD. It is defined by scaling the contributions ofBassociated with the dual displacement variables from individual subdomains. The simplest choice is to scale thecontributions from and to each subdomain by the multiplicity of the node. This scaling is refered to as multiplicityscaling orm-scaling. The multiplicity of a node is defined as the number of subdomains it belongs to. In the caseof only two subdomains the multiplicity of all interface nodes is2 and we would therefore haveBD = 1

2B.If coefficient jumps across the interface are present we haveto take into account the coefficientsG(x) of the

different subdomains. Otherwise the convergence rate of the FETI methods will depend on the coefficient jumps.Traditionally, the material coefficients are often denotedby ρ and therefore the corresponding scaling is oftenrefered to asρ-scaling. We will also use this terminology throughout thispaper. We construct the block matrixBD,

BD = [B(1)D , . . . , B

(N)D ],

from the subdomain matricesB(i)D , which are defined as follows: each row ofB(i) with a nonzero entry corresponds

to a Lagrange multiplier connecting the subdomainΩi with a neighboring subdomainΩj at a pointx ∈ ∂Ωi,h ∩

∂Ωj,h. We obtainB(i)D by multiplying each such row ofB(i) with

δ†j(x) :=Gj(x)∑

k∈Nx

Gk(x), (4.1)

9

Page 10: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

whereNx is the set of indices of subdomains which havex on its boundary. If the diagonal matricesD(i) containthe scaling factors for each row ofB(i) then we can write

BD = [D(1)B(1), . . . , D(N)B(N)].

Note that the scaling ofB(i) uses the coefficient of subdomainΩj in the numerator, i.e., the coefficient of thesubdomain facingΩi. This scaling can be defined using the local coefficient at every interface node. By means ofthe scaling, the operatorPD = BT

DB becomes a projection and is also refered to as jump operator.For a constantcoefficient theρ-scaling reduces to the multiplicity scaling.

We now partition the local stiffness matrices according to interior and interface variables,u(i)I andu(i)

Γ , andobtain

K(i) =

[K

(i)II K

(i) TΓI

K(i)ΓI K

(i)ΓΓ

].

We define the according block matrices

KII =

K(1)II 0

. . .

0 K(N)II

,KΓΓ =

K(1)ΓΓ 0

. . .

0 K(N)ΓΓ

, andKΓI =

K(1)ΓI 0

. . .

0 K(N)ΓI

,

where the first two matrices are square and the last is rectangular. Let us define the block Schur complement

SΓΓ = KΓΓ −KΓIK−1II K

TΓI =

N∑

i=1

(K(i)ΓΓ −K

(i)ΓI (K

(i)II )−1K

(i)TΓI ) =

N∑

i=1

S(i)ΓΓ,

which can be computed completely in parallel.Let us now define restriction matricesR(i)

Γ which restrict the degrees of freedom of a subdomain to the inter-

face, i.e.R(i)Γ = [0, IΓ], if the interior variables in each subdomain are numbered first and the interface variables

are numbered last.The Dirichlet preconditioner is then given in matrix form by

M−1 = BDRTΓSΓΓRΓB

TD =

N∑

i=1

B(i)D R

(i)TΓ S

(i)ΓΓR

(i)Γ B

(i)TD . (4.2)

There is also a computationally less expensive alternative, the lumped preconditionerML, which is defined as

M−1L = BDR

TΓKΓΓRΓB

TD

but which does not give (quasi-) optimal condition number estimates: the condition number estimates for thelumped preconditioner contain a linear factor ofH/h as opposed to a polylogarithmic bound which can be provenfor the Dirichlet preconditioner, see Section 6 and [38,39,45].

The inner product for the FETI projections has a significant influence on the convergence of problems withcoefficient jumps across the subdomain interface. Possiblechoices are the FETI preconditioner

QD = M−1,

the lumped preconditioner

QL = M−1L ,

10

Page 11: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

a superlumped version of it,

QSL =∑

j

B(j)D R

(j)TΓ diag(K(j)

ΓΓ )R(j)Γ B

(j)TD ,

see, e.g., [3], and the diagonal matrixQδ, introduced in [36]. Its entries for a uniform grid are defined by

qii = hminρm, ρn

where the Lagrange multiplierλi connectsΩm with Ωn and ρm, ρn refer to the constant coefficient of theirrespective domains. As usual,H is the diameter of the subdomain andh the diameter of the elements. Additionally,the entries which belong to a subdomain face are scaled by thefactor

(1 + log

(Hh

))hH . This scaling also applies

to an unstructured grid and can be implemented in the same way. In numerical experiments equally good resultswere obtained even without the additional weights

(1 + log

(Hh

))hH for the faces. In comparison with the other

choices, forQ = Qδ the resultingGTQδG remains very sparse thus reducing the computational cost for theFETI coarse problem(GTQδG)−1. One of the polylogarithmic condition number estimates in [36] for the case ofdiscontinuous coefficients relies on this choice ofQ = Qδ, see also Figure 4.2.

For parallel scalability results of the FETI-DP method on the JUGENE supercomputer, see Table 4.1. JU-GENE of the Julich Supercomputing Center (JSC), Forschungszentrum Julich, Germany, is currently Europe’sfastest supercomputer (number 3 in the TOP500 list of 6/2009) and the first petaflop/s machine in Europe. EachBlue Gene/P node has a single PPC 450 processor with 850 Mhz, 2GBytes memory per node, and 4 cores perprocessor.

In the upper part of the table we show scalability parallel results for a scalar problem of medium size dis-cretized using spectral elements with polynomial degreep = 32. The problem size is fixed to4 190 209, thenumber of subdomains is fixed toN = 4 096, andH/h = 1, i.e., we have one spectral element per subdomain.For more details, see also [23], where the case of multi-element subdomains is also considered. The spectral ele-ment method shows exponential convergence in the polynomial degree where the solution is smooth and therefore avery high accuracy can be obtained. Due to the high polynomial degree the linear systems have a much higher den-sity compared to low order finite elements. The iteration wasstopped when a relative tolerance ofrtol = 1e − 10was reached. The same spectral element problem has been solved on a 16 processor 2.2 Ghz Opteron cluster with4 GB RAM for each processor in approximately256 seconds. Since the problem size is fixed we expect the timedecrease with a growing number of processors. Ideally, the problem will be solved in half the time if twice as manyprocessors are used. The spectral element problem scales from512 to 2 048 cores with an efficiency of78%. Thistype of scalability is also refered to as strong scalability.

In the second part of the table a scalar problem, discretizedby first order finite elements (p = 1), is scaledfrom 16 processor cores to16 384 cores. We have one subdomain for each core and a fixed ratioH/h = 256.The problem size therefore grows from1 050 625 to 1 073 807 361 degrees of freedom, i.e., by a factor of103.The problem is highly heterogeneous with high coefficients jumps not aligned with the subdomain interface. Theiteration was stopped when a relative tolerance ofrtol = 1e − 7 was reached. Since the problem size growsproportionally to the number of processors, ideally, we would expect to see a constant time for the solution phaseas well as for the sum of the solution and assembly phase. In fact, scaling from16 to 16 384 cores, the time growsfrom 48s to 56s for the solution phase and from88s to 96s for the sum of the assembly and the solution phase.This results in a parallel efficiency of approximately92% or 86%, respectively. Not surprisingly, the efficiency forthe sum of assembly and solution phase is slightly better since the assembly phase exhibits perfect scalability.

We now have completely defined the FETI-1, FETI-DP, and hybrid FETI algorithms as preconditioned iterativemethods. In this context, we can see FETI-1 and FETI-DP as special instances of hybrid FETI. LetHΩ be thediameter ofΩ, h the element diameter,HS the subdomain diameter, andHC the diameter of the subdomainclusters. Then hybrid FETI withHS = HC , i.e., every subdomain cluster contains only one subdomain, is FETI-1with Dirichlet preconditioning, hybrid FETI withHC = HΩ, i.e., there is only one subdomain cluster, is FETI-DPwith Dirichlet preconditioning, whereas hybrid FETI withh = HS is FETI-1 with lumped preconditioning. In thelast case subdomains are single elements.

11

Page 12: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

5. Primal Constraints for FETI-DP in 3D. So far, we have not discussed how to choose the decomposi-tion of the local interface vectorsu(i)

Γ into primal variablesu(i)Π and remaining (dual) interface vectorsu(i)

∆ . One

immediate possibility is to choose selected vertices (or corners) ofΩi as nodes where the variablesu(i)Γ are as-

sembled; cf. Farhat et al. [12] or Mandel and Tezaur [45]. This yields good convergence bounds of the order of(1 + log(H/h))2 in two dimensions, see [45], but not in three dimensions, seeKlawonn, Widlund, and Dryja [39].Numerical evidence for this deterioration in three dimensions was first given in Farhat, Lesoinne, and Pierson [13];cf. also Klawonn, Rheinbach, and Widlund [30, Table 5, Figure 2].

To obtain scalable algorithms with condition number boundsof the order of(1 + log(H/h))2 in three di-mensions, Klawonn, Widlund, and Dryja [39] for scalar, second order elliptic equations, introduced averages overselected subdomain edges or faces to be continuous across the interface. For linear elasticity in three dimensions,Klawonn and Widlund [38] introduced averages and first ordermoments over selected edges as primal variables.For some very hard cases with large coefficient jumps, e.g., in the diffusion coefficient or the stiffness of the mate-rial, some vertices have to be selected as primal variables as well, in order to obtain a condition number bound ofthe order of(1 + log(H/h))2 which is robust with respect to the coefficient jumps. Numerical results with vertex

Laplace 2D SEM – FETI-DP (H/h = 1, p = 32)#Cores N D.o.f. It. Time Speedup Ideal Speedup Efficiency

Solve512 4 096 4 190 209 25 13.3s 1.0 1 100%

1 024 4 096 4 190 209 25 7.1s 1.9 2 95%2 048 4 096 4 190 209 25 4.3s 3.1 4 78%

Laplace 2D FEM – FETI-DP (H/h = 256, p = 1)#Cores N D.o.f. It. Time Speedup Ideal Speedup Efficiency

Assembly + Solve16 16 1 050 625 7 88s 1.0 1 100%64 64 4 198 401 14 89s 4.0 4 99%

256 256 16 785 409 20 91s 15.6 16 97%1 024 1 024 67 125 249 19 91s 61.9 64 97%4 096 4 096 268 468 225 18 91s 247.6 256 97%

16 384 16 384 1 073 807 361 17 96s 938.7 1 024 92%

Solve16 48s 1.0 1 100%64 49s 3.9 4 98%

256 see above 51s 15.1 16 94%1 024 51s 60.2 64 94%4 096 51s 240.9 256 94%

16 384 56s 877.7 1 024 86%

TABLE 4.1Scability results on the JUGENE BG/P from16 to16 384 cores (PPC 450) of the Julich Supercomputing Centers (JSC), see also Section 7.

Strong scalability (upper): More processors for a constantproblem size. Weak scalability (lower): More processors and an increasing problemsize. In all problems, the domainΩ is the unit square decomposed intoN smaller squares with sidelengthH as subdomains. Upper: FETI-DPfor a 2D diffusion problem with diffusion coefficientG = 1, cf. (2.4), discretized using spectral elements of polynomial degreep = 32, seealso [23], stopping criterionrtol = 1e− 10, MUMPS 4.7.3 for all factorizations. Lower: FETI-DP for a 2Ddiffusion problem, cf. (2.4), withdiffusion coefficientG. Every subdomain contains a quadratic inclusion of sidelength H/2 where the diffusion coefficientG is of the order of6 magnitudes higher than in the surrounding frame;G is constant on each part, the inclusion and the frame, respectively; see also [29]. Theproblem is discretized using linear finite elements, the stopping criterion isrtol = 1e − 7, UMFPACK 4.3 for all factorizations.

12

Page 13: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

and face average constraints for three dimensional elasticity problems were already presented in Farhat, Lesoinne,and Pierson [13]; see also the doctoral dissertation of Pierson [47]. There are two different possibilities to imple-ment the average and moment constraints over faces or edges.The first is the introduction of optional Lagrangemultipliers; see Farhat, Lesoinne, and Pierson [13], Pierson [47], or Klawonn and Widlund [38]. The second is toapply a transformation of basis,

K(i)

= T TK(i)T,

introducing the averages and, possibly, moments explicitly as new variables. Then, the dual interface variablesu

(i)∆ have zero average or first order moment on the selected edges or faces; see Klawonn and Widlund [38] and

Klawonn and Rheinbach [24]. The resulting algorithm satisfies a polylogarithmic condition number bound in 3D;see Fig. 5.2.

For a detailed algorithmic description of applying the transformation of basis, see [24] or [38]. The choiceof a good coarse problem, i.e., the selection of vertex, edge, and face constraints, is of vital importance to theconvergence and scalability of FETI-DP methods. For a more detailed description, see Section 6, and, e.g., [38]or [24]. For further details on the implementation and otheralgorithmic choices, see [39], [38], [24], or [13].

6. Theory. An important requirement for the parallel scalability of a domain decomposition algorithm is itsnumerical scalability. By numerical scalability, we understand that the condition number of the preconditionedsystem is only weakly dependent on the number of unknowns of each subdomain and independent of the numberof subdomains. If we denote a typical subdomain diameter byH and a typical finite element diameter byh, then wecan measure number of unknowns approximately by(H

h )d, whered is the space dimension. By a weak dependenceof the condition number of the preconditioned system on the number of unknowns of each subdomain we usuallyunderstand a polylogarithmic dependence onH

h . Good nonoverlapping domain decomposition methods, such asthe FETI methods considered here, usually satisfy a condition number bound of the type

κ(M−1F ) ≤ C (1 + log(H

h))2,

whereκ(M−1F ) = λmax(M−1F )/λmin(M−1F ) is the spectral condition number of the preconditioned system.If one uses the method of conjugate gradients as a Krylov space algorithm, this leads to a convergence rate of

O(1 + log(H

h)),

i.e., it only depends logarithmically on the size of the subproblems. In the theory presented here, we make theassumption that the values ofG(x), see (2.1), are only mildly varying within each subdomain but can have largejumps across the subdomain boundaries of neighboring subdomains. For simplicity, we assume in the theoreticalresults below, thatG(x) is constant on each subdomainΩi, denoting that value withGi but still allow for arbitraryjumps across the interface.

Furthermore, let us make the following standard geometric assumption; see also the second part of this section,where this assumption is siginificantly weakened.

ASSUMPTION 6.1 (Standard geometric assumption).Each subdomain is assumed to be the union of a uni-formly bounded number of coarse, shape regular triangles (in 2D) or tetrahedra (in 3D) which themselves aretriangulated by finite elements.

For the onelevel FETI method, we have the estimateTHEOREM 6.1. The condition number of the FETI method, with the Dirichlet preconditionerM, and with

Q = M−1 or Q = Qδ, satisfies

κ(PM−1PTF ) ≤ C (1 + log(H/h))2.

Here,C is independent ofh,H, and the values of theGi. A proof of this result for a scalar diffusion equation inthree dimensions can be found in Klawonn and Widlund [36]. Intwo dimensions, for a Dirichlet preconditioner

13

Page 14: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

with a different scaling, Mandel and Tezaur [44] have shown acondition number bound ofO(1 + log(Hh ))3. An

extension of the result in Theorem 6.1 to more general discontinuous coefficients can be found in Pechstein andScheichl; see [46]. The result in Theorem 6.1 for isotropic linear elasticity can be directly obtained from the scalarresult using appropriate Korn inequalities; see Klawonn and Widlund [34].

Whereas the onelevel FETI algorithm has the same condition number estimate in two and three dimensions,we will see that for FETI-DP methods different choices of theprimal variables can lead to different conditionnumber estimates in two and three dimensions. Under the Assumption 6.1, we have the following estimate for theFETI-DP method where exclusively vertex constraints are chosen as primal variables. For simplicity, we assumethat all vertices are chosen to be primal; see Klawonn, Widlund, and Dryja [39] and Klawonn and Widlund [38]for more general results where the concept of acceptable paths is introduced.

THEOREM 6.2. The condition number for FETI-DP with primal vertex constraints satisfies1. in 2 dimensions:κ(M−1F ) ≤ C (1 + log(H

h ))2.

2. in 3 dimensions:κ(M−1F ) ≤ C(

Hh

) (1 + log(H

h )))2.

Here,C is independent ofh,H, and the values of theGi. The result in two dimensions, for a scalar second-order elliptic equation without coefficient jumps across the interface, has been shown in Mandel and Tezaur [45];see also Klawonn, Rheinbach, and Widlund [32] for a more general case in two dimensions, scalar second-orderelliptic equations and linear elasticity, as well as problems with discontinuous coefficients. The estimate for thethree-dimensional case and scalar second-order elliptic equations with discontinuous coefficients can be found inKlawonn, Widlund, and Dryja [39]. Clearly, using only vertex constraints as primal variables, leads to a weakerbound for dual-primal FETI methods. This theoretical result can also be observed numerically; see Klawonn,Rheinbach, and Widlund [31]. Theoretical extensions for linear elasticity can be carried out along the lines of thetheory given in Klawonn and Widlund [38].

If edge averages (or first-order moments) are chosen as primal variables either additionally to or in certaincases also instead of the primal vertex constraints, a quadratic-logarithmic bound can be again guaranteed.

THEOREM 6.3. The condition number of the FETI-DP method with a sufficient number of edge averages andfirst-order edge moments, and possibly some primal vertex constraints, in three dimensions, satisfies

κ(M−1F ) ≤ C (1 + log(H/h))2.

Here,C is independent ofh,H, and the values of theGi. For a proof in the case of scalar second-order ellipticequations, see Klawonn, Widlund, and Dryja [39] and for the more involved case of the system of linear isotropicelasticity, see Klawonn and Widlund [38].

So far, as the condition number estimates were made under thestandard geometric assumption, cf. Assumption6.1. To extend these results to subdomains which are obtained from mesh partitioners, see Figure 6.1, the followingtwo definitons of a class of domains is used; see the references in Klawonn, Rheinbach, and Widlund [32] andDohrmann, Klawonn, and Widlund [9].

DEFINITION 6.4 (John domain).A domainΩ ⊂ IRn—an open, bounded, and connected set—is a Johndomain if there exists a constantCJ ≥ 1 and a distinguished central pointx0 ∈ Ω such that eachx ∈ Ω can bejoined to it by a rectifiable curveγ : [0, 1] → Ω withγ(0) = x0, γ(1) = x, and|x−γ(t)| ≤ CJ ·distance(γ(t), ∂Ω)for all t ∈ [0, 1].

DEFINITION 6.5 (Jones domain).A domainΩ ⊂ IRn is a Jones or uniform domain if there exists a constantCU such that any pair of pointsx1 ∈ Ω and x2 ∈ Ω can be joined by a rectifiable curveγ(t) : [0, 1] → Ωwith γ(0) = x1, γ(1) = x2, and where the Euclidean arc length ofγ ≤ CU |x1 − x2| andmini=1,2 |xi − γ(t)| ≤CU ·distance(γ(t), ∂Ω) for all t ∈ [0, 1]. John and Jones domains can have a very irregular boundary, anexamplefor such a domain is the interior of a von Koch curve; see Figure 6.2.

THEOREM 6.6. Let the domainΩ ⊂ IR2 be partitioned into subdomainsΩi, which are partitioned into shaperegular elements and which have complementsCΩi that are uniform in the sense of Definition 6.5. Then, withM the Dirichlet preconditioner,F the FETI–DP operator,Hi the diameter of the subdomainΩi, and withhi thesmallest diameter of any element inΩi, the condition number of the conjugate gradient method satisfies

κ(M−1F ) ≤ C maxi

(1 + log(Hi/hi))2.

14

Page 15: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

FETI-DP with an Irregular Domain Decomposition

ρ-scaling stiffness-scalingH/h D.o.f. |Γ| Cond. It. Cond. It.32 33 282 4 392 41.69 69 65.72 6164 132 098 9 000 45.06 70 139.13 77

128 526 338 18 216 48.87 71 299.05 103256 2 101 250 36 648 53.07 72 644.49 138512 8 396 802 73 512 57.67 74 1 385.60 195

TABLE 6.1Linear elasticity problem in 2D withN = 4 × 4 subdomains and a structured, yet highly ragged interface, see [32]; |Γ| is the number

of d.o.f. on the interface. When usingρ-scaling the condition number remains polylogarithmic even for a growingH/h; see Figure 6.3. Forstiffness scaling the condition number grows superlinearly in H/h.

HereC is a constant which depends only on the parametersCJ(Ωi) andCU (CΩi) of Definitions6.4and6.4, thePoincare parametersγ(Ωi, 2) of the subdomains, the maximum number of edges of any subdomain, and the shaperegularity of the finite elements.

The theoretical estimate in Theorem 6.6 relies on the use of the ρ-scaling, see (4.1) in Section 4. As analternative often an approximation, the so called stiffness scaling, is used: approximateGj(x) by the diagonal ofthe local stiffness matrices, i.e.,

Gj(x) ≈∑

T :x∈T ;T∈Ωj

aT (Φx,Φx).

As opposed to theρ-scaling the stiffness scaling can be computed from the assembled stiffness matrix and oftengives good results in applications. But in the case of a very ragged interface, combined with high coefficient jumpsacross the interface, the performance of stiffness scalingmay deteriorate dramatically; see Table 6.1.

7. Extending the Scalability for Massively Parallel Systems: From Exact to Inexact FETI-DP Methods.For the FETI-1 method as well as for the exact FETI-DP method the coarse problem gives a condition number anditeration count which is independent of the number of subdomains. Both methods are thus scalable with respectto the problem size. For the FETI-1 method the projectionPT gives the coarse problem whereas for FETI-DP thecoupling in the primal variablesuΠ represents the coarse problem. The expensive part of the coarse problem is thecomputation of(GTG)−1 for FETI-1. For FETI-DP it is the computation ofS−1

ΠΠ. In both methods the inversesare computed “exactly” by a Cholesky orLU decomposition and for both methods the size of the inverses isO(N)whereN is the number of subdomains. For many subdomains, i.e., whenusing today’s supercomputers with104−106 processor cores the coarse problem becomes a bottleneck. This represents the fact that FETI-1 and FETI-DP methods are essentially 2-level algorithms. The hybrid FETI method may be viewed as a 3-level algorithm.It is scalable with respect to the number of subclusters, it shows (quasi-optimal) scalability with respect to thenumber of subclusters, and it has a linear condition number estimate with respect to the number of subdomains ineach subcluster. It remains to be tested if hybrid FETI methods are competitive on massively parallel machines.

The original or standard, exact FETI-DP method is the methodof conjugate gradients applied to the symmetricpositive definite system

Fλ = d

with the preconditionersM−1 orM−1L ; see Section 3.2. Here, the term “exact” refers to the exact solution of the

coarse problem given bySΠΠ and the exact solution of the local Neumann subdomain problemsK(i)BB. When the

(exact) Dirichlet preconditioner is used, we of course alsosolve the local Dirichlet problemsK(i)II exactly.

Inexact extensions of FETI-1 methods are not straightforward since the projectionPT has to be formed ex-actly; see though Klawonn and Widlund [34] and Klawonn [21] for inexact solvers of the Neumann problems whichallow for larger subdomains leading to a smaller number of subdomains and thus to a smaller coarse problem.

15

Page 16: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

For FETI-DP we have a different situation since the coarse problem is built into the system matrix. Thus, aninexact solution in the elimination process of the primal variables would lead to a different system to be solved andthus to a perturbed solution, different to that of the original problem.

In order to build inexact or multilevel extensions of FETI-DP methods we have to reformulate the method. Inthis section, we are going to present inexact FETI-DP methods where the saddle point problems (3.7) and (3.9)are solved iteratively, using block-triangular preconditioners and a suitable Krylov space method. These methodswere first introduced in [25] for finite elements and then applied to spectral elements in [23]. In this paper we willshow that using the framework of inexact FETI-DP methods we obtain a solver which is scalable to more than104

processors cores.For the saddle point problems (3.7) and (3.9), we introduce the block-triangular preconditionersBL andBr,L,

respectively, as

B−1L =

[K−1 0

M−1BK−1 −M−1

], B−1

r,L =

[S−1

ΠΠ 0

−M−1BBK−1BBK

TΠBS

−1ΠΠ −M−1

],

whereK−1 andS−1ΠΠ are assumed to be spectrally equivalent preconditioners for K andSΠΠ, respectively, with

bounds independent of the discretization parametersh andH . The matrix blockM−1 is assumed to be a goodpreconditioner for the FETI-DP system matrixF and can be chosen as the Dirichlet or the lumped preconditionersM−1

D andM−1L , respectively. We will denote the corresponding right preconditioners by the subscriptR, i.e., we

haveBR = BTL andBr,R = BT

r,L.

A simple choice for the preconditionerK−1 may be defined using the following exact factorization ofK−1,i.e.,

[KBB KT

ΠB

KΠB KΠΠ

]−1

=

[I −K−1

BBKTΠB

0 I

] [K−1

BB 0

0 S−1ΠΠ

] [I 0

−KΠBK−1BB I

]. (7.1)

In this caseKΠB := KΠBK−1BB is built explicitly in a preprocessing step, since we need itto form SΠΠ . To obtain

a preconditionerK−1, we now replaceS−1ΠΠ by a good preconditionerS−1

ΠΠ. This yields the preconditioner

K−1 =

[I −KT

ΠB

0 I

] [K−1

BB 0

0 S−1ΠΠ

] [I 0

−KΠB I

]. (7.2)

This preconditioner is closely related to the preconditioner Br,L for the reduced system. Such factorizations werealso the basis for iterative substructuring methods with inexact Dirichlet solvers; see, e.g., Smith, Bjørstad, andGropp [50, Chapter 4.4] or Toselli and Widlund [52, Chapter 4.3] and the references given therein.

Instead of using the factorization (7.2), a preconditionercan, of course, also directly be applied toK; see [23,Table 5]. This results in an algorithm where the subdomain solves as well as the coarse grid solve are inexact.

The inexact FETI-DP methods thus are given by using a Krylov space method for nonsymmetric systems, e.g.,GMRES, to solve the preconditioned systems

B−1L Ax = B−1

L F

and

B−1r,LArxr = B−1

r,LFr ,

respectively.Let us note that we can also use a positive definite reformulation of the two preconditioned systems, which

allows the use of conjugate gradients. For this reformulation, a special inner product and a scaling of the precon-ditionersK andSΠΠ have to be used; for details, see Klawonn and Rheinbach [25].In [23] a variant was testedwhere also the Dirichlet problems in the Dirichlet preconditionerM−1, see Section 4, are computed inexactly.

16

Page 17: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

For parallel scalability of the irFETI-DP method, see Table7.2. The computations were performed on theJUGENE supercomputer of the Julich Supercomputing Centerat Forschungszentrum Julich. Here, a 2D linearelasticity problem is scaled from16 to 16 384 processor cores and subdomains, whereH/h = 160, i.e., we have77 763 degrees of freedom in each subdomain. Accordingly, the problem size grows by a factor of103 from821 762 degrees of freedom to838 942 722 degrees of freedom. The results show a very good scalabilityup tofour racks (16 384 cores). The standard FETI-DP method, cf. Table 7.1 scales well up to one rack (4 096 cores)and then fails to factor the coarse problem on four racks. TheirFETI-DP implementation uses one iteration of thehighly scalable BoomerAMG preconditioner [19] of the Lawrence Livermore National Laboratory to preconditionthe coarse problem; as a smoother symmetric Gauß-Seidel-/block-Jacobi is applied. As the coarse problem size issmall compared to the total number of cores, BoomerAMG only runs on a subset of the cores. All local subdomainproblems are solved by exact factorizations using the UMFPACK 4.3 multifrontal solver. PPC 450 optimizedlibraries were used whenever possible. The parallel applications are implemented using PETSc [1] of the ArgonneNational Laboratory and MPI.

For more details on inexact FETI-DP methods and for numerical scalability results of inexact FETI-DP meth-ods in 3D for up to27 000 subdomains, see [25]. For inexact FETI methods, see [21,34], for inexact BETI methods,see [40]. For the coupling of FETI and BETI methods, see, e.g., [41]. For a different hybrid approach, see [18].For FETI methods for the incompressible Stokes equation, see [42]. For a primal counterpart of inexact FETI-DPmethod, see [53,54], and for other related inexact domain decomposition methods, see [8] and [43].

Elasticity 2D – FETI-DP (H/h = 160)#Cores N D.o.f. It. Time Speedup Ideal Speedup Efficiency

Assembly + Solve16 16 821 762 14 35s 1.0 1 100%64 64 3 281 922 21 38s 3.7 4 92%

256 256 13 174 442 22 38s 14.7 16 92%1 024 1 024 52 449 282 23 39s 57.4 64 90%4 096 4 096 209 756 162 24 41s 218.5 256 85%

16 384 16 384 838 942 722∞ ∞ 0.0 1 024 0%

Solve16 21s 1.0 1 100%64 24s 3.5 4 88%

256 see above 24s 14.0 16 88%1 024 25s 53.8 64 84%4 096 26s 206.8 256 81%

16 384 ∞ 0.0 1 024 0%

TABLE 7.1FETI-DP on JUGENE for 2D linear elasticity without coefficient jumps on[0, 1]2, discretized with linear finite elements. Stopping

criterion rtol = 1e − 6. Upper: Assembly + Solver Time. Lower: Solver Time. The solution phase fails on 16 384 cores.

8. A Biomechanical Problem. In this section, we consider a FETI-DP algorithm for the solution of a non-linear elasticity problem from biomechanics. More precisely, we solve a hyperelastic, anisotropic, and almostincompressible material model of an arterial wall and compute the arterial wall stresses using a Newton-Krylov-FETI-DP approach. Here, the nonlinear elasticity problem is linearized using a Newton method combined with ahomotopy method, i.e., a load stepping strategy where the load is enforced incrementally in small steps and eachresulting nonlinear problem is solved with a Newton algorithm. In each Newton step, the linearized problem issolved using a FETI-DP algorithm and as a Krylov space method, GMRES is chosen. The results are part of a col-laborative effort with Jorg Schroder and Dominik Brands,Institut fur Mechanik A, Universitat Duisburg-Essen and

17

Page 18: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

Raimund Erbel and Dirk Bose, Westdeutsches Herzzentrum, Klinik fur Kardiologie, Universitat Duisburg-Essen.Experiments for different material models and a mock up artery have been presented in Brands et al. [6]. Here,

we present numerical results from the simulation of an arterial wall using real geometric data and also for a larger

Elasticity 2D – irFETI-DP/GMRES (H/h = 160)#Cores N D.o.f. It. Time Speedup Ideal Speedup Efficiency

Assembly + Solve16 16 821 762 17 37s 1.0 1 100%64 64 3 281 922 19 38s 3.9 4 97%

256 256 13 174 442 19 38s 15.6 16 97%1 024 1 024 52 449 282 18 38s 62.3 64 97%4 096 4 096 209 756 162 18 38s 249.3 256 97%

16 384 16 384 838 942 722 17 39s 971.5 1 024 95%

Solve16 22s 1.0 1 100%64 23s 3.8 4 96%

256 see above 24s 14.7 16 92%1 024 24s 58.7 64 92%4 096 23s 244.9 256 96%

16 384 24s 938.7 1 024 92%

TABLE 7.2irFETI-DP/GMRES on JUGENE for 2D linear elasticity withoutcoefficient jumps on[0, 1]2, discretized with linear finite elements.

Stopping criterionrtol = 1e − 6. Upper: Assembly + Solver Time. Lower: Solver Time. One iteration of BoomerAMG for the coarse grid(symmetric-Gauß-Seidel/Jacobi).

Elasticity 2D – irFETI-DP/GMRES (H/h = 144)

#Cores N D.o.f. It. Time Speedup Ideal Speedup EfficiencyAssembly + Solve

64 64 2.7M 18 29.3s 1.0 1 100%256 256 10.6M 19 30.3s 3.9 4 97%

1 024 1 024 42.5M 18 30.1s 15.6 16 97%4 096 4 096 169.9M 18 30.1s 62.3 64 97%

16 384 16 384 680.0M 16 30.6s 245.1 256 96%65 536 65 536 2 718.1M 16 35.7s 840.4 1 024 82%

Solve64 17.7s 1.0 1 100%

256 18.4s 3.8 4 96%1 024 see above 18.3s 15.5 16 97%4 096 18.2s 62.2 64 97%

16 384 18.7s 242.3 256 95%65 536 24.0s 755.2 1 024 74%

TABLE 7.3irFETI-DP/GMRES on JUGENE for 2D lineare elasticity without coefficient jumps on[0, 1]2, discretized with linear finite elements.

Stopping criterionrtol = 1e − 6. Upper: Assembly + Solver Time. Lower: Solver Time. One BoomerAMG-iteration for the coarse grid(symmetric-SOR/Jacobi).

18

Page 19: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

model than in [6]. The arterial wall is modeled as a fiber reinforced material with two sets of collagene fiberswhich are crosswire helically wound around the axis that is orthogonal to each crosssection of the artery. Sincethese fibers become exponentially stiffer with an increasing load, we can expect large anisotropy ratios.

8.1. The Mechanical Model and its Discretization.The deformationϕ : Ω → IR3 of a bodyΩ ⊂ IR3

under a given volume forcef , a prescribed deformation on∂ΩD, and a given tractiong on∂ΩN is given by

−Div T(x,∇ϕ(x)) = f(x,ϕ(x)) x ∈ Ωϕ(x) = ϕ0(x) x ∈ ∂ΩD ⊂ ∂Ω

T(x,∇ϕ(x)) · n = g(x,ϕ(x)) x ∈ ∂ΩN := ∂Ω \ ∂ΩD.

Here,T : Ω × IM3+ −→ IM3 is the response function of the 1. Piola-Kirchhoff stress tensor. We consider a hyper-

elastic material, i.e., there exists an energy functionalΨ : Ω× IM3+ −→ IR, sucht thatT(x,F) = ∂Ψ

∂F(x,F) ∀x ∈

Ω ∀F ∈ IM3+. Here,IM3

+ is the set of the real3 × 3 matrices with positive determinant. Following Balzani etal. [2], see also Brands et al. [6], we use the following modelfor ψ,

Ψ = Ψiso +

2∑

k=1

Ψtik (I1, J

(k)4 , J

(k)5 ). (8.1)

Here, the isotropic part is

Ψiso = c1

(I1I

−1/33 − 3

)+ ε1

(Iε2

3 + I−ε2

3 − 2), (8.2)

with c1, ε1 > 0, ε2 > 1 and the invariantsI1 = tr(C), I2 = tr(Cof(C)), I3 = det(C). Here,Cof(C) =(det(C))C−T is the cofactor ofC andtr(C) is the trace ofC. The transversely isotropic function for the fibers(α1 > 0, α2 > 2) is defined as

Ψtik = α1

⟨(I1J

(k)4 − J

(k)5 ) − 2

⟩α2

, (8.3)

with the invariantsJ4 = tr(CM) andJ5 = tr(C2M). For values of the parameters used in the present article, seeBrands et al. [6, Table 1, Set 2]. Here,〈b〉 denotes the Macauley bracket defined by〈b〉 := (|b| + b)/2 for b ∈ IR.Furthermore,C := FT F is the right Cauchy-Green-Tensor withF := ∇ϕ andM(a) := A(a) ⊗ A(a) are thestructural tensors, whereA(1) andA(2) are approximations of the collagen fibre bundle orientations in the artery.The arterial wall is modeled as a quasi-incompressible material. In order to avoid volumetric locking effects in thefinite element solution, we apply a three-field formulation,also known as theF -approach; see Simo [49, Section45] and also below. For the discretization of the deformation fieldϕ, 10-noded tetrahedral elements are used. For acomparison of different hyperelastic material models applied to the modeling and simulation of arterial walls, seeBrands et al. [6]. We will now briefly describe the three-fieldformulation and theF -approach; for more details,see, e.g., Simo [49, Section 45]. LetJ = J(ϕ) = det(F), then we have

F = J1/3F , F = J−1/3F .

Next, we introduce a new scalar variableθ, such thatθ = J is satisfied in a weak sense and define

F := θ1/3F , C := F T F

with F = F (ϕ, θ), C = C(ϕ, θ). Then, we consider the following three-field Lagrangian

L(ϕ, θ, π) =

Ω

W (C(ϕ, θ)) + π(J(ϕ) − θ)dx − Vext(ϕ),

whereVext(ϕ) is the potential energy of external forces; for more details, see Simo [49, Section 45]. We use aP2 − P0 − P0 mixed finite element discretization, i.e., piecewise quadratic elements for the deformation fieldϕand piecewise constant elements for the scalar fieldsθ andπ. Element by element static condensation ofθ andπleads to a reduced problem that we are solving using a Newton-Krylov-FETI-DP approach.

19

Page 20: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

8.2. An Anisotropic Model Problem. As described above the arterial wall is modeled as a fiber reinforcedmaterial. We can expect large anisotropies from the exponential stiffening of the fibers when increasing loads areapplied. It is well known that large anisotropies can have a negative effect on the convergence of iterative solverssince the anisotropies influence the distibution of the eigenvalues of the stiffness matrix. In domain decompositionmethods, for scalar problems, anisotropies can often be treated by an anisotropic decomposition of the compu-tational domain. To computationally analyze these effects, let us now discuss the effect of anisotropies on theconvergence of a FETI-DP domain decomposition method for a linear elasticity problem; see also Klawonn andRheinbach [27]. In the FETI-DP domain decomposition methodwe may consider three remedies to anisotropies:Adaption of the coarse problem, modification of the preconditioner and adaption of the domain decomposition.Experiments for a linear problem in 2D and for a single direction of the anisotropy show that enlarging the coarseproblem, i.e., the number of primal constraints, can improve the convergence. In principle, when using a very largecoarse space, independence of the anisotropy can be achieved. Unfortunately, these coarse spaces are too large tobe efficient for 3D problems within the context of a standard FETI-DP algorithm [24]. An inexact FETI-DP [25]variant could be considered instead but then a good preconditioner for the new coarse problem, which then isanisotropic itself, has to be constructed.

For the standard exact FETI-DP algorithm, we thus investigate the adaption of the domain decomposition tothe anisotropic problem. In geometric multigrid methods semi-coarsening and line relaxation have classically beenused to deal with anisotropies, and in algebraic multigrid methods coarsening in the direction of strong couplingis often successful [51]. In all of these cases it is the intention to accelerate the transport of information in thedirection in which the problem is strongly coupled. An adaption of the domain decomposition may follow thesame principle. Here, the subdomains are enlarged in the direction of the strong couplings.

For our model problem, we leave the setting of anisotropic hyperelasticity of arterical walls and restrict our-selves to the problem of linear elasticity on the unit cube

a(u, v) = f(v),

with

a(u, v) =

Ω

3∑

i,j=1

σij(u)εij(v)dx =

Ω

σ(u) : ε(v)dx, (8.4)

f(v) =

Ω

3∑

i=1

fividx+

∂ΩN

3∑

i=1

givi ds =

Ω

〈f, v〉 dx+

∂ΩN

〈g, v〉 ds,

in the space

V = H10(Ω, ∂ΩD) = (H1

0 (Ω, ∂ΩD))3 with H10 (Ω, ∂ΩD) := v ∈ H1(Ω) : v = 0 on∂ΩD,

where∂ΩD has a positive measure and∂ΩD ⊂ ∂Ω. We set zero Dirichlet boundary conditions on the upper faceof the cube and apply a constant volume force. As primal constraints in our FETI-DP algorithm, we only use edgeaverages which are enforced to be continuous on all edges andfor all componentsui, i = 1, 2, 3. This algorithmis also denoted as AlgorithmDE ; see [24]. We consider a homogeneous material and choose arbitrary materialparameters, i.e.,E = 210 undν = 0.3 with two embedded anisotropies, see Fig. 8.1, with a stiffness which isincreased by factor of102. We can think of the cube as a model for a small part of the arterial wall.

In the anisotropic case, for the stress-strain relation, wehave

σ = Cε

with

C = Ciso + Caniso

(1

)+ Caniso

(3

),

20

Page 21: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

and

Caniso(α) = 102 × E

cos4(α) cos2(α) sin2(α) 0 cos3(α) sin(α) 0 0cos2(α) sin2(α) sin4(α) 0 cos(α) sin3(α) 0 0

0 0 0 0 0 0cos3(α) sin(α) cos(α) sin3(α) 0 cos2(α) sin2(α) 0 0

0 0 0 0 0 00 0 0 0 0 0

.

The isotropic part is standard isotropic linear elasticitywe haveλ = Eν(1+ν)(1−2ν) , µ = E

2(1+ν) and

Ciso =

λ+ 2µ λ λ 0 0 0λ λ+ 2µ λ 0 0 0λ λ λ+ 2µ 0 0 00 0 0 µ 0 00 0 0 0 µ 00 0 0 0 0 µ

.

We discretize the problem using linear tetrahedra and decompose it either regularly or using the ParMetis [20]graph partitioner. As a stopping criterion for the conjugate gradient iteration we use a residual reduction by10magnitudes. In Fig. 8.1, 8.2, and 8.3 we see the results of thecomputations. It is evident that for structured as wellas for unstructured decompositions the adaption of the domain decomposition to the directions of the anisotropyresults in a faster convergence of the FETI-DP domain decomposition method. These results give rise to the hopethat similar strategies may be successful also in the setting of finite elasticity.

8.3. Newton-Krylov-FETI-DP. For a computational simulation of a balloon angioplasty we use the almostincompressible, nonlinear material model described in Section 8. The anisotropic parts of the energy function,see (8.3), which are modeling the two collagen fibers, show anexponential stiffening behavior. The problem willtherefore become increasingly anisotropic during the simulation. For more details on the material models andalternative choices, see [6], where the model used here was denoted as ModelψA.

The arterial geometry is obtained from the fusion of ultrasound images and X-ray data, see Figure 8.4. Formore details on the reconstruction, see von Birgelen, Bose, and Erbel [5] and Bose et al. [4]. The geometry isdiscretized with quadratic tetrahedral (P2) elements for the deformation combined with anF approach to avoidpossible locking from the incompressibility constraint, see Section 8.1 and Simo [49]. The resulting finite elementmodel has about 1.3 million degrees of freedom for the deformation and is partitioned into224 subdomains usingParMetis [20]. A decomposition into 720 subdomains for a larger model is shown in Fig. 8.5. The problemis then solved using a Newton-Krylov-FETI-DP approach; seeFig. 8.6. The system is linearized and a loadstepping strategy is used to enforce quadratic convergenceof the Newton method in each step. Each occuringlinear system is then solved using a FETI-DP method with GMRES as Krylov accelerator. A selective reuse ofKrylov subspaces in the Newton iteration can improve the convergence of nonoverlapping domain decompositionmethods; see [17]. For nonoverlapping domain decomposition as FETI methods such a strategy is feasible sincethe Lagrange multiplier space is small compared to the dimension of the overall problem. For our simulations wehave chosen not to reuse Krylov subspaces of previous Newtonsteps. As we can see from Fig. 8.6 between3 and4 Newton steps are needed in each load step and between139 and153 FETI-DP iterations are necessary in eachNewton step. A total number of approximately1200 linear systems have to be solved during the simulation. InFig. 8.7 a resulting stress distribution is depicted for a short arterial segment.

Acknowledgments. The authors would like to thank the Julich Supercomputing Center (JSC) of the For-schungszentrum Julich, Germany and also gratefully acknowledge the support by the Deutsche Forschungsge-meinschaft under grant DFG-KL 2094/1.

REFERENCES

21

Page 22: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

[1] Satish Balay, Kris Buschelman, William D. Gropp, DineshKaushik, Matt Knepley, Lois Curfman McInnes, Barry F. Smith, and HongZhang. PETSc users manual. Technical Report ANL-95/11 - Revision 2.2.3, Argonne National Laboratory, 2007.

[2] D. Balzani, P. Neff, J. Schroder, and G.A. Holzapfel. A polyconvex framework for soft biological tissues. Adjustment to experimentaldata.Int. J. Solids Struct., 43(20):6052–6070, 2006.

[3] Manoj Bhardwaj, David Day, Charbel Farhat, Michel Lesoinne, Kendall Pierson, and Daniel Rixen. Application of the FETI method toASCI problems - scalability results on one thousand processors and discussion of highly heterogeneous problems.Int. J. Numer.Meth. Engrg., 47:513–535, 2000.

[4] Dirk Bose, Dominik Brands, Raimund Erbel, Axel Klawonn, Oliver Rheinbach, and Jorg Schroder. Construction of finite elementmodels from IVUS ultrasound and biplane angiography. 2009.In preparation.

[5] Dirk Bose, Clemens von Birgelen, and Raimund Erbel. Intravascular ultrasound for the evaluation of therapies targeting coronaryatherosclerosis.Journal of the American College of Cardiology, 49(9):925–932, 2007.

[6] Dominik Brands, Axel Klawonn, Oliver Rheinbach, and Jorg Schroder. Modelling and convergence in arterial wall simulations using aparallel FETI solution strategy.Comput. Meth. Biomech. Biomed. Eng., Vol. 11(No. 5):pp. 569–583, October 2008.

[7] Philippe G. Ciarlet.Mathematical Elasticity Volume I: Three–Dimensional Elasticity. North-Holland, 1988.[8] Clark R. Dohrmann. An approximate BDDC preconditioner.Numer. Linear Algebra Appl., 14(2):149–168, 2007.[9] Clark R. Dohrmann, Axel Klawonn, and Olof B. Widlund. Domain decomposition for less regular subdomains: overlapping Schwarz in

two dimensions.SIAM J. Numer. Anal., 46(4):2153–2168, 2008.[10] C. Farhat and F. X. Roux. An unconventional domain decomposition method for an efficient parallel solution of large-scale finite element

systems.SIAM J. Sc. Stat. Comput., 13:379–396, 1992.[11] Charbel Farhat. A Lagrange multiplier based on divide and conquer finite element algorithm.J. Comput. System Engrg, 2:149–156,

1991.[12] Charbel Farhat, Michel Lesoinne, Patrick LeTallec, Kendall Pierson, and Daniel Rixen. FETI-DP: A dual-primal unified FETI method -

part i: A faster alternative to the two-level FETI method.Internat. J. Numer. Methods Engrg., 50:1523–1544, 2001.[13] Charbel Farhat, Michel Lesoinne, and Kendall Pierson.A scalable dual-primal domain decomposition method.Numer. Lin. Alg. Appl.,

7:687–714, 2000.[14] Charbel Farhat, Jan Mandel, and Francois-Xavier Roux.Optimal convergence properties of the FETI domain decomposition method.

Comput. Methods Appl. Mech. Engrg., 115:367–388, 1994.[15] Charbel Farhat and Francois-Xavier Roux. Implicit parallel processing in structural mechanics. In J. Tinsley Oden, editor,Computational

Mechanics Advances, volume 2 (1), pages 1–124. North-Holland, 1994.[16] Charbel Farhat and Francois-Xavier Roux. A method of Finite Element Tearing and Interconnecting and its parallel solution algorithm.

Int. J. Numer. Meth. Engrg., 32:1205–1227, 1991.[17] Pierre Gosselet and Christian Rey. On a selective reuseof Krylov subspaces in Newton-Krylov approaches for nonlinear elasticity. In

Domain decomposition methods in science and engineering, pages 419–426 (electronic). Natl. Auton. Univ. Mex., Mexico, 2003.[18] Pierre Gosselet and Christian Rey. Non-overlapping domain decomposition methods in structural mechanics.Arch. Comput. Methods

Engrg., 13(4):515–572, 2006.[19] Van E. Henson and Ulrike M. Yang. Boomeramg: A parallel algebraic multigrid solver and preconditioner.Appl. Numer. Math.,

41:155–177, 2002.[20] George Karypis, Kirk Schloegel, and Vipin Kumar. ParMetis - parallel graph partitioning and sparse matrix ordering, version 3.1.

Technical report, University of Minnesota, Department of Computer Science and Engineering, August 2003.[21] Axel Klawonn. An iterative substructuring method withLagrange multipliers for elasticity problems using approximate Neumann

subdomain solvers. In A.-M. Sandig, W. Schiehlen, and W.L.Wendland, editors,Multifield Problems: State of the Art, pages193–200. Springer-Verlag, Berlin, 2000.

[22] Axel Klawonn. FETI domain decomposition methods for second order elliptic partial differential equations.GAMM-Mitt., 29(2):319–341, 2006.

[23] Axel Klawonn, Luca F. Pavarino, and Oliver Rheinbach. Spectral element FETI-DP and BDDC preconditioners with multi-elementsubdomains.Comput. Meth. Appl. Mech. Engrg., 198(3-4):511–523, 2008.

[24] Axel Klawonn and Oliver Rheinbach. A parallel implementation of Dual-Primal FETI methods for three dimensional linear elasticityusing a transformation of basis.SIAM J. Sci. Comput., 28:1886–1906, 2006.

[25] Axel Klawonn and Oliver Rheinbach. Inexact FETI-DP methods.Internat. J. Numer. Methods Engrg., 69(2):284–307, 2007.[26] Axel Klawonn and Oliver Rheinbach. Robust FETI-DP methods for heterogeneous three dimensional elasticity problems. Comput.

Methods Appl. Mech. Engrg., 196(8):1400–1414, 2007.[27] Axel Klawonn and Oliver Rheinbach. FETI-DP for anisotropic problems. PAMM, pages 10189–10190, 2008. DOI:

10.1002/pamm.200810189, Special issue: 79th annual meeting of the International Association of Applied Mathematicsand Me-chanics (GAMM).

[28] Axel Klawonn and Oliver Rheinbach. A hybrid approach to3-level FETI. PAMM, pages 10841–10843, 2008. DOI:10.1002/pamm.200810841, Special issue: 79th annual meeting of the International Association of Applied Mathematicsand Me-chanics (GAMM).

[29] Axel Klawonn, Oliver Rheinbach, and Olof Widlund. Parallel FETI-DP for problems with generalized coefficient jumps. 2009. Inpreparation.

[30] Axel Klawonn, Oliver Rheinbach, and Olof B. Widlund. Some computational results for Dual-Primal FETI methods for three dimen-sional elliptic problems. In Ralf Kornhuber, Ronald H.W. Hoppe, Jacques Periaux, Olivier Pironneau, Olof B. Widlund,and JinchaoXu, editors,Domain Decomposition Methods in Science and Engineering. Springer-Verlag, Lecture Notes in Computational Sci-

22

Page 23: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

ence and Engineering, 2004. Proceedings of the 15th International Conference on Domain Decomposition Methods, Berlin, July21-25, 2003.

[31] Axel Klawonn, Oliver Rheinbach, and Olof B. Widlund. Some computational results for dual-primal FETI methods for elliptic problemsin 3D. In Ralf Kornhuber, Ronald H. W. Hoppe, Jacques Periaux, Olivier Pironneau, Olof B. Widlund, and Jinchao Xu, editors,Proceedings of the 15th international domain decomposition conference, pages 361–368, Berlin, 2005. Springer LNCSE. Lect.Notes Comput. Sci. Eng.

[32] Axel Klawonn, Oliver Rheinbach, and Olof B. Widlund. Ananalysis of a FETI–DP algorithm on irregular subdomains in the plane.SIAM Journal on Numerical Analysis, 46(5):2484–2504, 2008.

[33] Axel Klawonn and Olof B. Widlund. A domain decomposition method with Lagrange multipliers for second order elasticity. In C.-H.Lai, P.E. Bjørstad, M. Cross, and O. Widlund, editors,Proceedings of the 11th International Conference on DomainDecompositionMethods, pages 49–56. DDM.org, 1999.

[34] Axel Klawonn and Olof B. Widlund. A domain decomposition method with Lagrange multipliers and inexact solvers for linear elasticity.SIAM J. Sci. Comput., 22(4):1199–1219, 2000.

[35] Axel Klawonn and Olof B. Widlund. Dual and dual-primal FETI methods for elliptic problems with discontinuous coefficients in threedimensions. InDomain Decomposition Methods, Proceedings of the 12th International Conference on Domain DecompositionMethods, Chiba, Japan, October 1999. DDM.org, 2001.

[36] Axel Klawonn and Olof B. Widlund. FETI and Neumann-Neumann iterative substructuring methods:connections and newresults.Communications on Pure and Applied Mathematics, LIV:57–90, 2001.

[37] Axel Klawonn and Olof B. Widlund. Selecting constraints in Dual-Primal FETI methods for elasticity in three dimensions. In R. Ko-rnhuber, R.H.W. Hoppe, D.E. Keyes, J. Periaux, O. Pironneau, O. Widlund, and J. Xu, editors,Domain Decomposition Methodsin Science and Engineering, pages 67–81. Springer-Verlag, Lecture Notes in Computational Science and Engineering, 2005. Pro-ceedings of the 15th International Conference on Domain Decomposition Methods, Berlin, July 21-25, 2003.

[38] Axel Klawonn and Olof B. Widlund. Dual-Primal FETI Methods for Linear Elasticity.Comm. Pure Appl. Math., 59:1523–1572, 2006.[39] Axel Klawonn, Olof B. Widlund, and Maksymilian Dryja. Dual-Primal FETI methods for three-dimensional elliptic problems with

heterogeneous coefficients.SIAM J.Numer.Anal., 40, 159-179 2002.[40] U. Langer, G. Of, O. Steinbach, and W. Zulehner. Inexactdata-sparse boundary element tearing and interconnectingmethods.SIAM J.

Sci. Comput., 29(1):290–314 (electronic), 2007.[41] Ulrich Langer and Clemens Pechstein. Coupled finite andboundary element tearing and interconnecting solvers for nonlinear potential

problems.ZAMM Z. Angew. Math. Mech., 86(12):915–931, 2006.[42] Jing Li. A dual-primal FETI method for incompressible Stokes equations.Numer. Math., 102(2):257–275, 2005.[43] Jing Li and Olof B. Widlund. On the use of inexact subdomain solvers for BDDC algorithms.Comput. Meth. Appl. Mech. Engrg.,

196:1415–1428, 2007.[44] Jan Mandel and Radek Tezaur. Convergence of a Substructuring Method with Lagrange Multipliers.Numer. Math., 73:473–487, 1996.[45] Jan Mandel and Radek Tezaur. On the convergence of a dual-primal substructuring method.Numer. Math., 88:543–558, 2001.[46] Clemens Pechstein and Robert Scheichl. Analysis of FETI methods for multiscale PDEs.Numer. Math., 111(2):293–333, 2008.[47] Kendall H. Pierson.A family of domain decomposition methods for the massively parallel solution of computational mechanics problems.

PhD thesis, University of Colorado at Boulder, Aerospace Engineering, 2000.[48] Alfio Quarteroni and Alberto Valli.Domain Decomposition Methods for Partial Differential Equations. Oxford Science Publications,

1999.[49] J.C. Simo. Numerical analysis and simulation of plasticity. In P.G. Ciarlet and J.L. Lions, editors,Handbook of numerical analysis,

number 6. Elsevier Science, 1998.[50] Barry F. Smith, Petter E. Bjørstad, and William Gropp.Domain Decomposition: Parallel Multilevel Methods for Elliptic Partial

Differential Equations. Cambridge University Press, 1996.[51] Klaus Stuben. An Introduction to Algebraic Multigrid. In Multigrid, pages 413–532. Academic Press, London, San Diego, 2001. Also

available as GMD Report 70, November 1999.[52] Andrea Toselli and Olof B. Widlund.Domain Decomposition Methods - Algorithms and Theory, volume 34 ofSpringer Series in

Computational Mathematics. Springer-Verlag, Berlin Heidelberg New York, 2005.[53] Xuemin Tu. Three-level BDDC in three dimensions.SIAM J. Sci. Comput., 29(4):1759–1780 (electronic), 2007.[54] Xuemin Tu. Three-level BDDC in two dimensions.Inter. J. Numer. Methods Engrg., 69:33–59, 2007.

23

Page 24: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

FIG. 3.1.Definition of faces, edges, and vertices of a tetrahedron in 3D.

BK+BTλ = d BK−1BTλ = d BK+BTλ = dPBD(RΓ)TSΓΓRΓB

TDP

T BD(RΓ)TSΓΓRΓBTD PBD(RΓ)TSΓΓRΓB

TDP

T

FETI-1(H−1 = 4) FETI-DP(H−1 = 4) Hybrid Method= = =

hFETI(H−1S = 4, H−1

C = 4) hFETI(H−1S = 4, H−1

C = 1) hFETI(H−1S = 4, H−1

C = 2)

FIG. 3.2.Different FETI Methods. Systems to be solved in FETI-1, FETI-DP, and hybrid FETI and the corresponding preconditioners.

24

Page 25: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

2 4 6 8 10 12 14 164

4.5

5

5.5

6

6.5

Co

nd

itio

n

1/HC

2 4 6 8 10 12 14 164

6

8

10

12

14

16

18

20

Co

nd

itio

n

HS/h

2 4 6 8 10 12 14 160

5

10

15

20

25

30

35

40

Co

nd

itio

n

HC

/HS

HS/h = 2 HS/h = 2, . . . , 16 HS/h = 2

HC/HS = 2 HC/HS = 2 HC/HS = 2, . . . , 16

1/HC = 2, . . . , 16 1/HC = 2 1/HC = 2

FIG. 4.1. Scalability properties of the preconditioned hybrid FETI method with respect to the number of subclusters, the size of thesubdomains, and the number of subdomains per subcluster.

H/h λmax λmin Cond. It.3 2.0223 1.0012 2.0199 94 2.6624 1.0040 2.6518 105 3.1885 1.0071 3.1660 126 3.6411 1.0087 3.6097 157 4.0390 1.0117 3.9923 158 4.3938 1.0074 4.3615 179 4.7141 1.0055 4.6883 18

10 5.0061 1.0051 4.9807 1811 5.2744 1.0050 5.2482 1812 5.5227 1.0048 5.4963 1814 5.9702 1.0048 5.9417 1816 6.3655 1.0047 6.3357 19

2 4 6 8 10 12 14 162

2.5

3

3.5

4

4.5

5

5.5

6

6.5

H/h

κ

FIG. 4.2.FETI method usingQ = Qδ and the Dirichlet preconditionerM−1 for a 3D scalar problem with64 = 4×4×4 subdomains.The coefficient jumps of106 are aligned with the interface and arranged in layers. Left:Largest eigenvalueλmax, smallest eigenvalueλmin,condition numberCond. = λmax/λmin, and iteration count. Right: Least square logarithmic fit ofa second order polynomial inlog(H/h)shows a good agreement to the theory.

FIG. 5.1.Standard nodal finite element basis in 1D (left). Finite element basis including an average and additional basis functions withzero average (right).

25

Page 26: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

0 5 10 15 20 252

4

6

8

10

12

14

H/h

κ

FIG. 5.2. Condition number of a FETI-DP algorithm for 3D linear elasticity and a least square fit of a second order polynomial inlog(H/h). We have64 = 4 × 4 × 4 subdomains andH/h from 3 to 23. FETI-DP Algorithm DE is used, see [24], i.e. only edge averagesare used as primal constraints. A local transformation of basis on each edge is applied.

FIG. 6.1.Subdomains from mesh partitioners [20] in general have ragged interfaces.

26

Page 27: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

FIG. 6.2.A Jones domain with a von Koch-curve boundary, recursion level 4, triangulated with linear finite elements, see [32, Figure 5.1].

0 100 200 300 400 500 60040

42

44

46

48

50

52

54

56

58

H/h

κ

FIG. 6.3.Least square fit of a second order polynomial inlog(H/h) to the data from Table 6.1 for the case ofρ-scaling.

directions of anisotropy 100 Iterations 189 Iterations

FIG. 8.1. Left: Model problem of linear elasticity on the unit cube with two orthogonal anisotropy directions. The stiffness is102 higherin the directions of the anisotropy. The model problem is then discretized using linear tetrahedral elements, resulting in 213 × 3 = 27 783degrees of freedom. Middle: Regular decomposition into4 × 4 × 4 subdomains and number of FETI-DP (Alg. DE) iterations. Right:Unstructured decomposition using a mesh partitioner into32 subdomains and number of FETI-DP (Alg. DE) iterations. For Alg. DE , see [24].

27

Page 28: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

56 Iterations 172 Iterations 173 Iterations

FIG. 8.2.Model problem, see Fig. 8.1. Regular decomposition into2 × 2 × 8, 2 × 8 × 2, and8 × 2 × 2 subdomains. Each subdomainhas73 × 3 = 1 029 degrees of freedom. The total problem has24 843 degrees of freedom. The correct adaption of the domain decompositionto the anisotropies improves the results, cf. Fig. 8.1.

79 Iterations > 250 Iterations > 250 Iterations

FIG. 8.3.Model problem, see Fig. 8.1. Irregular decomposition into32 subdomains, with three different orientations. The total problemhas24 843 degrees of freedom. The correct adaption of the domain decomposition to the anisotropies clearly improves the results,cf. Fig. 8.1.

FIG. 8.4.Arterial segment reconstructed from ultrasound and X-ray data, see [4].

28

Page 29: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

FIG. 8.5.Visualization of an arterial segment decomposed into 720 subdomains.

0 0.5 1 1.5 2 2.5 33

3.2

3.4

3.6

3.8

4

4.2

4.4

4.6

4.8

5number of Newton Iteration

time

num

ber

of n

ewto

n ite

ratio

n ea

ch ti

mes

tep

0 200 400 600 800 1000 1200−8

−7

−6

−5

−4

−3

−2

−1

0

newton step (global)

log1

0 of

abs

. res

iduu

m

0 200 400 600 800 1000 1200138

140

142

144

146

148

150

152

154

newton step (global)

num

ber

of F

ET

I−Ite

ratio

n

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1 1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2 2.1

2.2

2.3

2.4

2.5

2.6

2.7

2.8

2.9

3

FIG. 8.6.Newton-Krylov-FETI-DP algorithm using only vertex constraints; computing the stress of an arterial wall, see Fig. 8.7.

29

Page 30: HIGHLY SCALABLE PARALLEL DOMAIN DECOMPOSITION METHODS … · 1. Introduction. Domain decomposition methods are an efficient approach for t he solution of elliptic partial differential

FIG. 8.7. Von Mises equivalent stress in an atherosclerotic artery during a balloon angioplasty; values of high stress in red and of lowstress in blue.

30