primal and dual-primal iterative substructuring methods of stochastic
TRANSCRIPT
Journal of Physics Conference Series
OPEN ACCESS
Primal and dual-primal iterative substructuringmethods of stochastic PDEsTo cite this article Waad Subber and Abhijit Sarkar 2010 J Phys Conf Ser 256 012001
View the article online for updates and enhancements
You may also likeFast MPI reconstruction with non-smoothpriors by stochastic optimization and data-driven splittingLena Zdun and Christina Brandt
-
Higher-order total variation approachesand generalisationsKristian Bredies and Martin Holler
-
Laplacians on discrete and quantumgeometriesGianluca Calcagni Daniele Oriti andJohannes Thuumlrigen
-
Recent citationsA domain decomposition method ofstochastic PDEs An iterative solutiontechniques using a two-level scalablepreconditionerWaad Subber and Abhijit Sarkar
-
Domain decomposition method ofstochastic PDEs a two-level scalablepreconditionerWaad Subber and Abhijit Sarkar
-
This content was downloaded from IP address 220116144228 on 08012022 at 0737
Primal and dual-primal iterative substructuring
methods of stochastic PDEs
Waad Subber1 and Abhijit Sarkar2
Department of Civil and Environmental Engineering Carleton University Ottawa OntarioK1S5B6 Canada
E-mail 1wsubberconnectcarletonca
2abhijit sarkarcarletonca
Abstract A novel non-overlapping domain decomposition method is proposed to solve thelarge-scale linear system arising from the finite element discretization of stochastic partialdifferential equations (SPDEs) The methodology is based on a Schur complement basedgeometric decomposition and an orthogonal decomposition and projection of the stochasticprocesses using Polynomial Chaos expansion The algorithm offers a direct approach toformulate a two-level scalable preconditioner The proposed preconditioner strictly enforcesthe continuity condition on the corner nodes of the interface boundary while weakly satisfyingthe continuity condition over the remaining interface nodes This approach relates to aprimal version of an iterative substructuring method Next a Lagrange multiplier based dual-primal domain decomposition method is introduced in the context of SPDEs In the dual-primal method the continuity condition on the corner nodes is strictly satisfied while Lagrangemultipliers are used to enforce continuity on the remaining part of the interface boundary Fornumerical illustrations a two dimensional elliptic SPDE with non-Gaussian random coefficientsis considered The numerical results demonstrate the scalability of these algorithms with respectto the mesh size subdomain size fixed problem size per subdomain order of Polynomial Chaosexpansion and level of uncertainty in the input parameters The numerical experiments areperformed on a Linux cluster using MPI and PETSc libraries
1 Introduction
A domain decomposition method of SPDEs is introduced in [1 2] to quantify uncertainty inlarge-scale linear systems The methodology is based on a Schur complement based geomet-ric decomposition and an orthogonal decomposition and projection of the stochastic processesA parallel preconditioned conjugate gradient method (PCGM) is adopted in [3] to solve theinterface problem without explicitly constructing the Schur complement system The parallelperformance of the algorithms is demonstrated using a lumped preconditioner for non-Gaussiansystems arising from a hydraulic problem having random soil permeability properties
A one-level Neumann-Neumann domain decomposition preconditioner for SPDEs is intro-duced in [4] in order to enhance the performance of the parallel PCGM iterative solver in [3]The implementation of the algorithm requires a local solve of a stochastic Dirichlet problem fol-lowed by a local solve of a stochastic Neumann problem in each iteration of the PCGM solverThe multilevel sparsity structure of the coefficient matrices of the stochastic system namely (a)the sparsity structure due to the finite element discretization and (b) the block sparsity structuredue to the Polynomial Chaos expansion is exploited for computational efficiency The one-level
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
ccopy 2010 IOP Publishing Ltd 1
Neumann-Neumann preconditioner in [4] demonstrates a good (strong and weak) scalability forthe moderate range of CPUs considered
In this paper we first describe a primal version of iterative substructuring methods for thesolution of the large-scale linear system arising from stochastic finite element method Thealgorithm offers a straightforward approach to formulate a two-level scalable preconditionerThe continuity condition is strictly enforced on the corner nodes (nodes shared among morethan two subdomains including the nodes at the ends of the interface edges) For the remainingpart of the interface boundary the continuity condition is satisfied weakly (in an average sense)Note that the continuity of the solution field across the entire interface boundary is eventuallysatisfied at the convergence of the iterations This approach naturally leads to a coarse gridwhich connects the subdomains globally via the corner nodes The coarse grid provides amechanism to propagate information globally which makes the algorithm scalable with respectto subdomain size In the second part of the paper a dual-primal iterative substructuringmethod is introduced for SPDEs which maybe viewed as an extension of the Dual-Primal FiniteElement Tearing and Interconnecting method (FETI-DP) [5] in the context of SPDEs In thisapproach the continuity condition on the corner nodes is strictly satisfied by partial assemblywhile Lagrange multipliers are used to enforce continuity on the remaining part of the interfaceboundary A system of Lagrange multiplier (also called the dual variable) is solved iterativelyusing PCGM method equipped with Dirichlet preconditioner PETSc [6] and MPI [7] librariesare used for efficient parallel implementation of the primal and dual-primal algorithms Thegraph partitioning tool METIS [8] is used for optimal decomposition of the finite element meshfor load balancing and minimum interprocessor communication The parallel performance of thealgorithms is studied for a two dimensional stochastic elliptic PDE with non-Gaussian randomcoefficients The numerical experiments are performed using a Linux cluster
2 Uncertainty representation by stochastic processes
This section provides a brief review of the theories of stochastic processes relevant to thesubsequent developments of the paper [9 1] We assume the data induces a representation of themodel parameters as random variables and processes which span the Hilbert space HG A set ofbasis functions ξi is identified to characterize this space using Karhunen-Loeve expansion Thestate of the system resides in the Hilbert space HL with basis functions Ψi being identifiedwith the Polynomial Chaos expansion (PC) The Karhunen-Loeve expansion of a stochasticprocess α(x θ) is based on the spectral expansion of its covariance function Rαα(x y) Theexpansion takes the following form
α(x θ) = α(x) +infin
sum
i=1
radic
λiξi(θ)φi(x)
where α(x) is the mean of the stochastic process θ represents the random dimension andξi(θ) is a set of uncorrelated (but not generally independent for non-Gaussian processes) randomvariables φi(x) are the eigenfunctions and λi are the eigenvalues of the covariance kernel whichcan be obtained as the solution to the following integral equation
int
D
Rαα(x y)φi(y)dy = λiφi(x)
where D denotes the spatial dimension over which the process is defined The covariance functionof the solution process is not known a priori and hence the Karhunen-Loeve expansion cannot beused to represent it Therefore a generic basis that is complete in the space of all second-orderrandom variables will be identified and used in the approximation process Since the solution
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
2
process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as
u(θ) =N
sum
j=0
Ψj(θ)uj
These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k
3 Review of Schur complement based domain decomposition method of SPDEs
Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system
A(θ)u(θ) = f (1)
where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]
In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that
Ω =
ns⋃
s=1
Ωs Ωs
⋂
Ωr = 0 s 6= r
andΓ =
⋃
s=1
Γs where Γs = partΩspartΩ
For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us
I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated
with nodes that are shared among two or more subdomains as shown in Fig(1)
Consequently the subdomain equilibrium equation can be represented as
[
AsII(θ) As
IΓ (θ)As
ΓI(θ) AsΓΓ
(θ)
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as
Lsum
i=0
Ψi
[
AsIIi As
IΓ i
AsΓIi As
ΓΓ i
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector
uΓ (θ) to the local interface vector usΓ(θ) is defined as
usΓ (θ) = RsuΓ (θ)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
3
Figure 1 Partitioning domain nodes into interior () and interface ()
Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations
Lsum
i=0
Ψi
A1IIi 0 A1
IΓ iR1
0 Ans
IIi Ans
IΓ iRns
RT1 A1
ΓIi RTns
Ans
ΓIi
nssum
s=1
RTs As
ΓΓ iRs
u1I(θ)
uns
I (θ)uΓ (θ)
=
f1I
fns
Inssum
s=1
RTs f s
Γ
(2)
The solution process can be expanded using the same Polynomial Chaos basis as
u1I(θ)
uns
I (θ)uΓ (θ)
=N
sum
j=0
Ψj(θ)
u1Ij
uns
Ij
uΓ j
(3)
Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained
A1II 0 A1
IΓR1
0 Ans
II Ans
IΓRns
RT1 A
1ΓI RT
nsAns
ΓI
nssum
s=1
RTs A
sΓΓRs
U1I
Uns
I
UΓ
=
F1I
Fns
Inssum
s=1
RTs F
sΓ
(4)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
4
where
[Asαβ]jk =
Lsum
i=0
〈ΨiΨjΨk〉Asαβi Fs
αk = 〈Ψkfsα〉
UmI = (um
I 0 umI N )T UΓ = (uΓ 0 uΓ N )T
the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form
Rs = blockdiag(R0s R
Ns )
where (R0s R
Ns ) are the deterministic restriction operators In parallel implementation
Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-
itly
A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ
S UΓ = GΓ (5)
where the global extended Schur complement matrix S is given by
S =
nssum
s=1
RTs [As
ΓΓ minusAsΓI (A
sII)
minus1AsIΓ ]Rs
and the corresponding right hand vector GΓ is
GΓ =
nssum
s=1
RTs [Fs
Γ minusAsΓI (A
sII)
minus1FsI ]
Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as
AsII Us
I = FsI minusAs
ΓIRsUΓ
4 Solution methods for the extended Schur complement system
Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
5
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Primal and dual-primal iterative substructuring
methods of stochastic PDEs
Waad Subber1 and Abhijit Sarkar2
Department of Civil and Environmental Engineering Carleton University Ottawa OntarioK1S5B6 Canada
E-mail 1wsubberconnectcarletonca
2abhijit sarkarcarletonca
Abstract A novel non-overlapping domain decomposition method is proposed to solve thelarge-scale linear system arising from the finite element discretization of stochastic partialdifferential equations (SPDEs) The methodology is based on a Schur complement basedgeometric decomposition and an orthogonal decomposition and projection of the stochasticprocesses using Polynomial Chaos expansion The algorithm offers a direct approach toformulate a two-level scalable preconditioner The proposed preconditioner strictly enforcesthe continuity condition on the corner nodes of the interface boundary while weakly satisfyingthe continuity condition over the remaining interface nodes This approach relates to aprimal version of an iterative substructuring method Next a Lagrange multiplier based dual-primal domain decomposition method is introduced in the context of SPDEs In the dual-primal method the continuity condition on the corner nodes is strictly satisfied while Lagrangemultipliers are used to enforce continuity on the remaining part of the interface boundary Fornumerical illustrations a two dimensional elliptic SPDE with non-Gaussian random coefficientsis considered The numerical results demonstrate the scalability of these algorithms with respectto the mesh size subdomain size fixed problem size per subdomain order of Polynomial Chaosexpansion and level of uncertainty in the input parameters The numerical experiments areperformed on a Linux cluster using MPI and PETSc libraries
1 Introduction
A domain decomposition method of SPDEs is introduced in [1 2] to quantify uncertainty inlarge-scale linear systems The methodology is based on a Schur complement based geomet-ric decomposition and an orthogonal decomposition and projection of the stochastic processesA parallel preconditioned conjugate gradient method (PCGM) is adopted in [3] to solve theinterface problem without explicitly constructing the Schur complement system The parallelperformance of the algorithms is demonstrated using a lumped preconditioner for non-Gaussiansystems arising from a hydraulic problem having random soil permeability properties
A one-level Neumann-Neumann domain decomposition preconditioner for SPDEs is intro-duced in [4] in order to enhance the performance of the parallel PCGM iterative solver in [3]The implementation of the algorithm requires a local solve of a stochastic Dirichlet problem fol-lowed by a local solve of a stochastic Neumann problem in each iteration of the PCGM solverThe multilevel sparsity structure of the coefficient matrices of the stochastic system namely (a)the sparsity structure due to the finite element discretization and (b) the block sparsity structuredue to the Polynomial Chaos expansion is exploited for computational efficiency The one-level
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
ccopy 2010 IOP Publishing Ltd 1
Neumann-Neumann preconditioner in [4] demonstrates a good (strong and weak) scalability forthe moderate range of CPUs considered
In this paper we first describe a primal version of iterative substructuring methods for thesolution of the large-scale linear system arising from stochastic finite element method Thealgorithm offers a straightforward approach to formulate a two-level scalable preconditionerThe continuity condition is strictly enforced on the corner nodes (nodes shared among morethan two subdomains including the nodes at the ends of the interface edges) For the remainingpart of the interface boundary the continuity condition is satisfied weakly (in an average sense)Note that the continuity of the solution field across the entire interface boundary is eventuallysatisfied at the convergence of the iterations This approach naturally leads to a coarse gridwhich connects the subdomains globally via the corner nodes The coarse grid provides amechanism to propagate information globally which makes the algorithm scalable with respectto subdomain size In the second part of the paper a dual-primal iterative substructuringmethod is introduced for SPDEs which maybe viewed as an extension of the Dual-Primal FiniteElement Tearing and Interconnecting method (FETI-DP) [5] in the context of SPDEs In thisapproach the continuity condition on the corner nodes is strictly satisfied by partial assemblywhile Lagrange multipliers are used to enforce continuity on the remaining part of the interfaceboundary A system of Lagrange multiplier (also called the dual variable) is solved iterativelyusing PCGM method equipped with Dirichlet preconditioner PETSc [6] and MPI [7] librariesare used for efficient parallel implementation of the primal and dual-primal algorithms Thegraph partitioning tool METIS [8] is used for optimal decomposition of the finite element meshfor load balancing and minimum interprocessor communication The parallel performance of thealgorithms is studied for a two dimensional stochastic elliptic PDE with non-Gaussian randomcoefficients The numerical experiments are performed using a Linux cluster
2 Uncertainty representation by stochastic processes
This section provides a brief review of the theories of stochastic processes relevant to thesubsequent developments of the paper [9 1] We assume the data induces a representation of themodel parameters as random variables and processes which span the Hilbert space HG A set ofbasis functions ξi is identified to characterize this space using Karhunen-Loeve expansion Thestate of the system resides in the Hilbert space HL with basis functions Ψi being identifiedwith the Polynomial Chaos expansion (PC) The Karhunen-Loeve expansion of a stochasticprocess α(x θ) is based on the spectral expansion of its covariance function Rαα(x y) Theexpansion takes the following form
α(x θ) = α(x) +infin
sum
i=1
radic
λiξi(θ)φi(x)
where α(x) is the mean of the stochastic process θ represents the random dimension andξi(θ) is a set of uncorrelated (but not generally independent for non-Gaussian processes) randomvariables φi(x) are the eigenfunctions and λi are the eigenvalues of the covariance kernel whichcan be obtained as the solution to the following integral equation
int
D
Rαα(x y)φi(y)dy = λiφi(x)
where D denotes the spatial dimension over which the process is defined The covariance functionof the solution process is not known a priori and hence the Karhunen-Loeve expansion cannot beused to represent it Therefore a generic basis that is complete in the space of all second-orderrandom variables will be identified and used in the approximation process Since the solution
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
2
process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as
u(θ) =N
sum
j=0
Ψj(θ)uj
These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k
3 Review of Schur complement based domain decomposition method of SPDEs
Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system
A(θ)u(θ) = f (1)
where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]
In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that
Ω =
ns⋃
s=1
Ωs Ωs
⋂
Ωr = 0 s 6= r
andΓ =
⋃
s=1
Γs where Γs = partΩspartΩ
For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us
I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated
with nodes that are shared among two or more subdomains as shown in Fig(1)
Consequently the subdomain equilibrium equation can be represented as
[
AsII(θ) As
IΓ (θ)As
ΓI(θ) AsΓΓ
(θ)
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as
Lsum
i=0
Ψi
[
AsIIi As
IΓ i
AsΓIi As
ΓΓ i
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector
uΓ (θ) to the local interface vector usΓ(θ) is defined as
usΓ (θ) = RsuΓ (θ)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
3
Figure 1 Partitioning domain nodes into interior () and interface ()
Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations
Lsum
i=0
Ψi
A1IIi 0 A1
IΓ iR1
0 Ans
IIi Ans
IΓ iRns
RT1 A1
ΓIi RTns
Ans
ΓIi
nssum
s=1
RTs As
ΓΓ iRs
u1I(θ)
uns
I (θ)uΓ (θ)
=
f1I
fns
Inssum
s=1
RTs f s
Γ
(2)
The solution process can be expanded using the same Polynomial Chaos basis as
u1I(θ)
uns
I (θ)uΓ (θ)
=N
sum
j=0
Ψj(θ)
u1Ij
uns
Ij
uΓ j
(3)
Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained
A1II 0 A1
IΓR1
0 Ans
II Ans
IΓRns
RT1 A
1ΓI RT
nsAns
ΓI
nssum
s=1
RTs A
sΓΓRs
U1I
Uns
I
UΓ
=
F1I
Fns
Inssum
s=1
RTs F
sΓ
(4)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
4
where
[Asαβ]jk =
Lsum
i=0
〈ΨiΨjΨk〉Asαβi Fs
αk = 〈Ψkfsα〉
UmI = (um
I 0 umI N )T UΓ = (uΓ 0 uΓ N )T
the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form
Rs = blockdiag(R0s R
Ns )
where (R0s R
Ns ) are the deterministic restriction operators In parallel implementation
Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-
itly
A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ
S UΓ = GΓ (5)
where the global extended Schur complement matrix S is given by
S =
nssum
s=1
RTs [As
ΓΓ minusAsΓI (A
sII)
minus1AsIΓ ]Rs
and the corresponding right hand vector GΓ is
GΓ =
nssum
s=1
RTs [Fs
Γ minusAsΓI (A
sII)
minus1FsI ]
Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as
AsII Us
I = FsI minusAs
ΓIRsUΓ
4 Solution methods for the extended Schur complement system
Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
5
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Neumann-Neumann preconditioner in [4] demonstrates a good (strong and weak) scalability forthe moderate range of CPUs considered
In this paper we first describe a primal version of iterative substructuring methods for thesolution of the large-scale linear system arising from stochastic finite element method Thealgorithm offers a straightforward approach to formulate a two-level scalable preconditionerThe continuity condition is strictly enforced on the corner nodes (nodes shared among morethan two subdomains including the nodes at the ends of the interface edges) For the remainingpart of the interface boundary the continuity condition is satisfied weakly (in an average sense)Note that the continuity of the solution field across the entire interface boundary is eventuallysatisfied at the convergence of the iterations This approach naturally leads to a coarse gridwhich connects the subdomains globally via the corner nodes The coarse grid provides amechanism to propagate information globally which makes the algorithm scalable with respectto subdomain size In the second part of the paper a dual-primal iterative substructuringmethod is introduced for SPDEs which maybe viewed as an extension of the Dual-Primal FiniteElement Tearing and Interconnecting method (FETI-DP) [5] in the context of SPDEs In thisapproach the continuity condition on the corner nodes is strictly satisfied by partial assemblywhile Lagrange multipliers are used to enforce continuity on the remaining part of the interfaceboundary A system of Lagrange multiplier (also called the dual variable) is solved iterativelyusing PCGM method equipped with Dirichlet preconditioner PETSc [6] and MPI [7] librariesare used for efficient parallel implementation of the primal and dual-primal algorithms Thegraph partitioning tool METIS [8] is used for optimal decomposition of the finite element meshfor load balancing and minimum interprocessor communication The parallel performance of thealgorithms is studied for a two dimensional stochastic elliptic PDE with non-Gaussian randomcoefficients The numerical experiments are performed using a Linux cluster
2 Uncertainty representation by stochastic processes
This section provides a brief review of the theories of stochastic processes relevant to thesubsequent developments of the paper [9 1] We assume the data induces a representation of themodel parameters as random variables and processes which span the Hilbert space HG A set ofbasis functions ξi is identified to characterize this space using Karhunen-Loeve expansion Thestate of the system resides in the Hilbert space HL with basis functions Ψi being identifiedwith the Polynomial Chaos expansion (PC) The Karhunen-Loeve expansion of a stochasticprocess α(x θ) is based on the spectral expansion of its covariance function Rαα(x y) Theexpansion takes the following form
α(x θ) = α(x) +infin
sum
i=1
radic
λiξi(θ)φi(x)
where α(x) is the mean of the stochastic process θ represents the random dimension andξi(θ) is a set of uncorrelated (but not generally independent for non-Gaussian processes) randomvariables φi(x) are the eigenfunctions and λi are the eigenvalues of the covariance kernel whichcan be obtained as the solution to the following integral equation
int
D
Rαα(x y)φi(y)dy = λiφi(x)
where D denotes the spatial dimension over which the process is defined The covariance functionof the solution process is not known a priori and hence the Karhunen-Loeve expansion cannot beused to represent it Therefore a generic basis that is complete in the space of all second-orderrandom variables will be identified and used in the approximation process Since the solution
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
2
process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as
u(θ) =N
sum
j=0
Ψj(θ)uj
These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k
3 Review of Schur complement based domain decomposition method of SPDEs
Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system
A(θ)u(θ) = f (1)
where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]
In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that
Ω =
ns⋃
s=1
Ωs Ωs
⋂
Ωr = 0 s 6= r
andΓ =
⋃
s=1
Γs where Γs = partΩspartΩ
For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us
I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated
with nodes that are shared among two or more subdomains as shown in Fig(1)
Consequently the subdomain equilibrium equation can be represented as
[
AsII(θ) As
IΓ (θ)As
ΓI(θ) AsΓΓ
(θ)
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as
Lsum
i=0
Ψi
[
AsIIi As
IΓ i
AsΓIi As
ΓΓ i
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector
uΓ (θ) to the local interface vector usΓ(θ) is defined as
usΓ (θ) = RsuΓ (θ)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
3
Figure 1 Partitioning domain nodes into interior () and interface ()
Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations
Lsum
i=0
Ψi
A1IIi 0 A1
IΓ iR1
0 Ans
IIi Ans
IΓ iRns
RT1 A1
ΓIi RTns
Ans
ΓIi
nssum
s=1
RTs As
ΓΓ iRs
u1I(θ)
uns
I (θ)uΓ (θ)
=
f1I
fns
Inssum
s=1
RTs f s
Γ
(2)
The solution process can be expanded using the same Polynomial Chaos basis as
u1I(θ)
uns
I (θ)uΓ (θ)
=N
sum
j=0
Ψj(θ)
u1Ij
uns
Ij
uΓ j
(3)
Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained
A1II 0 A1
IΓR1
0 Ans
II Ans
IΓRns
RT1 A
1ΓI RT
nsAns
ΓI
nssum
s=1
RTs A
sΓΓRs
U1I
Uns
I
UΓ
=
F1I
Fns
Inssum
s=1
RTs F
sΓ
(4)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
4
where
[Asαβ]jk =
Lsum
i=0
〈ΨiΨjΨk〉Asαβi Fs
αk = 〈Ψkfsα〉
UmI = (um
I 0 umI N )T UΓ = (uΓ 0 uΓ N )T
the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form
Rs = blockdiag(R0s R
Ns )
where (R0s R
Ns ) are the deterministic restriction operators In parallel implementation
Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-
itly
A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ
S UΓ = GΓ (5)
where the global extended Schur complement matrix S is given by
S =
nssum
s=1
RTs [As
ΓΓ minusAsΓI (A
sII)
minus1AsIΓ ]Rs
and the corresponding right hand vector GΓ is
GΓ =
nssum
s=1
RTs [Fs
Γ minusAsΓI (A
sII)
minus1FsI ]
Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as
AsII Us
I = FsI minusAs
ΓIRsUΓ
4 Solution methods for the extended Schur complement system
Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
5
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
process is a function of the material properties nodal solution variables denoted by u(θ) can beformally expressed as some nonlinear functional of the set ξi(θ) used to represent the materialstochasticity It has been shown that this functional dependence can be expanded in terms ofpolynomials in Gaussian random variables namely Polynomial Chaos [9] as
u(θ) =N
sum
j=0
Ψj(θ)uj
These polynomials are orthogonal in the sense that their inner product 〈ΨjΨk〉 defined asthe statistical average of their product is equal to zero for j 6= k
3 Review of Schur complement based domain decomposition method of SPDEs
Consider an elliptic stochastic PDE defined on a domain Ω with a given boundary conditionson partΩ Finite element discretization of the stochastic PDE leads to the following linear system
A(θ)u(θ) = f (1)
where A(θ) is the stiffness matrix with random coefficients u(θ) is the stochastic processrepresenting the response vector and f is the applied force For large-scale system Eq(1) canbe solved efficiently using domain decomposition method [1 2]
In domain decomposition method the spatial domain Ω is partitioned into ns non-overlappingsubdomains Ωs 1 le s le ns such that
Ω =
ns⋃
s=1
Ωs Ωs
⋂
Ωr = 0 s 6= r
andΓ =
⋃
s=1
Γs where Γs = partΩspartΩ
For a typical subdomain Ωs the nodal vector us(θ) is partitioned into a set of interior un-knowns us
I(θ) associated with nodes in the interior of Ωs and interface unknowns usΓ(θ) associated
with nodes that are shared among two or more subdomains as shown in Fig(1)
Consequently the subdomain equilibrium equation can be represented as
[
AsII(θ) As
IΓ (θ)As
ΓI(θ) AsΓΓ
(θ)
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
The Polynomial Chaos expansion can be used to represent the uncertainty in the modelparameters as
Lsum
i=0
Ψi
[
AsIIi As
IΓ i
AsΓIi As
ΓΓ i
]
usI(θ)
usΓ(θ)
=
f sI
f sΓ
A Boolean restriction operator Rs of size (nsΓtimes nΓ) which maps the global interface vector
uΓ (θ) to the local interface vector usΓ(θ) is defined as
usΓ (θ) = RsuΓ (θ)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
3
Figure 1 Partitioning domain nodes into interior () and interface ()
Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations
Lsum
i=0
Ψi
A1IIi 0 A1
IΓ iR1
0 Ans
IIi Ans
IΓ iRns
RT1 A1
ΓIi RTns
Ans
ΓIi
nssum
s=1
RTs As
ΓΓ iRs
u1I(θ)
uns
I (θ)uΓ (θ)
=
f1I
fns
Inssum
s=1
RTs f s
Γ
(2)
The solution process can be expanded using the same Polynomial Chaos basis as
u1I(θ)
uns
I (θ)uΓ (θ)
=N
sum
j=0
Ψj(θ)
u1Ij
uns
Ij
uΓ j
(3)
Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained
A1II 0 A1
IΓR1
0 Ans
II Ans
IΓRns
RT1 A
1ΓI RT
nsAns
ΓI
nssum
s=1
RTs A
sΓΓRs
U1I
Uns
I
UΓ
=
F1I
Fns
Inssum
s=1
RTs F
sΓ
(4)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
4
where
[Asαβ]jk =
Lsum
i=0
〈ΨiΨjΨk〉Asαβi Fs
αk = 〈Ψkfsα〉
UmI = (um
I 0 umI N )T UΓ = (uΓ 0 uΓ N )T
the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form
Rs = blockdiag(R0s R
Ns )
where (R0s R
Ns ) are the deterministic restriction operators In parallel implementation
Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-
itly
A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ
S UΓ = GΓ (5)
where the global extended Schur complement matrix S is given by
S =
nssum
s=1
RTs [As
ΓΓ minusAsΓI (A
sII)
minus1AsIΓ ]Rs
and the corresponding right hand vector GΓ is
GΓ =
nssum
s=1
RTs [Fs
Γ minusAsΓI (A
sII)
minus1FsI ]
Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as
AsII Us
I = FsI minusAs
ΓIRsUΓ
4 Solution methods for the extended Schur complement system
Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
5
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Figure 1 Partitioning domain nodes into interior () and interface ()
Enforcing the transmission conditions (compatibility and equilibrium) along the interfacesthe global equilibrium equation of the stochastic system can be expressed in the following blocklinear systems of equations
Lsum
i=0
Ψi
A1IIi 0 A1
IΓ iR1
0 Ans
IIi Ans
IΓ iRns
RT1 A1
ΓIi RTns
Ans
ΓIi
nssum
s=1
RTs As
ΓΓ iRs
u1I(θ)
uns
I (θ)uΓ (θ)
=
f1I
fns
Inssum
s=1
RTs f s
Γ
(2)
The solution process can be expanded using the same Polynomial Chaos basis as
u1I(θ)
uns
I (θ)uΓ (θ)
=N
sum
j=0
Ψj(θ)
u1Ij
uns
Ij
uΓ j
(3)
Substituting Eq(3) into Eq(2) and performing Galerkin projection to minimize the errorover the space spanned by the Polynomial Chaos basis [1] the following coupled deterministicsystems of equations is obtained
A1II 0 A1
IΓR1
0 Ans
II Ans
IΓRns
RT1 A
1ΓI RT
nsAns
ΓI
nssum
s=1
RTs A
sΓΓRs
U1I
Uns
I
UΓ
=
F1I
Fns
Inssum
s=1
RTs F
sΓ
(4)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
4
where
[Asαβ]jk =
Lsum
i=0
〈ΨiΨjΨk〉Asαβi Fs
αk = 〈Ψkfsα〉
UmI = (um
I 0 umI N )T UΓ = (uΓ 0 uΓ N )T
the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form
Rs = blockdiag(R0s R
Ns )
where (R0s R
Ns ) are the deterministic restriction operators In parallel implementation
Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-
itly
A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ
S UΓ = GΓ (5)
where the global extended Schur complement matrix S is given by
S =
nssum
s=1
RTs [As
ΓΓ minusAsΓI (A
sII)
minus1AsIΓ ]Rs
and the corresponding right hand vector GΓ is
GΓ =
nssum
s=1
RTs [Fs
Γ minusAsΓI (A
sII)
minus1FsI ]
Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as
AsII Us
I = FsI minusAs
ΓIRsUΓ
4 Solution methods for the extended Schur complement system
Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
5
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
where
[Asαβ]jk =
Lsum
i=0
〈ΨiΨjΨk〉Asαβi Fs
αk = 〈Ψkfsα〉
UmI = (um
I 0 umI N )T UΓ = (uΓ 0 uΓ N )T
the subscripts α and β represent the index I and Γ The coefficient matrix in Eq(4) is of ordern(N +1)timesn(N +1) where n and (N +1) denote the total number of the degrees of freedom andchaos coefficients respectively The stochastic counterpart of the restriction operator in Eq(4)takes the following form
Rs = blockdiag(R0s R
Ns )
where (R0s R
Ns ) are the deterministic restriction operators In parallel implementation
Rs acts as a scatter operator while RTs acts as a gather operator and are not constructed explic-
itly
A block Gaussian elimination reduces the system in Eq(4) to the following extended Schurcomplement system for the interface variable UΓ
S UΓ = GΓ (5)
where the global extended Schur complement matrix S is given by
S =
nssum
s=1
RTs [As
ΓΓ minusAsΓI (A
sII)
minus1AsIΓ ]Rs
and the corresponding right hand vector GΓ is
GΓ =
nssum
s=1
RTs [Fs
Γ minusAsΓI (A
sII)
minus1FsI ]
Once the interface unknowns UΓ is available the interior unknowns can be obtainedconcurrently by solving the interior problem on each subdomain as
AsII Us
I = FsI minusAs
ΓIRsUΓ
4 Solution methods for the extended Schur complement system
Solution methods for linear systems are broadly categorized into direct methods and iterativemethods The direct methods generally are based on sparse Gaussian elimination technique andare popular for their robustness However they are expensive in computation time and memoryrequirements and therefore cannot be applied to the solution of large-scale linear systems [10]On the other hand the iterative methods generate a sequences of approximate solutions whichconverge to the true solutions In the iterative methods the main arithmetic operation is thematrix-vector multiplication Therefore the linear system itself need not be constructed explicitlyand only a procedure for matrix-vector product is required This property makes iterativemethods more suitable to parallel processing than direct methods
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
5
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
41 Preconditioned Conjugate Gradient Method (PCGM)Non-overlapping domain decomposition method or iterative substructuring can be viewed as apreconditioned iterative method to solve the Schur complement system of the form [11]
S UΓ = GΓ
For symmetric positive-definite system such as Schur complement system the ConjugateGradient Method (CGM) is generally used The performance of CGM mainly depends on thespectrum of the coefficient matrix However the rate of convergence of the iterative methodcan generally be improved by transforming the original system into an equivalent system thathas better spectral properties (ie lower condition number κ(S)) of the coefficient matrix Thistransformation is called preconditioning and the matrix used in the transformation is called thepreconditioner In other words the transformed linear system becomes
Mminus1S UΓ = Mminus1GΓ
In general κ(Mminus1S) is much smaller than κ(S) and the eigenvalues of Mminus1S are clusterednear one This procedure known as Preconditioned Conjugate Gradient Method (PCGM) Inpractice the explicit construction of Mminus1 is not needed Instead for a given vector rΓ a systemof the the following form is solved
MZ = rΓ
The PCGM algorithm to solve the Schur complement system proceeds as follows [10]
Algorithm 1 The PCGM Algorithm
1 Initialize UΓ0= 0
2 Compute rΓ0= GΓ minus S UΓ0
3 Precondition Z0 = Mminus1rΓ0
4 First search direction P0 = Z0
5 Initialize ρ0 = (rΓ0Z0)
6 For j = 0 1 middot middot middot until convergence Do
7 Qj = SPj
8 ρtmpj= (Qj Pj)
9 αj = ρjρtmpj
10 UΓj+1= UΓj
+ αjPj
11 rΓj+1= rΓj
minus αjQj
12 Zj+1 = Mminus1rΓj+1
13 ρj+1 = (rΓj+1Zj+1)
14 βj = ρj+1ρj
15 Pj+1 = Zj+1 + βjPj
16 EndDo
The PCGM algorithm indicates that the main arithmetic operations are calculating the prod-uct Q = SP in step 7 and the preconditioned residual Z = Mminus1rΓ in step 12 These operationscan be performed in parallel as outlined next
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
6
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Given the subdomain Schur complement matrices Ss and a global vector P the matrix-vectorproduct Q = SP can be calculated in parallel as
Q =
nssum
s=1
RTs SsRsP
where ns is the number of subdomains and Rs and RTs are scatter and gather operator re-
spectively The parallel implementation of this procedure is summarized in Algorithm (2)
Algorithm 2 Parallel Matrix-Vector Product Procedure
1 Input (P)
2 Scatter Ps = RsP
3 Local operation Qs = SsPs
4 Gather Q =
nssum
s=1
RTs Q
s
5 Output (Q)
The working vectors Ps and Qs are defined on the subdomain level
Similarly the effect of a parallel preconditioner on a residual vector Z = Mminus1rΓ can becomputed as
Z =
nssum
s=1
RTs M
minus1s RsrΓ
This procedure is outlined in following algorithm
Algorithm 3 Parallel Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= RsrΓ
3 Local Solve MsZs = rs
Γ
4 Gather Z =
nssum
s=1
RTs Z
s
5 Output (Z)
The local preconditioner Ms and the working vectors rsΓ
and Zs are defined on the subdo-main level
5 Iterative substructuring methods of SPDEs
Next sections describe the primal and dual-primal substructuring methods in the context ofSPDEs In the primal method the interface problem is solved iteratively using PCGM solverequipped with a scalable preconditioner At each iteration of the iterative solver loop localproblems are solved on each subdomain in parallel These local problems are used to construct asubdomain level precondtioner Moreover a coarse problem is required to propagate information
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
7
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
globally across the subdomains This global exchange of information leads to a scalablepreconditioner In the dual-primal method a system of Lagrange multiplier that enforcescontinuity constraints across the interface boundary is solved iteratively using PCGM solverThe global coarse problem is already embedded in the operator of the Lagrange multiplier systemand therefore a one-level preconditioner such as lumped or Dirichlet is sufficient for scalability Aframework of the primal and dual-primal iterative substructuring methods for SPDEs is detailednext
6 A primal iterative substructuring method of SPDEs
In order to define local problems over each of the subdomains we partition the subdomainnodal vector us(θ) into a set of interior unknowns us
i (θ) corner unknowns usc(θ) and remaining
unknowns usr(θ) as schematically shown in Fig(2)
Figure 2 Partitioning domain nodes into interior () remaining () and corner(bull) nodes
According to this partitioning scheme the subdomain equilibrium equation can berepresented as
Asii(θ) As
ir(θ) Asic(θ)
Asri(θ) As
rr(θ) Asrc(θ)
Asci(θ) As
cr(θ) Ascc(θ)
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
The Polynomial Chaos representation of uncertain model parameters leads to the followingsubdomain equilibrium equation
Lsum
l=0
Ψl
Asiil As
irl Asicl
Asril As
rrl Asrcl
Ascil As
crl Asccl
usi (θ)
usr(θ)
usc(θ)
=
f si
f sr
f sc
(6)
The solution process is expressed using the same Polynomial Chaos basis as
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
8
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
usi (θ)
usr(θ)
usc(θ)
=
Nsum
j=0
Ψj(θ)
usij
usrj
uscj
(7)
Substituting Eq(7) into Eq(6) and performing Galerkin projection leads to the followingcoupled deterministic systems of equations
Asii As
ir Asic
Asri As
rr Asrc
Asci As
cr Ascc
Usi
Usr
Usc
=
Fsi
Fsr
Fsc
(8)
where
[Asαβ ]jk =
Lsum
l=0
〈ΨlΨjΨk〉Asαβl
Fsαk = 〈Ψkf
sα〉
Usα = (us
α0 middot middot middot usαN )T
the subscripts α and β represent the index i r and c
Enforcing the transmission conditions along the boundary interfaces the subdomainequilibrium equation can be expressed as
Asii As
irBsr As
icBsc
nssum
s=1
BsrTAs
ri
nssum
s=1
BsrTAs
rrBsr
nssum
s=1
BsrTAs
rcBsc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
crBsr
nssum
s=1
BscTAs
ccBsc
Usi
Ur
Uc
=
Fsi
nssum
s=1
BsrTFs
r
nssum
s=1
BscTFs
c
(9)
where Bsr is a Boolean rectangular matrix that maps the global remaining vector Ur to the
local remaining vector Usr as
Usr = Bs
rUr (10)
Similarly the restriction operator Bsc is a Boolean rectangular matrix that maps the global
corner vector Uc to the local corner vector Usc as
Usc = Bs
cUc (11)
In parallel implementation both Bsr and Bs
c act as scatter operators while BsrT and Bs
cT act
as gather operators
The first block equation in Eq(9) can be solved for Usi in parallel as
Usi = [As
ii]minus1(Fs
i minusAsirB
srUr minusAs
icBscUc) (12)
Substituting Eq(12) into Eq(9) leads to the following condensed system which representsthe Schur compliment system in Eq(5) but now the interface boundary nodes are split into
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
9
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
remaining and corner nodes as shown schematically in Fig(3)
nssum
s=1
BsrTSs
rrBsr
nssum
s=1
BsrTSs
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Ur
Uc
=
nssum
s=1
BsrTGs
r
nssum
s=1
BscTGs
c
(13)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
Figure 3 The interface boundary nodes split into remainder () and corner(bull) nodes
The corner nodal vector Uc in Eq(13) is eliminated next to obtain the following (symmetricpositive definite) reduced interface problem
(Frr minus Frc[Fcc]minus1Fcr)Ur = dr minus Frc[Fcc]
minus1dc (14)
where
Fαβ =
nssum
s=1
Bsα
TSsαβB
sβ
dα =
nssum
s=1
Bsα
TGsα
and α and β denotes subscripts r and c The above system can be solved using PCGM withan appropriate preconditioner Mminus1 defined in the next section
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
10
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
61 A two-level preconditionerAs mentioned previously the continuity condition at the corner nodes is enforced strictly whilethose for the remaining interface boundary nodes is satisfied in a weak sense This fact isschematically illustrated in Fig(4) [12] As the iterations converge the continuity conditionat all interface nodes (both corner and remaining boundary nodes) is satisfied strictly Theassembly of the unknown vector at the corner nodes leads to the following partially assembledSchur complement system
Figure 4 Partial assembly of corner nodes (bull)
Ssrr Ss
rcBsc
nssum
s=1
BscTSs
crBsr
nssum
s=1
BscTSs
ccBsc
Usr
Uc
=
Fsr
0
(15)
where
Fsr = Ds
rBsrrj
and rj is the residual at the jth iteration of the PCGM and Dsr represents a block diagonal
weighting matrix which satisfies the following property
nssum
s=1
BsrTDs
rBsr = I
The diagonal entries of each block of Dsr are the reciprocal of the number of subdomains that
share the interface boundary nodes
The subdomain level remaining unknown vector Usr can be eliminated in parallel from Eq(15)
as
Usr = [Ss
rr]minus1(Fs
r minus SsrcB
scUc) (16)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
11
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Substituting Usr into the second block of Eq(15) leads to the following coarse problem
F lowast
ccUc = dlowastc (17)
where
F lowast
cc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
dlowastc = minus
nssum
s=1
BscTSs
cr[Ssrr]
minus1Fsr
The continuity of the solution field over the remaining interface nodes is satisfied next byaveraging the local results as
Ur =
nssum
s=1
BsrTDs
rUsr
After some algebraic manipulations the preconditioner can be expressed as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr + RT
0 [F lowast
cc]minus1R0 (18)
where
R0 =
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
7 A dual-primal iterative substructuring method of SPDEs
In this section the dual-primal domain decomposition method is introduced in the context ofstochastic PDEs This approach is an extension of FETI-DP [5] for SPDEs In this approachthe continuity condition at the corner nodes is enforced strictly and Lagrange multipliers areused to enforce the continuity condition weakly over the remaining interface nodes
Partial assembly of Eq(8) leads to the following subdomain equilibrium equation
Asii As
ir AsicB
sc
Asri As
rr AsrcB
sc
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc
Usi
Usr
Uc
=
Fsi
Fsr
nssum
s=1
BscTFs
c
(19)
where Bsc is a Boolean restriction operator that maps the global corner vector Uc to the local
corner vector Usc as
Usc = Bs
cUc
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
12
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Eq(19) can be rewritten in compact form as
AsUs = Fs (20)
Let Bsr be a block diagonal signed Boolean continuity matrix defined as
s=nssum
s=1
BsrU
sr = 0
Next the original finite element problem can be reformulated as an equivalent constrainedminimization problem as
1
2UT AU minus UT F rarr min (21)
subject to BU = 0
where
A =
A1
As
Ans
U =
U1
Us
Uns
F =
F1
Fs
Fns
B =[
(0 B1r 0) middot middot middot (0 Bs
r 0) middot middot middot (0 Bnsr 0)
]
By introducing a vector of Lagrange multipliers to enforce the weak compatibility constraintthe saddle point formulation of Eq(21) can be expressed as
L(U Λ) =1
2UT AU minus UT F + UT BT Λ (22)
Minimizing Eq(22) with respect to U and Λ leads to the following equilibrium system
Asii As
ir AsicB
sc 0
Asri As
rr AsrcB
sc Bs
rT
nssum
s=1
BscTAs
ci
nssum
s=1
BscTAs
cr
nssum
s=1
BscTAs
ccBsc 0
0
nssum
s=1
Bsr 0 0
Usi
Usr
Uc
Λ
=
Fsi
Fsr
nssum
s=1
BscTFs
c
0
(23)
where
Λ =
λ0
λN
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
13
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Figure 5 Lagrange multipliers are the forces required to connect the tore interface boundary
and λj is the nodal force required to satisfy compatibility at the remaining interface nodesas shown schematically in Fig(5)
Eliminating the interior unknowns Usi from Eq(23) as
Usi = [As
ii]minus1(Fs
i minusAsirU
sr minusAs
icBscUc) (24)
Substituting Eq(24) into Eq(23) leads to
Ssrr Ss
rcBsc Bs
rT
nssum
s=1
BscTSs
cr
nssum
s=1
BscTSs
ccBsc 0
nssum
s=1
Bsr 0 0
Usr
Uc
Λ
=
Gsr
nssum
s=1
BscTGs
c
0
(25)
where
Ssαβ = As
αβ minusAsαi[A
sii]
minus1Asiβ
Gsα = Fs
α minusAsαi[A
sii]
minus1Fsi
The subdomain level remaining unknown vector Usr can be obtained in parallel from Eq(25)
as
Usr = [Ss
rr]minus1(Gs
r minus SsrcB
scUc minus Bs
rT Λ) (26)
Substituting Eq(26) into Eq(25) leads to
[
Fcc minusFcr
Frc Frr
]
Uc
Λ
=
dc
dr
(27)
where
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
14
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Fcr =
nssum
s=1
BscTSs
cr[Ssrr]
minus1BsrT
Frc =
nssum
s=1
Bsr [S
srr]
minus1SsrcB
sc
Frr =
nssum
s=1
Bsr [S
srr]
minus1BsrT
dc =
nssum
s=1
BscT (Gs
c minus Sscr[S
srr]
minus1Gsr)
dr =
nssum
s=1
Bsr [S
srr]
minus1Gsr
Solving for Uc from Eq(27) gives the following coarse problem
FccUc = (dc + FcrΛ) (28)
Substituting Uc into Eq(27) leads to the following symmetric positive-definite Lagrangemultiplier system
(Frr + Frc[Fcc]minus1Fcr)Λ = dr minus Frc[Fcc]
minus1dc (29)
Eq(29) is solved using PCGM with a Dirichlet precondtioner defined as
M =
nssum
s=1
BsrD
srS
srrD
srB
srT (30)
8 Connection between the methods
The explicit forms of the coarse problem operators for the primal preconditioner in Eq(17) andfor the dual-primal operator in Eq(28) are the same and can be expressed as
F lowast
cc = Fcc =
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
Furthermore the algebraic form of the primal preconditioner in Eq(18) can be re-casted as
Mminus1 =
nssum
s=1
BsrTDs
r[Ssrr]
minus1DsrB
sr
+
nssum
s=1
BsrTDs
r[Ssrr]
minus1SrcBsc
[
nssum
s=1
BscT (Ss
cc minus Sscr[S
srr]
minus1Ssrc)B
sc
]minus1
nssum
s=1
BscTScr[S
srr]
minus1DsrB
sr
which has the same form of the dual-primal operator in Eq(29)
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
15
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
9 Parallel implementation
In this section we give an outline on parallel implementation of PCGM to solve the primalEq(14) and dual-primal Eq(29) interface problems As mentioned previously in PCGM thecoefficient matrix need not be constructed explicitly as only its effect on a vector is requiredThis matrix-vector product can be obtained concurrently by solving subdomain level problems(Dirichlet and Neumann) and a global level coarse problem
91 Primal methodIn this subsection we give a brief description of parallel implementation of Algorithm (1) to solvethe primal interface problem in Eq(14)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr minus Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 4 Parallel Matrix-Vector Product for Primal Method
1 Input (P)
2 Scatter Ps = BsrP
3 Compute vs1 = Ss
crPs
4 Gather V1 =
nssum
s=1
BsrT vs
1
5 Global Solve FccV2 = V1
6 Scatter vs2 = Bs
cV2
7 Compute vs3 = Ss
rcvs2
8 Update Qs = SsrrP
s minus vs3
9 Gather Q =
nssum
s=1
BsrTQs
10 Output (Q)
Multiplication of Schur complement matrix by a vector in step 3 step 7 and step 8 inAlgorithm (4) is computed by solving a corresponding Dirichlet problem as
vsα = Ss
αβvsβ
vsα = (As
αβ minusAsαi[A
sii]
minus1Asiβ)vs
β
This procedure is outlined in the following algorithm
Algorithm 5 Dirichlet Solver Procedure
1 Input (vsβ)
2 Compute vs1 = As
iβvsβ
3 Solve Asiiv
s2 = vs
1
4 Compute vs3 = As
αivs2
5 Compute vs4 = As
αβvsβ
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
16
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
6 Compute vsα = vs
4 minus vs3
7 Output (vsα)
The global problem in step 5 of Algorithm (4) is solved iteratively using PCGM equippedwith lumped preconditioner as
Mminus1cc FccV2 = Mminus1
cc V1
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Next the effect of the two-level preconditioner in step 12 of Algorithm (1) is computed bysolving a subdomain level Neumann problem and a global coarse problem The procedure isoutlined in the following algorithm
Algorithm 6 Two-Level Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter Fsr = Ds
rBsrrΓ
3 Local Solve Ssrrv
s1 = Fs
r
4 Compute dsc = Ss
crvs1
5 Gather dc =
nssum
s=1
BscT ds
c
6 Global Solve F lowastccZc = minusdc
7 Scatter Zsc = BsZc
8 Update vs2 = Fs
r + SsrcZ
sc
9 Local Solve SsrrZ
sf = vs
2
10 Gather Z =
nssum
s=1
BsrTDs
rZsf
11 Output (Z)
The local solve in step 3 and step 9 of Algorithm (6) constitute a subdomain level Neumannproblem of the form Ss
rrUsr = rs
r which can be solved using the following algorithm
Algorithm 7 Neumann-Solver Procedure
1 Input (rsr)
2 Solve[
Asii As
ir
Asri As
rr
]
X s
Usr
=
0rsr
3 Output (Usr )
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
17
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
The global solve of the coarse problem in step 6 of Algorithm (6) is conducted in parallelusing PCGM equipped with lumped preconditioner as
Mminus1cc F lowast
ccZc = minusMminus1cc dc
where
Mminus1cc =
nssum
s=1
BscTAs
ccBsc
Finally we summarize the parallel implementation of the PCGM to solve the primal interfaceproblem in the following flow chart
Figure 6 Flowchart of Parallel PCGM to solve the primal interface problem
92 Dual-primal methodIn this subsection we outline the parallel implementation of the Algorithm (1) to solve thedual-primal interface problem Eq(29)
For the jth iteration of Algorithm (1) the matrix-vector product in step 7 defined as
Qj = (Frr + Frc[Fcc]minus1Fcr)Pj
can be computed using the following algorithm
Algorithm 8 Parallel Matrix-Vector Product for Dual-Primal Method
1 Input (P)
2 Scatter Ps = BsrTP
3 Local Solve Ssrrv
s1 = Ps
4 Compute vs2 = Ss
crvs1
5 Gather V2 =
nssum
s=1
BscT vs
2
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
18
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
6 Global Solve FccV3 = V2
7 Scatter vs3 = Bs
cV3
8 Compute vs4 = Ss
rcvs3
9 Update vs5 = Ps + vs
4
10 Local Solve SsrrQ
s = vs5
11 Gather Q =
nssum
s=1
BsrQ
s
12 Output (Q)
The local solve in step 3 and step 10 of Algorithm (8) is calculated by solving a subdomainlevel Neumann problem as outlined in Algorithm (7) The global coarse problem in step 6 ofAlgorithm (8) is solved in parallel using PCGM with lumped preconditioner similar to the pro-cedure of solving the coarse problem in the primal preconditioner
Next the effect of the Dirichlet Preconditioner in step 12 of Algorithm (1) is obtained usingthe following algorithm
Algorithm 9 Dirichlet Preconditioner Effect Procedure
1 Input (rΓ )
2 Scatter rsΓ
= DsrB
srT rΓ
3 Compute Zs = Ssrrr
sΓ
4 Gather Z =
nssum
s=1
BsrD
srZ
s
5 Output (Z)
We summarize the parallel implementation of the PCGM to solve the dual-primal interfaceproblem in the following flow chart
Figure 7 Flowchart of Parallel PCGM to solve the dual-primal interface problem
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
19
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
10 Numerical results
For numerical illustrations to the aforementioned mathematical framework we consider astationary stochastic Poissonrsquos equation with randomly heterogeneous coefficients given as
part
partx[cx(x y θ)
partu(x y θ)
partx] +
part
party[cy(x y θ)
partu(x y θ)
party] = f(x y) in Ω
where the forcing term is
f(x y) = 10
For simplicity a homogeneous Dirichlet boundary condition is imposed as
u(x y θ) = 0 on partΩ
The random coefficients cx(x y θ) and cy(x y θ) are modeled as independent lognormal ran-dom variables The underlying Gaussian random variable has a mean 10 and standard deviation025
In PCGM implementation the forcing term is taken to be the initial residual and theiterations are terminated when the ratio of L2 norms of the current and the initial residualis less than 10minus5
GkΓminus SUk
Γ2
G0Γ2
6 10minus5
Numerical experiments are performed in a Linux cluster with InfiniBand interconnect (2Quad-Core 30 GHz Intel Xeon processors and 32 GB of memory per node) using MPI [7] andPETSc [6] parallel libraries The graph partitioning tool METIS [8] is used to decompose thefinite element mesh
101 Stochastic featuresFinite element discretization with linear triangular elements results in 202242 elements and101851 nodes The random coefficients and the response are represented by third order polyno-mial chaos expansion (L = 7 N = 9) leading to a linear system of order 1018510 Fig(8) showsa typical finite element mesh while Fig(9) shows a typical mesh decomposition The mean andthe associated standard deviation of the solution process are shown in Fig(10) and Fig(11)respectively Clearly the maximum value of the coefficient of variation of the solution field is020 Details of the stochastic features of the solution field are shown in Figs(12-17) throughthere Polynomial Chaos coefficients The mean and the standard deviation of the solution fieldcomputed using the dual-primal method (not shown here) exactly match the results from the pri-mal method In Figs(18-23) the Polynomial Chaos coefficient of Lagrange multipliers are shown
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
20
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Figure 8 A typical FEM mesh Figure 9 Mesh Partitioning using METIS
Figure 10 The mean of the solution filed Figure 11 The standard deviation of thesolution filed
Figure 12 Chaos coefficients u0 Figure 13 Chaos coefficients u1
102 Scalability studyFirstly we study the scalability of the algorithms with respect to the problem size where wefix the number of subdomains used to solve the problem to 100 while increasing both meshresolution in the spatial dimension and the Polynomial Chaos order as reported in Table(1)Evidently increasing mesh resolution by factor (10times) does not deteriorate the performance ofthe primal and the dual-primal algorithms Simultaneously increasing Polynomial Chaos orderfrom the first order to third order does not effect the performance of the methods Note thatfor a given spatial problem size (n) using the first order Polynomial Chaos expansion leads toa total problem size of (3 times n) and using the third order Polynomial Chaos expansion leads to
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
21
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Figure 14 Chaos coefficients u2 Figure 15 Chaos coefficients u3
Figure 16 Chaos coefficients u4 Figure 17 Chaos coefficients u5
Figure 18 Lagrange multipliers λ0 Figure 19 Lagrange multipliers λ1
a total problem size of (10 times n)
Secondly we fix the problem size in the spatial domain to (71389 dofs) and increase thenumber of subdomains used to solve the problem the results are presented in Table(2) The re-sults reported for first second and third order Polynomial Chaos expansion These performanceresults suggest that both the primal and the dual-primal methods are scalable with respect tonumber of subdomains Clearly the dual-primal method requires slightly less number of iter-
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
22
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Figure 20 Lagrange multipliers λ2 Figure 21 Lagrange multipliers λ3
Figure 22 Lagrange multipliers λ4 Figure 23 Lagrange multipliers λ5
Table 1 Iteration counts of the primal and dual-primal methods for fixed number of subdomain(100)
Problem size PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
10051 10 10 10 8 8 820303 11 11 11 8 8 840811 11 12 12 8 9 959935 13 14 14 10 10 1071386 12 12 12 9 9 980172 11 11 12 8 8 8101851 12 12 12 9 9 9
ations to converge than the primal method This may be attributed to fact that the startinginitial residual in the dual-primal method is smaller than the starting initial residual in theprimal method However the rate of convergence of both the methods is almost the same asindicated in Figs(24-26)
Thirdly we fix problem size per subdomain while increase the overall problem size by adding
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
23
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Table 2 Iteration counts of the primal and dual-primal methods for fixed problem size of71386 dof
CPUs PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
20 10 11 11 8 8 840 12 12 12 9 9 960 12 13 13 9 9 980 12 12 13 9 9 9100 12 12 12 9 9 9120 12 12 12 9 9 9140 11 11 12 8 8 8160 12 12 12 8 8 9
Table 3 Iteration counts of the primal and dual primal methods for fixed problem size persubdomain (101851 dof)
Subdomains PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
100 10 10 10 8 8 8200 10 10 11 8 8 8400 12 13 13 9 9 9600 11 12 12 8 8 9800 12 13 13 9 9 9
more subdomains Table (3) shows the performance of the primal and the dual-primal methodsfor the first second and third order Polynomial Chaos expansion Again these results suggestthat both the primal and the dual-primal methods are scalable with respect to fixed problemsize per subdomain
Fourthly we study the performance of the primal and the dual-primal methods with respectthe strength of randomness of the system parameters Table (4) shows the performance of thealgorithms when the Coefficient of variation (CoV ) of the random parameters is varied from(5 to 50) Clearly the strength of the randomness does not degrade the performance of thealgorithms as the number of PCGM iteration is nearly constant
Finally it worth mentioning that the performances of the primal method and the dual-primalmethod demonstrate similar trend and this fact points out the similarity (duality) between thetwo methods through numerical experiments
11 Conclusion
Novel primal and dual-primal domain decomposition methods are proposed to solve the algebraiclinear system arising from the stochastic finite element method The primal method is equippedwith a scalable two-level preconditioner The numerical experiments illustrate that the proposed
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
24
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 24 The relative PCGM residualhistory for the case of 160 subdomains andfirst PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 25 The relative PCGM residualhistory for the case of 160 subdomains andsecond PC order
0 2 4 6 8 10 12 1410
minus5
10minus4
10minus3
10minus2
10minus1
100
101
Iteration number
Rel
ativ
e re
sidu
al
PminusDDMDPminusDDM
Figure 26 The relative PCGM residualhistory for the case of 160 subdomains andthird PC order
primal method and the dual-primal method are numerically scalable with respect to problemsize subdomain size and number of subdomains Both algorithms exhibit similar convergencerates with respect to the coefficient of variation (ie the level of uncertainty) and PolynomialChaos order Both primal and dual-primal iterative substructuring methods exploit a coarsegrid in the geometric space At this point it is worth mentioning that adding a coarse grid inthe stochastic space would be beneficial in the cases where a large number of random variablesare required to prescribe uncertainty in the input parameters This aspect is currently beinginvestigated by the authors
Acknowledgments
The authors gratefully acknowledge the financial support from the Natural Sciences andEngineering Research Council of Canada through a Discovery Grant Canada Research ChairProgram Canada Foundation for Innovation and Ontario Innovation Trust Dr Ali Rebainefor his help with ParMETIS graph-partitioning software
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
25
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26
Table 4 Iteration counts of the primal and dual primal methods for different CoV fixedproblem size (101851 dofs) and fixed number of subdomains (100)
CoV PDDM DP-DDM
1st 2nd 3rd 1st 2nd 3rd
005 10 10 10 8 8 8010 10 10 10 8 8 8015 10 10 10 8 8 8020 10 10 10 8 8 8025 10 10 10 8 8 8030 10 10 11 8 8 8035 10 10 11 8 8 8040 10 11 11 8 8 9045 10 11 12 8 8 9050 10 11 12 8 8 9
References[1] Sarkar A Benabbou N and Ghanem R 2009 International Journal for Numerical Methods in Engineering 77
689ndash701[2] Sarkar A Benabbou N and Ghanem R 2010 International Journal of High Performance Computing
Applications Accepted[3] Subber W Monajemi H Khalil M and Sarkar A 2008 International Symposium on Uncertainties in Hydrologic
and Hydraulic
[4] Subber W and Sarkar A 2010 High Performance Computing Systems and Applications (Lecture Notes in
Computer Science vol 5976) ed et al D M (Springer Berlin Heidelberg) pp 251ndash268[5] Farhat C Lesoinne M and Pierson K 2000 Numerical Linear Algebra with Application 7 687ndash714[6] Balay S Buschelman K Gropp W D Kaushik D Knepley M G McInnes L C Smith B F and Zhang H 2009
PETSc Web page httpwwwmcsanlgovpetsc[7] Message passing interface forum httpwwwmpi-forumorg[8] Karypis G and Kumar V 1995 METIS unstructured graph partitioning and sparse matrix ordering system[9] Ghanem R and Spanos P 1991 Stochastic Finite Element A Spectral Approach (New York Springer-Verlag)
[10] Saad Y 2003 Iterative methods for sparse linear systems 2nd ed (Philadelphia)[11] Toselli A and Widlund O 2005 Domain Decomposition Methods - Algorithms and Theory (Springer Series
in Computational Mathematics vol 34) (Berlin Springer)[12] Li J and Widlund O 2010 International Journal for Numerical Methods in Engineering 66(2) 250ndash271
High Performance Computing Symposium (HPCS2010) IOP PublishingJournal of Physics Conference Series 256 (2010) 012001 doi1010881742-65962561012001
26