multi-objective design optimisation of ﬂuid energy...

Multi-ObjecTive design Optimisation of fluid eneRgy machines

Deliverable number D3.3Deliverable name Report and prototype of compressible flow solverWorkpackage number WP3WP leader TU Delft

Due date M18Submission date 31/05/2017

Project no. 678727Duration: 36 monthsClassification: PUFile name: MOTOR D3.3

The work leading to these results has received funding from the European Community’s Horizon 2020Programme (2014-2020) under grant agreement no678727.The opinions expressed in the document are of the authors only and no way reflect the European Com-mission’s opinions. The European Union is not liable for any use that may be made of the information.

This document including the information is property of the MOTOR consortiumand shall not be copied or disclosed in any form to any party outside the

consortium without the written permission of the MOTOR General Assembly.

D3.3 MOTOR-678727

Authors and contributorsAuthor Organisation E-mailMatthias Moller TU Delft [email protected]

Document reviewersName Organisation DateKees Vuik TU Delft 29/05/2017

Document historyRelease Who Date CommentV1.0 Matthias Moller 28/05/2017

Quality controlAuthor Name DatelWP leader Matthias Moller 28/05/2017Internal review Kees Vuik 29/05/2017Coordinator Matthias Moller 31/05/2017

D3.3 MOTOR-678727

Contents

1 Executive summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5

3 Compressible flow solver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.1 Governing equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73.2 Spatial discretisation by isogeometric analysis . . . . . . . . . . . . . . . . . . . . . . . . 83.3 Temporal discretisation by explicit Runge-Kutta methods . . . . . . . . . . . . . . . . . . . 103.4 Algebraic flux correction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11

3.4.1 Artificial viscosity operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 113.5 Linearised FCT algorithm . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.5.1 Primitive variable limiter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133.6 Boundary treatment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 153.7 Implementation details . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15

3.7.1 FDBB library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 163.7.2 G+Smo library . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20

3.8 Numerical results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.8.1 Stationary isentropic vortex . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 203.8.2 Sod’s shock tube problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21

4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

D3.3 MOTOR-678727

1 Executive summary

This deliverable presents the results of the MOTOR Module MM3.1b – Compressible flow solver. Thenewly developed simulation tool has been implemented into the open-source C++ library G+Smo (Geom-etry + Simulation Modules), coordinated by the partner JKU, which provides efficient implementations ofstate-of-the-art technologies in isogeometric analysis (IgA). From the outset, MM3.1b has been designedwith high-performance computing capabilities in mind since it will form the basis for the detailed two- andthree-dimensional computational fluid dynamics (CFD) simulations of performance-critical parts of theMOTOR Product MP4: Screw Machines in work package 8. The result of the envisioned CFD simulationsare flow coefficients, which will be used to improve the accuracy of a 0D chamber model simulator thathas been developed by the partner TU Dortmund and will be used to optimise the geometry of a screwmachine with constant rotor pitch (Task 8.2) and variable rotor pitch (Task 8.3) in work package 8.The objectives set out in the proposal for the development of MM3.1b are as follows:

1. Implementation of an IgA-flow solver for stationary and transient compressible flows in two- andthree-dimensional twin screw compressor configurations

2. Realisation of high-resolution algebraic flux-correction techniques in the IgA setting

3. Implementation and testing of different approaches for imposing boundary conditions

4. Development of efficient multigrid solution techniques for linear/nonlinear problems

5. Validation of the flow solver for different benchmark problems

To test the applicability of the different computational building blocks (high-resolution algebraic flux correc-tion paradigm, treatment of boundary conditions, etcetera) in the framework of isogeometric analysis, aproof-of-concept implementation of a two-dimensional solver for inviscid compressible flows has been im-plemented during M1-M18. The governing equations for inviscid compressible flows are the compressibleEuler equations, which represent a system of conservation laws for mass, momentum, and total energy.The solver has been formulated in terms of the conservative variables density, momentum, and total en-ergy. A fully coupled solution approach is adopted, that is, the coupled system of d+ 2 partial differentialequations for the d+2-dimensional vector of state variables is solved as a whole. The governing equationsare closed by the equation of state (EOS) for an ideal polytropic gas.The flow solver implements a generalisation of the flux-corrected transport algorithm published in [15,16]to quadratic, and cubic tensor-product B-Spline basis functions making use of the primitive variables –pressure and density – to steer the limiting procedure. The original high-order target discretisation is usedwhenever possible and a smooth blending towards a low-order discretisation is adopted to prevent thecreation of spurious oscillations near discontinuities. The transient flow solver employs second- and third-order strong stability preserving explicit Runge-Kutta time integration schemes to achieve high temporalaccuracy and enable the use of larger time-step sizes compared to first-order explicit Euler method.Boundary conditions are imposed in weak sense by integrating a local Roe-type Riemann solver intothe boundary integral terms. The approach has been validated for various types of boundary conditionsneeded for the simulation of screw compressor machines. The compressible flow solver has been vali-dated for several two-dimensional benchmark problems with and without shock waves.To enable the simulation of large 3D problems, a high-performance variant of the flow solver has beenimplemented in parallel. It is based on an efficient re-formulation of the entire IgA discretisation in terms

D3.3 MOTOR-678727

of sparse-matrix vector multiplications (sparse BLAS level 2 routines) and element-wise vector operations(dense BLAS level 1 routines). The mathematical foundations validated by the proof-of-concept imple-mentation have remained unchanged. To enable the later porting of the solver to GPUs and, possibly,other accelerator devices, the essential building blocks of the compressible flow solver, that is, conserva-tive and primitive state variables and conversion routines between them, inviscid fluxes, equations of state(EOS), and Riemann invariants have been implemented as C++ expression template library FDBB avail-able as stand-alone open-source project (https://gitlab.com/mmoelle1/FDBB). Testing and perfor-mance evaluation of this library for several high-performance linear algebra back-ends (ArrayFire, Blaze,Eigen, IT++, VexCL) is complete. The realisation of the HPC variant of the flow solver is ongoing.In conclusion, the proof-of-concept implementation of the inviscid compressible flow solver is operationalfor single-patch geometries in 2D and shows that the mathematical concepts intended in the proposalare applicable in the IgA setting. Due to the adopted fully explicit time-integration approach, the solutionof linear and/or nonlinear problems was not required so that no multigrid solution algorithm has beenimplemented so far. The extra work invested into the HPC variant of the flow solver is considered worthysince the speed-ups observed for preliminary performance benchmark studies are significant.

2 Introduction

Objectives. The overall objective of the MOTOR Module MM3.1b is to develop a compressible flowsolver for the accurate numerical simulation of the complex flow patterns inside screw machine geometriesin work package 8 (WP8). The isogeometric analysis (IgA) framework has been chosen since it allows forrepresenting both the computational geometry and the numerical approximation by the same set of higher-order basis functions, thereby avoiding the creation of additional boundary approximation errors that aretypically observed in non-curvilinear finite elements (FE) and finite volume (FV) discretisations. Moreprecisely, FE and FV methods defined on computational meshes with straight-sided elements/volumes(i.e. simplex grids, quadrilateral/hexahedra grids) cannot represent curved boundaries exactly. This is awell-known problem of sub-parametric FE schemes (both continuous and discontinuous Galerkin), whichuse higher-order basis functions for approximating the solution but resort to lower-order approximationsof the geometry. Similar problems exist for FV schemes unless curvilinear meshes are adopted.

Approach. This shortcoming of standard FE/FV methods can be overcome by adopting the isogeomet-ric analysis framework with high-order B-spline basis functions or their non-uniform rational counterparts(NURBS), which constitute the standard format in computer-aided design (CAD) tools. For the compress-ible flow solver that is discussed in this deliverable, quadratic and cubic B-spline basis functions have beenchosen both for representing the computational geometry and for approximating the numerical solution. Itshould be mentioned that this approach still leads to non-exact representations of the very complex screwmachine geometry. However, as described in detail in deliverable D8.1 ’User manual pre-processor’, anadaptive spline-based pre-processing tool has been developed, which is able to generate highly accuratebi- and trivariate parameterisations of 2D and 3D screw machine geometries, respectively, thereby mak-ing use of adaptively refined quadratic and cubic B-spline basis functions in each spatial direction. Togive the reader a brief impression of the complexity of the screw machine geometries and of the quality ofthe generated B-spline parameterisations we present analysis-suitable parameterisations of the male andfemale rotor in Figure 1. Here, analysis-suitable means that the mapping from the parameter space to thephysical space is bijective, which is a crucial prerequisite for the successful application of the isogeometric

D3.3 MOTOR-678727

https://gitlab.com/mmoelle1/FDBB

Figure 1: Bivariate parameterisation of the male (left) and female (right) rotor of a screw machine geometrygenerated by the adaptive spline-based pre-processor described in delivarable D8.1.

analysis machinery. The technical details of the IgA approach are described in more detail in Section 3.2.It is clear that for practical applications, both rotors need to gear into each other, which will require furthermodification of the two stand-alone parameterisations so as to meet along the common interface. Thenecessary adjustment of the geometry pre-processor is ongoing work in WP8.

State of the art It is commonly accepted in the community that the generation of accurate compu-tational meshes is by far the largest challenge in the numerical simulation of screw machines. To thebest of our knowledge, only few commercial tools are available at the market. TwinMesh (https://www.twinmesh.com) is a hexahedral grid-generation pre-processor for the Ansys CFX simulation platform.SCORG (http://pdmanalysis.co.uk) is a complete design and analysis suite for screw machines withdirect link to the solver packages Ansys CFX, Pumplinx and StarCCM++. The aforementioned flow solverpackages are based on sophisticated FE/FV schemes or hybrid variants thereof and have reached a veryprofessional maturity level. However, none of them fully exploits the potential offered by the isogeomet-ric analysis framework to represent the complex geometries with very high accuracy. This is the mainmotivation for the development of MM3.1b – Compressible flow solver

Outline of the deliverable. The developed IgA-flow solver constitutes a generalisation of the low-orderFE approach published in [15, 16] to higher-order B-spline based isogeometric analysis. To prevent thereproduction of the aforementioned book chapter, this deliverable focuses on the novelties of the IgAapproach and refers to the original publication whenever no significant adjustments of the standard pro-cedure was required. Moreover, emphasis is placed on the technical realisation of the HPC variant of theflow solver, which required significant implementation work.

D3.3 MOTOR-678727

https://www.twinmesh.com

https://www.twinmesh.com

http://pdmanalysis.co.uk

3 Compressible flow solver

3.1 Governing equations

The appropriate mathematical model for describing the complex flow phenomena inside screw machinegeometries are the compressible Navier-Stokes equations, which constitute a system of nonlinear con-servation laws for the mass, the momentum, and the total energy. For the proof-of-concept flow solverimplemented in M1-M18 and described in this deliverable, a simplified mathematical model has beenadopted, which results from neglecting the dissipative, transport phenomena of viscosity, mass diffusion,and thermal conductivity. The resulting governing equations are the compressible Euler equations [8]:

∂ρ

∂t+∇ · (ρv) = 0 (1)

∂(ρv)

∂t+∇ · (ρv ⊗ v + pI) = 0 (2)

∂(ρE)

∂t+∇ · (ρEv + pv) = 0 (3)

Here, ρ is the density, v is the velocity vector, E is the total energy, and I stands for the d × d identitytensor, where d ∈ 1, 2, 3 denotes the number of spatial dimensions. These variables are related to thepressure p by the equation of state (EOS) for ideal polytropic gases

p = (γ − 1)

(ρE − ρ|v|2

2

)(4)

Here, γ denotes the heat capacity ratio, which equals γ = 1.4 for dry air within a temperature range of0C − 200C. This setup is considered appropriate for validating the compressible flow solver. Furtherextensions towards other EOS might be necessary in the sequel of the MOTOR project.The system of conservation laws (1)–(3) can be rewritten in divergence form as follows [15]:

∂U

∂t+∇ · F = 0 (5)

Here, U : Rd → Rd+2 denotes the state vector of conservative variables

U =[U1, . . . , Ud+2

]>=[ρ, ρv, ρE

]> (6)

and F : Rd+2 → R(d+2)×d stands for the tensor of inviscid fluxes

F =

F 11 . . . F d1...

. . ....

F 1d+2 . . . F dd+2

=

ρvρv ⊗ v + pIρEv + pv

(7)

The unsteady Euler equations are equipped with initial conditions prescribed at time t = 0

U(x, 0) = U0(x) in Ω ⊂ Rd (8)

The imposition of boundary conditions follows the approach described in [15]. In essence, two types ofboundary conditions are considered: Dirichlet boundary condition on the boundary ΓD

U = G(U,U∞) on ΓD (9)

D3.3 MOTOR-678727

and Neumann (normal flux) boundary condition on the boundary ΓN

n · F = Fn(U,U∞) on ΓN (10)

where n is the outward unit normal vector. Here, U∞ denotes the vector of ’free stream’ solution values,which are calculated as outlined in Section 3.6 and, in more detail, in the original publication [15].

3.2 Spatial discretisation by isogeometric analysis

The variational formulation of the compressible Euler equations is obtained by multiplying the strong form(5) by a test function W , integrating over the domain Ω and performing integration by parts [15]∫

ΩW∂U

∂t−∇W · Fdx +

∫ΓN

WFn ds = 0, ∀W (11)

The surface integral corresponding to the Dirichlet boundary part vanishes since the test function W issupposed to be zero on ΓD, which is common practice in variational calculus.Following the approach utilised in [15] for classical finite elements, both the numerial solution Uh ≈ U andthe numerical fluxes Fh ≈ F are interpolated using the same set of basis functions ϕjNj+1.

Uh(x, t) =N∑j=1

Uj(t)ϕj(x) (12)

Fh(x, t) =N∑j=1

Fj(t)ϕj(x) (13)

Here, Fj(t) := F(Uj(t)) denotes the value of the inviscid flux function evaluated at the j-th solutioncoefficient at time t. The above approach is known as Fletcher’s group finite element formulation [5].Substitution of approximations (12)–(13) into the variational form (11) and replacement of the test functionW by all possible basis functions ϕi leads to the following system of semi-discretised equations for thetime-dependent coefficients Uj(t) of the numerical solution; cf. the approach in [15]

N∑j=1

(∫Ωϕiϕjdx

)dUjdt−

N∑j=1

(∫Ω∇ϕiϕjdx

)· Fj +

∫Γn

ϕiFn ds = 0 (14)

Strictly speaking, each occurrence of ϕi needs to be replaced by I ⊗ϕi, where I is the d+ 2 dimensionalidentity tensor and ’⊗’ denotes the Kronecker product, in order to account for the fact that Uj and Fj arevector-value coefficients and local tensors, respectively. To further simplify the notation, let us define theconsistent mass matrix MC := mij and the discrete gradient operator C := cij as follows

mij =

∫Ωϕiϕj dx, cij =

∫Ωϕi∇ϕj dx (15)

Then the system of semi-discretised equations (14) can be written compactly in matrix form

MCdUkdt−

d∑l=1

[C l]>F lk + Sk(U) = 0 for k = 1, . . . , d+ 2 (16)

D3.3 MOTOR-678727

where superscript l refers to the l-th spatial component of the discrete gradient operator C and the tensorof inviscid fluxes F, respectively, and subscript k stands for the component that corresponds to the k-thvariable. Here, Sk(U) accounts for the contribution of boundary fluxes to be discussed in Section 3.6.The major difference between the approach developed in [15] and the one described here lies in thechoice of basis functions. Following the concept of single-patch isogeometric analysis, the computationaldomain Ω ⊂ Rd needs to be topologically equivalent to the d-dimensional hypercube Ω = (0, 1)d, whichis referred to as the parameter domain. Then there exists a bijective (i.e. invertible) mapping G : Ω→ Ω.This approach is quite different from the common practice in parametric finite elements, were the mappingbetween parameter and physical space (read element) is defined individually for each element.A common practice in isogeometric analysis, is to approximate the so-called geometric mapping G by

G : Ω→ Ωh, with G(ξ) =n∑j=1

Cjϕj(ξ) (17)

where Cj are the points of the control net and ϕj(ξ)nj=1 denotes an appropriate set of basis functionsdefined on the parameter domain Ω. It should be noted that G ≡ G and, consequently, Ω ≡ Ωh if an exactrepresentation of the physical geometry Ω can be achieved with the aid of the basis ϕj(ξ)nj=1. For thesake of readability, we will drop subscript h in what follows and assume that G ≡ G holds.For the compressible flow solver described in this deliverable, we have restricted ourselves to tensor-product B-spline basis functions. That is, in the two-dimensional case (d = 2), the parameter domaincorresponds to the unit square, Ω = (0, 1)2, and each basis function is defined as the product of twounivariate B-spline basis functions ϕjξ(ξ) and ϕjη(η), respectively

ϕj(ξ) := ϕjξ(ξ)ϕjη(η) (18)

Due to the tensor-product construction, the mapping between the global index j ∈ 1, . . . , n and thelocal indices jξ ∈ 1, . . . , nξ and jη ∈ 1, . . . , nη of the univariate functions is given by

j := (jη − 1)nξ + jξ (19)

Univariate B-spline basis functions ϕi,p of order p are defined by the Cox-de Boor recursion formula [4]

ϕi,0(ξ) =

1 if ξi ≤ ξ < ξi+1

0 otherwise(20)

ϕi,p(ξ) =ξ − ξiξi+p − ξi

ϕi,p−1(ξ) +ξi+p+1 − ξξi+p+1 − ξi+1

ϕi+1,p−1(ξ) (21)

Here, Ξ = ξ1, ξ2, . . . , ξn+p+1 is the so-called knot vector, which is a non-decreasing set of coordinatesξi in the parameter space. It should be noted that the derivative of a p-th order B-spline basis function isa B-spline basis function of order p− 1 so that the properties of (20)–(21) carry over to their derivatives.The so-defined univariate B-spline basis functions feature the following amenable properties [4]:

(P1) The compact support property, supp ϕi,p(ξ) = [ξi, ξi+p+1), ensures that matrices arising from bi-linear forms are sparse and, moreover, exhibit a regular band-structure, which makes it possible toadopt hardware-optimised matrix-storage formats and even sparse-banded BLAS techniques [1].

D3.3 MOTOR-678727

(P2) The strict positiveness, ϕi,p(ξ) > 0 for all ξ ∈ (ξi, ξi+p+1), ensures that all entries of the consistentmass matrix are strictly positive. This makes it possible to perform row-sum mass lumping, i.e.

ML := diagmi, mi :=

n∑j=1

∫Ωh

ϕi(x)ϕj(x) dx =

n∑j=1

∫Ωϕi(ξ)ϕj(ξ) | det J(G)|],dξ > 0 (22)

without loosing or gaining mass and/or rendering the lumped mass matrix singular. Here, det J(G)is the determinant of the Jacobian matrix of the geometric mapping G : Ω → Ω, which is strictlynon-zero due to the of bijectiveness of the mapping.

(P3) The partition of unity property,∑

i ϕi,p(ξ) = 1 for all ξ ∈ (0, 1), ensures that the discrete divergenceoperator C := cij has zero row sums

∑j cji = 0.

It can be easily shown, that the above properties carry over to the multivariate basis functions.The tensor-product construction allows for using different approximation orders pξ and pη in each spatialdirection and for creating both equidistantly and non-equidistantly spaced knot vectors Ξ and H .For a more detailed introduction into the concept of isogeometric analysis and an extensive description ofknot insertion and order elevation procedures the interested reader is referred to the textbook on IgA byCottrell, Hughes, and Bazilevs [4]. For the remainder of this deliverable, we assume that the geometricmapping (17) is given either by manual construction or as the result of the adaptive Spline-based geometrypre-processing tools described in Section 3.3 of deliverable D8.1.Starting from the B-spline basis BG := ϕj(ξ)nj=1 of the geometry mapping (17), a basis BU :=

ϕj(ξ)Nj=1 for approximating the numerical solution (12) and the fluxes (13) is constructed by insert-ing knots and/or raising the polynomial order of B-spline basis functions [4]. Both operations preserve theshape of the geometry Ω exactly, that is, there exists another set of control points Cj such that

G(ξ) =n∑j=1

Cjϕj(ξ) =N∑j=1

Cjϕj(ξ) (23)

In the current implementation of the compressible flow solver in the G+Smo library, the entries of thesystem matrices (15) are pre-assembled and stored during the initialisation phase. Consider for instance

mij =

∫Ωϕi(x)ϕj(x) dx =

∫Ωϕi(ξ)ϕj(ξ) | det J(G)|dξ (24)

The integral in the parameter domain is evaluated numerically by applying Gaussian quadrature rules inan ’element-wise’ manner, that is, for sub-domains of the form ωkl = (ξk, ξk+1) × (ηl, ηl+1). It shouldbe noted that more efficient assembly techniques have been developed in the literature [2,22]. However,since the coefficient matrices are assembled once and for all at the beginning of the simulation no extraeffort has been spent so far on speeding-up this step. In the course of the simulation the system operatorsare formed efficiently by applying sparse matrix-vector multiplications (SpMV) following expression (16).

3.3 Temporal discretisation by explicit Runge-Kutta methods

Let us represent system (16) in an even more abstract matrix form as follows

MdU

dt= R(U) (25)

D3.3 MOTOR-678727

Here, M = I ⊗MC is block counterpart of the consistent (or lumped) mass matrix and R(U) representsa nonlinear operator acting on the state vector U . The current implementation of the flow solver adoptsexplicit strong stability preserving (SSP) Runge-Kutta time integration schemes [6] of order two

MU (1) = MUn + ∆tR(Un) (26)

MUn+1 =1

2MUn +

1

2

(MU (1) + ∆tR(U (1))

)(27)

and three, respectively

MU (1) = MUn + ∆tR(Un) (28)

MU (2) =3

4MUn +

1

4

(MU (1) + ∆tR(U (1))

)(29)

MUn+1 =1

3MUn +

2

3

(MU (2) + ∆tR(U (2))

)(30)

If M = I ⊗ML, as it is the case for the algebraic flux-correction schemes presented below, then theabove Runge-Kutta schemes reduce to scaling the right-hand sides by the inverse of a diagonal matrix.

3.4 Algebraic flux correction

It is well known that standard approaches like the Galerkin method (14) tend to produce non-physical solu-tion approximations with undershoots and overshoots in the vicinity of steep gradients and discontinuities.These spurious oscillations not only lead to the visual pollution of the numerical solution but it also causesphysical quantities like density and pressure to become negative locally, which leads to the immediatebreak-down of the simulation. A common mitigation strategy is to add artificial viscosities to prevent thecreation of spurious oscillations either in linear or nonlinear, i.e. shock-capturing manner. However, manyof these approaches require the user to specify one or more problem-dependent parameters.As an alternative to these standard techniques, the algebraic flux-correction (AFC) paradigm has beendeveloped in a series of publications [9–13, 15–19]. The general concept is to start from the originalGalerkin (semi-)discretisation of the problem at hand and derive a low-order counterpart, which ensuresthat no undershoots and overshoots are created. In a second step, unnecessary artificial viscosities areremoved locally with the aid of a flux limiter. In particular, the compressible flow solver makes use of thelinearised FCT algorithm introduced in [12] and generalised to the Euler equations in [16]. We thereforepresent only the main building blocks in this deliverable and refer to the original publication for full details.

3.4.1 Artificial viscosity operators

One of the core ingredients of the AFC paradigm is the construction of artificial viscosities. The homo-geneity property of the compressible Euler equations ensures that [8]

F = AU, where A =∂F

∂U(31)

is the flux Jacobian, which can be decomposed as follows

e ·A = R(e)Λ(e)R−1(e) (32)

D3.3 MOTOR-678727

Here, Λ(e) is the diagonal matrix of eigenvalues

λ1 = e · v − c, λ2 = · · · = λd+1 = e · v, λd+2 = e · v + c (33)

and R(e) is the matrix of right eigenvalues with c =√γp/ρ denoting the speed of sound. Close-form

expression for the eigenvectors are given for instance in [8,27].Using the zero-sum property of the discrete divergence operator following from (P3) and the fact that

cji + cij = sij (34)

one can show [17] that away from the boundary, i.e. where sij = 0, we obtain [15]

Rhighi :=

N∑j=1

cji · Fj =∑j 6=i

eij · (Fj − Fi), eij =cji − cij

2(35)

Moreover, Roe has demonstrated that the density-averaged Roe mean values [24]

ρij =√ρiρj (36)

vij =

√ρivi +

√ρjvj

√ρi +

√ρj

(37)

Hij =

√ρiHi +

√ρjHj

√ρi +

√ρj

(38)

cij =

√√√√(γ − 1)

(Hij −

|v2ij |2

)(39)

admit the further decomposition of the flux difference in (35) into the difference between solution valuesmultiplied by the flux Jacobian evaluated for the Roe mean values

Fj − Fi = Aij(Uj − Ui) (40)

As a final result, expression (35) can be rewritten as follows [15]

Rhighi =

∑j 6=i

eij · Aij(Uj − Ui) (41)

Following the derivation procedure detailed in [17], a non-oscillatory low-order scheme can be constructedby eliminating the negative eigenvalues from the operators eij · Aij and eji · Aji, respectively. This canbe achieved in conservative manner by adding tensorial artificial viscosities of the form [17]

Dij := ‖eij‖Rij |Λij |R−1ij (42)

to both off-diagonal operators (j 6= i) and subtracting the same amount from the diagonal operatorseii · Aii and ejj · Ajj , respectively. In a practical implementation this can be accomplished by setting

Rlowi =

∑j 6=i

(eij · Aij +Dij

)(Uj − Ui) (43)

Alternative strategies for defining artificial viscosity operators are given in [15].

D3.3 MOTOR-678727

3.5 Linearised FCT algorithm

The addition of artificial viscosities to the high-order discretisation is not enough to ensure that the low-order solution is free of spurious oscillations. The intrinsic coupling of degrees of freedom by the consis-tent mass matrix needs to be removed by replacing MC by its lumped counterpart ML. By collecting theaforementioned modifications in the raw antidiffusive correction term F (U) we obtain [15]

MCdU

dt= Rhigh(U) + S(U) (44)

⇔ MLdU

dt= Rlow(U) + S(U) + F (U) (45)

where S(U) accounts for contributions at the boundary. The general idea of the AFC paradigm is toinvestigate the local smoothness of the solution U and to constrain the raw antidiffusive correction termF (U) in regions, where this is necessary to prevent the generation of undershoots and overshoots.The current implementation of the flow solver adopts the linearised FCT algorithm from [16]. This two-steppredictor-corrector approach first computes a preliminary solution using the low-order scheme

MLdU

dt= Rlow(U) + S(U) (46)

and computes an improved approximation to the end-of-step solution by applying a limited antidiffusion

MLUn+1 = MLU

low + ∆tF (U low, Un) (47)

Here, Un is the solution from the previous time step and U low is the solution to the predictor (46), whichis currently computed by the explicit SSP Runge-Kutta schemes (26)–(27) and (28)–(30), respectively.

3.5.1 Primitive variable limiter

The constrained antidiffusion term F (U low, Un) is calculated using the synchronised flux limiting proce-dure developed in [16]. To start with, a viable linearisation of the raw antidiffusive fluxes is given by

Fi =∑j 6=i

Fij , Fij = (I ⊗mij)(Ulowi − U low

j ) +Dlowij (U low

i − U lowj ) (48)

where Dlowij = Dij(U

lowij ) denotes the evaluation of the viscosity operator at the Roe mean values based

on the predicted low-order solution. Moreover, the low-order approximation to the time derivative reads

U low = M−1L

[R(U low) + S(U low)

](49)

The general philosophy of FCT schemes is to multiply the anti-symmetric fluxes Fij by a symmetriccorrection factor 0 ≤ αij ≤ 1 to obtain the amount of constrained antidiffusive correction

Fi =∑j 6=i

αijFij (50)

The procedure for calculating the correction factors αij based on the primitive variables density and pres-sure and the rational behind it is described in [16]. Since density is both a conservative and a primitive

D3.3 MOTOR-678727

variable , its values ρi are known and so are the raw antidiffusive fluxes fρij , which correspond to the firstcomponent of Fij . The conversion formulas for the pressure variable read

pi = (γ − 1)

[(ρE)i −

‖(ρv)i‖2

2ρi

](51)

fpij = (γ − 1)

[fρEij +

‖vi‖2

2fρij − vi · fρvij

](52)

First, both variables are constrained independently by applying the scalar flux limiting procedure from [16]to ui ∈ ρi, pi and fuij ∈ f

ρij , f

pij. The algorithmic steps are as follows:

1. Compute the sums of positive/negative antidiffusive fluxes per degree of freedom

P+i =

∑j 6=i

max0, fuij, P−i =∑j 6=i

min0, fuij (53)

2. Compute the distance to a local maximum/minimum of the low-order solution

Q+i = umax

i − uLi , Q−i = umini − uLi (54)

3. Compute the optimal correction factors per degree of freedom

R+i = min

1,miQ

+i

∆tP+i

, R−i = min

1,miQ

−i

∆tP−i

(55)

4. Compute symmetric correction factors αuij = αuji

αuij = minRij , Rji, Rij =

R+i , if fuij ≥ 0

R−i , if fuij < 0(56)

Finally, a synchronised correction factor αij for (50) is defined as [14,20,21]

αij = minαρij , αpij (57)

Let us remark that for all numerical examples presented in Section 3.8, the flux-corrected coefficients uiof the control variables u - density and pressure - satisfied the constraint

umini ≤ ui ≤ umax

i (58)

without the application of additional post-processing ’tricks’ like the ’failsafe’ strategy from [16].It should be noted that the above limiting procedure has been developed for low-order nodal finite ele-ments, where ui equals the solution value at node i. However, one can easily show that the imposition ofconstraint (58) on the coefficients of a B-spline basis expansion ensures that the flux-corrected solutionstays within the bounds set up by the low-order predictor. Let us assume that for some x∗ ∈ Ω we have

u(x∗) =

N∑j=1

ujϕj(x∗) >

N∑j=1

umaxj ϕj(x

∗) = umax(x∗) (59)

D3.3 MOTOR-678727

This is equivalent to the inequality

0 > umax(x∗)− u(x∗) =N∑j=1

[umaxj − uj ]ϕj(x∗) (60)

Due to the strict positiveness of B-spline basis functions over their entire support (property P(2)) the aboveinequality can only hold if at least one coefficient umax

j − uj is negative, which violates the assumptionthat (58) holds for all of them. A similar argument holds for the lower bound, which completes the proof.

3.6 Boundary treatment

The weak imposition of boundary conditions follows the procedure described in [15], which is inspiredby the earlier work [25, 28]. In essence, the normal flux for the Neumann boundary condition (10) isapproximated by Roe’s approximate Riemann solver [24], which reads

Fn(U,U∞) =1

2n · [F(U) + F(U∞)]− |n ·A(U,U∞)|(U∞ − U) (61)

Here, U and U∞ are the computed solution and the prescribed ’free stream’ values, respectively, andA(U,U∞) denotes the flux Jacobian evaluated for the Roe mean values. The above notation implies

|n ·A| := ‖n‖R(U,U∞) |Λ(U,U∞)|R−1(U,U∞), ‖n‖ ≡ 1 (62)

In the proof-of-concept implementation of the flow solver, the resulting boundary integrals

S(U)i =

∫Γn

ϕiFn(U(x), U∞(x)) ds (63)

are evaluated by Gaussian quadrature, which contradicts the design goals of the SpMV-formulation (16).An alternative implementation that is currently being tested in the HPC variant of the flow solver adoptsFletcher’s group formulation [5] also at the boundary. For the specification of the ’free-stream’ values thereader is referred to the very detailed description in Section 11 from publication [15].

3.7 Implementation details

The solution procedure described above has been implemented as a proof-of-concept code into theG+Smo library and validated for several benchmark problems (see Section 3.8). Internally, G+Smo makesuse of Eigen [7], which is a C++ expression template library for linear algebra tasks.However, it turned out that using the standard C++ iterators provided by Eigen is computationally inefficientfor realising the many summations over matrix entries and fluxes. Moreover, the repeated implementationof the complex formulas for evaluating eigenvectors, eigenvalues, inviscid fluxes and boundary valuesfor different time-stepping schemes and variants of the limiter is prone to programming errors, which aredifficult to detect. Last but not least, the explicit usage of data structures, operators and iterators from theEigen library makes it difficult to port the flow solver to other computing devices like GPUs.

D3.3 MOTOR-678727

Arm

adill

o

Arr

ayFi

re

Bla

ze

Eig

en

IT++

MTL

4

uBLA

S

VexC

L

Vie

nnaC

L

... ...

Low

-leve

l

Unified wrapper function API to core functionality of ETL’s:make temp, tag, tie, +, -, *, /, abs, sqrt, ...

Hig

h-le

vel

SFET’s for conservative/primitive state variables, secondaryvariables, inviscid fluxes, EOS, and Riemann invariants

Figure 2: Structure of the Fluid Dynamics Building Block library [23]

3.7.1 FDBB library

It was therefore decided to develop the Fluid Dynamics Building Blocks (FDBB) C++ library [23] as anextendable toolbox for developing the HPC variant of the flow solver. The overall structure of the open-source software package is depicted in Figure 2. The low-level layer provides a unified application pro-gramming interface (API) to several of the most widely used C++ expression template libraries (ETL) forlinear algebra tasks. The high-level layer provides so-called fast and smart expression templates (SFET)for converting conservative and primitive state variables into each other and for evaluating secondaryvariables like entropy and the speed of sound, inviscid fluxes, equations of state, and Riemann invariants.Template meta-programming techniques are used to provide a dimension-independent intuitive API to theend-user. The code snippet given in Figure 3 illustrates the basic usage of the FDBB library. A switchfrom the Eigen backend to another ETL requires only minimal changes in line 9. Likewise, a change in theequation of state or the spatial dimension is easily done by updating lines 1 and 5, respectively. Moreover,all FDBB functions are implemented in such a way that they accept an arbitrary number of parameters

1 / / EOS f o r i d e a l gas (gamma=1.4)2 using eos = fdbb : : fdbbEOSidealGas<double>;3

4 / / Conservat ive v a r i a b l e s i n 3d5 using var = fdbb : : fdbbVar iab les<eos ,3 ,6 fdbb : : EnumVar : : conservat ive >;7

8 / / Se lec t Eigen backend9 Eigen : : Array<double> u1 ( n ) , u2 ( n ) , u3 ( n ) , u4 ( n ) , u5 ( n ) , v ( n ) ;

10

11 / / Generic implementat ion o f v e l o c i t y magnitude ( squared )12 v = var : : v mag2 ( u1 , u2 , u3 , u4 , u5 ) ;

Figure 3: Code snipped illustrating the basic usage of the FDBB library.

D3.3 MOTOR-678727

1 Eigen : : Array<double> x ( n ) , y ( n ) , z ( n ) ;2 auto expr = x ∗ y ;3 expr += x / y ;4 z = expr ;

+

*

x y

/

x y

Figure 4: Code snippet using expression template meta-programming techniques (left) and correspondingexpression tree (right)

(variadic parameter packs) and the structure fdbb:: fdbbVariables accepts an addition template parameterthat defines the mapping between parameters and variables at compile time.The general design concept of ETLs is to prevent the immediate evaluation of each single sub-expressionand to build up symbolic expression trees instead. Consider the code snippet given in Figure 4 (left).Without the use of expression template meta-programming techniques, the product ’x*y’ would be evalu-ated immediately and assigned to the variable ’expr’. Since all quantities are arrays of length n this wouldtrigger a for-loop over all entries with computational costs: 2n data loads + n data stores. The later addi-tion of the term ’x/y’ would trigger a second for-loop with additional costs: 3n data loads + n data stores.Making use of expression template meta-programming techniques the same code snippet is expandedinto the symbolic expression tree depicted in Figure 4 (right), which is evaluated in a single for-loop (2ndata loads + n data stores) at the time when the actual assignment takes place in line 4.

1e1

1e2

1e3

1e4

1e5

1e3 1e4 1e5 1e6 1e7 1e8 1e91e10

Perfo

rman

ce[m

flops

]

Problem size [bytes]

Double precision performance

1e2

1e3

1e4

1e5

1e6

1e3 1e4 1e5 1e6 1e7 1e8 1e91e10

Ban

dwid

th[M

B/s

]

Problem size [bytes]

Double precision performance

Figure 5: Performance results for different ETLs for the expression v2:=(mxˆ2+myˆ2+mzˆ2)/rhoˆ2.Legend: Armadillo, • ArrayFire, N Blaze, H Blitz, Eigen, IIT++, uBlas, VexCL

D3.3 MOTOR-678727

Smart and fast ETLs furthermore utilise vectorisation, parallel execution, and other HPC techniques tomake the evaluation step most efficient. Figure 5 illustrates the performance of several ETLs for the evalu-ation of the simple expression v2:=(mxˆ2+myˆ2+mzˆ2)/rhoˆ2, which corresponds to the square of the velocitymagnitude ‖v‖2 in three space dimension. The left diagram shows the number of floating-point operationsper seconds (in MFLOPS). Higher values indicate better performance and, consequently, shorter comput-ing times. The results of the IT++ library are comparable to those of a hand-crafted implementation of thisexpression in a single for-loop without using vectorisation and/or parallel execution. Obviously, the per-formance of a naıve multi-loop implementation would be even worse. Comparing the performance of theEigen ETL, which is currently used in G+Smo, a significant performance gain is expected from switchingto ArrayFire, Blaze, or VexCL. The right diagram illustrates the corresponding memory throughput. Again,higher is better. Since the expressions used by the flow solver are quite complex, the performance gaindue optimised evaluation strategies without storing temporary data are considered quite significant.Last but not least, a block expression engine has been implemented, which makes it possible to tietogether scalar arrays, vectors, and matrices and implement the flow solver adopting the matrix form no-tation (16). The outer loops over the blocks are expanded at compile time yielding the same performanceas if the sums had been implemented explicitly. This feature supports the overall objective to implementthe HPC variant of the flow solver in a largely dimension-independent way. The implementation of theHPC variant with the full functionality of the proof-of-concept version is ongoing work.

Figure 6: G+Smo CompFlow XML Editor

D3.3 MOTOR-678727

1 # inc lude ” gismo . h ”2 # inc lude ” gsCompFlowConfigList . h ”3 # inc lude ” gsCompFlowConfig . h ”4 # inc lude ” gsCompFlowKernel . h ”5

6 namespace gismo 7

8 EXPORT bool run ( const gsMul t iPatch<double> &geometry ,9 const gsMul t iBas is<double> &bases ,

10 const gsBoundaryCondit ions<double> &bc ,11 const gsCompFlowPde<double> &pde ,12 const gsOpt ionL is t &opt ions )13 14 r e t u r n15 gsCompFlowKernel<double ,16 gsConf igL is t<17 gsConfig<i n t ,2> ,18 gsConfig<gismo : : PhysicalProblem , gismo : : PhysicalProblem : : Euler>,19 gsConfig<gismo : : S ta teVar iab le , gismo : : S ta teVar iab le : : conservat ive >,20 gsConfig<gismo : : GasModel , gismo : : GasModel : : a i r21 >,21 gsConfig<gismo : : SourceTerm , gismo : : SourceTerm : : none>,22 gsConfig<gismo : : Solver , gismo : : Solver : : none>,23 gsConfig<gismo : : Formulat ion , gismo : : Formulat ion : : t imeStepping>,24 gsConfig<gismo : : TimeStepping , gismo : : TimeStepping : : explicitSSPRK1>,25 gsConfig<gismo : : I n v i s c i d F l u x , gismo : : I n v i s c i d F l u x : : group>,26 gsConfig<gismo : : Inv isc idF luxAtBoundary , gismo : : Inv isc idF luxAtBoundary : : group>,27 gsConfig<gismo : : ViscousFlux , gismo : : ViscousFlux : : none>,28 gsConfig<gismo : : ViscousFluxAtBoundary , gismo : : ViscousFluxAtBoundary : : none>,29 gsConfig<gismo : : RhsTerm , gismo : : RhsTerm : : none>,30 gsConfig<gismo : : EtlBackend , gismo : : EtlBackend : : gismo>,31 gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> , gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> ,32 gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> , gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> ,33 gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> , gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> ,34 gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> , gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> ,35 gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> , gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> ,36 gsSparseMatr ixConf ig<r e a l t , i ndex t ,0> , gsVectorConf ig<r e a l t ,0 ,0>37 >>(geometry , bases , bc , pde ) . run ( op t ions ) ;38 ;

Figure 7: Wrapper of the single-patch compute kernel with auto-generated run() method

D3.3 MOTOR-678727

3.7.2 G+Smo library

As an addition to the G+Smo library, a just-in-time (JIT) compilation framework has been implemented,which makes it possible to compile source code at run time into a shared library and load the newly createdfunctionality dynamically. The JIT compiler is used in the HPC variant of the flow solver to generate aseparate hardware-optimised compute kernel for each individual patch, which gets assigned as a wholeto a compute device. The overall design philosophy is that each compute device can ’own’ one or morepatches but that no patch can be shared between multiple devices. The current implementation supportsonly single-patch geometries but a discontinuous Galerkin coupling of multiple patches will be realised.The developed G+Smo CompFlow XML Editor (see Figure 6) gives the user very fine-grained control overthe behaviour of the solution algorithm on each patch. The selected configuration is converted into acompile-time option list, which is passed to the compute kernel as template arguments. The source codeof the single-patch compute kernel corresponding to the settings from Figure 6 is presented in Figure 7.The major benefit of this compile-time approach is that different variants of (parts of) the solution algo-rithm have been implemented in a monolithic flow solver and can be enabled/disabled selectively withoutperformance loss. An ’if’ statement based on run-time parameters, e.g. to turn on/off pressure-based fluxlimiting, would be very inefficient if placed inside a nested loop. However, passing the same parameter asa compile-time constant, the JIT compiler will either delete the code block or include it unconditionally.

3.8 Numerical results

In this section we present a selection of numerical results computed by the proof-of-concept implementa-tion of the inviscid compressible flow solver. We abstain from giving the full details but refer to literature.

3.8.1 Stationary isentropic vortex

The stationary isentropic vortex benchmark is a standard test case to investigate the ability of the flowsolver to preserve the shape of an initially prescribed vortex profile [3] over time. Density and pressureare initialised as depicted in Figure 8 and exposed to a counterclock-wise rotation about the centre point.The aim is to preserve the initial profile as good as possible for long simulation times. That is, the exactsolution at all times equals the initial data. As discussed before, the initial solution must be prescribedby projecting the analytic expressions onto the space BU spanned by the B-spline basis functions. Theprofiles in Figure 8 correspond to 66× 66 equidistantly distributed B-spline basis functions of order 2.The numerical simulation was carried out on the square domain Ω = (−5, 5)2 with solid wall boundaryconditions imposed along the entire boundary. Figure 9 compares the exact (=L2-projected) solution withthe numerical approximation at time t = 30 computed by the third-order explicit SSP-RK scheme with amoderate time step size ∆t = 0.01. It can be seen, that the initial vortex is very well preserved over time.It should be noted, however, that flux limiting was deactivated for this test case since it would have causedflattening of the maximum and minimum solution values. This phenomenon, termed peak clipping, is awell-known limitation of many flux limiting schemes. A mitigation strategy that evaluates higher derivativesof the solution to distinguish between smooth extrema and discontinuities is currently being developed.However, for the envisioned simulation of the flow in screw machine geometries it is of minor importancesince the relevant parts of the flow pattern are characterised by shock waves.

D3.3 MOTOR-678727

(a) Density ρ (b) Pressure p

(c) Horizontal component of velocity v1 (d) Vertical component of velocity v2

Figure 8: Exact solution for the stationary isentropic vortex benchmark problem.

3.8.2 Sod’s shock tube problem

The time-dependent shock tube problem introduced by Sod in [26] is a common benchmark to investigatethe ability of a numerical scheme to deal with shock waves, contact discontinuities and rarefaction wavesat the same time. Simulations have been performed in the two-dimensional domain Ω = [0, 1] × [0, 1],which is initially filled with a gas at rest in two different states separated by a membrane at x = 0.5:ρLvL

pL

=

1.00.01.0

,ρRvRpR

=

0.1250.00.1

(64)

Solid wall boundary conditions are prescribed at all boundaries of the domain. When the membraneis removed instantaneously at time t = 0 the gas starts move from the high-pressure left part into thelow-pressure right part, giving rise to the creation of the three aforementioned wave types.

D3.3 MOTOR-678727

(a) Density ρ along y = 0.5 (b) Pressure p along y = 0.5

(c) Horizontal velocity component v1 along x = 0.5 (d) Vertical velocity component v2 along y = 0.5

Figure 9: Exact and numerical solutions to the stationary isentropic vortex benchmark at t = 30 computedby the third-order explicit SSP Runge-Kutta method (∆t = 0.01) with 66× 66 quadratic B-spline functions

D3.3 MOTOR-678727

The results depicted in Figure 10 where computed by the explicit SSP Runge-Kutta scheme. The leftcolumn shows the density, velocity, and pressure distribution along the cutline y = 0.5 computed bythe unstabilised Galerkin scheme adopting a first-order B-spline basis (129 × 129) with ∆t = 0.0005.Nonphysical oscillations are observed in all three quantities. The magnitude of these oscillations evengrew when using higher-order basis functions (not illustrated here). The right column shows the numericalsolution profiles for quadratic B-spline functions (66 × 66) and ∆t = 0.001 when the primitive flux limiterfrom Section 3.4 is applied. Measures to reduce the smearing of the contact discontinuity are investigated.

4 Conclusions

The work performed in M1-M18 forms the basis for the simulation of compressible flows in screw machinegeometries in WP8. The developed proof-of-concept implementation of the flow solver demonstrated thegeneral applicability of mathematical concepts intended in the proposal. The lack of performance requiredthe realisation of a second variant, which was designed with high-performance computing capabilities inmind from the very beginning. The computational building blocks have been implemented and validated.The realisation of the HPC variant with the full functionality of the proof-of-concept flow solver is underway.

D3.3 MOTOR-678727

(a) Density ρ along y = 0.5 (b) Density ρ along y = 0.5

(c) Velocity v1 along y = 0.5 (d) Velocity v1 along y = 0.5

(e) Pressure p along y = 0.5 (f) Pressure p along y = 0.5

Figure 10: Exact and numerical solutions to Sod’s shock tube benchmark at t = 0.231 computed by thethird-order explicit SSP Runge-Kutta method; AFC (left) vs. unstabilised Galerkin solution (right)D3.3 MOTOR-678727

References

[1] M. Altieri, C. Becker, and S. Turek. Proposal for sparse banded blas techniques. Technical Report99-11, University of Heidelberg, SFB 359, 1999.

[2] F. Calabro, G. Sangalli, and M. Tani. Fast formation of isogeometric galerkin matrices by weightedquadrature. Computer Methods in Applied Mechanics and Engineering, 316:606 – 622, 2017. Spe-cial Issue on Isogeometric Analysis: Progress and Challenges.

[3] P. Castonguay, P.E. Vincent, and A. Jameson. Application of high-order energy stable flux recon-struction schemes to the Euler equations. In 49th AIAA Aerospace Sciences Meeting, volume 686,2011.

[4] J.A. Cottrell, T.J.R. Hughes, and Y. Bazilevs. Isogeometric Analysis: Toward Integration of CAD andFEA. John Wiley & Sons, Ltd., 2009.

[5] C.A.J. Fletcher. The group finite element formulation. Comput. Methods Appl. Mech. Eng, 37:225–243, 1983.

[6] S. Gottlieb and C.-W. Shu. Total variation diminishing runge-kutta schemes. Mathematics of Com-putations, 67:73 – 85, 1998.

[7] G. Guennebaud, B. Jacob, et al. Eigen v3. http://eigen.tuxfamily.org, 2010.

[8] C. Hirsch. Numerical Computation of internal and external flows; Volume 2: Computational Methodsfor Inviscid and Viscous Flows. John Wiley & Sons, Ltd., 1990.

[9] D. Kuzmin. On the design of general-purpose flux limiters for finite element schemes. I. Scalarconvection. Journal of Computational Physics, 219(2):513 – 531, 2006.

[10] D. Kuzmin. Algebraic flux correction for finite element discretizations of coupled systems. Compu-tational Methods for Coupled Problems in Science and Engineering II, CIMNE, Barcelona, pages653–656, 2007.

[11] D. Kuzmin. On the design of algebraic flux correction schemes for quadratic finite elements. Journalof Computational and Applied Mathematics, 218(1):79 – 87, 2008. Special Issue: Finite ElementMethods in Engineering and Science (FEMTEC 2006)Special Issue: Finite Element Methods in En-gineering and Science (FEMTEC 2006).

[12] D. Kuzmin. Explicit and implicit FEM-FCT algorithms with flux linearization. Journal of ComputationalPhysics, 228(7):2517–2534, 2009.

[13] D. Kuzmin. Flux-Corrected Transport: Principles, Algorithms and Applications, chapter Algebraic fluxcorrection I. Scalar conservation laws. Springer, 2012.

[14] D. Kuzmin and M. Moller. Flux-Corrected Transport: Principles, Algorithms and Applications, chapterAlgebraic flux correction II. Compressible Euler equations. Springer, 2005.

[15] D. Kuzmin, M. Moller, and M. Gurris. Flux-Corrected Transport: Principles, Algorithms and Applica-tions, chapter Algebraic flux correction II. Compressible flow problems. Springer, 2012.

D3.3 MOTOR-678727

[16] D. Kuzmin, M. Moller, J. N. Shadid, and M. Shashkov. Failsafe flux limiting and constrained dataprojections for equations of gas dynamics. Journal of Computational Physics, 229(23):8766 – 8779,2010.

[17] D. Kuzmin, M. Moller, and S. Turek. High-resolution FEM–FCT schemes for multidimensional conser-vation laws. Computer Methods in Applied Mechanics and Engineering, 193(45–47):4915 – 4946,2004.

[18] D. Kuzmin and S. Turek. Flux correction tools for finite elements. Journal of Computational Physics,175(2):525 – 558, 2002.

[19] D. Kuzmin and S. Turek. High-resolution FEM-TVD schemes based on a fully multidimensional fluxlimiter. Journal of Computational Physics, 198(1):131 – 158, 2004.

[20] R. L˙chapter 30 years of FCT: Status and directions.

[21] R. L˙ Finite element flux-corrected transport (FEM-FCT) for the Euler and Navier-Stokes equations.International Journal of Numerical Methods in Fluids, 17:1093 – 1109, 1987.

[22] A. Mantzaflaris, B. Juttler, B.N. Khoromskij, and U. Langer. Matrix Generation in Isogeometric Anal-ysis by Low Rank Tensor Approximation, pages 321–340. Springer International Publishing, Cham,2015.

[23] M. Moller and A. Jaeschke. Fdbb: Fluid dynamics building blocks. https://mmoelle1.gitlab.io/FDBB/,2010.

[24] P.L. Roe. Approximate Riemann solvers, parameter vectors, and difference schemes. Journal ofComputational Physics, 43(2):357 – 372, 1981.

[25] R.A. Shapiro. Adaptive Finite Element Solution Algorithm for the Euler Equations, volume 32 of Noteson Numerical Fluid Mechanics and Multidisciplinary Design. Vieweg, 1991.

[26] G.A. Sod. A survey of several finite difference methods for systems of nonlinear hyperbolic conser-vation laws. Journal of Computational Physics, 27(1):1 – 31, 1978.

[27] E.F. Toro. Riemann Solvers and Numerical Methods for Fluid Dynamics. A Practical Introduction.Springer, 1999.

[28] P. Wesseling. Principles of Computational Fluid Dynamics. Springer, 2001.

D3.3 MOTOR-678727

multi-objective design optimisation of ﬂuid energy...

Documents