numerical analysis and implicit time stepping for high...

NUMERICAL ANALYSIS AND IMPLICIT TIME STEPPING FOR

HIGH-ORDER, FLUID FLOW SIMULATIONS ON

GPU ARCHITECTURES

A DISSERTATION

SUBMITTED TO THE DEPARTMENT OF

AERONAUTICS AND ASTRONAUTICS

AND THE COMMITTEE ON GRADUATE STUDIES

OF STANFORD UNIVERSITY

IN PARTIAL FULFILLMENT OF THE REQUIREMENTS

FOR THE DEGREE OF

DOCTOR OF PHILOSOPHY

Jerry Watkins

December 2017

http://creativecommons.org/licenses/by-nc/3.0/us/

This dissertation is online at: http://purl.stanford.edu/xw593rh0121

© 2017 by Jerry Enrique Watkins, II. All Rights Reserved.

Re-distributed by Stanford University under license with the author.

This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.

ii



http://purl.stanford.edu/xw593rh0121

I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.

Antony Jameson, Primary Adviser


Juan Alonso


Sanjiva Lele

Approved for the Stanford University Committee on Graduate Studies.

Patricia J. Gumport, Vice Provost for Graduate Education

This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.

iii

Abstract

High-order discontinuous finite element methods are becoming increasingly more pop-

ular for simulations of vortex dominated flows over complex geometries because they

are inherently less dissipative than traditional second-order finite volume methods.

In particular, the Flux Reconstruction (FR) approach has gained popularity because

it not only offers less dissipation but is also eminently parallelizable on multi-core

processors and accelerators. An extensive amount of research has been performed

for accelerated explicit methods for FR but more realistic simulations often require

high aspect ratio meshes where the maximum stable time step or CFL condition is

determined by the smallest cell. This can lead to significant limitations in explicit

time integration.

In this dissertation, eigensolution analysis is performed to study the effect of

these limitations on the stability, dissipation and dispersion properties of the nodal

Discontinuous Galerkin (DG) scheme via FR for advection-diffusion. It is shown that

the CFL condition for advection-diffusion is stricter than that for pure-advection

or pure-diffusion individually and a suitable estimate for the maximum stable time

step is proposed. The CFL condition is strongly influenced by the choice of interface

fluxes and, in general, the condition for a scheme using centered values is much higher

than that which has one-sided values. It is also shown that schemes with centered

interface values produce less error for well resolved solutions while schemes with one-

sided interface values produce less error for solutions that are under-resolved. These

results are verified for one- and two-dimensional advection-diffusion of an approximate

Gaussian and two-dimensional Couette flow.

In addition to eigensolution analysis, a multi-GPU, implicit time stepping method

iv

is developed, implemented and tested in order to study the feasibility of an implicit

scheme on modern hardware. It is shown that a nonlinear solver can be constructed

for the Direct Flux Reconstruction (DFR) method which maintains element-locality

and high arithmetic intensity with minimal communication costs on modern GPU

clusters. Analytical element local Jacobians for the DFR method are derived and a

Kronecker product formulation is used to reduce the time complexity from O(P 3d−1

)to O

(P d+1

)for Euler simulations and O

(P 3d)

to O(P d+2

)for Navier-Stokes where

P is the degree of the Lagrange basis polynomials and d is the spatial dimension.

Numerical results are obtained for the inviscid bump, inviscid NACA0012 airfoil,

isentropic vortex, laminar Joukowski airfoil and three-dimensional half cylinder with

a Reynolds number of 1000.

v

Acknowledgments

As a student who has benefited greatly from the financial support of grants, scholar-

ships and fellowships throughout my university education, I feel it is only fitting that

I begin by acknowledging the Rose M. Chappelear Memorial Fund and the National

Science Foundation Graduate Research Fellowship Program for supporting me finan-

cially during my graduate level education. I could not have completed my education

without their support and I consider myself very fortunate to have been selected

among the many applicants. For this, I am very grateful.

There are many people who have made my experience at Stanford a memorable

one, but my advisor, Professor Antony Jameson, stands out as someone who has

strongly influenced my academic development. As an advisor, he has always given

me the freedom to choose my own research path and has always been available to

provide valuable feedback. This has given me the opportunity to grow as both an

independent thinker and a researcher. His level of expertise in a wide variety of

subjects continues to fascinate me and I often enjoyed our conversations. It has been

a privilege to have had him as a mentor and I am very grateful for his support.

I would also like to express my gratitude to Professor Juan J. Alonso and Profes-

sor Sanjiva Lele for being a part of my dissertation reading committee and providing

valuable feedback towards the improvement of my dissertation. My gratitude also

extends to Professor Matthias Ihme and Professor Michael Saunders for participat-

ing in my oral examination and providing insightful comments about my research.

I would also like to thank Professor Margot Gerritsen for allowing me to perform

my numerical experiments on the XStream GPU cluster, supported by the National

Science Foundation Major Research Instrumentation program (ACI-1429830).

vi

I’m extremely grateful to my colleagues in the Aerospace Computing Laboratory

who have made my academic experience very enjoyable. First, I’d like to thank David

Williams for teaching me the fundamentals of high-order methods during my first year

of graduate school. I’d also like to thank Kartikey Asthana for being an exceptional

co-author and friend. His expertise in numerical analysis is matched only by his skill

with a guitar, and my experience at Stanford would not have been the same without

the late night jam sessions with him, Akshay and Zach. I’d like to give a special

thanks to Joshua Romero for leading the effort in developing ZEFR, the software

used for the second part of this dissertation, and for co-authoring our first paper

on implicit time stepping. I’ve benefited immensely from his expertise on scientific

computing. I’m also grateful to Jacob Crabill for his help with research over the past

few years and Freddie Witherden for providing numerical expertise and exceptional

feedback as a mentor. Freddie has been an invaluable asset and I’ve learned a great

deal about high-order methods and high performance computing from him. I would

also like to thank David Manosalvas, Manuel Lopez, Abhishek Sheshadri, Jonathan

Chiew and Jonathan Bull for their insight and support throughout my studies.

I’d like to acknowledge my mentor at Sandia National Laboratories, Irina Tezaur,

for being extremely supportive during my last year in graduate school. As a Sandia

intern, she gave me a comprehensive introduction to Sandia National Laboratories

and I have gained valuable experience which has helped me complete my degree.

My friends and family have also been overwhelmingly supportive during my time in

graduate school. I’d like to thank Kristopher Bravo, Matthew Rutledge, Joshua Day,

John Bitcon, Ashley Rutledge, Nicole Siminski-Day, Alan Gamage, Bobby Kianmajd,

Christina Lee and Jade Ontivmaemura for always being great friends and providing

unconditional support. I’m grateful for having such a loving family who has always

been there for me. I want to thank my father, Andy Watkins, for teaching me the

value of hard work, my mother, Petra Watkins, for teaching me the importance of

generosity, my siblings, Andrew Watkins and Xiomara Watkins-Breschi, for teaching

me to follow my dreams and my brother-in-law, Scott Breschi, for providing a positive

outlook on life. I also want to acknowledge my niece and nephew, Camila and Drew

Watkins, in the hope that this might inspire them to pursue a higher education.

vii

Lastly, this dissertation is dedicated to my loving girlfriend, Diana Portillo, who

has stood by my side throughout my entire graduate studies. I’m sincerely grateful

for having her in my life. Her love and support has made this work possible.

viii

Preface

Fluid motion is a complex physical phenomenon that can be found in a variety of

different applications ranging from the analysis of blood flow in the cardiovascular

system to the supersonic flight of aircraft. The primary tool used to analyze this phe-

nomenon is computational fluid dynamics (CFD). The modern scientist or engineer

relies heavily on CFD to obtain an accurate representation of fluid motion. However,

current generation CFD software deployed in industry is not capable of accurately

predicting transient, highly separated flows over complex geometries. Examples of

such flows include high-resolution wingtip vortices from an aircraft during take-off or

the complex vortex structure generated by the blades of a rotorcraft.

These high-fidelity simulations often require a massive amount of memory and

computation to provide sufficient resolution. This often requires the use of large high-

performance computing clusters which have undergone a dramatic change over the

past several years. Memory constrained accelerators characterized by an abundance

of computational power relative to memory bandwidth are now commonplace and

many CFD codes are now having trouble scaling on modern heterogeneous clusters.

This problem will only become worse as pre-exascale systems come online over the

next few years. The design of scalable CFD software will need to be completely

reconsidered in order to balance cheap arithmetic and expensive memory fetches with

limitations on the amount of high-bandwidth memory.

There are two primary reasons why current generation CFD software is unable to

adequately resolve complex flow structures on large scale computing clusters. First,

numerical methods, typically second order accurate finite volume schemes, have a ten-

dency to be overly dissipative. Hence, they require an excessive amount of resolution

ix

in order to successfully track complex flow features over time. Second, the methods

themselves are not well suited to the requirements of modern hardware platforms

which typically include coprocessors and Graphical Processing Units (GPUs). These

limitations have prevented scientists and engineers from conducting the types of high

fidelity simulations that accurately resolve the aforementioned phenomena.

The primary motivation of this dissertation has been to address these issues

through the development of high-order methods. High-order methods refer to a col-

lection of numerical schemes whose spatial accuracy is at least third order. These

methods can reduce numerical dissipation and improve the accuracy of a simulation

at a reduced computational cost compared to traditional second-order methods [90].

They also tend to have more computational work per degree of freedom which means

they benefit more from the high computational throughput on modern hardware.

Discontinuous Galerkin (DG) methods [76, 55, 20] have been the focal point of

recent efforts in developing high-order CFD codes which solve the Navier-Stokes equa-

tions because of their ability to simulate flows around complex geometries. Similar

to continuous finite element methods, DG methods utilize basis and test functions to

obtain high-order accuracy. However, since the solution is allowed to be discontinuous

at element interfaces, they are able to maintain conservation through a methodology

similar to finite volume methods. Methods of particular interest include collocation

based nodal DG [35] and spectral difference (SD) [52, 57] methods which have been

widely adopted because of their simplicity and efficiency.

In 2007, Huynh [41] proposed a Flux Reconstruction (FR) approach for tensor-

product elements that provides a generalized differential framework for recovering

both the nodal DG and SD schemes for linear advection. In 2009, he further extended

this framework to diffusion problems [42]. Since then, a proof of linear stabiliy for

a specific class of FR schemes called the energy stable FR (ESFR) schemes [88] has

been developed along with theory on non-linear stability and the role of aliasing errors

[47, 10]. These schemes have also been successfully extended to triangular [16, 95]

and tetrahedral [96] elements which further improved their capabilities on complex

geometry. Other frameworks such as the Correction Procedure via Reconstruction

(CPR) [30] have been proposed that unify the FR and the Lifting Collocation Penalty

x

(LCP) [91] formulations. Recently, the direct Flux Reconstruction (DFR) method

has been developed as a simplified formulation of the FR method that reduces the

theoretical and implementation complexity of the FR method [77].

Along with high-order accuracy, which serves to reduce numerical dissipation, the

FR approach allows for the formulation of an explicit semi-discrete equation with

a majority of operations being element-local. This is distinct from a typical finite

element method which requires the assembly of a global matrix. Within the context

of an explicit time stepping method, the FR approach is highly parallelizable on

modern architectures because it is able to make efficient use of fast-access memory

when performing element local operations.

Thus far, the majority of research in scalable methods on modern architectures

using FR has focused on explicit time stepping on GPUs. For example, in 2009,

Klockner et al. [51] demonstrated that a consumer graphics processor could be used to

accelerate a three-dimensional, unstructured nodal DG solver for Maxwell’s equations.

Shortly after, Castonguay et al. [17] developed a three-dimensional, unstructured

FR solver for the Navier-Stokes equations on GPU architectures. This solver was

then extended for Large Eddy Simulations (LES) and the Reynolds Averaged Navier-

Stokes (RANS) equations and released under an open source license using the name

HiFiLES [58]. The most successful implementation of the FR schemes on GPUs has

been PyFR [97] which has seen widespread success due to its Python based framework

which allows the code to be executed on a range of hardware platforms.

Accelerated explicit methods have been researched extensively for flow simula-

tions with low to moderate Reynolds number but more realistic applications in the

aerospace field are high Reynolds number flows with thin boundary layers. These

types of flows often require high aspect ratio meshes where the maximum stable time

step or CFL condition is determined by the smallest cell. This can lead to signif-

icant limitations in explicit time integration because the time step requirement for

numerical stability may be much smaller than the time step needed for accuracy. Of-

ten times, a reliable estimate for the maximum stable time step is not available and

researchers must rely on empirically determined estimates or a step size controller

[34]. In this situation, the solution can become unstable and large scale simulations

xi

can become unfeasible. Unsteady solutions are often generated using explicit schemes

near their numerical stability limit so understanding the numerical properties of the

scheme near this limit becomes increasingly important.

Implicit methods allow for larger time steps and can enable a time step to be

chosen based on the desired accuracy rather than numerical stability. However, an

implicit method often requires the construction of a global matrix which can be

prohibitively expensive for higher polynomial orders. The high memory requirements

and low arithmetic intensity of constructing a global matrix is often what makes

implicit methods unfeasible for modern hardware. This also makes it difficult to

compete with explicit methods on large GPU clusters.

These major limitations on explicit and implicit time stepping methods must be

addressed before reliable simulations of high Reynolds number flow over aircraft or

rotor blades can be accomplished with high-order methods on GPU architectures.

This dissertation focuses on addressing these issues in two parts.

In Part I, an eigensolution analysis of the FR formulation is performed on the linear

advection-diffusion equation to investigate the stability, dissipation and dispersion

properties associated with the nodal DG scheme for explicit time integration. Some

of the major contributions in Part I include:

• A verification that the CFL condition for advection-diffusion is stricter than

that for pure-advection and pure-diffusion individually,

• A maximum stable time step estimate for the linear advection-diffusion and

Navier-Stokes equations on unstructured, tensor product elements,

• An analysis showing that schemes with centered interface values produce less

error for well resolved solutions while schemes with one-sided interface values

produce less error for solutions that are under-resolved,

• A verification that the CFL condition for schemes with centered interface values

is much higher than schemes with one-sided interface values.

In Part II, an implicit time stepping method for GPU architectures is designed,

implemented and tested in order to study the feasibility of an implicit method on

xii

modern hardware. The major contributions in Part II include:

• An analysis showing that a high-order, implicit time stepping method can be

implemented efficiently on multi-GPU architectures,

• A reduction in the time complexity of computing analytical element local Jaco-

bians for advection-diffusion, Euler and the Navier-Stokes equations.

xiii

Contents

Abstract iv

Acknowledgments vi

Preface ix

I Numerical Analysis for Advection-Diffusion 1

1 Introduction 2

2 Eigensolution Analysis 6

2.1 Eigensolution Analysis for 1D Advection-Diffusion . . . . . . . . . . . 6

2.1.1 Problem specification . . . . . . . . . . . . . . . . . . . . . . . 7

2.1.2 The Flux Reconstruction method . . . . . . . . . . . . . . . . 8

2.1.3 Analytical solution of the semi-discrete and exact equations . 11

2.1.4 Stability, dissipation, dispersion and modal weights . . . . . . 12

2.1.5 Dissipation and dispersion error of nodal DG for advection-

diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

2.2 Spectral Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.2.1 Relative error and resolving efficiency . . . . . . . . . . . . . . 17

2.2.2 Spectral comparison of nodal DG with different interface fluxes 19

3 CFL Restrictions and Time Step Estimates 21

3.1 Time Integration and CFL Restrictions . . . . . . . . . . . . . . . . . 21

xiv

3.1.1 Multi-stage explicit time integration scheme . . . . . . . . . . 22

3.1.2 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 22

3.1.3 CFL restrictions on advection-diffusion . . . . . . . . . . . . . 23

3.1.4 CFL restrictions of nodal DG with different interface fluxes . . 23

3.2 Time Step Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 25

3.2.1 Conservative prediction method . . . . . . . . . . . . . . . . . 25

3.2.2 Extension to tensor product elements . . . . . . . . . . . . . . 27

3.2.3 Extension to Navier-Stokes equations . . . . . . . . . . . . . . 29

4 Numerical Experiments 31

4.1 1D Advection-Diffusion of an approximate Gaussian . . . . . . . . . . 32

4.2 2D Advection-Diffusion of an approximate Gaussian . . . . . . . . . . 35

4.3 2D Couette Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

II Implicit Time Stepping on GPU Architectures 42

5 Introduction 43

6 The Direct Flux Reconstruction Method 45

6.1 One-Dimensional Formulation for Advection-Diffusion . . . . . . . . . 46

6.1.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . 46

6.1.2 Direct Flux Reconstruction . . . . . . . . . . . . . . . . . . . 47

6.2 Two-Dimensional Formulation for Advection-Diffusion . . . . . . . . 52



6.3 Three-Dimensional Formulation for Advection-Diffusion . . . . . . . . 61



6.4 Extension to Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . 66

6.4.1 The Euler Equations . . . . . . . . . . . . . . . . . . . . . . . 66

6.4.2 The Navier-Stokes Equations . . . . . . . . . . . . . . . . . . 67


xv

7 Time Stepping Schemes 70

7.1 Time Accurate Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 71

7.1.1 Explicit Runge-Kutta . . . . . . . . . . . . . . . . . . . . . . . 71

7.1.2 Diagonally Implicit Runge-Kutta . . . . . . . . . . . . . . . . 72

7.1.3 Step size control . . . . . . . . . . . . . . . . . . . . . . . . . 73

7.2 Pseudo Time Stepping . . . . . . . . . . . . . . . . . . . . . . . . . . 74

7.2.1 The Global Linear System . . . . . . . . . . . . . . . . . . . . 75

7.2.2 Block Iterative Methods . . . . . . . . . . . . . . . . . . . . . 76

7.3 Dual Time Stepping . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

7.3.1 Application to Diagonally Implicit Runge-Kutta Schemes . . . 81

8 The Element Local Jacobian Matrix 83

8.1 One-Dimensional Formulation for Advection-Diffusion . . . . . . . . . 84

8.1.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85

8.1.2 Kronecker Product Formulation . . . . . . . . . . . . . . . . . 92

8.2 Two- and Three-Dimensional Formulation for Advection-Diffusion . . 93

8.2.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

8.2.2 Two-Dimensional Kronecker Product Formulation . . . . . . . 105

8.2.3 Three-Dimensional Kronecker Product Formulation . . . . . . 112

8.2.4 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 118

8.3 Extension to Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . 121

9 Numerical Experiments 125

9.1 Inviscid flow over a bump . . . . . . . . . . . . . . . . . . . . . . . . 126

9.2 Inviscid flow over the NACA 0012 airfoil . . . . . . . . . . . . . . . . 128

9.3 Convection of an isentropic vortex . . . . . . . . . . . . . . . . . . . . 128

9.4 Laminar flow over a Joukowski airfoil . . . . . . . . . . . . . . . . . . 132

9.5 Viscous flow over a half cylinder . . . . . . . . . . . . . . . . . . . . . 135

10 Implementation 139

10.1 Overview of Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139

10.1.1 Unsteady Solver . . . . . . . . . . . . . . . . . . . . . . . . . . 140

xvi

10.1.2 Steady-state Solver . . . . . . . . . . . . . . . . . . . . . . . . 140

10.1.3 Direct Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 141

10.2 Multicolored Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . . 143

10.2.1 Mesh Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . 143

10.2.2 Residual Computation per Color . . . . . . . . . . . . . . . . 145

10.3 Element Local Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . 145

10.3.1 Constructing the Flux and Solution Jacobians . . . . . . . . . 146

10.3.2 Constructing the Element Local Jacobians . . . . . . . . . . . 150

11 Performance Analysis 155

11.1 Multicolored Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . . 156

11.1.1 Inviscid flow over a bump . . . . . . . . . . . . . . . . . . . . 156

11.1.2 Inviscid flow over the NACA 0012 airfoil . . . . . . . . . . . . 157

11.2 GPU performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

11.2.1 Unsteady Solver . . . . . . . . . . . . . . . . . . . . . . . . . . 159

11.2.2 Steady-state Solver . . . . . . . . . . . . . . . . . . . . . . . . 161

11.3 Multi-GPU Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 163

11.3.1 Strong Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 163

11.3.2 Weak Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 163

12 Conclusions 166

A Boundary Conditions 169

A.1 Solid Slip-Wall and Symmetry . . . . . . . . . . . . . . . . . . . . . . 170

A.2 No-Slip Adiabatic Wall . . . . . . . . . . . . . . . . . . . . . . . . . . 171

A.3 Characteristic Riemann Invariant Far Field . . . . . . . . . . . . . . . 172

Bibliography 175

xvii

List of Tables

2.1 Upwinding coefficients and common names for DG schemes with dif-

ferent interface flux formulations. . . . . . . . . . . . . . . . . . . . . 14

2.2 Resolving efficiencies, ε = 0.1, for the DG scheme using Gauss-Legendre

solution points solving the advection-diffusion equation, a = 10. . . . 20

3.1 Maximum nondimensional time steps for the RK44, DG scheme using

Gauss-Legendre points. . . . . . . . . . . . . . . . . . . . . . . . . . . 26

4.1 Element size ‘h’ and nondimensional parameter ‘a’ for the meshes em-

ployed for Couette flow. . . . . . . . . . . . . . . . . . . . . . . . . . 39

4.2 Couette flow results for the RK44, DG scheme using Gauss-Legendre

solution points on Cartesian quadrilateral meshes. . . . . . . . . . . . 40

4.3 Couette flow results for the RK44, DG scheme using Gauss-Legendre

solution points on unstructured quadrilateral meshes. . . . . . . . . . 41

7.1 Scaling factors used for the explicit and implicit step controller. . . . 74

8.1 Diagonal Jacobian matrices used in the construction of the one-dimensional

element local Jacobian matrix for advection-diffusion . . . . . . . . . 85

8.2 Two- and three-dimensional sets . . . . . . . . . . . . . . . . . . . . . 94

8.3 Diagonal Jacobian matrices used in the construction of the two- and

three-dimensional element local Jacobian matrix for advection-diffusion.

The full names of the Jacobians matrices can be found in Table 8.1. . 96

9.1 Numerical time steps, ∆t, used for the convection of an isentropic vortex.131

xviii

9.2 Density error and rate of convergence for different time integration

schemes, convection of an isentropic vortex, implicit dual time stepping

with two color MCGS. . . . . . . . . . . . . . . . . . . . . . . . . . . 131

9.3 (x× y) Joukowski airfoil meshes used for each polynomial order where

x is the number of elements along the airfoil and wake and y is the

number of elements in the normal direction. The number of elements

in x is split evenly between the airfoil and wake. The total number of

elements, Neles, is given by the product of x and y. . . . . . . . . . . . 133

9.4 Drag coefficient error and rate of convergence for each polynomial or-

der, laminar flow over a Joukowski airfoil, implicit pseudo time stepping

with two color MCGS. . . . . . . . . . . . . . . . . . . . . . . . . . . 134

9.5 Lift and drag coefficients and the Strouhal number for viscous flow over

a half cylinder, Re = 1000, P = 3. . . . . . . . . . . . . . . . . . . . . 136

10.1 Common sizes for an element local Jacobian for the advection-diffusion

(AD) and Navier-Stokes (NS) equations. These sizes remain the same

for the advection and Euler equations. . . . . . . . . . . . . . . . . . 150

11.1 Convergence results for different meshes, inviscid flow over a bump,

implicit pseudo time stepping with two color MCGS, single GPU, P = 2156

11.2 Convergence results for different polynomial orders, inviscid flow over

a bump, implicit pseudo time stepping with two color MCGS, single

GPU, (48× 16) quadrilateral mesh . . . . . . . . . . . . . . . . . . . 157

11.3 Convergence results for different meshes, inviscid flow over the NACA

0012 airfoil, implicit pseudo time stepping with two color MCGS, single

GPU, P = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

11.4 Convergence results for different mixed meshes, inviscid flow over the

NACA 0012 airfoil, implicit pseudo time stepping with four color MCGS,

single GPU, P = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158

11.5 Convergence results for different polynomial orders, inviscid flow over

the NACA 0012 airfoil, implicit pseudo time stepping with two color

MCGS, single GPU, (32× 32) quadrilateral mesh . . . . . . . . . . . 158

xix

11.6 A comparison between explicit RK45 and implicit ESDIRK4 for viscous

flow over a half cylinder, Re = 1000, P = 3 on 12 NVIDIA Tesla K80

GPUs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159

11.7 Speedup of a single GPU over a single CPU core for inviscid flow over

the NACA 0012 airfoil, implicit pseudo time stepping with two color

MCGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162

11.8 Weak scalability study for inviscid flow over the NACA 0012 airfoil,

explicit RK4, P = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

11.9 Weak scalability study for inviscid flow over the NACA 0012 airfoil,

implicit pseudo time stepping with two color MCGS, P = 5 . . . . . . 164

xx

List of Figures

2.1 Numerical dissipation in each eigenmode p = 1, 2, 3 for the DG scheme

of order P = 2 using Gauss-Legendre solution points solving the advection-

diffusion equation, a = 10. . . . . . . . . . . . . . . . . . . . . . . . . 14

2.2 Numerical dispersion in each eigenmode p = 1, 2, 3 for the DG scheme

of order P = 2 using Gauss-Legendre solution points solving the advection-

diffusion equation, a = 10. . . . . . . . . . . . . . . . . . . . . . . . . 15

2.3 Energy distributed among each eigenmode p = 1, 2, 3 for the DG

scheme of order P = 2 using Gauss-Legendre solution points solving

the advection-diffusion equation, a = 10. . . . . . . . . . . . . . . . . 15

2.4 Energy distributed among each eigenmode p = 1, 2, 3 for the centered-

centered, DG scheme of order P = 2 solving the advection-diffusion

equation, a = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.5 Upper bound on initial slope of the relative error vs. nondimensional

wavenumber for the DG scheme of order P = 2 using Gauss-Legendre

solution points solving the advection-diffusion equation, a = 10. . . . 19

3.1 Maximum physical time step, ∆tmax, vs. nondimensional wavenumber

for the RK44, DG, centered-centered scheme using Gauss-Legendre

solution points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3.2 Maximum nondimensional time step, ∆tmax, vs. nondimensional wavenum-

ber, k/(P + 1), for the RK44, DG scheme using Gauss-Legendre solu-

tion points solving the advection-diffusion equation, a = 10. . . . . . 24

xxi

3.3 Maximum nondimensional time step estimates vs. nondimensional

wavespeed for the RK44, DG scheme of order P = 2 using Gauss-

Legendre solution points. . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.4 A visual representation of a quadrilateral element for a polynomial or-

der of P = 1. The flux points are marked by red squares and west, east,

south and north faces are represented by W,E, S,N , respectively. The

distances between flux points, h1,6, h3,8, h2,5, h4,7 are used to estimate

the maximum time step in the element. . . . . . . . . . . . . . . . . . 29

4.1 High resolution approximate Gaussian, P = 2, σ = 8/√

2π. . . . . . . 33

4.2 Relative error vs. time periods for the RK44, DG scheme of order

P = 2 using Gauss-Legendre solution points solving the advection-

diffusion equation, a = 10, on a high resolution approximate Gaussian,

σ = 8/√

2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

4.3 Low resolution approximate Gaussian, P = 2, σ = 1/√

2π. . . . . . . 34

4.4 Relative error vs. time periods for the RK44, DG scheme of order


diffusion equation, a = 10, on a low resolution approximate Gaussian,

σ = 1/√

2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

4.5 Nonuniform rectilinear grid with 40 × 50 elements and a minimum

element size of h1 = 0.2 and h2 = 0.1. . . . . . . . . . . . . . . . . . . 36

4.6 Initial condition and relative error for the RK44, DG scheme of order


diffusion equation, a = 10, on a high resolution approximate Gaussian,

σ = 8/√

2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.7 Initial condition and relative error for the RK44, DG scheme of order


diffusion equation, a = 10, on a low resolution approximate Gaussian,

σ = 1/√

2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37

4.8 Couette flow Mach contours, DG upwind-one-sided scheme using Gauss-

Legendre solution points, P = 3. . . . . . . . . . . . . . . . . . . . . . 38

xxii

4.9 3rd order results for Couette flow, RK44, DG scheme using Gauss-


4.10 4th order results for Couette flow, RK44, DG scheme using Gauss-


6.1 A visual representation of a quadrilateral element in parent space for

a polynomial order of P = 1. The solution points are marked by

blue circles, the flux points are marked by red squares and left, right,

bottom and top faces are represented by L,R,B,T , respectively. Left

and right states in an interface flux are represented by − and +. . . . 56

7.1 A Cartesian mesh with a red-black element color mapping. The DFR

element stencil (white) relies on information from neighboring elements

from the opposite color. . . . . . . . . . . . . . . . . . . . . . . . . . 79

8.1 The two distinct sparsity patterns used in the two-dimensional Kro-

necker product formulation of the element local Jacobian for DFR,

P = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

8.2 The summation of the two terms in Figure 8.1 produces the sparsity

pattern for the first term, ∇δ ·(∂fele

∂uele

), in the two-dimensional element

local Jacobian for DFR, P = 2. . . . . . . . . . . . . . . . . . . . . . 111

8.3 The cross-terms Kξ (x)Kη (y) and Kη (x)Kξ (y) in the second term,

∇δ ·(∂fele

∂qele

∂qele

∂uele

)produce a dense matrix so the final sparsity pattern of

the two-dimensional element local Jacobian for DFR, P = 2, is fully

dense. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111

8.4 The three distinct sparsity patterns used in the three-dimensional Kro-

necker product formulation of the element local Jacobian for DFR, P = 2.118

8.5 The summation of the three terms in Figure 8.4 produces the sparsity


∂uele

), in the three-dimensional element

local Jacobian for DFR, P = 2. . . . . . . . . . . . . . . . . . . . . . 118

xxiii

8.6 The six cross-terms in the second term, ∇δ ·(∂fele

∂qele

∂qele

∂uele

), produce

three distinct sparsity patterns used in the three-dimensional Kro-

necker product formulation of the element local Jacobian for DFR,

P = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119

8.7 The summation of all terms produces the final sparsity pattern for the

three-dimensional element local Jacobian for DFR, P = 2. . . . . . . 119

9.1 Entropy error vs. 1√nDoF

for inviscid flow over a bump, implicit pseudo

time stepping with two color MCGS. . . . . . . . . . . . . . . . . . . 127

9.2 (48× 16) quadrilateral mesh and pressure contours, inviscid flow over

a bump, implicit pseudo time stepping with two color MCGS, P = 2. 127

9.3 Lift coefficient vs. 1√nDoF

for inviscid flow over the NACA 0012 airfoil,

implicit pseudo time stepping with two and four color MCGS . . . . . 129

9.4 Mesh and pressure contours for inviscid flow over the NACA 0012

airfoil, implicit pseudo time stepping, P = 4. . . . . . . . . . . . . . . 129

9.5 Initial density contours for convection of an isentropic vortex. . . . . 131

9.6 Density error vs. numerical time step for convection of an isentropic

vortex, implicit dual time stepping with two color MCGS. . . . . . . 132

9.7 Drag coefficient vs. h = 1√nDoF

for laminar flow over a Joukowski

airfoil, implicit MCGS. . . . . . . . . . . . . . . . . . . . . . . . . . . 133

9.8 Laminar flow over Joukowski airfoil, implicit pseudo time stepping with

two color MCGS, P = 4, (48× 24) mesh. . . . . . . . . . . . . . . . . 134

9.9 Drag coefficient error vs. h = 1√nDoF

for laminar flow over a Joukowski

airfoil, implicit pseudo time stepping with two color MCGS. . . . . . 135

9.10 Half cylinder mesh with 20,292 unstructured hexahedral elements. Hex-

ahedral elements are created by extruding the quadrilaterals shown

above. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137

9.11 Instantaneous isosurfaces of density colored by Mach number for vis-

cous flow over a half cylinder, Re = 1000, P = 3. . . . . . . . . . . . 138

9.12 Time history of lift and drag coefficient for viscous flow over a half

cylinder, Re = 1000, P = 3. . . . . . . . . . . . . . . . . . . . . . . . 138

xxiv

10.1 Examples of the mesh coloring algorithm. . . . . . . . . . . . . . . . . 144

11.1 Entropy error, inviscid flow over a bump, implicit pseudo time stepping

with two color MCGS, single GPU . . . . . . . . . . . . . . . . . . . 157

11.2 A wall-clock time comparison between the inverse Jacobian computa-

tion and the block iterative method for one ESDIRK4 time step of

viscous flow over a half cylinder, Re = 1000, P = 3, on 12 NVIDIA

Tesla K80 GPUs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160

11.3 A wall-clock time comparison between each section of “Inverse Jaco-

bian” for viscous flow over a half cylinder, Re = 1000, P = 3, on 12

NVIDIA Tesla K80 GPUs. . . . . . . . . . . . . . . . . . . . . . . . . 161

11.4 A wall-clock time comparison between each section of the block itera-

tive method for viscous flow over a half cylinder, Re = 1000, P = 3,

on 12 NVIDIA Tesla K80 GPUs. . . . . . . . . . . . . . . . . . . . . 162

11.5 Speedup relative to one GPU for inviscid flow over the NACA 0012

airfoil, P = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164

xxv

Part I

Numerical Analysis for

Advection-Diffusion

1

Chapter 1

Introduction

Numerical analysis of DG methods has primarily relied on functional analysis. For

example, in [53, 48, 27, 81, 82], error estimates for linear and nonlinear time-dependent

problems were derived using approximation theory. For Flux Reconstruction (FR), a

proof of linear stability was developed [88] for ESFR schemes along with theory on

non-linear stability [47].

The primary tool for analyzing dissipation and dispersion properties has been

Fourier (von Neumann) analysis. For DG methods, this results in eigensolution anal-

ysis because of the eigendecomposition generated from the schemes. In the case of

linear advection, Hu et al. [39] showed that the P th order DG scheme results in one

physical mode and P spurious or parasitic modes which dampen out quickly for an

upwind flux but remain indefinitely for a centered flux. They also showed in [37] that

the order of the dissipation and dispersion error for the physical mode is 2P + 2 and

2P +3, respectively. In [38], these results were extended to quadrilateral and triangu-

lar elements. In the asymptotic limit of small wavenumber kh→ 0, Ainsworth [4, 5]

proved that for a fixed mesh of spacing h, dissipation and dispersion errors decay at

an exponential rate when 2P + 1 ≈ κkh where k is the wavenumber and κ > 1 is

a constant. This was then extended to the second-order wave equation in [3]. More

recently, Moura et al. [62] showed that the spurious modes, in fact, replicate the be-

havior of the physical mode. They also provided estimates of the largest wavenumber

that can be accurately resolved within a set tolerance for Burgers turbulence.

2

CHAPTER 1. INTRODUCTION 3

Eigensolution analysis has also been used to determine the CFL condition. For

example, the CFL condition for the linear advection equation is,

a∆t

h≤ C, (1.1)

where a is the physical wavespeed, ∆t is the physical time step, h is the element size

and C is a constant determined by the eigensolution analysis of the fully discrete

equation. By rearranging Eq. (1.1), a time step estimate is obtain,

∆t ≤ Ch

a. (1.2)

In [23], this condition is used for convection-dominated flows for the Euler and Navier-

Stokes equations. Since the constant C depends on the polynomial order P , it was

proved in [21] that a time step estimate would be linearly stable for a P + 1 order,

P + 1 stage Runge-Kutta method if

C =1

2P + 1, P ≤ 2. (1.3)

It was demonstrated in [23] that this condition holds for P ≥ 2 with less than 5%

error. In [84], Toulorge et al. obtained CFL estimates for linear advection through

eigensolution analysis for triangles. The CFL condition in Eq. (1.1) is often used for

two- and three-dimensional cases where a distinct reference length, h, is used for each

element type but a strict bound has yet to be proposed for two- and three-dimensional

elements.

For FR, Vincent et al. [87] applied eigensolution analysis to study the ESFR

schemes for linear advection and were able to identify the “c+” scheme that offers

the highest CFL limit for a given polynomial. Asthana et al. [9] derived a new set of

linearly stable schemes which have minimal dissipation and dispersion error.

In [12], Asthana, Jameson and I utilized eigensolution analysis to prove that any

FR scheme is consistent for linear advection and any stable FR scheme is convergent.

We also established that in the limit of small wavenumber kh → 0, the rate of

convergence is a function of time with short-time rates, t = 0+, being determined


by interpolation error and long-time rates, t → ∞, being determined by numerical

differentiation error. In [11], we were able to derive analytical estimates for the

rates of convergence of the first and second derivative operators. This lead to the

construction of a new class of superconvergent schemes for centered fluxes called

SFR which includes the nodal DG scheme. We also demonstrated that the rate of

convergence for a steady-state, forced, advection-diffusion problem is the same as the

short-time rate of convergence described in [12] and the rates of convergence for the

first and second derivative.

In Part I of this dissertation, eigensolution analysis of the FR formulation for the

linear advection-diffusion equation is performed. Stability, dissipation and disper-

sion over the entire range of resolvable wavenumbers is investigated to develop an

understanding of how FR behaves when the flow features are under-resolved. The

analysis focuses on different interface flux formulations of the nodal DG scheme on

Gauss-Legendre points although the same technique can be applied to any scheme

within the FR formulation. In pursuit of an accurate, stable time step, a method

for estimating the maximum stable time step for the linear advection-diffusion and

Navier-Stokes equations on unstructured, tensor product elements is provided.

The first part of this dissertation is formatted as follows. In chapter 2, we introduce

the semi-discrete advection-diffusion equation for FR and discuss the dissipation and

dispersion properties associated with nodal DG. In this chapter, we also derive a

measure of relative error in order to compare schemes on the basis of wave-propagation

error. In chapter 3, we utilize the fully discrete advection-diffusion equation and its

stability criteria to compare the CFL restrictions of different equations and schemes.

In this chapter, we also propose a conservative prediction method for the maximum

physical time step and extend it to multidimensional tensor product elements for the

Navier-Stokes equations. Lastly, in chapter 4 we provide the results for 1D and 2D

numerical experiments in order to verify the results found in the previous chapters.

Part I of this dissertation is based on the following publication:

• Jerry Watkins, Kartikey Asthana, and Antony Jameson. A numerical analysis

of the nodal discontinuous Galerkin scheme via Flux Reconstruction for the

advection-diffusion equation. Computers & Fluids, 2016. [93]


and is a continuation of the work provided in:

• Kartikey Asthana, Jerry Watkins, and Antony Jameson. On the rate of con-

vergence of flux reconstruction for steady-state problems. SIAM Journal on

Numerical Analysis, 2016. [11]

• Kartikey Asthana, Jerry Watkins, and Antony Jameson. On consistency and

rate of convergence of Flux Reconstruction for time-dependent problems. Jour-

nal of Computational Physics, 2017. [12]

Chapter 2

Eigensolution Analysis

This chapter is split into two sections. In the first section, an eigensolution analysis

is performed on the 1D FR formulation for the advection-diffusion equation and the

dissipation and dispersion properties for the nodal DG scheme are discussed. In

the second section, a measure of relative error is derived and different schemes are

compared based on wave-propagation error.

2.1 Eigensolution Analysis for

1D Advection-Diffusion

We begin by demonstrating that eigensolution analysis provides a purely element-

local description of the 1D semi-discrete linear advection-diffusion equation. The

derived linear dynamical system can then be solved analytically to obtain the analyt-

ical solution to the semi-discrete equation. The stability, dissipation and dispersion

properties of any FR scheme follow directly from the corresponding eigensolutions

[8]. This section concludes with a short discussion on the dissipation and dispersion

error of the nodal DG scheme on Gauss-Legendre points for the advection-diffusion

equation.

6

CHAPTER 2. EIGENSOLUTION ANALYSIS 7

2.1.1 Problem specification

Consider the 1D linear advection-diffusion equation,

∂u

∂t+ a

∂u

∂x= b

∂2u

∂x2, x ∈ R, t > 0, (2.1)

where a ∈ R is the constant wavespeed and b > 0 is the diffusion coefficient. A

periodic initial condition is introduced as an isolated Fourier component, u(x, 0) =

exp(ikx), where k > 0 is the wavenumber. Using h as the length scale and h2/b as

the time scale, Eq. (2.1) and the initial condition can be nondimensionalized as,

∂u

∂t+ a

∂u

∂x=∂2u

∂x2, x ∈ R, t > 0, (2.2)

u(x, 0) = exp(ikx),

where the nondimensional parameters are x = x/h, t = tb/h2, k = kh, and a = ah/b.

Following a traditional nodal finite element method [40], the domain is partitioned

into non-overlapping elements, Ω =⋃n

Ωn where Ωn = x|xn ≤ x < xn+1 = [xn, xn+1)

and the sequence of nodes, xn, is increasing on the real line. The element is

further discretized into P + 1 distinct solution points xn = [xn,1, xn,2, . . . , xn,P+1]T

where P is the degree of the piecewise interpolating polynomial representing the

numerical solution in a given element. To reduce complexity, a uniform element size

of h = xn+1 − xn,∀n is used. A linear isoparametric mapping is introduced from the

physical domain x ∈ Ωn to the parent domain ξ ∈ [−1, 1) such that

ξ|Ωn(x) = 2(x− xn)− 1 x ∈ [xn, xn+1). (2.3)

The distribution of solution points across all elements in the parent domain is kept

the same, ξ = (ξp)p=1,2,...,P+1, so that the numerical solution in the nth element can

be represented as

uδn(ξ, t) =P+1∑p=1

uδn,p(t)`p(ξ), ξ ∈ [−1, 1), (2.4)


where `p is the pth Lagrange polynomial.

Assuming exact time integration, Eq. (2.2) leads to a purely element-local descrip-

tion of the semi-discrete numerical equation,

duδndt

+ 2aQ1(k)uδn = 4Q2(k)uδn, ∀n, t > 0, (2.5)

uδn(0) = exp(ikxn)w0(k),

where uδn(t) = (uδn,p(t))p=1,2,...,P+1 denotes the vector of solution values, w0(k) is the

collocation projection of the initial condition onto the solution points,

w0(k) =

(exp

(ik

1

2(1 + ξp)

))p=1,2,...,P+1

, (2.6)

and Q1(k) ∈ C(P+1)×(P+1) and Q2(k) ∈ C(P+1)×(P+1) are the first order and second

order numerical differentiation operators in parent space. The definition of these

operators depend on the numerical scheme being used and the constants 2 and 4

come from the transformation of the numerical differentiation operator from parent

to physical space.

2.1.2 The Flux Reconstruction method

Under the Flux Reconstruction method, the first numerical differentiation operator

can be obtained by first defining a globally continuous, piecewise numerical solution

polynomial of degree P +1 using correction function polynomials that convey bound-

ary information to the interior of the element [41]. The numerical derivative is then

obtained by differentiating this globally continuous solution,

aδuδnδx

(ξ, t) = 2

[a∂uδn∂ξ

(ξ, t) + (auδL − auδn(−1, t))dgLdξ

(ξ)

+ (auδR − auδn(+1, t))dgRdξ

(ξ)

], ξ ∈ [−1, 1), (2.7)


where gL(ξ), gR(ξ) ∈ PP+1 are the left-boundary and right-boundary correction func-

tions in the parent space which satisfy the constraints,

gL(−1) = gR(+1) = 1,

gL(+1) = gR(−1) = 0, (2.8)

and can recover nodal DG, SD and various other high-order formulations [41, 88]. The

common interface fluxes, auδL and auδR, are computed from the polynomial functions

on either side of the interface,

uδL = (1− α)uδn−1(+1, t) + αuδn(−1, t),

uδR = (1− α)uδn(+1, t) + αuδn+1(−1, t), (2.9)

where the upwinding coefficient, α, determines the type of common interface value

used. A one-sided, upwinding flux is obtained for α = 0.0 and a centered flux is

obtained for α = 0.5. Eq. (2.7) and Eq. (2.9) are now utilized to construct the

numerical derivative,

δuδnδx

(ξ, t) = 2

[∂uδn∂ξ

(ξ, t) + (1− α)(uδn−1(+1, t)− uδn(−1, t))dgLdξ

(ξ)

+ α(uδn+1(−1, t)− uδn(+1, t))dgRdξ

(ξ)

], ξ ∈ [−1, 1). (2.10)

This can then be transformed to a matrix-vector operation on uδn,

δuδ

δx

∣∣∣∣n

=1∑

j=−1

Cjuδn+j , (2.11)

where Cj ∈ R(P+1)×(P+1) is given by

C−1 = (1− α)gL,ξ`T+,

C0 = D − (1− α)gL,ξ`T− − αgR,ξ`T+,

C+1 = αgR,ξ`T−.


Here D ∈ R(P+1)×(P+1) is the polynomial differentiation operator such that

Dp,m =d`mdξ

(ξp), p,m = 1, 2, . . . , P + 1, (2.12)

where `m is the mth Lagrange polynomial in the parent domain. gL,ξ, gR,ξ ∈ R(P+1)×1

are the derivatives of the left-boundary and right-boundary correction functions at

the solution points, and `−, `+ ∈ R(P+1)×1 are the extrapolated values of the P + 1

Lagrange basis polynomials,

`+p = `p(+1), `−p = `p(−1), p = 1, 2, . . . , P + 1. (2.13)

The second numerical derivative can be constructed by performing the same operation

on the first numerical derivative. Let vδn = δuδ

δx

∣∣n, then,

δ2uδ

δx2

∣∣∣∣n

=δvδ

δx

∣∣∣∣n

=1∑

j=−1

Cjvδn+j . (2.14)

The sinusoidal form of the initial condition in Eq. (2.5) provides the displacement

relations, uδn−1 = exp(−ik)uδn, uδn+1 = exp(ik)uδn, which can then be used to obtain

the numerical differentiation operators,

Q1(k) =1∑

j=−1

C(1,1)j exp(ikj),

Q2(k) =2∑

j=−2

Bj exp(ikj), (2.15)

where Bj ∈ R(P+1)×(P+1) is given by

B−2 = C(2,2)−1 C

(2,1)−1 ,

B−1 = C(2,2)−1 C

(2,1)0 +C

(2,2)0 C

(2,1)−1 ,

B0 = C(2,2)+1 C

(2,1)−1 +C

(2,2)0 C

(2,1)0 +C

(2,2)−1 C

(2,1)+1 ,

B+1 = C(2,2)+1 C

(2,1)0 +C

(2,2)0 C

(2,1)+1 ,


B+2 = C(2,2)+1 C

(2,1)+1 .

Here (1, 1), (2, 1) and (2, 2) are identifiers used to differentiate between different

correction procedures in the advection and diffusion terms. The first index in the

superscript refers to the first or second numerical differentiation operator while the

second index refers to the correction procedure for the discontinuous solution or the

discontinuous first derivative. The identifier defines the upwinding coefficient and

correction functions used during the correction procedure, for example,

C(1,1)+1 = α(1,1)g

(1,1)R,ξ `

T−,

C(2,1)+1 = α(2,1)g

(2,1)R,ξ `

T−,

C(2,2)+1 = α(2,2)g

(2,2)R,ξ `

T−.

2.1.3 Analytical solution of the semi-discrete and exact equa-

tions

The analytical solution to the semi-discrete numerical equation, Eq. (2.5), can be

found by exactly integrating the linear dynamical system in time using matrix fac-

torization. The semi-discrete numerical equation can be rewritten as

duδndt

+R(a, k)uδn = 0, ∀n, t > 0, (2.16)

where R(a, k) ∈ C(P+1)×(P+1) and,

R(a, k) = 2aQ1(k)− 4Q2(k). (2.17)

Assuming R(a, k) is diagonalizable,

R(a, k) = W (a, k)Γ(a, k)W−1(a, k), (2.18)

where Γ is the diagonal matrix of eigenvalues γp(a, k) ∈ C for p = 1, 2, . . . , P + 1

and W ∈ CP+1×P+1 is the dense matrix containing eigenvectors of the differentiation


operator. The initial condition can also be expanded in the basis of the eigenvectors,

w0(k) = W (a, k)β(a, k), (2.19)

where βp(a, k) ∈ C is the expansion coefficient along the pth column of W , wp(a, k),

for p = 1, 2, . . . , P + 1. The solution to Eq. (2.5) can now be written as

uδn(t) = exp(−tR)uδn(0)

= W exp(−tΓ)W−1 exp(ikxn)Wβ

= exp(ikxn)W exp(−tΓ)β

= exp(ikxn)P+1∑p=1

exp(−γpt)βpwp. (2.20)

This shows that the numerical solution is a superposition of P+1 eigenmodes along the

eigenvectors of R weighted by the expansion coefficients, βp, of the initial condition.

The time evolution of these modes is dictated by the eigenvalues γp.

The analytical solution to the exact equation, Eq. (2.2), on the vector of so-

lution points xn = xn + 12(1 + ξ) can be found by sampling the functional form

exp(ikx) exp(−(aik + k2)t),

un(t) = exp(ikxn) exp(−(aik + k2)t)P+1∑p=1

βpwp, (2.21)

2.1.4 Stability, dissipation, dispersion and modal weights

The stability, dissipation and dispersion properties of the numerical scheme can be

determined directly from the eigenvalues and eigenvectors of R. The condition for

numerical stability can be written as

γRep (a, k) ≥ 0 for p = 1, 2, . . . , P + 1, (2.22)

which must hold for k ∈ [0, (P + 1)π] where the upper bound corresponds to the

Nyquist limit for the mesh. The criteria for numerical dissipation and dispersion can


be determined from the equation for absolute error of the numerical solution,

en(t) = uδn(t)− un(t)

= exp(ikxn) exp(−(aik + k2)t)P+1∑p=1

[exp(−(γp − (aik + k2))t)− 1

]βpwp.

(2.23)

For the initial condition of an isolated Fourier component, each eigenmode contributes

a certain amount of numerical dissipation and dispersion to the numerical solution.

The numerical dissipation in an eigenmode is characterized by γRep (a, k)− k2 > 0 and

leads to a mode which decays in amplitude more rapidly than the prescribed decay

rate of physical diffusion. An eigenmode can also exhibit numerical anti-dissipation

if 0 ≤ γRep (a, k) < k2. In this case, the mode decays in amplitude at a slower rate.

Numerical dispersion is present in an eigenmode if γImp (a, k) − ak 6= 0. This causes

the mode to propagate at an incorrect speed.

Since all P +1 eigenmodes are present in the numerical solution, it’s important to

determine how each mode contributes quantitatively through the modal weights, βp.

This quantity can best be described as a distribution of energy among all eigenmodes

[9, 8] and is important in determining whether the solution will decay or propagate

at an incorrect speed.

2.1.5 Dissipation and dispersion error of nodal DG for

advection-diffusion

The FR formulation can be used to recover various high-order schemes. In this study,

we will focus our analysis on the nodal DG scheme on Gauss-Legendre points with

different interface flux formulations. Table 2.1 defines the set of upwinding coefficients

used to obtain common interface values for four different types of interface flux for-

mulations. Since the nodal DG correction functions are used, the flux formulations

for diffusion are able to recover two common schemes within the DG community:

BR1 introduced by Bassi and Rebay [14] and local Discontinuous Galerkin (LDG)


introduced by Cockburn and Shu [22]. For simplicity, the penalty term often used to

stabilize the schemes [35] will be omitted so that LDG becomes minimal dissipation

LDG [19].

Scheme α(1,1) α(2,1) α(2,2) Common name for DG scheme

Centered-centered 0.5 0.5 0.5 centered advection - BR1 diffusionUpwind-centered 0.0 0.5 0.5 upwinded advection - BR1 diffusionCentered-one-sided 0.5 1.0 0.0 centered advection - LDG diffusionUpwind-one-sided 0.0 1.0 0.0 upwinded advection - LDG diffusion

Table 2.1: Upwinding coefficients and common names for DG schemes with differentinterface flux formulations.

The numerical dissipation and dispersion properties for the solution of advection-

diffusion problems is dependent on the non-dimensional parameter a. For most results

in this part of the dissertation, a non-dimensional parameter of a = 10 is chosen for

illustration. Figures 2.1, 2.2 and 2.3 show the dissipation, dispersion and modal

weight of each eigenmode for a centered-centered and upwind-one-sided schemes of

polynomial order, P = 2. We see that for one of the modes, designated mode

0 0.5 1 1.5 2 2.5 3

−80

−60

−40

−20

0

20

40

60

80

k/(P + 1)

γRe

p−

k2

Mode 1Mode 2Mode 3ExactStability Limit

(a) Centered-centered scheme:α(1,1) = 0.5, α(2,1) = 0.5, α(2,2) = 0.5

0 0.5 1 1.5 2 2.5 3

−50

0

50

100

150

200

250

k/(P + 1)

γRe

p−

k2

Mode 1Mode 2Mode 3ExactStability Limit

(b) Upwind-one-sided scheme:α(1,1) = 0.0, α(2,1) = 1.0, α(2,2) = 0.0

Figure 2.1: Numerical dissipation in each eigenmode p = 1, 2, 3 for the DG schemeof order P = 2 using Gauss-Legendre solution points solving the advection-diffusionequation, a = 10.


0 0.5 1 1.5 2 2.5 3

−140

−120

−100

−80

−60

−40

−20

0

20

40

60

80

k/(P + 1)

γIm p

−ak

Mode 1Mode 2Mode 3Exact


0 0.5 1 1.5 2 2.5 3

−140

−120

−100

−80

−60

−40

−20

0

20

40

60

80

k/(P + 1)

γIm p

−ak

Mode 1Mode 2Mode 3Exact


Figure 2.2: Numerical dispersion in each eigenmode p = 1, 2, 3 for the DG schemeof order P = 2 using Gauss-Legendre solution points solving the advection-diffusionequation, a = 10.

0 0.5 1 1.5 2 2.5 3

0

0.5

1

1.5

2

2.5

3

k/(P + 1)

|βp|2

Mode 1Mode 2Mode 3


0 0.5 1 1.5 2 2.5 3

0

1

2

3

4

5

6

7

k/(P + 1)

|βp|2

Mode 1Mode 2Mode 3


Figure 2.3: Energy distributed among each eigenmode p = 1, 2, 3 for the DG schemeof order P = 2 using Gauss-Legendre solution points solving the advection-diffusionequation, a = 10.

1, the dissipation and dispersion errors vanish in the asymptotic limit of k → 0,

and the squared modal weight |β1|2 → P + 1. This mode is defined as the physical

mode and its existence and uniqueness have been proven for advection problems


[8]. Regarding the effect of interface fluxes, we see that both schemes have little

dispersion or dissipation for low wavenumbers. For higher wavenumbers, the physical

mode of the upwind-one-sided scheme remains dominant and the scheme is more

dissipative. On the other hand, the physical mode of the centered-centered scheme

loses its dominance as can be seen from Figure 2.3. Moreover, close to the Nyquist

limit, all the modes for this scheme are anti-dissipative.

Even though the chosen flux is a linear function of the state, the state itself

is sinusoidal and cannot be expressed in any finite-dimensional polynomial basis.

Correspondingly, the location of solution points has a direct impact on the distribution

of the initial condition among the P + 1 eigenmodes. This can be observed from

Figure 2.4 which plots the distribution of energy among the 3 eigenmodes for the

centered-centered DG scheme with P = 2, advection-diffusion parameter a = 10,

using Gauss-Legendre, equidistant and Gauss-Lobatto points respectively.

0 0.5 1 1.5 2 2.5 3

0

0.5

1

1.5

2

2.5

3

k/(P + 1)

|βp|2

Mode 1Mode 2Mode 3

(a) Gauss-Legendre

0 0.5 1 1.5 2 2.5 3

0

0.5

1

1.5

2

2.5

3

k/(P + 1)

|βp|2

Mode 1Mode 2Mode 3

(b) Equidistant

0 0.5 1 1.5 2 2.5 3

0

0.5

1

1.5

2

2.5

3

k/(P + 1)

|βp|2

Mode 1Mode 2Mode 3

(c) Gauss-Lobatto

Figure 2.4: Energy distributed among each eigenmode p = 1, 2, 3 for the centered-centered, DG scheme of order P = 2 solving the advection-diffusion equation, a = 10.

2.2 Spectral Comparison

This section provides an investigation into the relative error generated by the nodal

DG scheme with different interface flux formulations for the full range of resolvable

wavenumbers. A resolving efficiency is defined in order to measure the fraction of

resolved waves that propagate with minimal error. We see that the centered-centered

scheme produces the least amount of relative error for a well resolved solution while


the centered-one-sided scheme leads to the least amount of relative error for solutions

that are under-resolved.

2.2.1 Relative error and resolving efficiency

A measure of relative error can be derived by using the vector `2 norm of absolute

error. The use of a pointwise error is motivated by the nodal nature of the FR

formulation and is different from the L2 functional norm that measures integrated

error over a fixed domain. Consider the vector `2 norm of the absolute error using

Eq. (2.23),

‖en(t)‖`2 = ‖uδn(t)− un(t)‖`2

= exp(−k2t)

∥∥∥∥∥P+1∑p=1

[exp(−(γp − (aik + k2))t)− 1

]βpwp

∥∥∥∥∥`2

, (2.24)

where we have used that | exp(ζ)| = 1 if ζ is purely imaginary. Similarly, the norm

of the analytical solution to the exact equation, Eq. (2.21) is given by

‖un(t)‖`2 = exp(−k2t) ‖w0‖`2 ,

= exp(−k2t)

√√√√P+1∑p=1

∣∣∣∣exp

(ik

1

2(1 + ξp)

)∣∣∣∣2,= exp(−k2t)

√P + 1. (2.25)

A relative error can then be defined by taking the difference between Eq. (2.24) and

(2.25),

‖en(t)‖`2‖un(t)‖`2

=1√P + 1

∥∥∥∥∥P+1∑p=1

[exp(−(γp − (aik + k2))t)− 1

]βpwp

∥∥∥∥∥`2

. (2.26)


The triangle inequality can be used to eliminate the dependency on the eigenvectors,

‖en(t)‖`2‖un(t)‖`2

≤ 1√P + 1

P+1∑p=1

∥∥[exp(−(γp − (aik + k2))t)− 1]βpwp

∥∥`2,

=1√P + 1

P+1∑p=1

∣∣exp(−(γp − (aik + k2))t)− 1∣∣ |βp|. (2.27)

Then in the limit of t→ 0,

limt→0

‖en(t)‖`2‖un(t)‖`2

≤ 1√P + 1

P+1∑p=1

∣∣γp − (aik + k2)∣∣ |βp| t+O(t2), (2.28)

which gives an upper bound on the initial relative error. In other words, the initial

slope of the relative error can be written as

limt→0

1

t

‖en(t)‖`2‖un(t)‖`2

≤ 1√P + 1

P+1∑p=1

∣∣γp − (aik + k2)∣∣ |βp| = φ(a, k), (2.29)

which shows that the initial growth of error is determined by the products of eigen-

value errors and modal weights of the system. Based on [9, 54], a resolving efficiency

can now be defined as

η =kf

(P + 1)π, (2.30)

where kf is such that

φ(a, k) ≤ ε for k ≤ kf . (2.31)

η measures the fraction of resolved waves that are propagated with minimal error.

The initial growth rate of error for these waves are guaranteed to be less than the

specified slope tolerance, ε.


2.2.2 Spectral comparison of nodal DG with different inter-

face fluxes

Figure 2.5 plots the initial slope of the relative error, Eq. (2.29), for the four types

of interface flux formulations in Table 2.1 for an advection-diffusion problem with

a = 10. We see that for low wavenumbers or well resolved waves, the centered-

10−3

10−2

10−1

100

10−10

10−5

100

k/(P + 1)

φ

Centered-centeredUpwind-centeredCentered-one-sidedUpwind-one-sided

(a) Log Scale

0 0.5 1 1.5 2 2.5 30

50

100

150

200

250

300

k/(P + 1)

φ


(b) Linear Scale

Figure 2.5: Upper bound on initial slope of the relative error vs. nondimensionalwavenumber for the DG scheme of order P = 2 using Gauss-Legendre solution pointssolving the advection-diffusion equation, a = 10.

centered scheme produces the least amount of error. For larger wavenumbers up to

k/(P + 1) ≈ 2, the centered-one-sided scheme generates the least amount of error.

Table 2.2 lists the resolving efficiencies for different interface flux formulations and

polynomial orders. A slope tolerance of ε = 0.1 is used to compare the ability of a

scheme to capture well resolved waves. We see that the centered-centered scheme

produces a better resolving efficiency for even or high polynomial orders while the

centered-one-sided scheme produces a higher resolving efficiency for odd polynomials

of low order. Interestingly, the commonly used upwind-one-sided scheme is not the

best for any polynomial order.


P Centered-centered Upwind-centered Centered-one-sided Upwind-one-sided

1 0.0296 0.0297 0.0469 0.04642 0.0887 0.0650 0.0684 0.05073 0.0872 0.0842 0.0932 0.08844 0.1145 0.1120 0.1091 0.10345 0.1482 0.1350 0.1245 0.1211

Table 2.2: Resolving efficiencies, ε = 0.1, for the DG scheme using Gauss-Legendresolution points solving the advection-diffusion equation, a = 10.

Chapter 3

CFL Restrictions and Time Step

Estimates

This chapter is split into two sections. In the first section, the fully discrete equation

and its stability criteria are utilized to compare the CFL restrictions of different

schemes and equations. In the second section, a conservative prediction method for

the maximum physical time step is proposed and extended to multidimensional tensor

product elements for the Navier-Stokes equations.

3.1 Time Integration and CFL Restrictions

This section discusses the CFL restriction of nodal DG with different interface flux

formulations for the advection, diffusion and advection-diffusion equations. Towards

this end, the fully discrete advection-diffusion equation is constructed for a general ex-

plicit, M-stage Runge-Kutta (RK) type scheme and the stability criterion is imposed

to solve for the maximum nondimensional time step of the discrete linear dynamical

system. We show that the coupling of advection and diffusion leads to a stricter CFL

limit compared to pure-advection and pure-diffusion. We also show that the centered-

centered scheme has the least restrictive CFL limit with a limit around 5 times larger

than the CFL limit for the upwind-one-sided scheme for certain wavenumbers.

21

CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 22

3.1.1 Multi-stage explicit time integration scheme

The fully discrete numerical update equation can be obtained from Eq. (2.16)

δuδnδt

= −R(a, k)uδn, ∀n, t > 0. (3.1)

In the case of a general explicit, M-stage Runge-Kutta (RK) time integration scheme,

the numerical solution can be written as

uδn(t+ ∆t) = S(a, k,∆t)uδn(t), ∀n, t > 0, (3.2)

where ∆t is the numerical time step and S(a, k,∆t) is the numerical integration

operator,

S(a, k,∆t) =M∑m=0

(−1)mνmm!

∆tmR(a, k)m, (3.3)

where νm for m = 0, 1, . . . ,M are coefficients of the RK scheme and M is the number

of stages.

3.1.2 Stability Criteria

The stability criteria of the discrete linear dynamical system in Eq. (3.2) can be

written as

ρ(S(a, k,∆t)) ≤ 1 ∀k ∈ [0, (P + 1)π], (3.4)

where ρ is the spectral radius. The computation of the spectral radius requires the

eigenvalues of S(a, k,∆t) which can be found by using the eigendecomposition from

Eq. (2.18) as

S(a, k,∆t) = W (a, k)

[M∑m=0

(−1)mνmm!

∆tmΓ(a, k)m

]W−1(a, k),

= W (a, k)Λ(a, k,∆t)W−1(a, k) (3.5)


where Λ is the diagonal matrix of eigenvalues λp(a, k,∆t) ∈ C for p = 1, 2, . . . , P + 1.

The maximum time step can then be obtained as the solution to

maximize ∆t

subject toP+1maxp=1|λp(a, k,∆t)| ≤ 1, ∀k ∈ [0, (P + 1)π], (3.6)

for a given time integration scheme and nondimensional wavespeed a. To find the

stablity criteria, the nondimensional timestep is transformed back into a physical time

step using the transformation introduced in the beginning of chapter 2,

∆t ≤ ∆tmax = ∆tmaxh2

b. (3.7)

In the sections to follow, we use a standard 4 stage, 4th order Runge-Kutta (RK44)

scheme for illustration. A bisection method is used to find ∆tmax.

3.1.3 CFL restrictions on advection-diffusion

The maximum physical time step, ∆tmax, is used to compare the CFL restrictions

for all wavenumbers for advection, diffusion and advection-diffusion problems. Figure

3.1 plots the maximum physical time step for different equations using P = 2 and

P = 3. The figure shows that the coupling of advection and diffusion decreases the

maximum physical time step for all wavenumbers leading to a stricter CFL limit.

Hence, choosing the minimum of the maximum stable time steps for the advection

equation and diffusion equation is inadequate when choosing a time step for the

advection-diffusion equation.

3.1.4 CFL restrictions of nodal DG with different interface

fluxes

Figure 3.2 plots the maximum nondimensional time step for the four types of interface

flux formulations in Table 2.1 using P = 2 and P = 3 for an advection-diffusion prob-

lem with a = 10. We see that the centered-centered scheme has the least restrictive


0 0.5 1 1.5 2 2.5 30

0.05

0.1

0.15

0.2

0.25

0.3

k/(P + 1)

∆t m

ax

Advection, a = 10

Diffusion, b = 1Advection-Diffusion, a = 10

(a) P = 2

0 0.5 1 1.5 2 2.5 30.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

0.05

0.055

0.06

k/(P + 1)

∆t m

ax

Advection, a = 10

Diffusion, b = 1Advection-Diffusion, a = 10

(b) P = 3

Figure 3.1: Maximum physical time step, ∆tmax, vs. nondimensional wavenumberfor the RK44, DG, centered-centered scheme using Gauss-Legendre solution points.

0 0.5 1 1.5 2 2.5 30

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09

0.1

k/(P + 1)

∆t m

ax


(a) P = 2

0 0.5 1 1.5 2 2.5 30

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

k/(P + 1)

∆t m

ax


(b) P = 3

Figure 3.2: Maximum nondimensional time step, ∆tmax, vs. nondimensionalwavenumber, k/(P + 1), for the RK44, DG scheme using Gauss-Legendre solutionpoints solving the advection-diffusion equation, a = 10.

CFL limit while the upwind-one-sided scheme has the most restrictive CFL limit. In

fact, the CFL limit of the centered-centered scheme is larger than the upwind-one-

sided scheme for all wavenumbers and around 5 times larger for certain wavenumbers.


3.2 Time Step Estimates

This section begins by showing that the minimum of the maximum stable time steps

for pure-advection and pure-diffusion is inadequate when choosing a maximum stable

time step for the advection-diffusion equation. However, a conservative prediction

can be obtained from the harmonic sum of the maximum stable time steps for the

advection and diffusion equations. We propose extensions to multidimensional tensor

product elements in structured as well as unstructured grids and to the Navier-Stokes

equations.

3.2.1 Conservative prediction method

The stability criterion in Eq. (3.7) gives an exact value for the maximum physical

time step for an advection-diffusion problem but requires the knowledge of a multi-

variable, nonlinear function ∆tmax which changes depending on the time integration

scheme, choice of correction function, common interface value, polynomial order and

non-dimensional parameter a = ah/b. Several prediction methods are often used in

an effort to reduce the dependency of ∆tmax on a with the hopes that the method is

conservative. One method is to use the minimum of the maximum stable time steps

for pure-advection and pure-diffusion,

∆t ≤ min

(∆tadvmax

h

a, ∆tdiffmax

h2

b

)or ∆tmax = min

(∆tadvmax

1

a, ∆tdiffmax

),

(3.8)

where ∆tadvmax and ∆tdiffmax are the maximum nondimensional time steps for the ad-

vection and diffusion equations, respectively. These values can be determined by

removing the diffusion or advection term in the fully discrete equation and solving

the optimization problem in Eq. (3.6). Table 3.1 provides these values for a variety

of common interface values and polynomial orders.

A more conservative method that is used for the Navier-Stokes equation [35, 60, 59]

is the harmonic sum of the maximum stable time steps for the advection equation


∆tadvmax ∆tdiffmax

P Centered Upwind Centered One-sided

1 0.707 0.464 0.174 0.07732 0.349 0.235 0.0426 0.01873 0.213 0.145 0.0158 0.006344 0.143 0.100 0.00719 0.002665 0.103 0.0736 0.00373 0.00129

Table 3.1: Maximum nondimensional time steps for the RK44, DG scheme usingGauss-Legendre points.

and the diffusion equation,

∆t ≤ 1a

∆tadvmax h+ b

∆tdiffmax h2

or ∆tmax =1

a∆tadvmax

+ 1

∆tdiffmax

. (3.9)

Figure 3.3 shows how these methods compare with the exact maximum nondimen-

sional time step for P = 2. The estimate in Eq. (3.8) is at or above the maximum

10−2

100

102

104

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04

0.045

a = ah/b

∆t m

ax

MinimumHarmonicExact


10−2

100

102

104

0

0.002

0.004

0.006

0.008

0.01

0.012

0.014

0.016

0.018

0.02

a = ah/b

∆t m

ax

MinimumHarmonicExact


Figure 3.3: Maximum nondimensional time step estimates vs. nondimensionalwavespeed for the RK44, DG scheme of order P = 2 using Gauss-Legendre solu-tion points.

allowable timestep for all values of a clearly showing that it fails as a conservative


prediction method. On the other hand, the harmonic sum gives conservative predic-

tions for the full range of advection-diffusion problems. Moreover, in the case of the

upwind-one-sided scheme, a prediction based on the harmonic sum almost exactly

matches the exact CFL limit. These observations hold for other choices of common

interface values and polynomial orders as well, showing that Eq. (3.9) can be used as

a conservative estimate for the CFL restriction. Figure 3.3 also shows that a ≈ 10

is near a region where the coupling of advection and diffusion has a dramatic effect

on the CFL limit of the scheme. In the case of the centered-centered scheme, the

harmonic sum is still conservative but loses accuracy near a ≈ 10.

3.2.2 Extension to tensor product elements

The 1D FR formulation for advection-diffusion can be extended to quadrilateral and

hexahedral elements as discussed in [41, 42, 15]. This section proposes a method

for extending the time step estimate from Eq. (3.9) to non-curved tensor product

elements. For simplicity, the discussion focuses on rectilinear and quadrilateral meshes

but the methodology can be easily extended to hexahedral elements.

Consider a multidimensional linear advection-diffusion problem,

∂u

∂t+

Nd∑d=1

ad∂u

∂xd= b

Nd∑d=1

∂2u

∂x2d

, xd ∈ R, t > 0,

where Nd is the number of spatial dimensions, ad is the advection coefficient in the

dth dimension and the domain is partitioned into Neles non-overlapping, conforming

tensor product elements, Ω =Neles⋃n=1

Ωn.

In the special case of a rectangular element, the element domain is defined by Ωn =

[xd,n, xd,n+1) for d = 1, 2, . . . , Nd. Each element is further discretized into (P +1)Nd

distinct solution points where each point belongs to Nd sets of P + 1 collinear points

allowing for a simple extension of the one-dimensional case to multiple dimensions.

The rectangular element size is defined by Nd side lengths of hd = xd,n+1 − xd,n for

d = 1, 2, . . . , Nd and a linear isoparametric mapping is again used on the physical

domain. The fully discrete numerical equation can now be constructed by following


the same procedure as before,

δuδnδt

= −Ruδn, ∀n, t > 0. (3.10)

where R is defined as,

R(a1, . . . , aNd , b, k1, . . . , kNd , h1, . . . , hNd) =

Nd∑d=1

[2adhdQ1d(kdhd)−

4b

h2d

Q2d(kdhd)

],

(3.11)

and the equation can again be integrated using a general explicit, M-stage RK scheme

to produce a similar stability criteria as Eq. (3.4). The conservative prediction method

for ∆t from Eq. (3.9) can now be extended, in a similar manner, to estimate the time

step of the problem,

∆t ≤ 1Nd∑d=1

[ad

∆tadvmax hd+ b

∆tdiffmax h2d

] . (3.12)

In the case of a non-rectangular element, we can construct a maximum time

step estimate as the harmonic sum of estimates for each dimension. Each edge of the

quadrilateral element contains P +1 distinct flux points for a total of Nfpts = 4(P +1)

flux points. A time step estimate on each flux point can be computed as,

∆tm =1

|am|∆tadvmax hm

+ b

∆tdiffmax h2m

, m = 1, 2, . . . , Nfpts, (3.13)

where |am| is the absolute value of the advection coefficient normal to the face and

hm are distances between flux points defined in Figure 3.4, for example, h1 = h6 =

h1,6. The minimum time step is then computed on each face. From Figure 3.4, the

minimum time steps computed on the west, east, south and north faces are

∆tW = min(∆t7,∆t8

), ∆tE = min

(∆t3,∆t4

), (3.14)

∆tS = min(∆t1,∆t2

), ∆tN = min

(∆t5,∆t6

). (3.15)

Finally from Eq. (3.9), the maximum time step estimate can be based on a harmonic


rr

r

r

rr

r

S1

2

3

4

56

7

8

r

N

EW

h1,6

h2,5

h3,8

h4,7

Figure 3.4: A visual representation of a quadrilateral element for a polynomial orderof P = 1. The flux points are marked by red squares and west, east, south and northfaces are represented by W,E, S,N , respectively. The distances between flux points,h1,6, h3,8, h2,5, h4,7 are used to estimate the maximum time step in the element.

sum of the minimum time step for each dimension,

∆t ≤ 11

min (∆tW ,∆tE)+ 1

min (∆tS ,∆tN)

. (3.16)

3.2.3 Extension to Navier-Stokes equations

Consider the unsteady, two-dimensional, compressible Navier-Stokes equations in con-

servative form,

∂U

∂t+∂Finv

∂x1

+∂Ginv

∂x2

=∂Fvisc

∂x1

+∂Gvisc

∂x2

(3.17)

where,

U =

ρ

ρu

ρv

e

, Finv =

ρu

ρu2 + p

ρuv

(e+ p)u

, Ginv =

ρv

ρvu

ρv2 + p

(e+ p)v

,


Fvisc =

0

σ11

σ12

uσ11 + vσ21 − q1

, Gvisc =

0

σ21

σ22

uσ12 + vσ22 − q2

, (3.18)

where ρ is density, u, v are the velocity components in the x1, x2 directions, respec-

tively, and e is total energy per unit volume. The pressure is determined from the

equation of state,

p = (γ − 1)

(e− 1

2ρ(u2 + v2

)), (3.19)

where γ is the ratio of specific heats. For a Newtonian fluid, the viscous stresses are

σij = µ

(∂ui∂xj

+∂uj∂xi

)− 2

3µδij

∂uk∂xk

, (3.20)

and the heat fluxes are

qi = −k ∂T∂xi

, (3.21)

where k = Cpµ/Pr, T = p/(ρR), Pr is the Prandtl number, Cp is the specific heat at

constant pressure, R is the gas constant and µ is the dynamic viscosity.

As before, a time step estimate on each flux point can be computed as,

∆tm =1

|λm|∆tadvmax hm

+ νm∆tdiffmax h2

m

, m = 1, 2, . . . , Nfpts, (3.22)

where λm and νm are the numerical wavespeeds and numerical diffusion coefficients

on each flux point, respectively. From [59], these can be estimated as,

|λm| = |Vm|+ cm, νm = max

(µmρm

,γµmPrρm

), m = 1, 2, . . . , Nfpts, (3.23)

where Vm is the velocity normal to the face and cm =√

γpmρm

is the speed of sound.

The maximum time step can then be estimated from Eq. (3.16).

Chapter 4

Numerical Experiments

In this chapter, we verify the results obtained in the previous chapters by solving

the advection-diffusion and Navier-Stokes equations on 1D uniform, 2D rectilinear

and 2D unstructured grids. The numerical tests for advection-diffusion confirm that

the centered-centered scheme generates the least amount of error for well resolved

solutions while the centered-one-sided scheme generates the least amount of error

for solutions that are under-resolved. For the Navier-Stokes equations, P = 2, the

upwind-centered scheme generates the least amount of error for well resolved solutions

while the upwind-one-sided scheme generates the least amount of error for solutions

that are under-resolved. These results match the expectations in Section 2.2. The

time step estimates established in Section 3.2 are also verified and the formula in Eq.

(3.12) is shown to be an effective prediction for the maximum stable time step for

both the advection-diffusion and Navier-Stokes equations on Cartesian grids. In the

case of unstructured grids, this time step estimate is not always conservative but it

is still fairly accurate. The estimates are shown to be accurate within 50% error on

all test cases and conservative on tests with Cartesian grids.

In the numerical tests that follow, the correction functions and solution point

locations are chosen to recover the nodal DG scheme on Gauss-Legendre points and

a standard 4 stage, 4th order RK scheme is used for time integration.

31

CHAPTER 4. NUMERICAL EXPERIMENTS 32

4.1 1D Advection-Diffusion of an approximate

Gaussian

The first test case involves the solution of the 1D advection-diffusion equation with

an initial condition of an approximate Gaussian and periodic boundary conditions.

The domain is chosen as x ∈ [−10, 10] and Neles = 20 elements are used to construct a

uniform mesh with h = 1. An advection speed of a = 10 and a diffusion coefficient of

b = 1 are used in order to test cases with a nondimensional wavespeed of a = 10. All

cases have been verified to be stable using the maximum stable time step estimate

proposed in Eq. (3.9). We choose to use half of this estimate to minimize time

integration errors.

The initial condition and exact solution of an approximate Gaussian can be com-

puted from,

un,p(t) =

Nk∑j=−Nk

θj exp(−bk2j t) cos(kj(xn,p − at)), (4.1)

where θj is the jth spectral weight, kj = 2πj/L is the jth wavenumber associated with

the domain length, L = 20, and Nk is the number of waves used. Nk is chosen to be

the largest value under the constraint kNk ≤ (P + 1)π/h. The spectral weights are

defined as

θj =exp(−(σkj)

2/2)

√2πσ

Nk∑s=−Nk

exp(−(σks)2/2)

, −Nk ≤ j ≤ Nk, (4.2)

where σ is the standard deviation of the Gaussian. The standard deviation dictates

the width of the Gaussian and changes the spectral weight distribution across the

wave spectrum.


The vector `2 norm of the relative error is computed as

‖uδ(t)− u(t)‖`2‖u(t)‖`2

=

Neles∑n=1

P+1∑p=1

∣∣uδn,p(t)− un,p(t)∣∣2Neles∑n=1

P+1∑p=1

∣∣un,p(t)∣∣2

12

, (4.3)

where uδn,p(t) is the numerical solution in the nth element at the pth solution point.

We also compute the L2 functional norm of the relative error for comparison,

‖uδ(x, t)− u(x, t)‖L2

‖u(x, t)‖L2

=

Neles∑n=1

∫Ωn

∣∣uδn(x, t)− un(x, t)∣∣2 dΩ

Neles∑n=1

∫Ωn

∣∣un(x, t)∣∣2 dΩ

12

, (4.4)

where the integral is approximated numerically using Gaussian quadrature with 12

quadrature points in each element.

We begin with the case of a well resolved Gaussian with σ = 8/√

2π and a solution

polynomial of P = 2. Figure 4.1, which plots the initial condition and the spectral

weight distribution of this Gaussian, shows that the initial condition is indeed well

resolved. Figure 4.2 plots the relative error for different interface fluxes. As expected

−10 −5 0 5 10

−0.04

−0.02

0

0.02

0.04

0.06

0.08

0.1

0.12

0.14

0.16

x

u

(a) Initial condition

−3 −2 −1 0 1 2 310

−200

10−150

10−100

10−50

100

k/(P + 1)

µ

(b) Spectral weight distribution

Figure 4.1: High resolution approximate Gaussian, P = 2, σ = 8/√

2π.


from the results in Figure 2.5, the centered-centered scheme produces less error.

0 0.2 0.4 0.6 0.8 110

−5

10−4

10−3

t/T

‖uδ−

u‖2/‖u‖2

Centered-centeredUpwind-centered

Centered-one-sidedUpwind-one-sided

(a) `2 pointwise norm

0 0.2 0.4 0.6 0.8 110

−5

10−4

10−3

t/T

‖uδ−

u‖2/‖u‖2



(b) L2 functional norm

Figure 4.2: Relative error vs. time periods for the RK44, DG scheme of order P = 2using Gauss-Legendre solution points solving the advection-diffusion equation, a =10, on a high resolution approximate Gaussian, σ = 8/

√2π.

For the next case, the standard deviation of the Gaussian is decreased to σ =

1/√

2π. Figure 4.3 plots the initial condition and the spectral weight distribution for

this Gaussian showing that the initial condition is now poorly resolved. Figure 4.4

−10 −5 0 5 10−0.2

0

0.2

0.4

0.6

0.8

1

1.2

x

u


−3 −2 −1 0 1 2 310

−5

10−4

10−3

10−2

10−1

k/(P + 1)

µ

(b) Spectral weight distribution

Figure 4.3: Low resolution approximate Gaussian, P = 2, σ = 1/√

2π.

plots the relative error for different interface fluxes. Once again, as expected from the


results in Figure 2.5, the least amount of error is now exhibited by the center-one-sided

scheme.

0 0.2 0.4 0.6 0.8 110

−3

10−2

10−1

t/T

‖uδ−

u‖2/‖u‖2



(a) `2 pointwise norm

0 0.2 0.4 0.6 0.8 110

−3

10−2

10−1

t/T

‖uδ−

u‖2/‖u‖2



(b) L2 functional norm

Figure 4.4: Relative error vs. time periods for the RK44, DG scheme of order P = 2using Gauss-Legendre solution points solving the advection-diffusion equation, a =10, on a low resolution approximate Gaussian, σ = 1/

√2π.

4.2 2D Advection-Diffusion of an approximate

Gaussian

We now consider the 2D advection-diffusion equation in a periodic box. The domain

is chosen to be square, x1 ∈ [−10, 10], x2 ∈ [−10, 10] and 2000 elements are used to

construct a 2D, nonuniform, rectilinear grid with a minimum element size of h1 = 0.2

and h2 = 0.1 near the center of the domain as shown in Figure 4.5. All cases have been

verified to be stable when using the maximum stable time step estimate proposed in

Eq. (3.12). We choose to use half of this estimate to minimize time integration errors.

The initial condition and exact solution can be computed from,

un,p(t) =

Nk2∑j2=−Nk2

Nk1∑j1=−Nk1

θj1,j2

2∏d=1

exp(−(bk2d,jdt) cos(kd,jd(xd,n,p − adt)), (4.5)


−10 −5 0 5 10−10

−8

−6

−4

−2

0

2

4

6

8

10

x1

x2

Figure 4.5: Nonuniform rectilinear grid with 40×50 elements and a minimum elementsize of h1 = 0.2 and h2 = 0.1.

where θj1,j2 is defined as

θj1,j2 =exp(−σ2(k2

1,j1+ k2

2,j2)/2)

2πσ

Nk2∑s2=−Nk2

Nk1∑s1=−Nk1

exp(−σ2(k21,j1

+ k22,j2

)/2)

, ∀j1, j2. (4.6)

The relative error is computed using Eq. (4.3).

The first case uses a well resolved Gaussian with σ = 8/√

2π and a solution

polynomial of P = 2. Figure 4.6 plots the initial condition and relative error for

different interface fluxes. As in the 1D case, the centered-centered scheme produces

the least amount of error.

The standard deviation of the Gaussian is now decreased to σ = 1/√

2π so that

the initial condition is now under-resolved. Figure 4.7 plots the initial condition and

relative error for different interface fluxes. Since the grid is nonuniform, the Gaussian

travels through a refined region during the first quarter period leading to smaller errors

for the centered-centered and upwind-centered schemes. As the Gaussian travels

through the less refined region near the half period, the centered-one-sided scheme

starts producing the least amount of error resulting in the most accurate solution at

the end of one period.



0 0.2 0.4 0.6 0.8 110

−6

10−5

10−4

t/T

‖uδ−

u‖2/‖u‖2



(b) `2 pointwise norm of relative error

Figure 4.6: Initial condition and relative error for the RK44, DG scheme of orderP = 2 using Gauss-Legendre solution points solving the advection-diffusion equation,a = 10, on a high resolution approximate Gaussian, σ = 8/

√2π.


0 0.2 0.4 0.6 0.8 1

10−4

10−3

t/T

‖uδ−

u‖2/‖u‖2



(b) `2 pointwise norm of relative error

Figure 4.7: Initial condition and relative error for the RK44, DG scheme of orderP = 2 using Gauss-Legendre solution points solving the advection-diffusion equation,a = 10, on a low resolution approximate Gaussian, σ = 1/

√2π.

4.3 2D Couette Flow

The last numerical test involves the solution of air passing between two parallel plates,

separated by a distance H. The lower plate is kept stationary at a fixed temperature

Tl = 300K while the upper plate travels with constant velocity Uu and at a fixed


temperature Tu = 315K. Uu is computed from a set Mach number of Mau = 0.2.

The Reynolds number of this flow is set to Reu = 20 and the dynamic viscosity, µ, is

held constant. The ratio of specific heats, γ, and the Prandtl number, Pr, are set to

1.4 and 0.72, respectively. The domain is rectangular, x1 ∈ [−1, 1], x2 ∈ [0, 1], with

periodic boundary conditions on the left and right boundaries and fixed and moving

isothermal boundary conditions on the bottom and top boundaries, respectively. The

flow field is initialized with the velocity of the top plate and RK44 is used to march

the simulation to a steady state solution. The exact solution for the velocity profile

is

u(x) =Uux2

H, v(x) = 0. (4.7)

A series of Cartesian and unstructured quadrilateral meshes are used to solve

the Navier-Stokes equations. For example, Figure 4.8 shows a (32 × 16) Cartesian

mesh and a unstructured quadrilateral mesh with (639) elements along with the

numerical Mach contours. For any given mesh, the nondimensional wavespeed, a,

0.1

Mach

0

0.2

(a) Cartesian mesh, (32× 16) elements

0.1

Mach

0

0.2

(b) Unstructured mesh, (639) elements

Figure 4.8: Couette flow Mach contours, DG upwind-one-sided scheme using Gauss-Legendre solution points, P = 3.

can be approximated by using Eq. (3.23) based on the properties of the flow attached

to the upper plate,

a ≈ λuh

νu=

(Uu + cu)h

max(µρu, γµ

Prρu

) . (4.8)


For air, max(µρu, γµ

Prρu

)= γµ

Prρu. a can now be reduced by using the Reynolds number,

Reu =ρuUuH

µ(4.9)

a =Peu(Mau + 1)

γMauHh, (4.10)

where Peu = ReuPr = 14.4 is the Peclet number. For a given mesh, the element size

h is computed as h =√

2Neles

which recovers the exact element size for a Cartesian

mesh. a is then computed for each mesh and shown in Table 4.1.

Neles

(4× 2) (8× 4) (16× 8) (32× 16) (159) (639)

h 0.5 0.25 0.125 0.0625 0.07931 0.03956a 30.86 15.43 7.714 3.857 4.894 2.441

Table 4.1: Element size ‘h’ and nondimensional parameter ‘a’ for the meshes employedfor Couette flow.

Convergence of the L2 velocity error with grid spacing h is shown in Figure 4.9

for P = 2 and Figure 4.10 for P = 3. The rates of convergence for both Cartesian

10−1

10−7

10−6

10−5

h =√

2Neles

‖eV‖2

Upwind-centeredUpwind-one-sidedOrder 3

(a) Cartesian meshes

10−1

10−8

10−7

10−6

h =√

2Neles

‖eV‖2


(b) Unstructured quadrilateral meshes

Figure 4.9: 3rd order results for Couette flow, RK44, DG scheme using Gauss-Legendre solution points, P = 2.


10−1

10−10

10−9

10−8

10−7

10−6

h =√

2Neles

‖eV‖2


(a) Cartesian meshes

10−1

10−10

10−9

h =√

2Neles

‖eV‖2


(b) Unstructured quadrilateral meshes

Figure 4.10: 4th order results for Couette flow, RK44, DG scheme using Gauss-Legendre solution points, P = 3.

and unstructured meshes match well with the expected orders of around P + 1 for

the upwind-one-sided scheme [11].

The L2 velocity errors and orders of accuracy are given in Table 4.2 for the Carte-

sian meshes and Table 4.3 for unstructured meshes. For P = 2, Cartesian meshes, the

Upwind-centered Upwind-one-sided

P Neles L2 Error Order ∆tmax

∆tpredL2 Error Order ∆tmax

∆tpred

2 (4× 2) 4.415e-05 - 1.3 4.358e-05 - 1.1(8× 4) 5.256e-06 3.07 1.4 5.427e-06 3.01 1.0(16× 8) 5.934e-07 3.15 1.5 6.430e-07 3.08 1.0(32× 16) 6.546e-08 3.18 1.3 7.741e-08 3.05 1.0

3 (4× 2) 4.584e-07 - 1.3 4.380e-07 - 1.1(8× 4) 3.089e-08 3.89 1.4 2.706e-08 4.02 1.0(16× 8) 1.995e-09 3.95 1.2 1.622e-09 4.06 1.0(32× 16) 1.274e-10 3.97 1.1 1.004e-10 4.01 1.0

Table 4.2: Couette flow results for the RK44, DG scheme using Gauss-Legendresolution points on Cartesian quadrilateral meshes.

upwind-one-sided scheme produces less error for the less refined case while the upwind-

centered scheme produces less error when the mesh is more refined. For P = 3, the


Upwind-centered Upwind-one-sided

P Neles L2 Error Order ∆tmax

∆tpredL2 Error Order ∆tmax

∆tpred

2 (159) 5.214e-07 - 1.5 5.432e-07 - 0.9(639) 5.249e-08 3.31 1.4 6.003e-08 3.18 0.9

3 (159) 1.777e-09 - 1.3 1.529e-09 - 0.8(639) 9.192e-11 4.27 1.3 7.922e-11 4.27 0.8

Table 4.3: Couette flow results for the RK44, DG scheme using Gauss-Legendresolution points on unstructured quadrilateral meshes.

upwind-one-sided scheme produces less error for all meshes which matches the results

from Table 2.2. For unstructured meshes, the upwind-centered scheme produces less

error for P = 2 and the upwind-one-sided scheme produces less error for P = 3. This

matches well with the results from the Cartesian meshes since the meshes are more

refined.

The ratios of maximum allowable time step over the estimated time step computed

using Eq. (3.16), ∆tmax

∆tpred, are shown in Table 4.2 for the Cartesian meshes. The results

show that the time step estimate remains conservative for both schemes and matches

almost exactly for the upwind-one-sided scheme as was expected from Figure 3.3. It’s

also important to note that in the case of the upwind-centered scheme, the time step

estimate loses accuracy near a ≈ 10, as expected from Figure 3.3.

The results from the unstructured meshes in Table 4.3 show that the time step

estimate is conservative for the upwind-centered scheme but not for the upwind-one-

sided scheme, however, the time step estimate is still fairly accurate for both cases.

Part II

Implicit Time Stepping on

GPU Architectures

42

Chapter 5

Introduction

The most popular implicit time integration strategies for high-order, unsteady, fluid

flow simulations fall into three categories: backward differentiation formulas (BDF)

[64], diagonally implicit Runge-Kutta (DIRK) methods [71] and fully implicit Runge-

Kutta (IRK) methods [68]. Each have their own advantages and disadvantages but

all require the solution of an implicit function. Pseudo time stepping [49] and New-

ton’s method are often used to solve the nonlinear implicit function. These methods

can be enhanced by using h-multigrid [85], p-multigrid [28], hp-multigrid [63] or a

Newton-Krylov method with preconditioning [75, 70]. Each technique requires the

construction of a Jacobian matrix to solve the resulting linear system. Thus, storing

and computing the Jacobian matrix is a vital portion of any implicit method.

Storing and computing the global Jacobian matrix can be prohibitively expensive

for high-order methods. There have been several methods proposed to attempt to

reduce the size and time complexity. For example, in [83, 56, 24, 25], lower-upper

symmetric Gauss-Seidel (LU-SGS) with first order approximate element local Jaco-

bians are used to avoid the construction of off-diagonal block Jacobians. In [28],

an element line Jacobi smoother is proposed which only stores the Jacobians for a

line of elements. In [72], a line-based DG method is suggested which significantly

reduces the sparsity pattern of the global Jacobian matrix. More recently, Kronecker

products were used in [26, 67, 69] to reduce the size and complexity of element local

Jacobians. Typically, implicit time stepping on GPU architectures has been a more

43


difficult problem to address because of the high memory requirements and the low

arithmetic intensity but, in [36], an implicit time stepping method was developed for

two-dimensional triangular grids on a single GPU.

In Part II of this dissertation, analytical element local Jacobians for the Direct

Flux Reconstruction (DFR) method applied to advection-diffusion, Euler and the

Navier-Stokes equations are derived. A Kronecker product formulation for these

Jacobians is developed to reduce the time complexity and a detailed discussion on the

sparsity pattern of these matrices is given. An efficient implicit time stepping method

for DFR on multi-GPU architectures is also designed, implemented and tested.

The second part of this dissertation is formatted as follows. In chapter 6, we intro-

duce a brief description of the DFR method in multiple dimensions for the advection-

diffusion, Euler and Navier-Stokes equations. In chapter 7, we describe the time

stepping methods used to solve steady and unsteady fluid flow simulations and the

block iterative methods used for the linear solver. In chapter 8, we analytically de-

rive the element local Jacobian and its Kronecker product formulation for the DFR

method in multiple dimensions for the advection-diffusion, Euler and Navier-Stokes

equations. In chapter 9, we provide numerical results for the inviscid bump, inviscid

NACA0012 airfoil, isentropic vortex, laminar Joukowski airfoil and three-dimensional

half cylinder with a Reynolds number of 1000. Lastly, in chapters 10 and 11, we

describe the implementation and performance of the GPU accelerated implicit time

stepping method for DFR.

Part II of this dissertation is based on the following publications:

• Jerry Watkins, Joshua Romero, and Antony Jameson. Multi-GPU, Implicit

Time Stepping for High-order Methods on Unstructured Grids. 46th AIAA

Fluid Dynamics Conference, 2016. [94]

• Jerry Watkins, Freddie Witherden, and Antony Jameson. A Kronecker prod-

uct formulation for the high-order, direct Flux Reconstruction method. (in

progress).

All results are generated using an in-house, high-order, compressible flow solver for

GPU architectures called ZEFR.

Chapter 6

The Direct Flux Reconstruction

Method

The direct Flux Reconstruction (DFR) method was first proposed by Romero et al.

in [77] as a simplified and efficient alternative to the FR method. In the DFR method,

a Lagrange interpolation over all interior and interface points is used to perform the

flux correction as opposed to a correction function. This scheme has been proven

capable of recovering the FR form of the nodal DG method when Gauss-Legendre

points are used in the interior [77] and the method has also been extended to triangle

elements [78]. This work focuses on unstructured tensor-product elements such as

quadrilateral and hexahedral elements.

This chapter is split into four sections. In section 6.1, a description of the DFR

method for one-dimensional advection-diffusion is given. This is then is extended to

two- and three-dimensional unstructured quadrilateral and hexahedral elements in

sections 6.2 and 6.3, respectively. Section 6.4 describes the DFR method applied to

the Euler and Navier-Stokes equations.

45

CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 46

6.1 One-Dimensional Formulation for Advection-

Diffusion

We begin by describing a one-dimensional advection-diffusion problem with discrete

boundaries. The DFR method is then applied to this problem using the techniques

described in [77, 78].

6.1.1 Problem Specification

Consider the conservative form of the one-dimensional advection-diffusion equation

defined on a bounded domain and subject to appropriate initial conditions and bound-

ary conditions,∂u

∂t+∂f

∂x= 0, x ∈ Ω = [0, 1], t > 0, (6.1)

where x is the spatial coordinate, t is time, u = u(x, t) is a conserved scalar quantity

and f = fadv(u)−fdiff

(u, ∂u

∂x

)is the difference between the advective flux and diffusive

flux. Note that the equation can be rewritten as a system of first-order equations,

∂u

∂t+∂f (u, q)

∂x= 0, x ∈ Ω = [0, 1], t > 0,

q − ∂u

∂x= 0. (6.2)

Following a traditional nodal finite element method [40], the domain is partitioned

into Neles non-overlapping elements,

Ω =

Neles⋃ele=1

Ωele, (6.3)

where Ωele = [xele, xele+1). With the domain partitioned, the exact solution u, the

exact solution derivative q and the exact flux f(u) can be approximated by the nu-

merical solution, the numerical solution derivative and the numerical flux,

uδ =

Neles∑ele=1

uδele, qδ =

Neles∑ele=1

qδele, f δ =

Neles∑ele=1

f δele. (6.4)


A linear isoparametric mapping is introduced from the physical domain x ∈ Ωele to

the parent domain ξ ∈ ΩS = [−1, 1) such that

ξ(x|Ωele) = 2

(x− xele

xele+1 − xele

)− 1,

x|Ωele(ξ) =

(1− ξ

2

)xele +

(1 + ξ

2

)xele+1. (6.5)

Applying this transformation gives rise to a transformed system of equations within

the standard element ΩS of the following form,

∂uδele

∂t+∂f δele

∂ξ= 0,

qδele −∂uδele

∂ξ= 0, (6.6)

where

uδele = |Jele| uδele(x|Ωele(ξ), t), qδele = |Jele| qδele(x|Ωele

(ξ), t), f δele = f δele(x|Ωele(ξ), t),

and |Jele| = 12(xele+1 − xele) is the determinant of the geometric element Jacobian

matrix of the coordinate transformation. In what follows, the superscript δ to dif-

ferentiate the numerical solution, solution derivative and flux from their exact values

will be dropped to improve readability.

6.1.2 Direct Flux Reconstruction

Consider rearranging Eq. (6.6) and producing the semi-discrete form of the one-

dimensional advection-diffusion equation as a system of equations,

∂uele

∂t= − 1

|Jele|δfele

δξ, (6.7)

qele =1

|Jele|δuele

δξ, (6.8)


where δfele

δξis the numerical derivative of fele and δuele

δξis the numerical derivative of

uele. In the DFR method, globally C0 continuous piecewise polynomials uC and fC

are constructed in order compute the numerical derivative of uele and fele in each

element.

The first step in this process is to further discretize each element by Nspts1D = P+1

distinct solution points so that the discontinuous solution in each element uele can be

represented by a piecewise interpolating polynomial of degree P ,

uele(ξ) =

Nspts1D∑spt=1

uspt,ele `spt(ξ), (6.9)

where `1(ξ), `2(ξ), . . . , `Nspts1D(ξ) are the Lagrange basis polynomials defined at the

solution points ξ1, ξ2, . . . , ξNspts1D. This can be written in vector format as,

uele(ξ) = `(ξ)′uele, (6.10)

where `(ξ)′ = [`1(ξ), `2(ξ), . . . , `Nspts1D(ξ)] and uele = [u1,ele, u2,ele, . . . , uNspts1D,ele]

′. To

recover the nodal DG method, the solution points are chosen to be collocated with the

zeros of the Legendre polynomial of degree P + 1, also known as the Gauss-Legendre

points [77].

The next step is to extrapolate the discontinuous solution to the element interfaces

using Eq (6.10). The extrapolated values in each element are written as,

uLele = uele(−1) = EL uele, uRele = uele(+1) = ER uele, (6.11)

where uLele and uRele are the extrapolated discontinuous solution values on the left

and right boundaries of the eleth element and EL,ER ∈ R(1×Nspts1D) are polynomial

extrapolation operators where

EL = `(−1)′, ER = `(+1)′, (6.12)


The common interface solutions are computed by using the extrapolated discon-

tinuous solution on both sides of each interface,

uI,Lele = uI(uRele-1, u

Lele

), uI,Rele = uI

(uRele, u

Lele+1

), (6.13)

where uI(u−, u+) is the interface solution function, u− and u+ are the left and right

states of the solution on each side of the interface and uI,Lele and uI,Rele are the common

interface solutions on the left and right boundaries of the eleth element. The interface

solution function is defined by the scheme used to formulate second-order numerical

derivatives. In this case, the Local Discontinuous Galerkin (LDG) approach is chosen,

uI(u−, u+) =1

2(u− + u+)− β (u− − u+), (6.14)

where β = ±0.5 corresponds to one-sided common interface solutions which recover

LDG [22] and β = 0 corresponds to centered common interface solutions which recover

BR1 [14]. The common interface solutions at boundaries are computed directly using

appropriate boundary conditions for the problem,

uI,L1 = ub(uL1), uI,RNeles

= ub(uRNeles

), (6.15)

where ub(u) is the function which applies the boundary condition and u is the solution

extrapolated to the boundary. Note that ub(u) may be different depending on the

boundary condition.

The next step is to construct a continuous solution uCele such that a piecewise

sum results in a globally C0 continuous solution uC that passes through the common

interface solution values at element interfaces. This is accomplished by utilizing

Lagrange interpolating polynomials of degree P + 2,

uCele(ξ) = uI,Lele˜0(ξ) +

Nspts1D∑spt=1

uspt,ele˜spt(ξ) + uI,Rele

˜P+2(ξ), (6.16)

where ˜0(ξ), ˜1(ξ), . . . , ˜

P+2(ξ) are the Lagrange interpolating polynomials of degree

P + 2 defined at P + 3 collocation points −1, ξ1, ξ2, . . . , ξNspts1D, 1. The derivative


of uCele can now be obtained by differentiating Eq. (6.16) with respect to ξ,

∂uCele

∂ξ(ξ) = uI,Lele

∂ ˜0

∂ξ(ξ) +

Nspts1D∑spt=1

uspt,ele∂ ˜

spt

∂ξ(ξ) + uI,Rele

∂ ˜P+2

∂ξ(ξ). (6.17)

This can be rewritten in vector format as,

∂uCele

∂ξ(ξ) = uI,Lele

∂ ˜0

∂ξ(ξ) +

∂ ˜

∂ξ(ξ)′uele + uI,Rele

∂ ˜P+2

∂ξ(ξ), (6.18)

where ˜(ξ)′ = [˜1(ξ), ˜2(ξ), . . . , ˜

Nspts1D(ξ)]. This equation can then be evaluated at

each solution point to obtain the numerical derivative of uele,

δuspt,ele

δξ= uI,Lele

∂ ˜0

∂ξ(ξspt) +

∂ ˜

∂ξ(ξspt)

′uele + uI,Rele

∂ ˜P+2

∂ξ(ξspt), (6.19)

where spt = 1, 2, . . . , Nspts1D. This can be manipulated into a matrix-vector format

so that,δuele

δξ= DL

ξ uI,Lele +Dξ uele +DR

ξ uI,Rele , (6.20)

where Dξ ∈ R(Nspts1D×Nspts1D) and DLξ ,D

Rξ ∈ R(Nspts1D×1) are polynomial differentia-

tion operators such that

Dξp,m =∂ ˜

m

∂ξ(ξp), p,m = 1, 2, . . . , Nspts1D,

DLξp =

∂ ˜0

∂ξ(ξp), p = 1, 2, . . . , Nspts1D,

DRξp =

∂ ˜P+2

∂ξ(ξp), p = 1, 2, . . . , Nspts1D. (6.21)

Coupling this with Eq. (6.8) concludes the evaluation of the numerical solution deriva-

tive in vector format,

qele =1

|Jele|δuele

δξ. (6.22)

The numerical derivative of fele can be evaluated in much the same way as the


numerical derivative of uele. The first step in this process is to extrapolate the dis-

continuous numerical solution derivative to element interfaces,

qLele = qele(−1) = EL qele, qRele = qele(+1) = ER qele. (6.23)

Once the discontinuous numerical solution derivative is extrapolated, the trans-

formed common interface fluxes can be computed,

f I,Lele = f I(uRele-1, u

Lele, q

Rele-1, q

Lele

),

f I,Rele = f I(uRele, u

Lele+1, q

Rele, q

Lele+1

), (6.24)

where f I(u−, u+, q−, q+) = f Iadv(u−, u+) − f Idiff(u−, u+, q−, q+) is the interface flux

function. The Rusanov flux is commonly used as the interface advective flux function

so that

f Iadv(u−, u+) =1

2(fadv(u+) + fadv(u−))− 1

2|λ(u−, u+)|(u+ − u−), (6.25)

where

|λ(u−, u+)| = max

(∣∣∣∣∂fadv

∂u(u+)

∣∣∣∣ , ∣∣∣∣∂fadv

∂u(u−)

∣∣∣∣) , (6.26)

and ∂fadv

∂u(u) is the wavespeed or the derivative of the advective flux with respect

to the solution. Continuing with the LDG formulation, the interface diffusive flux

function can be defined as

f Idiff(u−, u+, q−, q+) =1

2(fdiff(u−, q−) + fdiff(u+, q+))+

β (fdiff(u−, q−)− fdiff(u+, q+)) + τ (u− − u+). (6.27)

This leaves the transformed common interface fluxes at boundaries which can be

computed as

f I,L1 = f b(uL1 , q

L1

), f I,RNeles

= f b(uRNeles

, qRNeles

), (6.28)

where f b(u, q) is the function which applies the boundary condition and u and q are

the solution and solution derivative extrapolated to the boundary.


The transformed continuous flux fCele can be formed and differentiated in much

the same way as the continuous solution so that the numerical derivative of fele can

computed in matrix-vector format,

δfele

δξ= DL

ξ fI,Lele +Dξ fele +DR

ξ f I,Rele , (6.29)

where fele = f (uele, qele) = fadv (uele)−fdiff (uele, qele) for all elements. This is coupled

with Eq. (6.7) to obtain the semi-discrete equation in vector format,

∂uele

∂t= − 1

|Jele|δfele

δξ. (6.30)

6.2 Two-Dimensional Formulation for Advection-

Diffusion

The DFR method, along with other schemes within the Flux Reconstruction (FR)

family, can be directly extended to quadrilateral elements using a tensor-product

formulation [41, 42]. While the cited references describe the methodology in the

context of the standard FR method, the same procedure can be applied to the DFR

method by simply replacing the FR correction procedure using correction polynomials

with the Lagrange interpolation described above. A summary of the procedure is

described below for the two-dimension advection-diffusion equation.


Consider the conservative form of the two-dimensional advection-diffusion equation

defined on a bounded domain and subject to appropriate initial condition and bound-

ary conditions,∂u

∂t+∇ · f = 0, (x, y) ∈ Ω, t > 0, (6.31)

where Ω is an arbitrary domain closed by an arbitrary boundary ∂Ω, x and y are

spatial coordinates, t is time, u = u(x, y, t) is a conserved scalar quantity and

f = (fx, fy) = (fx,adv(u)− fx,diff (u,∇u) , fy,adv(u)− fy,diff (u,∇u)) is the difference


between the advective flux and diffusive fluxes in the x and y directions. This can be

rewritten as a system of first-order equations,

∂u

∂t+∇ · f(u, q) = 0, (x, y) ∈ Ω, t > 0,

q −∇u = 0, (6.32)

where q = (qx, qy) is the solution gradient.

The domain is partitioned into Neles non-overlapping, conforming quadrilateral

elements,

Ω =

Neles⋃ele=1

Ωele. (6.33)

Each quadrilateral element in the physical domain (x, y) is mapped to a reference

element, ΩS = (ξ, η)|− 1 ≤ ξ, η < 1, in the transformed parent space (ξ, η) so that,(x

y

)= Γele(ξ, η) =

Nnpts∑npt=1

Mnpts(ξ, η)

(xnpt,ele

ynpt,ele

), (6.34)

where Mnpts(ξ, η) are the element shape functions and Nnpts is the number of points

used to define the physical space element.

Applying this transformation gives rise to a transformed system of equations of

the following form,

∂uδele

∂t+ ∇ · f δele = 0, (6.35)

qδele − ∇uδele = 0, (6.36)

where ∇ =(∂∂ξ, ∂∂η

), qδele =

(qδξele

, qδηele

), f δele =

(f δξele

, f δηele

)and

uδele = |Jele| uδele(Γele(ξ, η), t),

qδele = J ′ele qδele(Γele(ξ, η), t)

f δele = |Jele| J−1ele f

δele(Γele(ξ, η), t).


The Jacobian matrix of the element shape functions, Jele, its transpose, J ′ele, inverse,

J−1ele , and determinant, |Jele|, come directly from Eq. (6.34). In what follows, the

superscript δ to differentiate the numerical solution, solution gradient and fluxes

from their exact values will be dropped to improve readability.


Consider rearranging Eq. (6.35) and producing the semi-discrete form of the two-

dimensional advection-diffusion equation as a system of equations,

∂uele

∂t= − 1

|Jele|∇δ · fele, (6.37)

qele = J−1′

ele ∇δuele, (6.38)

where ∇δ =(δδξ, δδη

)and J−1′

ele is the transpose of the inverse Jacobian matrix. The

DFR method for 2D quadrilateral elements is a direct extension of the 1D method

where numerical derivatives are computed along each set of collinear points. The first

step is to further discretize each quadrilateral element by Nspts2D = (P + 1)2 distinct

solution points generated through a tensor product of a set of 1D solution points.

Each solution point is defined by the Gauss-Legendre sets ξ1, ξ2, . . . , ξNspts1D and

η1, η2, . . . , ηNspts1D where the points in the former set are chosen as the most rapidly

changing index. The discontinuous solution in each element uele can be represented

by a product of piecewise interpolating polynomials of degree P ,

uele(ξ, η) =

Nspts2D∑spt=1

uspt,eleφspt(ξ, η), (6.39)

where φ1(ξ, η), φ2(ξ, η), . . . , φNspts2D(ξ, η) are the shape functions constructed from

1D Lagrange basis polynomials defined at the location of the sptth solution point.

This can be written in vector format as,

uele(ξ, η) = φ(ξ, η)′uele, (6.40)


where uele = [u1,ele, u2,ele, . . . , uNspts2D,ele]′ and the vector φ(ξ, η)′ ∈ R(1×Nspts2D) is de-

fined by the Kronecker product between the vectors `(ξ)′ = [`1(ξ), `2(ξ), . . . , `Nspts1D(ξ)]

and `(η)′ = [`1(η), `2(η), . . . , `Nspts1D(η)],

φ(ξ, η)′ = `(η)′ ⊗ `(ξ)′. (6.41)

The next step is to extrapolate the discontinuous solution to Nspts1D = P + 1

distinct flux points on each face of the Nfaces2D = 4 faces of a quadrilateral element

for a total of Nfpts2D = Nfaces2D Nspts1D flux points per quadrilateral. Using Eq. (6.40),

the extrapolated values in each element and face are written as,

ufaceele = Eface uele, face ∈ L,R,B,T , (6.42)

where uLele,uRele,u

Bele,u

Tele ∈ R(Nspts1D×1) are the extrapolated discontinuous solution

vectors on the left, right, bottom and top boundaries of the eleth element, respec-

tively, as shown in Figure 6.1. EL,ER,EB,ET ∈ R(Nspts1D×Nspts2D) are polynomial

extrapolation operators such that

ELp,m = φm(−1, ηp), ER

p,m = φm(+1, ηp),

EBp,m = φm(ξp,−1), ET

p,m = φm(ξp,+1), (6.43)

where p = 1, 2, . . . , Nspts1D and m = 1, 2, . . . , Nspts2D.

The extrapolated solution can now be used to compute the common interface

solutions,

uI,faceele = uI

(ufaceN

eleN ,ufaceele

), face ∈ L,B,

uI,faceele = uI

(uface

ele ,ufaceNeleN

), face ∈ R,T , (6.44)

where “eleN” and “faceN” refers to the neighboring element and face, respectively,

and the neighboring vector ufaceNeleN is permuted appropriately to align with uface

ele . The


r

r

r

r b

−

B

T

RL

br rb

r

r

b

+ − +

−

+

+

−

Figure 6.1: A visual representation of a quadrilateral element in parent space for apolynomial order of P = 1. The solution points are marked by blue circles, the fluxpoints are marked by red squares and left, right, bottom and top faces are representedby L,R,B,T , respectively. Left and right states in an interface flux are representedby − and +.

LDG method [35] is used for the interface solution function,

uI(u−, u+) =1

2(u− + u+)− β · (u−n− + u+n+), (6.45)

where n− and n+ are unit normals, β = ±0.5n− recovers LDG [22] and β = 0

recovers BR1 [14]. The common interface solutions at boundaries are computed using

the same technique as in Eq. (6.15),

uI,∂Ωele = ub

(u∂Ω

ele

), (6.46)

where u∂Ωele = E∂Ω uele and ∂Ω represents the appropriate face for the boundary.

Following the 1D formulation in section 6.1.2 along each set of collinear points in

the quadrilateral, the continuous solution can now be constructed and differentiated


by using the common interface solution values at the element interfaces,

∂uCele

∂ξ(ξ, η) =

Nspts2D∑spt=1

uspt,ele∂ψspt

∂ξ(ξ, η) +

∑face∈L,R

Nspts1D∑fpt=1

uI,facefpt,ele

∂ψfacefpt

∂ξ(ξ, η),

∂uCele

∂η(ξ, η) =

Nspts2D∑spt=1

uspt,ele∂ψspt

∂η(ξ, η) +

∑face∈B,T

Nspts1D∑fpt=1

uI,facefpt,ele

∂ψfacefpt

∂η(ξ, η), (6.47)

where new sets of shape functions, ψ(ξ, η), have been constructed from sets of 1D

Lagrange interpolating polynomials of P + 2 defined at P + 3 collocation points. The

gradient can be written in vector format as

∂uCele

∂ξ(ξ, η) =

∂ψ

∂ξ(ξ, η)′ uele +

∑face∈L,R

∂ψface

∂ξ(ξ, η)′ uI,face

ele ,

∂uCele

∂η(ξ, η) =

∂ψ

∂η(ξ, η)′ uele +

∑face∈B,T

∂ψface

∂η(ξ, η)′ uI,face

ele . (6.48)

The polynomials ˜0(ξ), ˜

0(η), ˜P+2(ξ), ˜

P+2(η) are used along with the vectors `(ξ)′,

`(η)′, ˜(ξ)′ = [˜1(ξ), ˜2(ξ), . . . , ˜

Nspts1D(ξ)] and ˜(η)′ = [˜1(η), ˜

2(η), . . . , ˜Nspts1D

(η)], to

construct the shape functions,

∂ψ

∂ξ(ξ, η)′ = `(η)′ ⊗ ∂ ˜

∂ξ(ξ)′,

∂ψ

∂η(ξ, η)′ =

∂ ˜

∂η(η)′ ⊗ `(ξ)′, (6.49)

and

∂ψLm∂ξ

(ξ, η) = `m(η)∂ ˜

0

∂ξ(ξ),

∂ψRm∂ξ

(ξ, η) = `m(η)∂ ˜

P+2

∂ξ(ξ),

∂ψBm∂η

(ξ, η) =∂ ˜

0

∂η(η) `m(ξ),

∂ψTm∂η

(ξ, η) =∂ ˜

P+2

∂η(η) `m(ξ), (6.50)

for m = 1, 2, . . . , Nspts1D. The numerical solution gradient in reference space can then

be computed by evaluating the equation at each solution point. The final result is


shown in matrix-vector format,

δuele

δξ= DL


ξ uI,Rele ,

δuele

δη= DB

η uI,Bele +Dη uele +DT

η uI,Tele , (6.51)

where the polynomial differentiation operators Dξ, Dη ∈ R(Nspts2D×Nspts2D) and DLξ ,

DRξ , DB

η , DTη ∈ R(Nspts2D×Nspts1D) are defined as

Dξp,m =∂ψm∂ξ

(θp), Dηp,m =∂ψm∂η

(θp), (6.52)

for p,m = 1, 2, . . . , Nspts2D,

Dfaceξp,m =

∂ψfacem

∂ξ(θp), face ∈ L,R,

Dfaceηp,m =

∂ψfacem

∂η(θp), face ∈ B, T, (6.53)

for p = 1, 2, . . . , Nspts2D and m = 1, 2, . . . , Nspts1D and the set θ1, θ2, . . . , θNspts2D is

used to represent the set of solution points (ξ1, η1), (ξ2, η1), . . . , (ξNspts1D, ηNspts1D

).Eq. (6.51) can now be combined with Eq. (6.38) to obtain the numerical solution

gradient in matrix-vector format,

qxele= J−1′

(x,ξ)ele

δuele

δξ+ J−1′

(x,η)ele

δuele

δη,

qyele= J−1′

(y,η)ele

δuele

δξ+ J−1′

(y,η)ele

δuele

δη, (6.54)

where J−1′

(x,ξ)ele, J−1′

(x,η)ele, J−1′

(y,ξ)ele, J−1′

(y,η)ele∈ R(Nspts2D×Nspts2D) are the components of

J−1′

ele represented as diagonal matrices. This operation can be summarized as the

block matrix-vector multiplication,

qele = J−1′

ele ∇δuele. (6.55)


The numerical gradient of the flux is evaluated by first extrapolating the discon-

tinuous numerical solution gradient to element interfaces,

qfacedele

= Eface qdele, face ∈ L,R,B,T , d ∈ x,y, (6.56)

where each numerical solution derivative, qele = (qxele, qyele

) is extrapolated indepen-

dently. The transformed common interface fluxes that are normal to the element

faces are then computed by using the extrapolated discontinuous solution and solu-

tion gradient on both sides of each interface as the left and right states in a common

interface function. The transformed common interface fluxes are written as

f I,facedele

= Afaceele f I

(ufaceN

eleN ,ufaceele , q

faceNeleN , qface

ele

), d, face ∈ ξ,L, η,B,

f I,facedele

= Afaceele f I

(uface

ele ,ufaceNeleN , qface

ele , qfaceNeleN

), d, face ∈ ξ,R, η,T ,

(6.57)

where ALele,A

Rele,A

Bele,A

Tele ∈ R(Nspts1D×Nspts1D) are diagonal matrices that transform

the common interface fluxes. The interface transformation matrices are defined as

Afacep,p,ele =

∣∣J facep,ele

∣∣ ∣∣∣(J facep,ele)

−1′nface∣∣∣ , face ∈ L,R,B, T, (6.58)

where JLp,ele, JRp,ele, J

Bp,ele, J

Tp,ele are the geometric element Jacobian matrices evaluated

at the pth flux point, nL, nR, nB, nT are the unit normals in parent space and p =

1, 2, . . . , Nspts1D. The Rusanov flux used for the common interface function from

Eq. (6.25) now becomes

f Iadv(u−, u+) =1

2(fnadv(u+) + fnadv(u−))− 1

2|λ(u−, u+)|(u+ − u−), (6.59)

where fnadv(u) = fadv(u) · n− is the advective flux normal to the face and

|λ(u−, u+)| = max

(∣∣∣∣∂fnadv

∂u(u+)

∣∣∣∣ , ∣∣∣∣∂fnadv

∂u(u−)

∣∣∣∣) , (6.60)

where∂fnadv

∂u(u) is the wavespeed of the normal advective flux or the derivative of the


normal advective flux with respect to the solution. For the interface diffusive flux

function, the LDG flux can be reduced to the following form

f Idiff(u−, u+, q−, q+) =1

2(fndiff(u−, q−) + fndiff(u+, q+))+

β · n− (fdiff(u−, q−) · n− + fdiff(u+, q+) · n+)+

τ (u− − u+), (6.61)

where fndiff(u, q) = fdiff(u, q) · n− is the diffusive flux normal to the face. Lastly, the

transformed common interface fluxes at boundaries are computed as,

f I,∂Ωele = A∂Ω

ele f b(u∂Ω

ele , q∂Ωele

), (6.62)

where q∂Ωele = E∂Ω qele for each dimension and ∂Ω represents the appropriate face for

the boundary.

Using each set of collinear points in the quadrilateral, the transformed continu-

ous fluxes are constructed in each element and differentiated to form the numerical

derivatives in matrix-vector format,

δfξele

δξ= DL

ξ fI,Lξele

+Dξ fξele+DR

ξ fI,Rξele

,

δfηele

δη= DB

η fI,Bηele

+Dη fηele+DT

η fI,Tηele

, (6.63)

where

fξele= J−1

(ξ,x)elefx (uele, qele) + J−1

(ξ,y)elefy (uele, qele) ,

fηele= J−1

(η,x)elefx (uele, qele) + J−1

(η,y)elefy (uele, qele) , (6.64)

and J−1(ξ,x)ele

, J−1(ξ,y)ele

, J−1(η,x)ele

, J−1(η,y)ele

∈ R(Nspts2D×Nspts2D) are the components of

|Jele| J−1ele represented as diagonal matrices. This can be expressed more compactly in

block matrix-vector form as

fele = J−1ele f (uele, qele) . (6.65)


Eq. (6.37) can now be written in a matrix-vector format as

∂uele

∂t= −V −1

ele ∇δ · fele, (6.66)

where V −1ele ∈ R(Nspts2D×Nspts2D) is a diagonal matrix and is defined as,

V −1p,p,ele =

1

|Jp,ele|, p = 1, 2, . . . , Nspts2D. (6.67)

6.3 Three-Dimensional Formulation for Advection-

Diffusion

Much like the two-dimensional formulation, the three-dimensional formulation of the

DFR method on hexahedral elements for advection-diffusion is a direct extension of

the one-dimensional formulation. This section serves to summarize these steps for

clarity.


Following directly from the definitions provided in the problem specification in section

6.2.1, the domain is extended to include the z-dimension (x, y, z) ∈ Ω so that u =

u(x, y, z, t), q = (qx, qy, qz), and f = (fx, fy, fz). Each hexahedral element in the

partitioned physical domain (x, y, z) is then mapped to a reference element in the

transformed parent space (ξ, η, ζ) so that,x

y

z

= Γele(ξ, η, ζ) =

Nnpts∑npt=1

Mnpts(ξ, η, ζ)

xnpt,ele

ynpt,ele

znpt,ele

, (6.68)

where Mnpts(ξ, η, ζ) are the element shape functions and Nnpts is the number of points

used to define the physical space element. The system of equations resulting from

applying this transformation to the three-dimensional advection-diffusion equation


becomes

∂uδele

∂t+ ∇ · f δele = 0, (6.69)

qδele − ∇uδele = 0, (6.70)

where ∇ =(∂∂ξ, ∂∂η, ∂∂ζ

), qele = (qξele

, qηele, qζele

), fele = (fξele, fηele

, fζele) and

uele = |Jele| uele(Γele(ξ, η, ζ), t),

qele = J ′ele qele(Γele(ξ, η, ζ), t)

fele = |Jele| J−1ele fele(Γele(ξ, η, ζ), t).

The Jacobian matrix of the element shape functions, Jele, its transpose, J ′ele, inverse,

J−1ele , and determinant, |Jele|, come directly from Eq. (6.68). In what follows, the

superscript δ to differentiate the numerical solution, solution gradient and fluxes

from their exact values will be dropped to improve readability.


The semi-discrete form of the three-dimensional advection-diffusion equation is formed

by rearranging Eq. (6.69),

∂uele

∂t= − 1

|Jele|∇δ · fele, (6.71)

qele = J−1′

ele ∇δuele, (6.72)

where ∇δ =(δδξ, δδη, δδζ

). Much like the DFR method for quadrilateral elements, the

DFR method for hexahedral elements is a direct extension of the 1D method. Each

element is discretized into Nspts = (P + 1)3 distinct Gauss-Legendre solution points

defined by the sets ξ1, ξ2, . . . , ξNspts1D, η1, η2, . . . , ηNspts1D

and ζ1, ζ2, . . . , ζNspts1D


where ξi is chosen as the most rapidly changing index followed by ηj. The discontin-

uous solution is defined in vector format as

uele(ξ, η, ζ) = φ(ξ, η, ζ)′uele, (6.73)

where the vector φ(ξ, η, ζ)′ ∈ R(1×Nspts) is defined by the Kronecker product between

the vectors `(ξ)′ = [`1(ξ), `2(ξ), . . . , `Nspts1D(ξ)], `(η)′ = [`1(η), `2(η), . . . , `Nspts1D

(η)]

and `(ζ)′ = [`1(ζ), `2(ζ), . . . , `Nspts1D(ζ)],

φ(ξ, η, ζ)′ = `(ζ)′ ⊗ `(η)′ ⊗ `(ξ)′. (6.74)

The discontinuous solution is extrapolated to each face of the Nfaces = 6 faces of

the hexahedral element creating Nspts2D = (P + 1)2 flux points per face or Nfpts =

NfacesNspts2D distinct flux points per hexahedral element. Using Eq. (6.73), the ex-

trapolated values in each element are written as,

ufaceele = Eface uele, face ∈ L,R,F ,Bk,B,T , (6.75)

where the extrapolated discontinuous solution vectors are of size (Nspts2D × 1), the

polynomial extrapolation operators are of size (Nspts2D×Nspts) and L,R,F ,Bk,B,T

correspond to the left, right, front, back, bottom and top boundaries of the element,

respectively. The polynomial extrapolation operators are defined as,

ELp,m = φm(−1, ηi, ζj), ER

p,m = φm(+1, ηi, ζj),

EFp,m = φm(ξi,−1, ζj), EBk

p,m = φm(ξi,+1, ζj),

EBp,m = φm(ξi, ηj,−1), ET

p,m = φm(ξi, ηj,+1), (6.76)

where i, j = 1, 2, . . . , Nspts1D, p = j Nspts1D + i and m = 1, 2, . . . , Nspts.

Following the same procedure described in section 6.2.2, the common interface

solution values, including those on boundaries, can be computed,

uI,faceele = uI

(ufaceN

eleN ,ufaceele

), face ∈ L,F ,B,


uI,faceele = uI

(uface

ele ,ufaceNeleN

), face ∈ R,Bk,T ,

uI,∂Ωele = ub

(u∂Ω

ele

). (6.77)

The gradient of the continuous solution can be constructed along each set of

collinear points in the hexahedral. This can be written in vector format as

∂uCele

∂ξ(ξ, η, ζ) =

∂ψ

∂ξ(ξ, η, ζ)′ uele +

∑face∈L,R

∂ψface

∂ξ(ξ, η, ζ)′ uI,face

ele ,

∂uCele

∂η(ξ, η, ζ) =

∂ψ

∂η(ξ, η, ζ)′ uele +

∑face∈F ,Bk

∂ψface

∂η(ξ, η, ζ)′ uI,face

ele ,

∂uCele

∂ζ(ξ, η, ζ) =

∂ψ

∂ζ(ξ, η, ζ)′ uele +

∑face∈B,T

∂ψface

∂ζ(ξ, η, ζ)′ uI,face

ele . (6.78)

The polynomials ˜0(ξ), ˜

0(η), ˜0(ζ), ˜

P+2(ξ), ˜P+2(η), ˜

P+2(ζ) are used along with the

vectors `(ξ)′, `(η)′, `(ζ)′, ˜(ξ)′, ˜(η)′ and ˜(ζ)′, to construct the shape functions,

∂ψ

∂ξ(ξ, η, ζ)′ = `(ζ)′ ⊗ `(η)′ ⊗ ∂ ˜

∂ξ(ξ)′,

∂ψ

∂η(ξ, η, ζ)′ = `(ζ)′ ⊗ ∂ ˜

∂η(η)′ ⊗ `(ξ)′,

∂ψ

∂ζ(ξ, η, ζ)′ =

∂ ˜

∂ζ(ζ)′ ⊗ `(η)′ ⊗ `(ξ)′, (6.79)

and

∂ψLm∂ξ

(ξ, η, ζ) = `j(ζ) ì(η)∂ ˜

0

∂ξ(ξ),

∂ψRm∂ξ

(ξ, η, ζ) = `j(ζ) ì(η)∂ ˜

P+2

∂ξ(ξ),

∂ψFm∂η

(ξ, η, ζ) = `j(ζ)∂ ˜

0

∂η(η) ì(ξ),

∂ψBkm∂η

(ξ, η, ζ) = `j(ζ)∂ ˜

P+2

∂η(η), ì(ξ),

∂ψBm∂ζ

(ξ, η, ζ) =∂ ˜

0

∂ζ(ζ) `j(η) ì(ξ),

∂ψTm∂ζ

(ξ, η, ζ) =∂ ˜

P+2

∂ζ(ζ) `j(η) ì(ξ), (6.80)

for i, j = 1, 2, . . . , Nspts1D and m = j Nspts1D + i. Evaluating the equation at each

solution point and rearranging the results in a matrix-vector format produces the


numerical solution gradient in reference space,

δuele

δξ= DL


ξ uI,Rele ,

δuele

δη= DF

η uI,Fele +Dη uele +DBk

η uI,Bkele ,

δuele

δζ= DB

ζ uI,Bele +Dζ uele +DT

ζ uI,Tele , (6.81)

where the polynomial differentiation operators on solution and flux points are of size

(Nspts ×Nspts) and (Nspts ×Nspts2D), respectively. They are defined as

Dξp,m =∂ψm∂ξ

(θp), Dηp,m =∂ψm∂η

(θp), Dζp,m =∂ψm∂ζ

(θp), (6.82)

for p,m = 1, 2, . . . , Nspts,

Dfaceξp,m =

∂ψfacem

∂ξ(θp), face ∈ L,R,

Dfaceηp,m =

∂ψfacem

∂η(θp), face ∈ F,Bk,

Dfaceζp,m =

∂ψfacem

∂ζ(θp), face ∈ B, T, (6.83)

for p = 1, 2, . . . , Nspts and m = 1, 2, . . . , Nspts2D and the set θ1, θ2, . . . , θNspts is used

to represent the set of solution points

(ξ1, η1, ζ1), (ξ2, η1, ζ1), . . . , (ξNspts1D, ηNspts1D

, ζNspts1D). Eq. (6.81) can now be com-

bined with Eq. (6.72) to obtain the numerical solution gradient in block matrix-vector

format,

qele = J−1′

ele ∇δuele. (6.84)

Next, the numerical solution gradient can be extrapolated using the same tech-

nique as before,

qfacedele

= Eface qdele, face ∈ L,R,F ,Bk,B,T , d ∈ x,y, z, (6.85)

where each numerical solution derivative, qele = (qxele, qyele

, qzele) is extrapolated using


the same polynomial extrapolation operators. The transformed common interface

fluxes and the boundary fluxes are then computed as

f I,facedele

= Afaceele f I

(ufaceN

eleN ,ufaceele , q

faceNeleN , qface

ele

), d, face ∈ ξ,L, η,F , ζ,B,

f I,facedele

= Afaceele f I

(uface


ele , qfaceNeleN

), d, face ∈ ξ,R, η,Bk, ζ,T ,

f I,∂Ωele = A∂Ω

ele f b(u∂Ω

ele , q∂Ωele

), (6.86)

using the same techniques described in section 6.2.2. The numerical derivatives of

the transformed fluxes can then be expressed in matrix-vector format as

δfξele

δξ= DL

ξ fI,Lξele

+Dξ fξele+DR

ξ fI,Rξele

,

δfηele

δη= DF

η fI,Fηele

+Dη fηele+DBk

η f I,Bkηele,

δfζele

δζ= DB

ζ fI,Bζele

+Dζ fζele+DT

ζ fI,Tζele

, (6.87)

where

fele = J−1ele f (uele, qele) . (6.88)

Eq. (6.71) can now be written in a matrix-vector format as

∂uele

∂t= −V −1

ele ∇δ · fele. (6.89)

6.4 Extension to Fluid Flow

This section describes how the DFR method is applied to the Euler and Navier-Stokes

equations.

6.4.1 The Euler Equations

In conservative form, the unsteady Euler equations are written as

∂U

∂t+∇ · F = 0. (6.90)


where F = (Fx, Fy),

U =

ρ

ρu

ρv

e

, Fx =

ρu

ρu2 + p

ρuv

(e+ p)u

, Fy =

ρv

ρvu

ρv2 + p

(e+ p)v

, (6.91)

in the two-dimensional case and F = (Fx, Fy, Fz),

U =

ρ

ρu

ρv

ρw

e

, Fx =

ρu

ρu2 + p

ρuv

ρuw

(e+ p)u

, Fy =

ρv

ρvu

ρv2 + p

ρvw

(e+ p)v

, Fz =

ρw

ρwu

ρwv

ρw2 + p

(e+ p)w

,

(6.92)

in the three-dimensional case. The equations are defined by the primitive variables

where ρ is density, u, v, w are the velocity components in the x, y, z directions,

respectively, and e is total energy per unit volume. The pressure is determined from

the equation of state,

p = (γ − 1)

(e− 1

2ρ |V |2

), (6.93)

where γ is the ratio of specific heats and |V | is the magnitude of velocity.

6.4.2 The Navier-Stokes Equations

In conservative form, the unsteady Navier-Stokes equations are written as

∂U

∂t+∇ · F = 0 (6.94)

where F = (Fx, Fy) in the two dimensions, F = (Fx, Fy, Fz) in three dimensions and

Fd = Fd,inv − Fd,visc, d ∈ x, y, z. (6.95)


The solution vector and inviscid flux are defined by the Euler equations in equa-

tions (6.91) and (6.92) for two dimensions and three dimensions, respectively. The

viscous flux is written in two dimensions as

Fx,visc =

0

σxx

σxy

uσxx + vσyx − qx

, Fy,visc =

0

σyx

σyy

uσxy + vσyy − qy

, (6.96)

while the three dimensional form is written as

Fx,visc =

0

σxx

σxy

σxz

uσxx + vσyx + wσzx − qx

, Fy,visc =

0

σyx

σyy

σyz

uσxy + vσyy + wσzy − qy

,

Fz,visc =

0

σzx

σzy

σzz

uσxz + vσyz + wσzz − qz

. (6.97)

For a Newtonian fluid, the viscous stresses are defined in Einstein notation as

σij = µ

(∂ui∂xj

+∂uj∂xi

)− 2

3µδij

∂uk∂xk

, (6.98)

and the heat fluxes are

qi = −k ∂T∂xi

, (6.99)

where k = Cpµ/Pr, T = p/(ρR), Pr is the Prandtl number, Cp is the specific heat at

constant pressure, R is the gas constant and µ is the dynamic viscosity. In this work,

the dynamic viscosity is treated as a constant.



The DFR method can be directly applied to the two-dimensional and three-dimensional,

Euler and Navier-Stokes equations using the same techniques described in section 6.2

and section 6.3, respectively. In the case of the Euler equations, a second order term

doesn’t exist and the construction of a numerical solution gradient and diffusive flux

can be avoided.

The transformed semi-discrete equation can be written as,

∂Uele

∂t= −V −1

ele ∇δ · Fele, (6.100)

where the vector Uele = [u′1,ele,u′2,ele, . . . ,u

′Nvars,ele]

′ and Nvars is the number of conser-

vative variables in the equation. All DFR operators become block diagonal matrices

and act on each conservation equation separately. For example, the extrapolation

and differentiation operations are carried out on each variable. Also, the wavespeed

used for the Rusanov flux changes to,

|λ(U−, U+)| = max(|V n(U+)|+ c(U+), |V n(U−)|+ c(U−)

), (6.101)

where V n(U) is the velocity normal to the face and c(U) is the speed of sound. The

boundary functions used at the boundary faces are dependent on the problem and a

list of boundary conditions used are shown in appendix A.

Chapter 7

Time Stepping Schemes

Following directly from Eq. (6.100), consider the construction of a global, transformed

semi-discrete equation,∂U

∂t= −V −1 ∇δ · F , (7.1)

where U and ∇δ · F are vectors containing the solution and the divergence of the

flux in all elements. The fully discrete equation is obtained by substituting the exact

time derivative term with the numerical time derivative,

δU

δt= R (U) , (7.2)

where R (U) is the residual,

R (U ) = −V −1 ∇δ · F . (7.3)

A numerical time stepping method can be applied to Eq. (7.2) using the methods

described in this chapter.

The chapter is split into three sections. The first section discusses the explicit

and implicit Runge-Kutta (RK) methods used in this part of the dissertation. The

second section gives a description of pseudo time stepping for steady-state problems

and block iterative methods for linear solvers. The last section gives an overview of

dual time stepping for diagonally implicit Runge-Kutta (DIRK) methods.

70

CHAPTER 7. TIME STEPPING SCHEMES 71

7.1 Time Accurate Schemes

Consider the fully discrete equation described in Eq. (7.2). The unsteady solution

of this equation requires a time accurate method which can preserve the order of

accuracy from the spatial discretization. In order to accomplish this goal, Runge-

Kutta (RK) methods are often used to achieve higher temporal accuracy.

RK methods are often described in the following form,

Un+1 = Un + ∆tnNstages∑s=1

bsRs, (7.4)

where Un is solution at the current time step n, Un+1 is the solution at the next

time step, ∆tn is the numerical time step, bs are the weights of the residual, Rs are

the residuals evaluated at each stage s and Nstages is the number of stages for the

method.

7.1.1 Explicit Runge-Kutta

In this work, an explicit, four stage, fourth order RK scheme (RK44) is used to update

the solution,

R1 = R (Un) ,

R2 = R (Un + a2,1 ∆tn R1) ,

R3 = R (Un + a3,2 ∆tn R2) ,

R4 = R (Un + a4,3 ∆tn R3) ,

Un+1 = Un + ∆tn(b1 R1 + b2 R2 + b3 R3 + b4 R4), (7.5)


where the coefficients a and b are defined by the Butcher tableau,

0

c2 a2,1

c3 0 a3,2

c4 0 0 a4,3

b1 b2 b3 b4

=

0

1/2 1/2

1/2 0 1/2

1 0 0 1

1/6 1/3 1/3 1/6

.

A low-storage, explicit, five stage, fourth order RK scheme (RK45) from [50]

referred to as RK4(3)5[2R+] is also used to update the solution. This scheme is

particularly useful because it only requires the storage of the residual from two stages.

7.1.2 Diagonally Implicit Runge-Kutta

Consider the three stage, fourth order diagonally implicit RK scheme (DIRK34)

from [6],

R1 = R (Un + ∆tn a1,1 R1) ,

R2 = R (Un + ∆tn (a2,1 R1 + a2,2 R2)) ,

R3 = R (Un + ∆tn (a3,1R1 + a3,2R2 + a3,3R3)) ,

Un+1 = Un + ∆tn (b1 R1 + b2 R2 + b3 R3) , (7.6)

where the coefficients a and b are defined by the Butcher tableau,

c1 a1,1

c2 a2,1 a2,2

c3 a3,1 a3,2 a3,3

b1 b2 b3

=

(1 + α)/2 (1 + α)/2

1/2 −α/2 (1 + α)/2

(1− α)/2 1 + α −(1 + 2α) (1 + α)/2

1/ (6α2) 1− 1/ (3α2) 1/ (6α2)

and α = 2 cos(π/18) /√

3. This type of scheme requires the solution of an implicit

function at each stage. It’s important to note that each stage of a DIRK scheme is

completely independent from any future stages. This is in sharp contrast to a fully


implicit RK scheme which couples all stages of an RK scheme. Since all the diagonal

terms are equal, this scheme could also be referred to as a singly diagonally implicit

scheme (SDIRK).

In this work, explicit, singly diagonally implicit RK schemes (ESDIRK) are used

to advance the solution in time. In this case, the first stage of the scheme is explicit

while the remaining stages are implicit. The four stage, third order (ESDIRK3) and

six stage, fourth order (ESDIRK4) schemes referred to as ARK3(2)4L[2]SA-ESDIRK

and ARK4(3)6L[2]SA-ESDIRK, respectively, are utilized from [18].

7.1.3 Step size control

The RK45 and ESDIRK4 schemes described above both use the PI step size controller

described in [34, 98] to estimate the maximum stable time step. Consider applying a

time stepping method one order lower than the given scheme in Eq. (7.4),


bsRs. (7.7)

An error approximation can be made by taking the difference between Eq. (7.4) and

Eq. (7.7),

en(tn + ∆tn) ≈ ∆tnNstages∑s=1

(bs − bs

)Rs. (7.8)

Normalizing this equation and setting a tolerance produces the relative error,

σn =|en|

Tola + Tolr max (|Un|, |Un+1|), (7.9)

where Tola and Tolr are the absolute and relative error tolerances, respectively. The

maximum relative error across all values is found and defined as σn and the scaling

factor can then be computed as

f = (σn)−α/Q(σn−1

)−β/Q, (7.10)


where α = 0.7, β = 0.4 and Q is the order of the scheme. A minimum, maximum

and safety factor is also introduced such that,

fmin ≤ fsafef ≤ fmax, (7.11)

where the factors are set based on the scheme as shown in Table 7.1. This forces a

Scheme fmin fsafe fmax

Explicit 0.3 0.8 2.5Implicit 0.5 0.9 2.5

Table 7.1: Scaling factors used for the explicit and implicit step controller.

smoother transition when changing the time step. The estimate for the maximum

stable time step can then be computed as

∆tn+1 = ∆tn fsafef. (7.12)

7.2 Pseudo Time Stepping

Starting from Eq. (7.2), consider the following steady-state problem,

R(U) = 0. (7.13)

The solution to this equation can be found by utilizing pseudo time stepping or

pseudo-transient continuation [49]. This particular method is useful for solving non-

linear systems of partial differential equations where a good initial guess for a proper

root finding technique is not immediately available.

Applying a pseudo time stepping technique to Eq. (7.13) forms the following fully

discrete equation,δU

δτ= R(U), (7.14)

where an explicit or implicit method can be used in place of the numerical pseudo

time derivative of the solution to march the solution to steady-state. In this case,


stability is considerably more important than accuracy when choosing a time stepping

scheme.

Given an initial solution of Um where m is the current pseudo time step iteration,

backward Euler can be applied to Eq. (7.14) to find the solution at the next pseudo

time step,

∆Um = ∆Tm R(Um+1

), (7.15)

where ∆Um = Um+1−Um and ∆Tm is a diagonal matrix containing the time step,

∆τm, along the diagonal. When an element local time step estimate is used based

on the local CFL number, ∆τmele, is used along the diagonal. Following a linearized

backward Euler technique, a Taylor series expansion of R (Um+1) is used to linearize

the equation,

R(Um+1

)≈ Rm +

∂Rm

∂Um∆Um, (7.16)

whereRm = R (Um). Rearranging Eq. (7.15) by using the approximation in Eq. (7.16)

gives the following global linear system,((∆Tm)−1 − ∂Rm

∂Um

)∆Um = Rm. (7.17)

It’s important to note that Eq. (7.17) approaches Newton’s method as ∆τmele →∞,

∂Rm

∂Um∆Um = −Rm. (7.18)

This allows the method to efficiently transition into Newton’s method when the so-

lution is sufficiently close to the final result.

7.2.1 The Global Linear System

The global linear system in Eq. (7.17) can be written as

A ∆Um = Rm, (7.19)


where

A = (∆Tm)−1 − ∂Rm

∂Um. (7.20)

The global matrix, A, is a large, sparse, block matrix where each block contains an

element residual Jacobian. The diagonal blocks of the matrix are defined as

Aele,ele = (∆Tmele)−1 − ∂Rm

ele

∂Umele

, (7.21)

while the off-diagonal blocks are written as

Aele,eleN = − ∂Rmele

∂UmeleN

, (7.22)

where eleN refers to neighboring elements within the stencil of the scheme. For

example, one-dimensional advection would contain two neighboring elements while

one-dimensional diffusion could contain up to four.

7.2.2 Block Iterative Methods

The size of the global matrix can force linear solvers to be prohibitively expensive

for high-order methods. This size can be mitigated through the use of block iterative

methods. A block iterative method can be reformulated to avoid the construction of

the off-diagonal blocks defined in Eq. (7.22). This eliminates the need to store and

compute off-diagonal blocks, helps maintain element locality within a linear solver

and reduces the linear system to a batch of smaller linear systems.

Starting from a traditional splitting method [32], the matrix can be split into two

parts, A = M −N so that Eq. (7.19) becomes,

M ∆Uk+1 = N ∆Uk +Rm, (7.23)

where ∆Uk+1 → ∆Um when k →∞ if the iterative method is convergent.

In order to avoid constructing N , Eq. (7.23) is rearranged. Subtracting M ∆Uk


from both sides of Eq. (7.23) produces

M ∆2Uk = Rm −A ∆Uk, (7.24)

where

∆2Uk = ∆Uk+1 −∆Uk =(Uk+1 −Um

)−(Uk −Um

)= Uk+1 −Uk. (7.25)

The linear approximation in Eq. (7.16) is used to reduce Eq. (7.24) even further. The

approximation is written as

R(Uk) ≈ R(Um) +∂Rm

∂Um∆Uk, (7.26)

and can be rewritten as

Rm = Rk − ∂Rm

∂Um∆Uk. (7.27)

Applying this approximation and Eq. (7.20) directly to Eq. (7.24) produces the fol-

lowing,

M ∆2Uk = Rk − (∆Tm)−1 ∆Uk. (7.28)

Note that Uk+1 → Um+1 as k →∞ but since time accuracy isn’t particularly impor-

tant when advancing a pseudo time step, the iterative method need not fully converge

and the total amount of iterations in k is usually small. This particular formulation

has the advantage of avoiding the construction of N but has the disadvantage of

having to recompute the residual R(Uk) at every k iteration. It’s also important to

note that the last term in Eq. (7.28), (∆Tm)−1 ∆Uk, can sometimes be removed for

faster convergence [83].

In this work, block Jacobi and multicolored Gauss-Seidel (MCGS) are used to

solve the linear system in Eq. (7.28). Each method has their own advantages and

disadvantages. Block Jacobi is particular efficient because all linear systems can be

solved simultaneously as opposed to the color sequence in MCGS but MCGS tends to

have better convergence properties compared to block Jacobi. Both methods require

nonsingular diagonal blocks. This is provided by adjusting the pseudo time step to


ensure strictly diagonal dominant matrices.

Block Jacobi

The simplest block iterative method is the block Jacobi method which splits the global

matrix into a block diagonal matrix and a matrix containing the off-diagonal terms.

In this case, the matrices M and N take the form,

Mele,ele = (∆Tmele)−1 − ∂Rm

ele

∂Umele

,

Nele,eleN =∂Rm

ele

∂UmeleN

, (7.29)

where the remaining blocks in both matrices are empty. By placing the off-diagonal

terms in the matrix N , a much smaller system can be solved in each element.

Eq. (7.28) is rewritten as a batch of element local systems,((∆Tm

ele)−1 − ∂Rm

ele

∂Umele

)∆2Uk

ele = Rkele − (∆Tm

ele)−1 ∆Uk

ele, (7.30)

where all linear systems can be solved simultaneously.

Multicolored Gauss-Seidel

The multicolored block Gauss-Seidel (MCGS) method is a block iterative method

which attempts to improve convergence by coloring sets of elements and solving the

resulting system for each color sequentially in order to update the residual of the next

color.

The splitting method used on the global matrix is highly dependent on the coloring

algorithm and the spatial discretization. In this work, groups of elements are chosen

such that no two adjacent elements have the same color. For example, consider the

application of two colors, red and black, on the Cartesian mesh shown in Figure 7.1.

The corresponding global matrix can be represented as


b

b

b

b b

(a) Stencil for Advection/Euler

b

b

b

b b

b

bb

b

b

b

bb

(b) Stencil for Diffusion/Navier-Stokes

Figure 7.1: A Cartesian mesh with a red-black element color mapping. The DFRelement stencil (white) relies on information from neighboring elements from theopposite color.

A =

[A1,1 A1,2

A2,1 A2,2

]

where 1 and 2 correspond to the colors red and black, respectively.

When solving advection or Euler equations, the neighboring elements of any par-

ticular element belong to the opposite color (see Figure 7.1a). This ensures that neigh-

boring element Jacobians are located inA1,2,A2,1 and that the matricesA1,1 = D1,1,

A2,2 = D2,2 where D1,1, D2,2 are block diagonal matrices. The splitting method

then follows a traditional block Gauss-Seidel method where M is a lower block tri-

angular matrix and N is a strictly upper block triangular matrix,

M =

[D1,1 0

A2,1 D2,2

], N =

[0 A1,2

0 0

].

For diffusion or the Navier-Stokes equations, the stencil is much larger and some

of the neighboring elements may belong to the same color (see Figure 7.1b). In this


case, the matrices A1,1, A2,2 have neighboring element Jacobians so that

A1,1 = D1,1 +C1,1,

A2,2 = D2,2 +C2,2, (7.31)

where D1,1, D2,2 contain the block diagonal components and C1,1, C2,2 contain the

off-diagonal blocks. The splitting method is then,

M =

[D1,1 0

A2,1 D2,2

], N =

[C1,1 A1,2

0 C2,2

].

By ensuring that the diagonal matrices of M are block diagonal, all element local

systems within a given color can be solved simultaneously. Combining the M matrix

above with Eq. (7.28) produces the following result,

D1,1 ∆2Uk1 = Rk

1 − (∆Tm1 )−1 ∆Uk

1 ,

A2,1 ∆2Uk1 +D2,2 ∆2Uk

2 = Rk2 − (∆Tm

2 )−1 ∆Uk2 , (7.32)

where

D1,1ele,ele=(∆Tm

1ele

)−1 −∂Rm

1ele

∂Um1ele

,

D2,2ele,ele=(∆Tm

2ele

)−1 −∂Rm

2ele

∂Um2ele

,

A2,1ele,eleN= −

∂Rm2ele

∂Um1eleN

. (7.33)

A linear approximation is used in each element to further reduce Eq. (7.32). The

approximation is written as

R(Uk+1

1eleN,Uk

2ele,Uk

2eleN

)≈ R

(Uk

1eleN,Uk

2ele,Uk

2eleN

)+∑eleN

∂Rm2ele

∂Um1eleN

(Uk+1

1eleN−Uk

1eleN

),

(7.34)

where, in this case, it is assumed that element neighbors may exist within the same


color group. This approximation can be rewritten as

R∗2ele

= Rk2ele

+∑eleN

∂Rm2ele

∂Um1eleN

∆2Uk1eleN

, (7.35)

where the ∗ in R∗2ele

defines a computed residual given the most up-to-date solution

values in all elements. Applying this approximation for all elements within the second

color directly to Eq. (7.32) produces the following,

D1,1 ∆2Uk1 = Rk

1 − (∆Tm1 )−1 ∆Uk

1 ,

D2,2 ∆2Uk2 = R∗

2 − (∆Tm2 )−1 ∆Uk

2 . (7.36)

Eq. (7.36) can be generalized to multiple colors and is rewritten as a batch of element

local systems for each color,((∆Tm

cele

)−1 −∂Rm

cele

∂Umcele

)∆2Uk

cele= R∗

cele−(∆Tm

cele

)−1∆Uk

cele, (7.37)

where all linear systems within a given color, c, can be solved simultaneously.

7.3 Dual Time Stepping

Dual time stepping [44] is the process of converting each implicit time step into a

modified steady state problem in order to apply a pseudo time stepping technique.

It’s usually used in conjunction with a second order backward difference formula

(BDF2) but it can be applied to implicit Runge-Kutta methods as well [45].

7.3.1 Application to Diagonally Implicit Runge-Kutta Schemes

Consider rewriting Eq. (7.6) into a general form which solves for the solution at each

stage,

Us = Un + ∆tns∑j=1

as,jRj ,



bsRs, (7.38)

where s can be any stage within the DIRK scheme and each stage is solved sequen-

tially. A modified residual can be formed for each stage so that

Rs(Us) = 0, (7.39)

where

Rs(Us) = − (∆T n)−1 (Us −Un) +s∑j=1

as,jRj , (7.40)

and ∆T n is a diagonal matrix containing the time step, ∆tn, along the diagonal.

Following directly from the pseudo time stepping technique described in section 7.2,

a global linear system can be constructed for each stage,((∆Tm)−1 − ∂Rm

s

∂Ums

)∆Um

s = Rms . (7.41)

where∂Rm

s

∂Ums

= − (∆T n)−1 + as,s∂Rm

s

∂Ums

. (7.42)

This global linear system can then be solved using the methods described in sections

7.2.1 and 7.2.2. It’s important to note that the stage residual Jacobian, ∂Rms∂Ums

, can

often be replaced by the residual Jacobian of the first stage without a significant

loss in convergence. This means that the element local Jacobian only needs to be

computed once per time step.

Chapter 8

The Element Local Jacobian

Matrix

The block iterative methods described in section 7.2.2 require the construction of

element local Jacobians,∂Rmele

∂Umele. What follows is a derivation of the element local

Jacobian matrix for the DFR method and a reformulation of these matrices into

Kronecker products.

This chapter is split into three sections. Section 8.1 describes the element local

Jacobian derivation for one-dimensional DFR and the one-dimensional Kronecker

operators needed for the multidimensional Kronecker product formulation. Section

8.2 extends the derivation and Kronecker product formulation for two- and three-

dimensional unstructured quadrilateral and hexahedral elements. We show that the

element local Jacobian can be split into two terms which are constructed from two or

three distinct sparsity patterns based on Kronecker products as shown in Figures 8.1–

8.7. We also show that the time complexity of computing an element local Jacobian

can be reduced from O(P 3d−1

)to O

(P d+1

)for the advection or Euler equations and

O(P 3d)

to O(P d+2

)for the advection-diffusion or Navier-Stokes equations. Section

8.3 describes the changes needed to form element local Jacobian matrices for the Euler

and Navier-Stokes equations including the wavespeed Jacobians and modifications to

elements on boundaries.

83

CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 84

8.1 One-Dimensional Formulation for Advection-

Diffusion

Following from Eq. (6.30), the one-dimensional DFR formulation of the semi-discrete

advection-diffusion equation in each element can be written as

rele = r(uele-1,uele,uele+1) = − 1

|Jele|δfele

δξ, (8.1)

where rele = r(uele-1,uele,uele+1) is the residual. This equation can be differentiated

with respect to the discontinuous solution vector, uele, to produce the element local

Jacobian matrix,∂rele

∂uele

= − 1

|Jele|∂

∂uele

(δfele

δξ

), (8.2)

where ∂rele

∂ueleis a matrix of size (Nspts1D × Nspts1D). The Jacobian of the numerical

derivative of the flux is split into two distinct parts by applying a chain rule,

∂

∂uele

(δfele

δξ

)=

δ

δξ

(∂fele

∂uele

)+

δ

δξ

(∂fele

∂qele

∂qele

∂uele

), (8.3)

where the numerical derivative of the flux with respect to the solution is defined as

δ

δξ

(∂fele

∂uele

)= DL

ξ

∂f I,Lele

∂uLele

∗

EL +Dξ∂fele

∂uele

+DRξ

∂f I,Rele

∂uRele

∗

ER, (8.4)

and the numerical derivative of the flux with respect to the solution derivative is

defined as

δ

δξ

(∂fele

∂qele

∂qele

∂uele

)=

(DLξ

∂f I,Lele

∂qLele

EL +Dξ∂fele

∂qele

+DRξ

∂f I,Rele

∂qRele

ER

)(− 1

|Jele|

) (DLξ

∂uI,Lele

∂uLele

EL +Dξ +DRξ

∂uI,Rele

∂uRele

ER

). (8.5)


The modified common interface flux Jacobians also contain contributions from neigh-

boring solution derivatives and are defined as

∂f I,Lele

∂uLele

∗

=∂f I,Lele

∂uLele

+∂f I,Lele

∂qRele-1

∂qRele-1

∂uI,Rele-1

∂uI,Rele-1

∂uLele

,

∂f I,Rele

∂uRele

∗

=∂f I,Rele

∂uRele

+∂f I,Rele

∂qLele+1

∂qLele+1

∂uI,Lele+1

∂uI,Lele+1

∂uRele

, (8.6)

Table 8.1 defines the Jacobians which are diagonal matrices primarily constructed

through function evaluations. These are described in more detail in section 8.1.1.

Name Variable Size

Discontinuous Flux ∂fele

∂uele, ∂fele

∂qele(Nspts1D ×Nspts1D)

Common Flux∂fI,Lele

∂uLele,∂fI,Rele

∂uRele,∂fI,Lele

∂qLele,∂fI,Rele

∂qRele,∂fI,Lele

∂qRele-1,∂fI,Rele

∂qLele+1(1× 1)

Common Solution∂uI,Lele

∂uLele,∂uI,Rele

∂uRele,∂uI,Rele-1

∂uLele,∂uI,Lele+1

∂uRele(1× 1)

Modified Common Flux∂fI,Lele

∂uLele

∗,∂fI,Rele

∂uRele

∗(1× 1)

Neighbor Contribution∂qRele-1

∂uI,Rele-1

,∂qLele+1

∂uI,Lele+1

(1× 1)

Table 8.1: Diagonal Jacobian matrices used in the construction of the one-dimensionalelement local Jacobian matrix for advection-diffusion

8.1.1 Derivation

The derivation of the one-dimensional element local Jacobian begins by noting that

there is no dependence on the discontinuous solution when applying the numerical

derivative, δδξ

, to the continuous flux in Eq. (8.2). This means that the Jacobian with

respect to the discontinuous solution, ∂∂uele

, can be applied directly to the flux by

modifying Eq. (6.29),

∂

∂uele

(δfele

δξ

)= DL

ξ

∂

∂uele

(f I,Lele

)+Dξ

∂

∂uele

(fele) +DRξ

∂

∂uele

(f I,Rele

). (8.7)


The derivation can now be split into sections where the Jacobians in Table 8.1 are

constructed.

Discontinuous Flux Jacobian

The Jacobian of the discontinuous flux can be computed by applying the chain rule

to the original flux function,

∂

∂uele

(fele) =∂

∂uele

(f (uele, qele)) ,

=∂fele

∂uele

+∂fele

∂qele

∂

∂uele

(qele) , (8.8)

where

∂fele

∂uele

=∂f

∂u(uele, qele) =

∂fadv

∂u(uele)−

∂fdiff

∂u(uele, qele) ,

∂fele

∂qele

=∂f

∂q(uele, qele) = −∂fdiff

∂q(uele, qele) , (8.9)

and ∂fadv

∂u(u), ∂fdiff

∂u(u, q) and ∂fdiff

∂q(u, q) are the exact derivatives of the flux functions

fadv(u) and fdiff(u, q) with respect to u and q, accordingly. Since there’s no dependence

between solution points when evaluating the flux functions, the discontinuous flux

Jacobians, ∂fele

∂ueleand ∂fele

∂qeleare diagonal matrices of size (Nspts1D ×Nspts1D).

Common Interface Flux Jacobians

The Jacobians of the common interface fluxes can also be computed by applying the

chain rule. Starting from Eq. (6.24) and utilizing the extrapolation equations (6.11)

and (6.23), the Jacobians are formulated as

∂

∂uele

(f I,Lele

)=

∂

∂uele

(f I(uRele-1, u

Lele, q

Rele-1, q

Lele

)),

=∂f I,Lele

∂uLele

EL +∂f I,Lele

∂qLele

EL∂

∂uele

(qele) +

∂f I,Lele

∂qRele-1

ER∂

∂uele

(qele-1) ,


∂

∂uele

(f I,Rele

)=

∂

∂uele

(f I(uRele, u

Lele+1, q

Rele, q

Lele+1

)),

=∂f I,Rele

∂uRele

ER +∂f I,Rele

∂qRele

ER∂

∂uele

(qele) +

∂f I,Rele

∂qLele+1

EL∂

∂uele

(qele+1) , (8.10)

where

∂f I,Lele

∂uLele

=∂f I

∂u+

(uRele-1, u

Lele, q

Lele

),

∂f I,Rele

∂uRele

=∂f I

∂u−(uRele, u

Lele+1, q

Rele

),

∂f I,Lele

∂qLele

=∂f I

∂q+

(uLele, q

Lele

),

∂f I,Rele

∂qRele

=∂f I

∂q−(uRele, q

Rele

),

∂f I,Lele

∂qRele-1

=∂f I

∂q−(uRele-1, q

Rele-1

),

∂f I,Rele

∂qLele+1

=∂f I

∂q+

(uLele+1, q

Lele+1

), (8.11)

and

∂f I

∂u−(u−, u+, q−

)=∂f Iadv

∂u−(u−, u+

)− ∂f Idiff

∂u−(u−, q−

),

∂f I

∂u+

(u−, u+, q+

)=∂f Iadv

∂u+

(u−, u+

)− ∂f Idiff

∂u+

(u+, q+

),

∂f I

∂q−(u−, q−

)= −∂f

Idiff

∂q−(u−, q−

),

∂f I

∂q+

(u+, q+

)= −∂f

Idiff

∂q+

(u+, q+

), (8.12)

are the exact derivatives of the interface flux function f I (u−, u+, q−, q+) with respect

to u−, u+, q− and q+, respectively. Much like the flux function, the interface flux

function has no dependence between solution points so the resulting common interface

flux Jacobians evaluated from the interface flux derivatives are scalars in the one-

dimensional case.

The derivatives of the interface advective flux function or, in this case, the Rusanov

flux are computed by differentiating Eq. (6.25) with respect to u− and u+,

∂f Iadv

∂u−(u−, u+

)=

1

2

∂fadv

∂u(u−)− 1

2

∂

∂u−(|λ(u−, u+)|(u+ − u−)

),


∂f Iadv

∂u+

(u−, u+

)=

1

2

∂fadv

∂u(u+)− 1

2

∂

∂u+

(|λ(u−, u+)|(u+ − u−)

), (8.13)

where

∂

∂u−(|λ(u−, u+)|(u+ − u−)

)= (u+ − u−)

∂

∂u−(|λ(u−, u+)|

)− |λ(u−, u+)|,

∂

∂u+

(|λ(u−, u+)|(u+ − u−)

)= (u+ − u−)

∂

∂u+

(|λ(u−, u+)|

)+ |λ(u−, u+)|, (8.14)

and

∂

∂u−(|λ(u−, u+)|

)=

∂∂u−

(∣∣∂fadv

∂u(u−)

∣∣) if∣∣∂fadv

∂u(u−)

∣∣ > ∣∣∂fadv

∂u(u+)

∣∣,0 otherwise,

∂

∂u+

(|λ(u−, u+)|

)=

∂∂u+

(∣∣∂fadv

∂u(u+)

∣∣) if∣∣∂fadv

∂u(u+)

∣∣ > ∣∣∂fadv

∂u(u−)

∣∣,0 otherwise.

The derivatives of the interface diffusive flux are formed by differentiating the LDG

flux in Eq. (6.27),

∂f Idiff

∂u−(u−, q−

)=

1

2

∂fdiff

∂u(u−, q−) + β

∂fdiff

∂u(u−, q−) + τ,

∂f Idiff

∂u+

(u+, q+

)=

1

2

∂fdiff

∂u(u+, q+)− β ∂fdiff

∂u(u+, q+)− τ,

∂f Idiff

∂q−(u−, q−

)=

1

2

∂fdiff

∂q(u−, q−) + β

∂fdiff

∂q(u−, q−),

∂f Idiff

∂q+

(u+, q+

)=

1

2

∂fdiff

∂q(u+, q+)− β ∂fdiff

∂q(u+, q+). (8.15)

Boundary Flux Jacobians

The Jacobians of the boundary fluxes are found by differentiating Eq. (6.28),

∂

∂u1

(f I,L1

)=

∂

∂u1

(f b(uL1 , q

L1

)),

=∂f I,L1

∂uL1EL +

∂f I,L1

∂qL1EL

∂

∂u1

(q1) ,


∂

∂uNeles

(f I,RNeles

)=

∂

∂uNeles

(f b(uRNeles

, qRNeles

)),

=∂f I,RNeles

∂uRNeles

ER +∂f I,RNeles

∂qRNeles

ER∂

∂uNeles

(qNeles) , (8.16)

where

∂f I,L1

∂uL1=∂f b

∂u

(uL1 , q

L1

),

∂f I,RNeles

∂uRNeles

=∂f b

∂u

(uRNeles

, qRNeles

),

∂f I,L1

∂qL1=∂f b

∂q

(uL1 , q

L1

),

∂f I,RNeles

∂qRNeles

=∂f b

∂q

(uRNeles

, qRNeles

). (8.17)

and ∂fb

∂u(u, q) and ∂fb

∂q(u, q) are the derivatives of the function which applies the bound-

ary condition.

Contribution from local solution derivative

Following directly from Eq. (6.22), the Jacobian of the numerical solution derivative

with respect to the discontinuous solution in each element can be written as

∂

∂uele

(qele) = − 1

|Jele|∂

∂uele

(δuele

δξ

), (8.18)

where ∂∂uele

can applied directly to the solution in Eq. (6.20),

∂

∂uele

(δuele

δξ

)= DL

ξ

∂

∂uele

(uI,Lele

)+Dξ +DR

ξ

∂

∂uele

(uI,Rele

). (8.19)

The Jacobians of the common solution are constructed by applying the chain rule

to Eq. (6.13) and differentiating Eq. (6.11),

∂

∂uele

(uI,Lele

)=

∂

∂uele

(uI(uRele-1, u

Lele

))=∂uI,Lele

∂uLele

EL,

∂

∂uele

(uI,Rele

)=

∂

∂uele

(uI(uRele, u

Lele+1

))=∂uI,Rele

∂uRele

ER, (8.20)


where∂uI,Lele

∂uLele

=∂uI

∂u+=

1

2+ β,

∂uI,Rele

∂uRele

=∂uI

∂u−=

1

2− β, (8.21)

are the exact derivatives of the LDG interface solution function uI (u−, u+) with

respect to u−, u+, respectively.

In the case of a boundary, the Jacobian is found by differentiating Eq.(6.15),

∂

∂u1

(uI,L1

)=

∂

∂u1

(ub(uL1))

=∂uI,L1

∂uL1EL,

∂

∂uNeles

(uI,RNeles

)=

∂

∂uNeles

(ub(uRNeles

))=∂uI,RNeles

∂uRNeles

ER, (8.22)

where∂uI,L1

∂uL1=∂ub

∂u

(uL1),

∂uI,RNeles

∂uRNeles

=∂ub

∂u

(uRNeles

), (8.23)

and ∂ub

∂u(u) is the derivative of the function which applies the boundary condition for

the solution at the specified boundary.

Contribution from neighboring solution derivatives

Since neighboring solution derivatives, qele-1 and qele+1, depend on the element local

discontinuous solution, the Jacobians for these contributions must be included in the

derivation. The Jacobians of the neighboring solution derivatives can be written as

∂

∂uele

(qele-1) = − 1

|Jele-1|∂

∂uele

(δuele-1

δξ

),

∂

∂uele

(qele+1) = − 1

|Jele+1|∂

∂uele

(δuele+1

δξ

). (8.24)

Applying ∂∂uele

to Eq. (6.20) for the neighboring elements produces

∂

∂uele

(δuele-1

δξ

)= DR

ξ

∂

∂uele

(uI,Rele-1

),

∂

∂uele

(δuele+1

δξ

)= DL

ξ

∂

∂uele

(uI,Lele+1

), (8.25)


where the Jacobians of the common solution are written as

∂

∂uele

(uI,Rele-1

)=

∂

∂uele

(uI(uRele−1, u

Lele

))=∂uI,Rele-1

∂uLele

EL,

∂

∂uele

(uI,Lele+1

)=

∂

∂uele

(uI(uRele, u

Lele+1

))=∂uI,Lele+1

∂uRele

ER, (8.26)

and∂uI,Rele-1

∂uLele

=∂uI,Lele

∂uLele

,∂uI,Lele+1

∂uRele

=∂uI,Rele

∂uRele

. (8.27)

The neighboring common interface solution Jacobians can then be evaluating using

the results from Eq. (8.21).

Modified Common Interface Flux Jacobians and Final Result

The modified common interface flux Jacobians in Eq. (8.6) are constructed by rewrit-

ing the Jacobians of the common fluxes in Eq. (8.10) so that the contributions from

neighboring solution derivatives are combined with other terms in the equation,

∂

∂uele

(f I,Lele

)=∂f I,Lele

∂uLele

∗

EL +∂f I,Lele

∂qLele

EL∂

∂uele

(qele) ,

∂

∂uele

(f I,Rele

)=∂f I,Rele

∂uRele

∗

ER +∂f I,Rele

∂qRele

ER∂

∂uele

(qele) , (8.28)

where the modified common interface flux Jacobians,∂fI,Lele

∂uLele

∗and

∂fI,Rele

∂uRele

∗are defined to

include∂fI,Lele

∂uLeleand

∂fI,Rele

∂uRelefrom Eq. (8.11) as well as the contributions from neighboring

solution derivatives in equations (8.10) and (8.24)–(8.27),

∂f I,Lele

∂uLele

∗

=∂f I,Lele

∂uLele

+∂f I,Lele

∂qRele-1

ER(− 1

|Jele-1|

)DRξ

∂uI,Rele-1

∂uLele

,

∂f I,Rele

∂uRele

∗

=∂f I,Rele

∂uRele

+∂f I,Rele

∂qLele+1

EL(− 1

|Jele+1|

)DLξ

∂uI,Lele+1

∂uRele

. (8.29)


The neighbor contribution Jacobians are now defined as

∂qRele-1

∂uI,Rele-1

= ER(− 1

|Jele-1|

)DRξ ,

∂qLele+1

∂uI,Lele+1

= EL(− 1

|Jele+1|

)DLξ , (8.30)

where it’s important to note that the results are scalars.

By utilizing equations (8.8), (8.28) and (8.18)–(8.20) and grouping terms together,

the Jacobian of the numerical derivative of the flux from Eq. (8.7) can now be split

into two distinct parts and the final result in Eq. (8.3) can be constructed.

8.1.2 Kronecker Product Formulation

The element local Jacobian follows a distinct pattern which comes directly from the

Lagrange basis polynomials used in the DFR method. This allows for a Kronecker

product formulation which is more compact. The numerical derivative of the flux

with respect to the solution and the solution derivative in equations (8.4) and (8.5)

can be reformulated as

δ

δξ

(∂fele

∂uele

)=∂f I,Lele

∂uLele

∗

⊗KL + 1⊗(K

∂fele

∂uele

)+∂f I,Rele

∂uRele

∗

⊗KR, (8.31)

and

δ

δξ

(∂fele

∂qele

∂qele

∂uele

)=

(∂f I,Lele

∂qLele

⊗KL + 1⊗(K

∂fele

∂qele

)+∂f I,Rele

∂qRele

⊗KR

)(− 1

|Jele|

) (∂uI,Lele

∂uLele

⊗KL + 1⊗K +∂uI,Rele

∂uRele

⊗KR

), (8.32)

where the Kronecker product operators are of size (Nspts1D×Nspts1D) and come directly

from equations (6.21) and (6.12),

K = Dξ =∂ ˜

∂ξ(ξ)′,

KL = DLξ E

L =∂ ˜

0

∂ξ(ξ) `(−1)′,


KR = DRξ E

R =∂ ˜

P+2

∂ξ(ξ) `(+1)′. (8.33)

For clarity, these operators can also be written as,

Kp,m =∂ ˜

m

∂ξ(ξp), p,m = 1, 2, . . . , Nspts1D,

KLp,m =

∂ ˜0

∂ξ(ξp) `m(−1), p,m = 1, 2, . . . , Nspts1D,

KRp,m =

∂ ˜P+2

∂ξ(ξp) `m(+1), p,m = 1, 2, . . . , Nspts1D. (8.34)

Using this new notation, the neighbor contribution Jacobians from Eq. (8.30) can

also be reformulated as

∂qRele-1

∂uI,Rele-1

=

(− 1

|Jele-1|

) Nspts1D∑p=1

KRp,p,

∂qLele+1

∂uI,Lele+1

=

(− 1

|Jele+1|

) Nspts1D∑p=1

KLp,p. (8.35)

In the one-dimensional case, Kronecker products do very little to simplify the equation

but by rewriting the equation in terms of Kronecker products, it’s much easier to see

the extension to multiple dimensions.

8.2 Two- and Three-Dimensional Formulation for

Advection-Diffusion

In this section, the two- and three dimensional element local Jacobian formulation

for advection-diffusion is generalized for multiple dimensions to avoid repetition. In

order to generalize the formulation, a number of sets are defined in Table 8.2. These

are used throughout the section.

Following from Eq. (6.71), the two- or three-dimensional DFR formulation of the

semi-discrete advection-diffusion equation in each element can be written as

rele = r(uele,ueleN) = −V −1ele ∇

δ · fele, (8.36)


Variable 2D 3D

Dx x, y x, y, zDξ ξ, η ξ, η, ζDLξ ξ,L, η,B ξ,L, η,F , ζ,BDRξ ξ,R, η,T ξ,R, η,Bk, ζ,T FL L,B L,F ,BFR R,T R,Bk,T

Table 8.2: Two- and three-dimensional sets

where r(uele,ueleN) is the residual. Differentiating with respect to the discontinuous

solution vector, uele, produces the element local Jacobian matrix,

∂rele

∂uele

= −V −1ele

∂

∂uele

(∇δ · fele

), (8.37)

where the Jacobian of the numerical gradient of the flux is split into two distinct

parts by applying a chain rule,

∂

∂uele

(∇δ · fele

)= ∇δ ·

(∂fele

∂uele

)+ ∇δ ·

(∂fele

∂qele

∂qele

∂uele

). (8.38)

In the two-dimensional case, the element local Jacobian, ∂rele

∂ueleis a matrix of size

(Nspts2D×Nspts2D). The numerical gradient of the flux with respect to the solution is

defined as

∇δ ·

(∂fele

∂uele

)=DL

ξ

∂f I,Lξele

∂uLele

∗

EL +Dξ∂fξele

∂uele

+DRξ

∂f I,Rξele

∂uRele

∗

ER+

DBη

∂f I,Bηele

∂uBele

∗

EB +Dη∂fηele

∂uele

+DTη

∂f I,Tηele

∂uTele

∗

ET , (8.39)


and the numerical gradient of the flux with respect to the solution gradient is

∇δ ·

(∂fele

∂qele

∂qele

∂uele

)=

∑dj∈x,y(

DLξ

∂f I,Lξele

∂qLdjele

EL +Dξ∂fξele

∂qdjele

+DRξ

∂f I,Rξele

∂qRdjele

ER+

DBη

∂f I,Bηele

∂qBdjele

EB +Dη∂fηele

∂qdjele

+DTη

∂f I,Tηele

∂qTdjele

ET

)(J−1′

(dj,ξ)ele

(DLξ

∂uI,Lele

∂uLele

EL +Dξ +DRξ

∂uI,Rele

∂uRele

ER

)+

J−1′

(dj,η)ele

(DBη

∂uI,Bele

∂uBele

EB +Dη +DTη

∂uI,Tele

∂uTele

ET

)). (8.40)

In the three-dimensional case, the size of ∂rele

∂ueleis (Nspts × Nspts). The numerical

gradient of the flux with respect to the solution is

∇δ ·

(∂fele

∂uele

)=DL

ξ

∂f I,Lξele

∂uLele

∗

EL +Dξ∂fξele

∂uele

+DRξ

∂f I,Rξele

∂uRele

∗

ER+

DFη

∂f I,Fηele

∂uFele

∗

EF +Dη∂fηele

∂uele

+DBkη

∂f I,Bkηele

∂uBkele

∗

EBk,

DBζ

∂f I,Bζele

∂uBele

∗

EB +Dζ∂fζele

∂uele

+DTζ

∂f I,Tζele

∂uTele

∗

ET , (8.41)

and the numerical gradient of the flux with respect to the solution gradient is

∇δ ·

(∂fele

∂qele

∂qele

∂uele

)=

∑dj∈x,y,z(

DLξ

∂f I,Lξele

∂qLdjele

EL +Dξ∂fξele

∂qdjele

+DRξ

∂f I,Rξele

∂qRdjele

ER+

DFη

∂f I,Fηele

∂qFdjele

EF +Dη∂fηele

∂qdjele

+DBkη

∂f I,Bkηele

∂qBkdjele

EBk+


DBζ

∂f I,Bζele

∂qBdjele

EB +Dζ∂fζele

∂qdjele

+DTζ

∂f I,Tζele

∂qTdjele

ET

)(J−1′

(dj,ξ)ele

(DLξ

∂uI,Lele

∂uLele

EL +Dξ +DRξ

∂uI,Rele

∂uRele

ER

)+

J−1′

(dj,η)ele

(DFη

∂uI,Fele

∂uFele

EF +Dη +DBkη

∂uI,Bkele

∂uBkele

EBk

)+

J−1′

(dj,ζ)ele

(DBζ

∂uI,Bele

∂uBele

EB +Dζ +DTζ

∂uI,Tele

∂uTele

ET

)). (8.42)

The modified common interface flux Jacobians contain contributions from neigh-

boring solution derivatives and are defined as

∂f I,facediele

∂ufaceele

∗

=∂f I,face

diele

∂ufaceele

+∑dj∈Dx

∂f I,facediele

∂qfaceNdjeleN

∂qfaceNdjeleN

∂uI,faceNeleN

∂uI,faceNeleN

∂ufaceele

,

for di, face ∈ DLξ ∪ DRξ , (8.43)

Table 8.3 defines the Jacobians which are diagonal matrices primarily constructed

through function evaluations. These are described in more detail in section 8.2.1.

Name Variable 2D Size 3D Size

D. Flux∂fdele

∂uele,∂fdele

∂qele(Nspts2D ×Nspts2D) (Nspts ×Nspts)

C. Flux∂fI,facedele

∂ufaceele

,∂fI,facediele

∂qfacedjele

,∂fI,facediele

∂qfaceNdjeleN

(Nspts1D ×Nspts1D) (Nspts2D ×Nspts2D)

C. Solution∂uI,face

ele

∂ufaceele

,∂uI,faceN

eleN

∂ufaceele


M. C. Flux∂fI,facedele

∂ufaceele

∗(Nspts1D ×Nspts1D) (Nspts2D ×Nspts2D)

N. Contribution∂qfaceNdjeleN

∂uI,faceNeleN


Table 8.3: Diagonal Jacobian matrices used in the construction of the two- and three-dimensional element local Jacobian matrix for advection-diffusion. The full names ofthe Jacobians matrices can be found in Table 8.1.


8.2.1 Derivation

The Jacobian of the numerical gradient of the flux follows directly from Eq. (6.87),

∂

∂uele

(∇δ · fele

)=∑d∈Dξ

Dd∂

∂uele

(fdele) +

∑d,face∈DLξ ∪D

Rξ

Dfaced

∂

∂uele

(f I,facedele

).

(8.44)

The derivation can now be split into sections where the Jacobians in Table 8.3 are

constructed.

Discontinuous Flux Jacobians

The Jacobian of the transformed discontinuous flux in block matrix-vector format is

found directly from the transformation in Eq. (6.88),

∂

∂uele

(fele

)= J−1

ele

∂

∂uele

(f (uele, qele)) . (8.45)

This can be written as

∂

∂uele

(fdiele) =

∑dk∈Dx

J−1(di,dk)ele

∂

∂uele

(fdk (uele, qele)) , di ∈ Dξ, (8.46)

where the Jacobian of the discontinuous flux is found by applying the chain rule to

the flux function,

∂

∂uele

(fdi (uele, qele)) =∂fdiele

∂uele

+∑dj∈Dx

∂fdiele

∂qdjele

∂

∂uele

(qdjele) , di ∈ Dx, (8.47)

and

∂fdiele

∂uele

=∂fdi∂u

(uele, qele) =∂fdi,adv

∂u(uele)−

∂fdi,diff

∂u(uele, qele) ,

∂fdiele

∂qdjele

=∂fdi∂qdj

(uele, qele) = −∂fdi,diff

∂qdj(uele, qele) , di, dj ∈ Dx. (8.48)


Note that the discontinuous flux Jacobians,∂fdiele

∂ueleand

∂fdiele

∂qdjele

are diagonal matrices

because there’s no dependence among solution points.

Common Interface Flux Jacobians

The Jacobians of the transformed common fluxes are found by differentiating Eq. (6.86),

∂

∂uele

(f I,facedele

)= Aface

ele

∂

∂uele

(f I(ufaceN

eleN ,ufaceele , q

faceNeleN , qface

ele

)),

for d, face ∈ DLξ ,∂

∂uele

(f I,facedele

)= Aface

ele

∂

∂uele

(f I(uface


ele , qfaceNeleN

)),

for d, face ∈ DRξ . (8.49)

Applying the chain rule and differentiating the extrapolation equations (6.75) and (6.85)

produces the following result,

∂

∂uele

(f I,facediele

)=∂f I,face

diele

∂ufaceele

Eface+

∑dj∈Dx

(∂f I,face

diele

∂qfacedjele

Eface ∂

∂uele

(qdjele) +

∂f I,facediele

∂qfaceNdjeleN

EfaceN ∂

∂uele

(qdjeleN)

),


where

∂f I,facediele

∂ufaceele

= Afaceele

∂f I

∂u+

(ufaceN

eleN ,ufaceele , q

faceele

),

∂f I,facediele

∂qfacedjele

= Afaceele

∂f I

∂q+dj

(uface

ele , qfaceele

),

∂f I,facediele

∂qfaceNdjeleN

= Afaceele

∂f I

∂q−dj

(ufaceN

eleN , qfaceNeleN

),

for di, face ∈ DLξ , dj ∈ Dx,

∂f I,facediele

∂ufaceele

= Afaceele

∂f I

∂u−(uface


ele

),


∂f I,facediele

∂qfacedjele

= Afaceele

∂f I

∂q−dj

(uface

ele , qfaceele

),

∂f I,facediele

∂qfaceNdjeleN

= Afaceele

∂f I

∂q+dj

(ufaceN

eleN , qfaceNeleN

),

for di, face ∈ DRξ , dj ∈ Dx, (8.51)

and

∂f I

∂u−(u−, u+, q−

)=∂f Iadv

∂u−(u−, u+

)− ∂f Idiff

∂u−(u−, q−

),

∂f I

∂u+

(u−, u+, q+

)=∂f Iadv

∂u+

(u−, u+

)− ∂f Idiff

∂u+

(u+, q+

),

∂f I

∂q−d

(u−, q−

)= −∂f

Idiff

∂q−d

(u−, q−

), d ∈ Dx,

∂f I

∂q+d

(u+, q+

)= −∂f

Idiff

∂q+d

(u+, q+

), d ∈ Dx, (8.52)

are the exact derivatives of the interface flux function f I (u−, u+, q−, q+) with respect

to u−, u+, q−d and q+d for d ∈ Dx, respectively. Note that common interface flux Jaco-

bians,∂fI,facediele

∂ufaceele

,∂fI,facediele

∂qfacedjele

and∂fI,facediele

∂qfaceNdjeleN

are diagonal matrices because there’s no dependence

among flux points.

In the multi-dimensional case, the derivatives of the interface advective flux func-

tion or the Rusanov flux are computed by differentiating Eq. (6.59) with respect to

u− and u+,

∂f Iadv

∂u−(u−, u+

)=

1

2

∂fnadv

∂u(u−)− 1

2

∂

∂u−(|λ(u−, u+)|(u+ − u−)

),

∂f Iadv

∂u+

(u−, u+

)=

1

2

∂fnadv

∂u(u+)− 1

2

∂

∂u+

(|λ(u−, u+)|(u+ − u−)

), (8.53)

where∂fnadv

∂u(u) = ∂fadv

∂u(u) · n−,

∂

∂u−(|λ(u−, u+)|(u+ − u−)

)= (u+ − u−)

∂

∂u−(|λ(u−, u+)|

)− |λ(u−, u+)|,

∂

∂u+

(|λ(u−, u+)|(u+ − u−)

)= (u+ − u−)

∂

∂u+

(|λ(u−, u+)|

)+ |λ(u−, u+)|, (8.54)


and

∂

∂u−(|λ(u−, u+)|

)=

∂∂u

(∣∣∣∂fnadv

∂u(u−)

∣∣∣) if∣∣∣∂fnadv

∂u(u−)

∣∣∣ > ∣∣∣∂fnadv

∂u(u+)

∣∣∣,0 otherwise,

∂

∂u+

(|λ(u−, u+)|

)=

∂∂u

(∣∣∣∂fnadv

∂u(u+)

∣∣∣) if∣∣∣∂fnadv

∂u(u+)

∣∣∣ > ∣∣∣∂fnadv

∂u(u−)

∣∣∣,0 otherwise.

The derivatives of the interface diffusive flux are formed by differentiating the LDG

flux in Eq. (6.61),

∂f Idiff

∂u−(u−, q−

)=

1

2

∂fndiff

∂u(u−, q−) + β · n−

(∂fndiff

∂u(u−, q−) · n−

)+ τ,

∂f Idiff

∂u+

(u+, q+

)=

1

2

∂fndiff

∂u(u+, q+) + β · n−

(∂fndiff

∂u(u+, q+) · n+

)− τ,

∂f Idiff

∂q−d

(u−, q−

)=

1

2

∂fndiff

∂qd(u−, q−) + β · n−

(∂fndiff

∂qd(u−, q−) · n−

), d ∈ Dx,

∂f Idiff

∂q+d

(u+, q+

)=

1

2

∂fndiff

∂qd(u+, q+) + β · n−

(∂fndiff

∂qd(u+, q+) · n+

), d ∈ Dx,

(8.55)

where∂fndiff

∂u(u, q) = ∂fdiff

∂u(u, q) · n− and

∂fndiff

∂qd(u, q) = ∂fdiff

∂qd(u, q) · n−.

Boundary Flux Jacobian

The Jacobian of the common flux at boundaries is found by differentiating Eq. (6.62),

∂

∂uele

(f I,∂Ω

ele

)=A∂Ω

ele

∂

∂uele

(f b(u∂Ω

ele , q∂Ωele

)),

=∂f I,∂Ω

ele

∂u∂Ωele

E∂Ω +∑d∈Dx

∂f I,∂Ωele

∂q∂Ωdele

E∂Ω ∂

∂uele

(qdele) , (8.56)


where

∂f I,∂Ωele

∂u∂Ωele

= A∂Ωele

∂f b

∂u

(u∂Ω

ele , q∂Ωele

),

∂f I,∂Ωele

∂q∂Ωdele

= A∂Ωele

∂f b

∂qd

(u∂Ω

ele , q∂Ωele

), d ∈ Dx, (8.57)

and ∂fb

∂u(u, q) and ∂fb

∂qd(u, q) are the derivatives of the function which applies the bound-

ary condition.

Contribution from local solution gradient

Following directly from Eq. (6.84), the Jacobian of the numerical solution gradient

with respect to the discontinuous solution in block matrix-vector format can be writ-

ten as∂

∂uele

(qele) = J−1′

ele

∂

∂uele

(∇δuele

), (8.58)

where ∂∂uele

can applied directly to the solution in Eq. (6.51) in the two-dimensional

case,

∂

∂uele

(δuele

δξ

)= DL

ξ

∂

∂uele

(uI,Lele

)+Dξ +DR

ξ

∂

∂uele

(uI,Rele

),

∂

∂uele

(δuele

δη

)= DB

η

∂

∂uele

(uI,Bele

)+Dη +DT

η

∂

∂uele

(uI,Tele

), (8.59)

and Eq. (6.81) in the three-dimensional case,

∂

∂uele

(δuele

δξ

)= DL

ξ

∂

∂uele

(uI,Lele

)+Dξ +DR

ξ

∂

∂uele

(uI,Rele

),

∂

∂uele

(δuele

δη

)= DF

η

∂

∂uele

(uI,Fele

)+Dη +DBk

η

∂

∂uele

(uI,Bkele

),

∂

∂uele

(δuele

δζ

)= DB

ζ

∂

∂uele

(uI,Bele

)+Dζ +DT

ζ

∂

∂uele

(uI,Tele

). (8.60)

The Jacobians of the common solution values are constructed by differentiating


Eq. (6.44),

∂

∂uele

(uI,face

ele

)=

∂

∂uele

(uI(ufaceN

eleN ,ufaceele

)), face ∈ FL,

∂

∂uele

(uI,face

ele

)=

∂

∂uele

(uI(uface

ele ,ufaceNeleN

)), face ∈ FR. (8.61)

Applying the chain rule to this equation and differentiating Eq. (6.75) produces the

following,∂

∂uele

(uI,face

ele

)=∂uI,face

ele

∂ufaceele

Eface, face ∈ FL ∪ FR, (8.62)

where

∂uI,faceele

∂ufaceele

=∂uI

∂u+=

1

2− β · n+, face ∈ FL,

∂uI,faceele

∂ufaceele

=∂uI

∂u−=

1

2− β · n−, face ∈ FR, (8.63)

are the exact derivatives of the LDG interface solution function uI (u−, u+) with

respect to u−, u+, respectively. It’s important to note that the common interface

solution Jacobians∂uI,face

ele

∂ufaceele

are diagonal matrices defined by a single scalar.

In the case of a boundary, the Jacobian of the common solution is found by

differentiating Eq.(6.46),

∂

∂uele

(uI,∂Ω

ele

)=

∂

∂uele

(ub(u∂Ω

ele

))=∂uI,∂Ω

ele

∂u∂Ωele

E∂Ω, (8.64)

where∂uI,∂Ω

ele

∂u∂Ωele

=∂ub

∂u

(u∂Ω

ele

), (8.65)

and ∂ub

∂u(u) is the derivative of the function which applies the boundary condition for

the solution at the specified boundary.


Contribution from neighboring solution gradients

The Jacobian of the neighboring solution gradient can be written as

∂

∂uele

(qeleN) = J−1′

eleN

∂

∂uele

(∇δueleN

). (8.66)

Applying ∂∂uele

to Eq. (6.81) for the neighboring elements produces

∂

∂uele

(δueleN

δd

)= DfaceN

d

∂

∂uele

(uI,faceN

eleN

), d, faceN ∈ DLξ ∪ DRξ , (8.67)

where the Jacobians of the common solution values are written as

∂

∂uele

(uI,faceN

eleN

)=∂uI,faceN

eleN

∂ufaceele

Eface, face ∈ FL ∪ FR, (8.68)

and∂uI,faceN

eleN

∂ufaceele

=∂uI,face

ele

∂ufaceele

, face ∈ FL ∪ FR. (8.69)

The neighboring common interface solution Jacobians can then be evaluated using

the results from Eq. (8.63).

Modified Common Interface Flux Jacobians and Final Result

The Jacobian of the transformed discontinuous flux can be rewritten by combining

equations (8.46) and (8.47) and applying the flux transformation directly to the dis-

continuous flux Jacobians,

∂

∂uele

(fdiele) =

∂fdiele

∂uele

+∑dj∈Dx

∂fdiele

∂qdjele

∂

∂uele

(qdjele) , di ∈ Dξ, (8.70)

where

∂fdiele

∂uele

=∑dk∈Dx

J−1(di,dk)ele

∂fdkele

∂uele

, di ∈ Dξ,

∂fdiele

∂qdjele

=∑dk∈Dx

J−1(di,dk)ele

∂fdkele

∂qdjele

di ∈ Dξ, dj ∈ Dx. (8.71)


The modified common interface flux Jacobians in Eq. (8.43) are constructed by

including the contributions from the neighboring solution gradients in the common

interface flux Jacobians from Eq. (8.50),

∂

∂uele

(f I,facediele

)=∂f I,face

diele

∂ufaceele

∗

Eface +∑dj∈Dx

∂f I,facediele

∂qfacedjele

Eface ∂

∂uele

(qdjele) ,


where the common interface flux Jacobians,∂fI,facediele

∂ufaceele

∗, are defined to include

∂fI,facediele

∂ufaceele

from Eq. (8.51) as well as the contributions from neighboring solution gradients in

equations (8.50) and (8.66)–(8.69),

∂f I,facediele

∂ufaceele

∗

=∂f I,face

diele

∂ufaceele

+∑dj∈Dx

∂f I,facediele

∂qfaceNdjeleN

EfaceN J−1′

(dj,dN)eleNDfaceNdN

∂uI,faceNeleN

∂ufaceele

,


where “dN” refers to the neighboring face’s dimension in reference space. Eq. (8.73)

can be further simplified by grouping terms which do not depend on the solution,

∂qfaceNdjeleN

∂uI,faceNeleN

= EfaceN J−1′

(dj,dN)eleNDfaceNdN ,

for dj ∈ Dx, faceN ∈ FL ∪ FR, (8.74)

where∂qfaceNdjeleN

∂uI,faceNeleN

are diagonal matrices which are permuted according to the matching

local element flux point locations. This leads directly to the construction of Eq (8.43).

By utilizing equations (8.70), (8.72) and (8.58)–(8.62) and grouping terms to-

gether, the Jacobian of the numerical gradient of the flux from Eq. (8.44) can now be

split into two distinct parts and the final result in Eq. (8.38) can be constructed.


8.2.2 Two-Dimensional Kronecker Product Formulation

The two-dimensional element local Jacobian can be written more compactly by us-

ing Kronecker products. First, consider the Kronecker product formulations of the

polynomial differentiation and extrapolation operators. The extrapolation operators

in equations (6.41) and (6.43) can be written as

EL = `(η)′ ⊗ `(−1)′, ER = `(η)′ ⊗ `(+1)′,

EB = `(−1)′ ⊗ `(ξ)′, ET = `(+1)′ ⊗ `(ξ)′. (8.75)

The differentiation operators in equations (6.49) and (6.52) are written as

Dξ = `(η)′ ⊗ ∂ ˜

∂ξ(ξ)′, Dη =

∂ ˜

∂η(η)′ ⊗ `(ξ)′, (8.76)

and from equations (6.50) and (6.53),

DLξ = `(η)′ ⊗ ∂ ˜

0

∂ξ(ξ), DR

ξ = `(η)′ ⊗ ∂ ˜P+2

∂ξ(ξ),

DBη =

∂ ˜0

∂η(η)⊗ `(ξ)′, DT

η =∂ ˜

P+2

∂η(η)⊗ `(ξ). (8.77)

In order to simplify these expressions, we invoke the following property for Lagrange

polynomials,

`m(ξp) =

1 if m = p,

0 otherwise,p,m = 1, 2, . . . , Nspts1D, (8.78)

so that equations (8.75), (8.76) and (8.77) become,

EL = I ⊗ `(−1)′, ER = I ⊗ `(+1)′,

EB = `(−1)′ ⊗ I, ET = `(+1)′ ⊗ I,

Dξ = I ⊗ ∂ ˜

∂ξ(ξ)′, Dη =

∂ ˜

∂η(η)′ ⊗ I,


DLξ = I ⊗ ∂ ˜

0

∂ξ(ξ), DR

ξ = I ⊗ ∂ ˜P+2

∂ξ(ξ),

DBη =

∂ ˜0

∂η(η)⊗ I, DT

η =∂ ˜

P+2

∂η(η)⊗ I, (8.79)

where I is an identity matrix of size (Nspts1D ×Nspts1D).


The discontinuous flux Jacobians can also be rewritten in terms of Kronecker prod-

ucts. As an example, consider expanding the discontinuous flux Jacobians,∂fdiele

∂uele,

into block diagonals with each diagonal containing Nspts1D values corresponding to

the collinear points along the direction di ∈ ξ, η,

∂fξele

∂uele

=

Nspts1D∑j=1

eje′j ⊗

∂fξele

∂uele

∣∣∣∣j

,

∂fηele

∂uele

=

Nspts1D∑i=1

∂fηele

∂uele

∣∣∣∣i

⊗ eie′i, (8.80)

where ei represents a vector of size (Nspts1D × 1) with a 1 in the ith index and 0 in

all other indices. Now consider a product of the polynomial differentiation operator

and one of the discontinuous flux Jacobians,

Dξ∂fξele

∂uele

=

(I ⊗ ∂ ˜

∂ξ(ξ)′

)Nspts1D∑j=1

eje′j ⊗

∂fξele

∂uele

∣∣∣∣j

,

=

Nspts1D∑j=1

eje′j ⊗

(∂ ˜

∂ξ(ξ)′

∂fξele

∂uele

∣∣∣∣j

), (8.81)

where Eq. (8.79) is used and the mixed product property of Kronecker products is

invoked. The same operation can be performed on the other product except the

Kronecker product is in reverse. Substituting the Kronecker product operator from


Eq. (8.33) into both formulations produces the following,

Dξ∂fξele

∂uele

=

Nspts1D∑j=1

eje′j ⊗

(K

∂fξele

∂uele

∣∣∣∣j

),

Dη∂fηele

∂uele

=

Nspts1D∑i=1

(K

∂fηele

∂uele

∣∣∣∣i

)⊗ eie′i. (8.82)

This operation can be repeated for the discontinuous flux Jacobians,∂fdiele

∂qdjele

.

Common Interface Jacobians

The product of the polynomial differentiation operator, common interface Jacobian

and polynomial extrapolation operator forms a Kronecker product as well. As an

example consider the following reformulation,

DLξ

∂f I,Lξele

∂uLele

∗

EL =

(I ⊗ ∂ ˜

0

∂ξ(ξ)

)(∂f I,Lξele

∂uLele

∗

⊗ 1

)(I ⊗ `(−1)′) ,

=∂f I,Lξele

∂uLele

∗

⊗

(∂ ˜

0

∂ξ(ξ) `(−1)′

), (8.83)

where Eq. (8.79) is used, the mixed product property of Kronecker products is invoked

and a scalar 1 is placed in the appropriate location to complete the product. The

Kronecker product operator from Eq. (8.33) can now be substituted so that,

DLξ

∂f I,Lξele

∂uLele

∗

EL =∂f I,Lξele

∂uLele

∗

⊗KL. (8.84)

This type of operation is repeated for all products involving common interface Jaco-

bians.


Neighbor Contribution Jacobians

The neighbor contribution Jacobians from Eq. (8.74) are also reformulated. As an

example, consider expanding J−1′

(dj,ξ)eleNinto block diagonal components as was per-

formed on the discontinuous flux Jacobians,

J−1′

(dj,ξ)eleN=

Nspts1D∑j=1

eje′j ⊗ J

−1′

(dj,ξ)eleN

∣∣∣j, dj ∈ Dx. (8.85)

Now consider the left face of a neighbor. The neighbor contribution Jacobians can

be expressed as

∂qLdjeleN

∂uI,LeleN

= EL J−1′

(dj,ξ)eleNDLξ , dj ∈ Dx

= (I ⊗ `(−1)′)

Nspts1D∑j=1

eje′j ⊗ J

−1′

(dj,ξ)eleN

∣∣∣j

(I ⊗ ∂ ˜0

∂ξ(ξ)

),

=

Nspts1D∑j=1

eje′j ⊗

(`(−1)′ J−1′

(dj,ξ)eleN

∣∣∣j

∂ ˜0

∂ξ(ξ)

), (8.86)

where Eq. (8.79) is used and the mixed product property of Kronecker products is

invoked. The final result after utilizing the Kronecker product operator in Eq. (8.33)

is

∂qLdjeleN

∂uI,LeleN

=

Nspts1D∑j=1

Nspts1D∑i=1

eje′j ⊗KL

i,i J−1′

(dj,ξ)eleN

∣∣∣i,j, dj ∈ Dx, (8.87)

This type of operation is repeated for all products involving neighbor contribution Ja-

cobians. It’s also important to note that the resulting Jacobians need to be permuted

to match the local element flux points.


Final Result

By utilizing the Kronecker product formulations in equations (8.79), (8.82) and (8.84),

Eq. (8.39) is rewritten as

∇δ ·

(∂fele

∂uele

)= Kξ

(∂fξele

∂uele

)+Kη

(∂fηele

∂uele

), (8.88)

where

Kξ(∂fξele

∂uele

)=∂f I,Lξele

∂uLele

∗

⊗KL +

Nspts1D∑j=1

eje′j ⊗

(K

∂fξele

∂uele

∣∣∣∣j

)+∂f I,Rξele

∂uRele

∗

⊗KR,

Kη(∂fηele

∂uele

)=KL ⊗

∂f I,Bηele

∂uBele

∗

+

Nspts1D∑i=1

(K

∂fηele

∂uele

∣∣∣∣i

)⊗ eie′i +KR ⊗

∂f I,Tηele

∂uTele

∗

,

(8.89)

and Eq. (8.40) is rewritten as

∇δ ·

(∂fele

∂qele

∂qele

∂uele

)=

∑dj∈x,y

(Kξ(∂fξele

∂qdjele

)+Kη

(∂fηele

∂qdjele

))(J−1′

(dj,ξ)eleKξ(∂qξele

∂uele

)+ J−1′

(dj,η)eleKη(∂qηele

∂uele

)), (8.90)

where

Kξ(∂fξele

∂qdjele

)=∂f I,Lξele

∂qLdjele

⊗KL +

Nspts1D∑j=1

eje′j ⊗

(K

∂fξele

∂qdjele

∣∣∣∣j

)+∂f I,Rξele

∂qRdjele

⊗KR,

Kη(∂fηele

∂qdjele

)= KL ⊗

∂f I,Bηele

∂qBdjele

+

Nspts1D∑i=1

(K

∂fηele

∂qdjele

∣∣∣∣i

)⊗ eie′i +KR ⊗

∂f I,Tηele

∂qTdjele

,

Kξ(∂qξele

∂uele

)=∂uI,Lele

∂uLele

⊗KL + I ⊗K +∂uI,Rele

∂uRele

⊗KR,

Kη(∂qηele

∂uele

)= KL ⊗ ∂uI,Bele

∂uBele

+K ⊗ I +KR ⊗ ∂uI,Tele

∂uTele

. (8.91)


Sparsity Pattern

The Kronecker product formulation highlights the unique dependency between the

sparsity pattern of the matrix and the dimensions of the numerical derivatives. There

are two distinct structures that emerge from the formulation: Kξ (x) and Kη (x). Fig-

ure 8.1 shows an example of the sparsity pattern for these structures for a polynomial

order of P = 2. These patterns dictate the structure of the element local Jacobian.

2 4 6 8

1

2

3

4

5

6

7

8

9

Number of nonzeros: 27

(a) Kξ (x)

2 4 6 8

1

2

3

4

5

6

7

8

9


(b) Kη (x)

Figure 8.1: The two distinct sparsity patterns used in the two-dimensional Kroneckerproduct formulation of the element local Jacobian for DFR, P = 2.

The first term, ∇δ ·(∂fele

∂uele

), defined in Eq. 8.88 is simply the summation of these two

structures. Figure 8.2 shows the sparsity pattern for ∇δ ·(∂fele

∂uele

)for P = 2.

The second term, ∇δ ·(∂fele

∂qele

∂qele

∂uele

), defined in Eq. 8.90 can be expanded into four

terms where two terms are like-terms, Kξ (x)Kξ (y) and Kη (x)Kη (y), and two terms

are cross-terms, Kξ (x)Kη (y) and Kη (x)Kξ (y). The two like-terms do not change

the sparsity pattern of the matrix. The cross-terms, however, produce the dense

matrix shown in Figure 8.3.


2 4 6 8

1

2

3

4

5

6

7

8

9


Figure 8.2: The summation of the two terms in Figure 8.1 produces the sparsity


∂uele

), in the two-dimensional element local Jacobian

for DFR, P = 2.

2 4 6 8

1

2

3

4

5

6

7

8

9


Figure 8.3: The cross-terms Kξ (x)Kη (y) and Kη (x)Kξ (y) in the second term,

∇δ ·(∂fele

∂qele

∂qele

∂uele

)produce a dense matrix so the final sparsity pattern of the two-

dimensional element local Jacobian for DFR, P = 2, is fully dense.


8.2.3 Three-Dimensional Kronecker Product Formulation

Following the same procedure described in section 8.2.2, consider applying the prop-

erty in Eq. (8.78) to the three-dimensional Kronecker product formulations of the

polynomial differentiation and extrapolation operators. The extrapolation operators

in equations (6.74) and (6.76) can be written as

EL = I ⊗ I ⊗ `(−1)′, ER = I ⊗ I ⊗ `(+1)′,

EF = I ⊗ `(−1)′ ⊗ I, EBk = I ⊗ `(+1)′ ⊗ I,

EB = `(−1)′ ⊗ I ⊗ I, ET = `(+1)′ ⊗ I ⊗ I, (8.92)

The differentiation operators in equations (6.79) and (6.82) are written as

Dξ = I ⊗ I ⊗ ∂ ˜

∂ξ(ξ)′, Dη = I ⊗ ∂ ˜

∂η(η)′ ⊗ I, Dζ =

∂ ˜

∂ζ(ζ)′ ⊗ I ⊗ I, (8.93)

and from equations (6.80) and (6.83),

DLξ = I ⊗ I ⊗ ∂ ˜

0

∂ξ(ξ), DR

ξ = I ⊗ I ⊗ ∂ ˜P+2

∂ξ(ξ),

DFη = I ⊗ ∂ ˜

0

∂η(η)⊗ I, DBk

η = I ⊗ ∂ ˜P+2

∂η(η)⊗ I,

DBζ =

∂ ˜0

∂ζ(ζ)⊗ I ⊗ I, DT

ζ =∂ ˜

P+2

∂ζ(ζ)⊗ I ⊗ I. (8.94)

These operators can then be used to write Kronecker product formulations of the

various Jacobians.


Consider expanding the discontinuous flux Jacobians,∂fdiele

∂uele, into block diagonals

with each diagonal containing Nspts1D values corresponding to the collinear points


along the direction di ∈ ξ, η, ζ,

∂fξele

∂uele

=

Nspts1D∑k=1

Nspts1D∑j=1

eke′k ⊗ eje′j ⊗

∂fξele

∂uele

∣∣∣∣j,k

,

∂fηele

∂uele

=

Nspts1D∑k=1

Nspts1D∑i=1

eke′k ⊗

∂fηele

∂uele

∣∣∣∣i,k

⊗ eie′i,

∂fζele

∂uele

=

Nspts1D∑j=1

Nspts1D∑i=1

∂fζele

∂uele

∣∣∣∣i,j

⊗ eje′j ⊗ eie′i. (8.95)

An example of a product between a polynomial differentiation operator from Eq. (8.93)

and a discontinuous flux Jacobian is given below,

Dξ∂fξele

∂uele

=

(I ⊗ I ⊗ ∂ ˜

∂ξ(ξ)′

)Nspts1D∑k=1

Nspts1D∑j=1


∂fξele

∂uele

∣∣∣∣j,k

,

=

Nspts1D∑k=1

Nspts1D∑j=1


(∂ ˜

∂ξ(ξ)′

∂fξele

∂uele

∣∣∣∣j,k

), (8.96)

where the mixed product property of Kronecker products is invoked. The same type

of operation can also be performed on the other products. Substituting the Kronecker

product operator from Eq. (8.33) produces the following,

Dξ∂fξele

∂uele

=

Nspts1D∑k=1

Nspts1D∑j=1


(K

∂fξele

∂uele

∣∣∣∣j,k

),

Dη∂fηele

∂uele

=

Nspts1D∑k=1

Nspts1D∑i=1

eke′k ⊗

(K

∂fηele

∂uele

∣∣∣∣i,k

)⊗ eie′i,

Dζ∂fζele

∂uele

=

Nspts1D∑j=1

Nspts1D∑i=1

(K

∂fζele

∂uele

∣∣∣∣i,j

)⊗ eje′j ⊗ eie′i. (8.97)

This operation can be repeated for the discontinuous flux Jacobians,∂fdiele

∂qdjele

.



As an example, consider expanding the following common interface Jacobian into

block diagonals,

∂f I,Lξele

∂uLele

∗

=

Nspts1D∑j=1

eje′j ⊗

∂f I,Lξele

∂uLele

∗∣∣∣∣∣j

. (8.98)

The product between the polynomial differentiation operator from Eq. (8.94), the

common interface Jacobian and the polynomial extrapolation operator from Eq. (8.92)

is given below,

DLξ

∂f I,Lξele

∂uLele

∗

EL =

(I ⊗ I ⊗ ∂ ˜

0

∂ξ(ξ)

)Nspts1D∑j=1

eje′j ⊗

∂f I,Lξele

∂uLele

∗∣∣∣∣∣j

⊗ 1

(I ⊗ I ⊗ `(−1)′) ,

=

Nspts1D∑j=1

eje′j ⊗

∂f I,Lξele

∂uLele

∗∣∣∣∣∣j

⊗

(∂ ˜

0

∂ξ(ξ) `(−1)′

), (8.99)

where the mixed product property of Kronecker products is invoked and a scalar 1 is

placed in the appropriate location to complete the product. The Kronecker product

operator from Eq. (8.33) can now be substituted so that,

DLξ

∂f I,Lξele

∂uLele

∗

EL =

Nspts1D∑j=1

eje′j ⊗

∂f I,Lξele

∂uLele

∗∣∣∣∣∣j

⊗KL. (8.100)

This type of operation is repeated for all products involving common interface Jaco-

bians.

Neighbor Contribution Jacobians

As an example, consider expanding J−1′

(dj,ξ)eleNinto block diagonal components as was

performed on the discontinuous flux Jacobians,

J−1′

(dj,ξ)eleN=

Nspts1D∑k=1

Nspts1D∑j=1

eke′k ⊗ eje′j ⊗ J

−1′

(dj,ξ)eleN

∣∣∣j,k, dj ∈ Dx. (8.101)


Now consider the left face of a neighbor. The neighbor contribution Jacobians can

be expressed as

∂qLdjeleN

∂uI,LeleN

= EL J−1′

(dj,ξ)eleNDLξ , dj ∈ Dx

= (I ⊗ I ⊗ `(−1)′)

Nspts1D∑k=1

Nspts1D∑j=1

eke′k ⊗ eje′j ⊗ J

−1′

(dj,ξ)eleN

∣∣∣j,k

(I ⊗ I ⊗ ∂ ˜0

∂ξ(ξ)

),

=

Nspts1D∑k=1

Nspts1D∑j=1


(`(−1)′ J−1′

(dj,ξ)eleN

∣∣∣j,k

∂ ˜0

∂ξ(ξ)

), (8.102)

where equations (8.94) and (8.92) are used and the mixed product property of Kro-

necker products is invoked. The final result after utilizing the Kronecker product

operator in Eq. (8.33) is

∂qLdjeleN

∂uI,LeleN

=

Nspts1D∑k=1

Nspts1D∑j=1

Nspts1D∑i=1

eke′k ⊗ eje′j ⊗KL

i,i J−1′

(dj,ξ)eleN

∣∣∣i,j,k

, dj ∈ Dx,

(8.103)

This type of operation is repeated for all products involving neighbor contribution Ja-

cobians. It’s also important to note that the resulting Jacobians need to be permuted

to match the local element flux points.

Final Result

By utilizing the Kronecker product formulations in equations (8.93), (8.97) and

(8.100), Eq. (8.41) is rewritten as

∇δ ·

(∂fele

∂uele

)= Kξ

(∂fξele

∂uele

)+Kη

(∂fηele

∂uele

)+Kζ

(∂fζele

∂uele

), (8.104)


where

Kξ(∂fξele

∂uele

)=

Nspts1D∑k=1

Nspts1D∑j=1


(K

∂fξele

∂uele

∣∣∣∣j,k

)+

Nspts1D∑j=1

eje′j ⊗

∂f I,Lξele

∂uLele

∗∣∣∣∣∣j

⊗KL +∂f I,Rξele

∂uRele

∗∣∣∣∣∣j

⊗KR

,

Kη(∂fηele

∂uele

)=

Nspts1D∑k=1

Nspts1D∑i=1

eke′k ⊗

(K

∂fηele

∂uele

∣∣∣∣i,k

)⊗ eie′i+

Nspts1D∑j=1

eje′j ⊗

KL ⊗∂f I,Fηele

∂uFele

∗∣∣∣∣∣j

+KR ⊗∂f I,Bkηele

∂uBkele

∗∣∣∣∣∣j

,

Kζ(∂fζele

∂uele

)=

Nspts1D∑j=1

Nspts1D∑i=1

(K

∂fζele

∂uele

∣∣∣∣i,j

)⊗ eje′j ⊗ eie′i+

Nspts1D∑i=1

(KL ⊗

∂f I,Bζele

∂uBele

∗∣∣∣∣∣i

+KR ⊗∂f I,Tζele

∂uTele

∗∣∣∣∣∣i

)⊗ eie′i, (8.105)

and Eq. (8.42) is rewritten as

∇δ·

(∂fele

∂qele

∂qele

∂uele

)=

∑dj∈x,y,z

(Kξ(∂fξele

∂qdjele

)+Kη

(∂fηele

∂qdjele

)+Kζ

(∂fζele

∂qdjele

))(J−1′

(dj,ξ)eleKξ(∂qξele

∂uele

)+ J−1′

(dj,η)eleKη(∂qηele

∂uele

)+ J−1′

(dj,ζ)eleKζ(∂qζele

∂uele

)), (8.106)

where

Kξ(∂fξele

∂qdjele

)=

Nspts1D∑k=1

Nspts1D∑j=1


(K

∂fξele

∂qdjele

∣∣∣∣j,k

)+

Nspts1D∑j=1

eje′j ⊗

∂f I,Lξele

∂qLdjele

∣∣∣∣∣j

⊗KL +∂f I,Rξele

∂qRdjele

∣∣∣∣∣j

⊗KR

,

Kη(∂fηele

∂qdjele

)=

Nspts1D∑k=1

Nspts1D∑i=1

eke′k ⊗

(K

∂fηele

∂qdjele

∣∣∣∣i,k

)⊗ eie′i+


Nspts1D∑j=1

eje′j ⊗

KL ⊗∂f I,Fηele

∂qFdjele

∣∣∣∣∣j

+KR ⊗∂f I,Bkηele

∂qBkdjele

∣∣∣∣∣j

,

Kζ(∂fζele

∂qdjele

)=

Nspts1D∑j=1

Nspts1D∑i=1

(K

∂fζele

∂qdjele

∣∣∣∣i,j

)⊗ eje′j ⊗ eie′i+

Nspts1D∑i=1

(KL ⊗

∂f I,Bζele

∂qBdjele

∣∣∣∣∣i

+KR ⊗∂f I,Tζele

∂qTdjele

∣∣∣∣∣i

)⊗ eie′i,

Kξ(∂qξele

∂uele

)=I ⊗ I ⊗K +

Nspts1D∑j=1

eje′j ⊗

∂uI,Lele

∂uLele

∣∣∣∣∣j

⊗KL +∂uI,Rele

∂uRele

∣∣∣∣∣j

⊗KR

,

Kη(∂qηele

∂uele

)=I ⊗K ⊗ I +

Nspts1D∑j=1

eje′j ⊗

KL ⊗ ∂uI,Fele

∂uFele

∣∣∣∣∣j

+KR ⊗ ∂uI,Bkele

∂uBkele

∣∣∣∣∣j

,

Kζ(∂qζele

∂uele

)=K ⊗ I ⊗ I +

Nspts1D∑i=1

(KL ⊗ ∂uI,Bele

∂uBele

∣∣∣∣∣i

+KR ⊗ ∂uI,Tele

∂uTele

∣∣∣∣∣i

)⊗ eie′i,

(8.107)

Sparsity Pattern

For the three-dimensional formulation, there are three distinct sparsity patterns that

emerge: Kξ (x), Kη (x) and Kζ (x). Figure 8.4 shows an example of the sparsity

pattern for these structures for a polynomial order of P = 2. The first term, ∇δ ·(∂fele

∂uele

), defined in Eq. 8.104 is the summation of these three structures. Figure 8.5

shows the sparsity pattern for ∇δ ·(∂fele

∂uele

)for P = 2.


∂qele

∂qele

∂uele

), defined in Eq. 8.106 can be expanded into

nine terms where three terms are like-terms and six terms are cross-terms. The three

like-terms do not change the sparsity pattern of the matrix. The cross-terms can be

split into three more structures as shown in Figure 8.6. The summation of all the

terms produce the sparsity pattern for ∇δ ·(∂fele

∂uele

)as shown in Figure 8.7 for P = 2.


5 10 15 20 25

5

10

15

20

25


(a) Kξ (x)

5 10 15 20 25

5

10

15

20

25


(b) Kη (x)

5 10 15 20 25

5

10

15

20

25


(c) Kζ (x)

Figure 8.4: The three distinct sparsity patterns used in the three-dimensional Kro-necker product formulation of the element local Jacobian for DFR, P = 2.

5 10 15 20 25

5

10

15

20

25


Figure 8.5: The summation of the three terms in Figure 8.4 produces the sparsity


∂uele

), in the three-dimensional element local Jacobian

for DFR, P = 2.

8.2.4 Time Complexity

When solving the advection or Euler equations, the first term, ∇δ ·(∂fele

∂uele

), is the

only computation needed to form the element local Jacobians. The time complexity

for the original formulation can be found by considering the most expensive term in


5 10 15 20 25

5

10

15

20

25


(a) Kξ (x)Kη (y)Kη (x)Kξ (y)

5 10 15 20 25

5

10

15

20

25


(b) Kξ (x)Kζ (y)Kζ (x)Kξ (y)

5 10 15 20 25

5

10

15

20

25


(c) Kη (x)Kζ (y)Kζ (x)Kη (y)

Figure 8.6: The six cross-terms in the second term, ∇δ ·(∂fele

∂qele

∂qele

∂uele

), produce three dis-

tinct sparsity patterns used in the three-dimensional Kronecker product formulationof the element local Jacobian for DFR, P = 2.

5 10 15 20 25

5

10

15

20

25


Figure 8.7: The summation of all terms produces the final sparsity pattern for thethree-dimensional element local Jacobian for DFR, P = 2.

Eq. (8.39),

DLξ

∂f I,Lξele

∂uLele

∗

EL → P d P d−1 P d → O(P 3d−1

). (8.108)


For the two- and three-dimensional case, the time complexity of computing this term,

is Nspts2DNspts1DNspts2D ≈ O (P 5) and NsptsNspts2DNspts ≈ O (P 8), respectively. This

leads to the general time complexity of O(P 3d−1

). The most expensive term in the

Kronecker product formulation in Eq.(8.88) or Eq.(8.104) is

∑ ∂f I,Lξele

∂uLele

∗

KL → P d−1 P P → O(P d+1

). (8.109)

For the two- and three-dimensional case, the time complexity of computing this term

multiple times, is N3spts1D ≈ O (P 3) and N4

spts1D ≈ O (P 4), respectively. This leads to

the time complexity O(P d+1

)which is significantly less than the original form.

For the advection-diffusion or Navier-Stokes equations, the second term, ∇δ ·(∂fele

∂qele

∂qele

∂uele

), is required to form the element local Jacobians. In this case, the original

time complexity is dictated by the matrix multiplication between the two terms in

Eq. (8.40). As an example, consider

Dξ∂fξele

∂uele

Dξ → P d P d P d → O(P 3d), (8.110)

which is Nspts2DNspts2DNspts2D ≈ O (P 6) in 2D and NsptsNsptsNspts ≈ O (P 9) in 3D.

This leads to the time complexity O(P 3d). Through the use of Kronecker products,

this matrix multiplication is expanded into multiple terms. The time complexity is

then dictated by a sum of matrix multiplications, for example,

∑K

∂fξele

∂uele

K → P d−1 P P P → O(P d+2

). (8.111)

After accounting for the sums, the time complexity is N4spts1D ≈ O (P 4) for 2D and

N5spts1D ≈ O (P 5) for 3D. This leads to the the general form O

(P d+2

).


8.3 Extension to Fluid Flow

Following from Eq. (6.100), the DFR formulation of the semi-discrete Euler or Navier-

Stokes equations in each element can be written as

Rele = R(Uele,UeleN) = −V −1ele ∇

δ · Fele. (8.112)

Differentiating with respect to the discontinuous solution vector, Uele, produces the

element local Jacobian matrix,

∂Rele

∂Uele

= −V −1ele

∂

∂Uele

(∇δ · Fele

), (8.113)

where ∂Rele

∂Ueleis a matrix of size (Nspts2DNvars×Nspts2DNvars) or (NsptsNvars×NsptsNvars)

depending on whether the problem is two-dimensional or three-dimensional. In the

case of the Euler equations, the second derivative doesn’t exist and the construction

of the solution Jacobians and viscous flux Jacobians can be avoided.

As stated in section 6.4.3, all DFR operators become block diagonal matrices and

act on each conservation equation separately. The discontinuous and common inter-

face flux Jacobians,∂fdiele

∂uele,∂fdiele

∂qdjele

,∂fI,facediele

∂ufaceele

,∂fI,facediele

∂qfacedjele

and∂fI,facediele

∂qfaceNdjeleN

become block matrices

with a diagonal matrix for each variable pair. Lastly, since the common interface solu-

tion values have no dependence between conservative variables for interior interfaces,

the Jacobians of the numerical solution gradient and neighboring numerical solution

gradients are also block diagonal matrices. After evaluating the block matrix multi-

plications, the element local Jacobian matrix on interior elements can be constructed

by a direct application of the techniques in section 8.2 for each variable pair.

For the Navier-Stokes equations, the exact derivatives of the flux functions, Finv(U)

and Fvisc(U,Q), with respect to U and Q are ∂Finv

∂U(U), ∂Fvisc

∂U(U,Q) and ∂Fvisc

∂Q(U,Q).

These are given in reference [92].

Wavespeed Jacobian

Since the wavespeed used for the Rusanov flux was redefined in section 6.4.3, the

derivative of the wavespeed is also redefined. The derivative of the term with the


wavespeed can be split into a piecewise function,

∂

∂U−(|λ(U−, U+)|(U+ − U−)

)= (U+ − U−)

∂

∂U−(|λ(U−, U+)|

)− |λ(U−, U+)| I,

∂

∂U+

(|λ(U−, U+)|(U+ − U−)

)= (U+ − U−)

∂

∂U+

(|λ(U−, U+)|

)+ |λ(U−, U+)| I,

(8.114)

where I represents an identity matrix of size (Nvars ×Nvars) and

∂

∂U−(|λ(U−, U+)|

)=

∂λ∂U

(U−) if λ(U−) > λ(U+),

0 otherwise,

∂

∂U+

(|λ(U−, U+)|

)=

∂λ∂U

(U+) if λ(U+) > λ(U−),

0 otherwise.

The derivative of the wavespeed, λ(U) = |V n(U)|+ c(U) is computed as

∂λ

∂U(U) =

−sgn(V n)V

n

ρ− c

2ρ+ γ(γ−1)(u2+v2)

4ρc

sgn(V n)nxρ− γ(γ−1)u

2ρc

sgn(V n)nyρ− γ(γ−1)v

2ρcγ(γ−1)

2ρc

, (8.115)

for the two-dimensional case and

∂λ

∂U(U) =

−sgn(V n)Vn

ρ− c

2ρ+ γ(γ−1)(u2+v2+w2)

4ρc

sgn(V n)nxρ− γ(γ−1)u

2ρc

sgn(V n)nyρ− γ(γ−1)v

2ρc

sgn(V n)nzρ− γ(γ−1)w

2ρcγ(γ−1)

2ρc

, (8.116)

for the three-dimensional case where sgn(x) is the signum function and nx, ny, nz are

the x, y and z components of the unit normal vector n.


Boundary Flux Jacobians

Boundary conditions for the Euler and Navier-Stokes equations can be found in ap-

pendix A. The boundary flux Jacobians can be computed by taking the derivative

of the boundary flux with respect to the extrapolated solution and the extrapolated

solution gradient. Eq. (A.1) is differentiated to obtain,

∂F b

∂U(U,Q) =

∂F n

∂U

(U b, Qb

) ∂U b

∂U(U) +

∂F n

∂Qb

(U b, Qb

) ∂Qb

∂U(U,Q) , (8.117)

∂F b

∂Q(U,Q) =

∂F n

∂Qb

(U b, Qb

) ∂Qb

∂Q(U,Q) , (8.118)

where F n is the flux normal to the face. The normal flux Jacobians evaluated using

the solution and solution gradient at the boundary are computed using the the exact

derivatives of the flux functions. The boundary Jacobians, ∂Ub

∂U(U), ∂Qb

∂U(U,Q) and

∂Qb

∂Q(U,Q) are found by taking the exact derivative of the boundary conditions in

appendix A. These are derived using Mathematica [43] and the results are given in

reference [92].

Element local Jacobian formulation on boundary elements

For elements with boundary conditions on a face, a block matrix multiplication among

variable pairs must be performed between the flux Jacobians and the Jacobian of the

numerical solution gradient. The Jacobian of the discontinuous flux Jacobian on

boundary elements is written as[∂

∂uele

(fdi (uele, qele))

]vi,vj

=

[∂fdiele

∂uele

]vi,vj

+

∑dj∈x,y,z

Nvars∑vk=1

[∂fdiele

∂qdjele

]vi,vk

[∂

∂uele

(qdjele)

]vk,vj

,

for di ∈ x, y, z, vi, vj = 1, 2, . . . , Nvars, (8.119)


and the Jacobian of the boundary flux is written as

[∂

∂uele

(f I,∂Ω

ele

)]vi,vj

=

[∂f I,∂Ω

ele

∂u∂Ωele

]vi,vj

E∂Ω+

∑d∈x,y,z

Nvars∑vk=1

[∂f I,∂Ω

ele

∂q∂Ωdele

]vi,vk

E∂Ω

[∂

∂uele

(qdele)

]vk,vj

,

for vi, vj = 1, 2, . . . , Nvars, (8.120)

where vi, vj refer to a single component of an (Nvars × Nvars) block matrix. The

sparsity of the block matrix for the Jacobian of the numerical solution gradient is

dependent on the boundary condition used for the common solution,

[∂

∂uele

(uI,∂Ω

ele

)]vi,vj

=

[∂uI,∂Ω

ele

∂u∂Ωele

]vi,vj

E∂Ω, vi, vj = 1, 2, . . . , Nvars, (8.121)

where∂uI,∂Ω

ele

∂u∂Ωele

=∂ub

∂u

(u∂Ω

ele

). (8.122)

Chapter 9

Numerical Experiments

In this chapter, five test cases are studied in order to verify the implementation of the

implicit, high-order DFR method on unstructured meshes for GPUs. The first test

case deals with the inviscid flow over a bump where the rate of convergence of entropy

error is verified for a polynomial order of P = 2. For the second test case, inviscid flow

over the NACA 0012 airfoil is simulated for a polynomial order of P = 4 and a grid

converged lift coefficient is obtained which compares well with results from Overflow

and CFL3D. In the third test case, convection of an isentropic vortex is performed

using the ESDIRK3 and ESDIRK4 schemes and the correct order of accuracy for

time integration is obtained. In the fourth test case, laminar flow over a Joukowski

airfoil is simulated for various polynomial orders and a rate of convergence of at least

2P is found for the drag coefficient. The last test case is an unsteady simulation of

viscous flow over a half cylinder where it is shown that the Strouhal number for the

implicit method compares well with the Strouhal number for the explicit method on

ZEFR and PYFR [97].

The vector `1 norm of the residual for the continuity equation is computed and is

used to track the convergence of all steady-state simulations. A converged solution

is assumed if the residual drops by 10 orders of magnitude from the initial residual.

100 block iterations are used per pseudo time step and local time stepping is used

in all cases. For simulations that start from uniform flow, the CFL from section 3.2

is exponentially increased until a CFL of around 10000 is reached. For the unsteady

125


simulations, each stage uses a fixed number of one pseudo time step and 200 block

iterations where the stage residual drops by at least 7 orders of magnitude. It’s

important to note that the solution diverges without the relaxation from the pseudo

time step (i.e. Newton iterations do not work). All meshes are constructed with

high-order elements.

9.1 Inviscid flow over a bump

The first test case involves the solution of subsonic flow over a smooth Gaussian

bump in a channel. The inflow Mach number is set to 0.5 with zero angle of attack.

A characteristic Riemann invariant farfield boundary condition is used for the in-

flow and outflow boundaries and a slip-wall or symmetry boundary condition is used

for the surface of the bump and the top boundary. These are described in appen-

dices A.3 and A.1, respectively. The L2 functional norm of the entropy error is used

to determine the accuracy of the solution and is given by

‖eS‖L2(Ω) =

√√√√√∫

Ω

(pp∞

(ρ∞ρ

)γ− 1)2

dV∫ΩdV

, (9.1)

where the integrals are approximated numerically using Gaussian quadrature with 10

quadrature points in each element. A full description of this problem can be found

in the first international high-order workshop [1].

The entropy error is computed for a series of meshes on a single GPU using

the pseudo time stepping method and the two color, MCGS block iterative method

described in section 7.2. Figure 9.1 shows the entropy error vs. length scale h =1√

nDoF. The results show a rate convergence of 2.89 for the fixed polynomial order of

P = 2 which is close to the theoretical results for a linear, steady-state case: P + 1

[11]. The results also show that increasing the polynomial order on a (48× 16) mesh

reduces the entropy error while maintaining less degrees of freedom than a more

refined mesh. The (48 × 16) quadrilateral mesh and final pressure contours for a

polynomial order of P = 2 are shown in Figure 9.2.


10−2

10−9

10−8

10−7

10−6

10−5

10−4

h = 1√nDoF

‖eS‖ L

2(Ω

)

P = 2(48× 16)

Order 3

Figure 9.1: Entropy error vs. 1√nDoF

for inviscid flow over a bump, implicit pseudotime stepping with two color MCGS.

-1

-0.75

-0.5

-0.25

0

Cp

-1.19

0.212

Figure 9.2: (48 × 16) quadrilateral mesh and pressure contours, inviscid flow over abump, implicit pseudo time stepping with two color MCGS, P = 2.


9.2 Inviscid flow over the NACA 0012 airfoil

The second test case involves the solution of subsonic flow over the NACA 0012 airfoil

at 1.25 degree angle of attack. The inflow Mach number is set to 0.5, a characteristic

Riemann invariant farfield boundary condition from appendix A.3 is used on the

farfield and a slip wall boundary condition from appendix A.1 is used on the surface

of the airfoil. The lift coefficient is used to determine the accuracy of the simulation

and is compared to results from Vassberg and Jameson [86]. A complete description

is also found in this reference.

The lift coefficient is computed on a series of O-meshes and mixed, quadrilateral

and triangle meshes using the pseudo time stepping method and the two and four

color, MCGS block iterative method described in section 7.2 on a single GPU. Trian-

gles are generated using an edge-collapsing method [80]. All meshes have a far field

located 100 chord lengths away. Figure 9.3 shows the lift coefficient vs. length scale

h = 1√nDoF

for three different codes. Degrees of freedom for Overflow and CFL3D

are assumed to be equal to the number of mesh elements. The figure shows that

ZEFR is able to obtain a lift coefficient that is relatively close to the results from

Overflow and CFL3D with less degrees of freedom. It’s also important to note that

the coarsest mixed mesh was able to obtain a fairly accurate lift coefficient with less

degrees of freedom by coarsening the far field regions. The (32 × 32) quadrilateral

and 764 mixed, quadrilateral and triangle mesh are shown in Figure 9.4 along with

final pressure contours for a polynomial order of P = 4. It’s important to note that

the mixed mesh cases maintain the ability to run large time steps, unconstrained by

the CFL limit. This is a notable result, as a strong CFL constraint was observed to

limit the utility of the collapsed-edge triangular elements when coupled with explicit

time stepping [80].

9.3 Convection of an isentropic vortex

In this section, a two-dimensional isentropic vortex is convected in inviscid flow across

a square box domain, Ω = (x, y)| − 20 < x, y < 20, where the freestream Mach


10−4

10−3

10−2

0.178

0.1785

0.179

0.1795

0.18

0.1805

0.181

0.1815

0.182

h = 1√

nDoF

Cl

ZEFR - Quad, P = 4

ZEFR - Mixed, P = 4

OverflowCFL3D

Figure 9.3: Lift coefficient vs. 1√nDoF

for inviscid flow over the NACA 0012 airfoil,implicit pseudo time stepping with two and four color MCGS

-0.4

0

0.4

0.8

Cp

-0.72

1

(a) (32× 32) O-mesh, 2 color MCGS

-0.4

0

0.4

0.8

Cp

-0.72

1

(b) 764 mixed mesh, 4 color MCGS

Figure 9.4: Mesh and pressure contours for inviscid flow over the NACA 0012 airfoil,implicit pseudo time stepping, P = 4.


number is set to 0.4. The initial condition follows the description in [87, 12] and con-

sists of uniform flow in the y-direction superposed with an isentropic vortex centered

at the origin whereρ

u

v

p

(x, y, 0) =

[1− γ−1

2(ωMRf)2

] 1γ−1

−ωfy1 + ωfx

1γM2ρ(x, y, 0)γ

, (9.2)

ω =Γ

2πR(9.3)

f = exp

(1− (x2 + y2)

2R2

). (9.4)

Following directly from [87, 12], Γ = 13.5 is the strength of the vortex and R = 1.5

is a measure of the radius of the vortex. Periodic boundary conditions are used on

the exterior boundaries of the mesh so that the exact solution can be determined

by the periodic propagation of the vortex in the y-direction. The periodic boundary

condition creates a lattice of infinite vortices which interact with each other through

dispersion error. This test case is used to verify the implementation of the unsteady

implicit time stepping method and is commonly used to test high-order time integra-

tion schemes [89, 13].

The error in the solution is computed after 5 periods on a uniform Cartesian grid

of (180 × 180) quadrilaterals where a polynomial order of P = 3 is used. In this

case, a large time step is used to ensure that the error due to time integration is

much larger than spatial discretization error. The implicit ESDIRK3 and ESDIRK4

schemes described in section 7.1.2 are used to advance the solution in time on 16

NVIDIA Tesla K80 GPUs. Pseudo time stepping and MCGS are used to converge

each stage of the DIRK scheme. An example of the density contours for the isentropic

vortex is shown in Figure 9.5. The time steps used for each case are given in Table 9.1.

Figure 9.6 shows the error in density after 5 periods for all cases described above.


Figure 9.5: Initial density contours for convection of an isentropic vortex.

ESDIRK3 ESDIRK4

Case 1 0.125 0.5Case 2 0.0625 0.25Case 3 0.03125 0.125

Table 9.1: Numerical time steps, ∆t, used for the convection of an isentropic vortex.

The figure shows that the same error can be obtained with the ESDIRK4 scheme

with a much larger time step compared to the ESDIRK3 scheme. Linear regression

is used to obtain the rates of convergence in Figure 9.6 and the results are shown in

Table 9.2. For this particular test case, the ESDIRK schemes are able to maintain

ESDIRK3 ESDIRK4

Case 1 8.565e-03 1.426e-02Case 2 1.116e-03 6.291e-04Case 3 1.416e-04 5.049e-05

Order 2.959± 0.006 4.1± 0.144

Table 9.2: Density error and rate of convergence for different time integration schemes,convection of an isentropic vortex, implicit dual time stepping with two color MCGS.


10-2

10-1

100

10-4

10-3

10-2

Figure 9.6: Density error vs. numerical time step for convection of an isentropicvortex, implicit dual time stepping with two color MCGS.

the correct rates of convergence for time steps which are significantly larger than than

the maximum stable time step for an explicit method. For example, in this particular

case the maximum stable time step for the RK45 scheme was on the order of 10−3.

9.4 Laminar flow over a Joukowski airfoil

In this section, steady, laminar flow over a symmetric Joukowski airfoil at a zero

degree angle of attack is simulated where the freestream Mach number is set to 0.5,

the Reynolds number based on the chord is set to 1000, the heat capacity ratio is set

to γ = 1.4, the Prandtl number is set to 0.72 and the dynamic viscosity remains fixed.

A characteristic Riemann invariant farfield boundary condition is used on the farfield

and a no-slip adiabatic wall boundary condition is used on the surface of the airfoil.

These are described in appendices A.3 and A.2, respectively. This benchmark case

comes from the 4th International Workshop on High-order CFD Methods [2] and is

used to verify the high-order DFR implementation of the Navier-Stokes equations by

computing rates of convergence on drag coefficient.

The drag coefficient is computed on a series of structured quadrilateral c-meshes


from [2] using the pseudo time stepping method and the two color, MCGS block

iterative method described in section 7.2 on up to 16 NVIDIA Tesla K80 GPUs

and polynomial orders of P = 1, 2, 3, 4. The meshes used for each case are given

in Table 9.3. The number of degrees of freedom (nDoF ) for each case is found by

P 1 2 3 4

Case 1 (256× 128) (64× 32) (32× 16) (24× 12)Case 2 (512× 256) (128× 64) (48× 24) (32× 16)Case 3 (1024× 512) (256× 128) (64× 32) (48× 24)

Table 9.3: (x×y) Joukowski airfoil meshes used for each polynomial order where x isthe number of elements along the airfoil and wake and y is the number of elements inthe normal direction. The number of elements in x is split evenly between the airfoiland wake. The total number of elements, Neles, is given by the product of x and y.

multiplying Neles by Nspts2D where Nspts2D = (P + 1)2 is the nDoF per element.

Figure 9.7 shows the drag coefficient vs. length scale h = 1√nDoF

for each case. A

10−3

10−2

0.1218

0.122

0.1222

0.1224

0.1226

0.1228

h =1

√

nDoF

Cd

P = 1P = 2P = 3P = 4Reference

Figure 9.7: Drag coefficient vs. h = 1√nDoF

for laminar flow over a Joukowski airfoil,implicit MCGS.

machine zero lift and a drag coefficient of Cd = 0.1219 is obtained that is consistent


with the findings from multiple CFD codes from the high-order workshop [29]. The

most efficient case in terms of nDoF is the P = 4, case 3 simulation on the (48× 24)

c-mesh. The Mach and pressure contours for this case are shown in Figure 9.8.

(a) Mach Contours (b) Pressure Contours

Figure 9.8: Laminar flow over Joukowski airfoil, implicit pseudo time stepping withtwo color MCGS, P = 4, (48× 24) mesh.

A drag coefficient error is also computed by converging a P = 4 case with (256×128) elements and using the drag coefficient as a reference as shown in Figure 9.7.

The drag coefficient error is then plotted against h = 1√nDoF

in Figure 9.9. The

figure shows that the same error can be obtained with less degrees of freedom. Linear

regression is used to obtain the rates of convergence in Figure 9.9 and Table 9.4. The

P 1 2 3 4

Case 1 2.309e-04 2.143e-04 3.961e-04 6.243e-04Case 2 5.284e-05 9.015e-06 3.866e-05 3.814e-05Case 3 1.257e-05 4.432e-07 3.706e-06 1.351e-06

Order 2.099± 0.006 4.46± 0.025 6.7± 0.253 8.8± 0.155

Table 9.4: Drag coefficient error and rate of convergence for each polynomial order,laminar flow over a Joukowski airfoil, implicit pseudo time stepping with two colorMCGS.

results show that the DFR scheme is able to obtain a rate of convergence of at least

2P for the drag coefficient which is expected for a integral functional quantity in a


10−3

10−2

10−7

10−6

10−5

10−4

10−3

h =1

√

nDoF

CdError

P = 1

P = 2

P = 3

P = 4

Line fit

Figure 9.9: Drag coefficient error vs. h = 1√nDoF

for laminar flow over a Joukowskiairfoil, implicit pseudo time stepping with two color MCGS.

finite element method [73, 31]. This is also consistent with the results from other

finite element methods shown in the high-order workshop [29].

9.5 Viscous flow over a half cylinder

In this section, unsteady, viscous flow over a half cylinder is studied where the

freestream Mach number is set to 0.2, the Reynolds number based on the diame-

ter is set to 1000, the heat capacity ratio is set to γ = 1.4, the Prandtl number is set

to 0.72 and the dynamic viscosity remains fixed. As before, a characteristic Riemann

invariant farfield boundary condition is used on the farfield and a no-slip adiabatic


wall boundary condition is used on the surface of the half cylinder. The half cylinder

is used to both verify the 3D Navier-Stokes discretization of the DFR method and

the unsteady implicit time stepping method.

The flow simulation over a half cylinder with a span of 0.5 times the diameter uses

a polynomial order of P = 3 and a mesh containing 20, 292 unstructured hexahedral

elements, shown in Figure 9.10, such that the total number of degrees of freedom is

approximately 1.3 million.

The explicit RK45 and implicit ESDIRK4 schemes described in sections 7.1.1

and 7.1.2, respectively, are used to advance the solution in time on 12 NVIDIA Tesla

K80 GPUs. Pseudo time stepping and block Jacobi are used to converge each stage for

the implicit method. An example of the instantaneous isosurfaces of density colored

by Mach number is given in Figure 9.11.

Table 9.5 shows the lift and drag coefficients and the Strouhal number for three

cases where the span times π times the diameter is used as the reference area. The ta-

Case Studies CL CD St

PYFR - RK54 7e-05 ± 0.112 1.30 ± 0.084 0.224ZEFR - RK54 1e-03 ± 0.152 1.3 ± 0.157 0.222ZEFR - ESDIRK4 3e-03 ± 0.155 1.3 ± 0.165 0.220

Table 9.5: Lift and drag coefficients and the Strouhal number for viscous flow over ahalf cylinder, Re = 1000, P = 3.

ble compares the results generated through the explicit RK45 time stepping method in

PyFR [97] with the the RK45 and ESDIRK4 methods in ZEFR using the same mesh.

The results show a reasonable agreement between the three simulations. Figure 9.12

shows the time history of the lift and drag coefficients for all three simulations.


(a) Full domain

(b) Close up

Figure 9.10: Half cylinder mesh with 20,292 unstructured hexahedral elements. Hex-ahedral elements are created by extruding the quadrilaterals shown above.


Figure 9.11: Instantaneous isosurfaces of density colored by Mach number for viscousflow over a half cylinder, Re = 1000, P = 3.

0 10 20 30 40 50 60 70 80 90 100−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1

t

CL

PYFR - RK54

ZEFR - RK54

ZEFR - ESDIRK

(a) Lift Coefficient

0 10 20 30 40 50 60 70 80 90 1001

1.1

1.2

1.3

1.4

1.5

1.6

1.7

1.8

1.9

2

t

CD

PYFR - RK54

ZEFR - RK54

ZEFR - ESDIRK

(b) Drag Coefficient

Figure 9.12: Time history of lift and drag coefficient for viscous flow over a halfcylinder, Re = 1000, P = 3.

Chapter 10

Implementation

The methods described in the previous chapters have been implemented within an

existing in-house, high-order, compressible flow solver for GPU architectures called

ZEFR. An explanation of the implementation of the explicit method along with the

computation of the residual is given in [79]. What follows is the implementation de-

tails of the implicit method. We show that the implicit method proposed does not

require the construction of global Jacobian matrices or any additional MPI commu-

nication compared to an explicit method. We also show that all operations can be

performed in an element-local fashion.

This chapter is split into three sections. Section 10.1 provides an overview of

the steps within each solver for the implicit method. Section 10.2 describes the mesh

coloring algorithm and the modifications to the residual computation for multicolored

Gauss-Seidel (MCGS) block iterative solver. Section 10.3 discusses the steps, data

structures and algorithms needed to construct the element local Jacobians.

10.1 Overview of Solvers

This section provides an overview of the steps within each solver for the implicit

method. We begin by providing the steps for the unsteady solver which utilizes the

dual time stepping technique described in section 7.3.1. We then provide the steps

for the steady-state solver in section 7.2 and conclude with a discussion on the direct

139

CHAPTER 10. IMPLEMENTATION 140

solver used to solve each element local linear system.

10.1.1 Unsteady Solver

A dual time stepping approach for DIRK schemes, described in section 7.3.1, is used

to solve unsteady problems. The main steps for one time step are given below,

1. (ESDIRK) Compute first stage residual, R1

2. Loop over all remaining stages:

(a) Solve modified stage residual problem, Rs(Us) = 0, using steady-state

solver

3. Advance solution using Eq. (7.38)

4. (Ctrl) Compute error from Eq. (7.8)

5. (Ctrl) Estimate next time step using Eq. (7.12)

where (ESDIRK) is used to refer to the first explicit stage in ESDIRK schemes and

(Ctrl) is used to refer to the step size control for DIRK schemes with embedded pairs

in section 7.1.3. If only one Newton iteration or pseudo time step is required for

solving the modified stage residual, it’s possible to use the same LHS matrix for all

stages. This significantly reduces the computation time but limits the physical time

step.

10.1.2 Steady-state Solver

Newton’s method and pseudo time stepping, described in section 7.2, are used to

solve steady-state problems. Both methods can be split into two main sections for

each Newton iteration or pseudo time step.

First, the left-hand side (LHS) matrices are constructed and processed for the

direct solver. The steps in this procedure are as follows,

1. Compute residual, Rmele


2. (PTS) Compute and/or adapt element local pseudo time step, ∆τmele

3. Compute element local Jacobians,∂Rmele

∂Umele

4. (PTS) Apply pseudo time step to element local Jacobians

5. Process LHS matrices for direct solver

where (PTS) is used to describe the steps which are only applicable to pseudo time

stepping. The residual is computed first in order to obtain the information necessary

to compute the element local Jacobians. The implementation details of the element

local Jacobian and the direct solvers are given in sections 10.3 and 10.1.3, respectively.

The second section of the steady-state solver is the block iterative method. Block

Jacobi and multicolored Gauss-Seidel (MCGS) are both described in section 7.2.2

where the equations are given in (7.30) and (7.37), respectively. The total amount

of iterations is fixed and is usually set between 100–200. The procedure to complete

one block iteration is given below,

1. Compute residual, Rkele or R∗

cele

2. Construct right-hand side (RHS) from Eq. (7.30) or Eq. (7.37)

3. Solve linear system for ∆2Ukele or ∆2Uk

celeusing direct solver

4. Advance solution by adding difference to current solution, Ukele or Uk

cele

where the procedure is repeated for each color for MCGS. More information about

the implementation of MCGS can be found in section 10.2.

10.1.3 Direct Solver

A solver is needed to solve each element local linear system. In this work, LU factor-

ization is used on both the CPU and GPU. A modified procedure where the inverse

of the left-hand side is computed is also used to enhance performance. The procedure

for the direct solver is split into two main parts as shown in section 10.1.2.


First, after computing the left-hand side for all elements, a batch LU factorization

is performed. If the modified procedure is chosen, a batch inverse is also computed

using the LU factorization and the inverse is stored in a separate data structure.

During the block iteration procedure, the element local systems are solved using

batch forward and backward substitution. If an inverse is computed, a batch matrix-

vector multiplication is used instead.

For the CPU implementation, all batch operations are performed in serial by using

either the TNT/JAMA libraries [74] or the Eigen linear algebra library [33]. For the

GPU implementation, cuBLAS [66] is used for the batch LU factorization, inverse

computation and forward and backward substitution. The kernel used for the batch

matrix-vector multiplication on the GPU is shown in Listing 10.1.

Listing 10.1: CUDA kernel for batched matrix-vector multiplication where

blockDim = (32, 6) and

gridDim = ((M + 32 - 1)/32, std::min((batchCount + 6 - 1)/6, 65535)).

1 __global__

2 void DgemvBatched(const int M, const int N,

3 const double ** Aarray , int lda , const double ** xarray ,

4 int incx , double ** yarray , int incy , int batchCount)

5

6 const unsigned int i = blockDim.x * blockIdx.x +

7 threadIdx.x;

8

9 if (i >= M) return;

10

11 for (unsigned int batch = blockDim.y * blockIdx.y +

12 threadIdx.y; batch < batchCount; batch += gridDim.y *

13 blockDim.y)

14

15 double sum = 0.0;

16 for (unsigned int j = 0; j < N; j++)

17 sum += Aarray[batch ][i + j*lda] *


18 xarray[batch][j * incx];

19

20 yarray[batch][i * incy] = sum;

21

22 __syncthreads ();

23

24

10.2 Multicolored Gauss-Seidel

This section describes the mesh coloring algorithm and the modifications to the resid-

ual computation for multicolored Gauss-Seidel (MCGS) block iterative solver. We

show that the mesh coloring algorithm is able to obtain the target two colors for a

structured mesh and four colors for an unstructured mesh.

10.2.1 Mesh Coloring

MCGS requires graph coloring in order to have separate sets of elements to advance

sequentially and improve convergence. In this work, a modified vertex coloring algo-

rithm is created where the main requirement is such that no two adjecent elements

have the same color. This is done in order to maximize the amount of data in M and

minimize the amount of data in N for the MCGS splitting method in section 7.2.2.

The algorithm also attempts to distribute the amount of elements used for each color

evenly in order to obtain a near equal distribution. This ensures that the computa-

tion requirement for each color remains about the same and the GPU has the most

amount of work possible per color.

The total number of colors used in the algorithm is set by the user. Ideally, a

minimal amount of colors is desirable in order to maintain larger subproblem sizes

leading to more efficient GPU performance. For two-dimensional meshes, the mini-

mum is two for structured meshes and four for unstructured meshes via the four-color

theorem for planar graphs [7].


The algorithm used is a modified greedy mesh coloring algorithm. First, a vector

of counts for each color is initialized to zero. Then, the element connectivity graph

is traversed in a breadth-first order. Each element is set to a color with the lowest

count. The color must also be unused by adjacent elements. If all available colors are

used, the algorithm fails.

Examples of the mesh coloring algorithm on two-dimensional structured and un-

structured meshes are shown in Figure 10.1. In these examples, the algorithm was

(a) Channel mesh, 2 color distribution: (384, 384)

(b) NACA0012 mesh,2 color distribution: (512, 512)

(c) Mixed NACA0012 mesh,4 color distribution: (379, 378, 378, 377)

Figure 10.1: Examples of the mesh coloring algorithm.

able to obtain the target two colors for the structured cases and four colors for the

unstructured case. It was also able to distribute the colors evenly as desired. The

mesh coloring is performed in serial on a single process on the CPU. The colors sets

are then partitioned between ranks for MPI simulations.


10.2.2 Residual Computation per Color

The algorithm used to compute the residual requires coalesced memory access among

elements in order to be efficient on the GPU. Creating staggered color sets and com-

puting the elements within one color set at a time leads to irregular memory access

patterns which is detrimental to the performance. In order to avoid this problem,

elements are shuffled in memory after mesh coloring. This ensures that each color

has coalesced memory access patterns.

In the inviscid case, all residual operations which occur on the element only occur

for a particular color. For example, the computation of the discontinuous flux and

the extrapolation and differentiation operations. In the viscous case, the gradient is

recomputed and extrapolated on all elements to ensure the most up-to-date value.

For both cases, all operations which occur on faces occurs on all faces. For exam-

ple, the computation of the common interface flux. This also means that all MPI

communication is performed for each color.

The work load for the residual computation in MCGS is higher than that for

block Jacobi, especially when using a large number of colors. These inefficiencies

could be mitigated by only updating information that is necessary but this makes the

algorithm more complex and leads to more irregular data access patterns.

10.3 Element Local Jacobians

The construction of the element local Jacobians is carried out in several steps:

1. Compute discontinuous flux Jacobians

2. Compute boundary condition Jacobians

3. Compute common interface flux and solution Jacobians

4. Compute modified interface flux Jacobians

5. Construct element local Jacobians

6. Scale Jacobians by determinant of geometric Jacobian


Each step is described in detail in the sections that follow. The flux Jacobian and

boundary Jacobians are computed using the functions in [92]. It’s important to

note that the element local Jacobian computation requires only the solution and

solution gradient information on MPI interfaces. This should already be provided

from the residual computation so no additional MPI communication is required for

the computation of the element local Jacobians or the implicit method in general.

10.3.1 Constructing the Flux and Solution Jacobians

The flux and solution Jacobian computations follow a similar procedure to the re-

spective functions in the residual computation. This section provides a high level

overview of the additional data structures and computations required.

Data Structures

All flux and solution Jacobian data structures are organized into a row-major, struc-

ture of arrays (SoA) format where data across all elements is contiguous in memory.

This data layout leads to a stride when accessing different values within an element

but is particular beneficial on GPUs because it allows for coalesced memory access

and vectorization of operations within kernels when performing the various function

evaluations. The dimensions of the data structures are

Discontinuous Flux:∂F

∂U: [Ndims, Nspts, Nvars, Nvars, Neles] ,

∂F

∂Q: [Ndims, Ndims, Nspts, Nvars, Nvars, Neles] ,

Common Interface Flux:∂F I

∂U: [Nfpts, Nvars, Nvars, Neles] ,

∂F I

∂Q: [Ndims, Nfpts, Nvars, Nvars, Neles] ,

∂F I

∂QN: [Ndims, Nfpts, Nvars, Nvars, Neles] ,

Common Interface Solution:∂U I

∂U: [Nfpts, Nvars, Nvars, Neles] ,


Neighbor Contribution:∂QN

∂U I,N: [Ndims, Nfpts, Neles] , (10.1)

where QN refers to the neighboring element solution gradient, Ndims is number of

spatial dimensions, Nspts is the number of solution points per element, Nfpts is the

number of flux points per element, Nvars is the number of conservative variables in

the equation and Neles is the number of elements in the domain. These numbers

are defined in chapter 6 for each problem type. It’s also important to note that the

common interface fluxes for each face can be found from each respective structure,

for example,

∂F I,face

∂U face,∈ ∂F

I

∂U,

∂F I,face

∂Qface,∈ ∂F

I

∂Q,

∂U I,face

∂U face,∈ ∂U

I

∂U,

for face ∈ FL ∪ FR. (10.2)

Lastly, the transpose of the inverse geometric Jacobian matrix, J−1′ , is also a neces-

sary structure and the dimensions are defined as,

Geometric Jacobian: J−1′ : [Ndims, Nspts, Ndims, Neles] . (10.3)


The discontinuous flux Jacobians are computed by directly evaluating the flux Jaco-

bians at solutions points as shown in Eq. (8.48),

∂F

∂U spts=∂F

∂U(Uspts,Qspts) ,

∂F

∂Q spts

=∂F

∂Q(Uspts,Qspts) , (10.4)

where Uspts is the solution at solution points, Qspts is the solution gradient at solution

points and the functions ∂F∂U

(U,Q) and ∂F∂Q

(U,Q) are given in [92] for the Euler and

Navier-Stokes equations.


The discontinuous flux is then transformed to create the the transformed discon-

tinuous flux using Eq. (8.71),

∂F

∂U= J−1 ∂F

∂U spts,

∂F

∂Q= J−1 ∂F

∂Q spts

, (10.5)

where a temporary structure is used to avoid overwriting data. For the GPU imple-

mentation, this process is parallelized over all elements and solution points.


Common interface Jacobians are computed by evaluating functions on global flux

points, “gfpts”, as opposed to element local flux points, “fpts”, where there is a

unique global flux point for each flux point pair between elements. This is done to

avoid computing values multiple times.

The process of computing the common interface flux Jacobians is described in

detail from an element local perspective starting from Eq. (8.51) in section 8.2.1.

Using this process from a global flux point perspective, the common interface flux

Jacobians are expressed as

∂F I

∂U− gfpts= Agfpts

∂F I

∂U−(U−

gfpts,U+gfpts,Q

−gfpts

),

∂F I

∂U+ gfpts= Agfpts

∂F I

∂U+

(U−

gfpts,U+gfpts,Q

+gfpts

),

∂F I

∂Q−gfpts

= Agfpts∂F I

∂Q−(U−

gfpts,Q−gfpts

),

∂F I

∂Q+gfpts

= Agfpts∂F I

∂Q+

(U+

gfpts,Q+gfpts

), (10.6)

where U−gfpts, U

+gfpts, Q

−gfpts and Q+

gfpts are the solution and solution gradient extrap-

olated from element solution points to the left and right sides of each global flux

point, respectively. These Jacobians directly correlate to the element local common


interface flux Jacobians through the use of pointers,

∂F I

∂U− gfpts→ ∂F I

∂U,

∂F I

∂U+ gfpts→ ∂F I

∂U,

∂F I

∂Q−gfpts

→ ∂F I

∂Q,

∂F I

∂Q+gfpts

→ ∂F I

∂Q,

∂F I

∂Q−gfpts

→ ∂F I

∂QN,

∂F I

∂Q+gfpts

→ ∂F I

∂QN, (10.7)

where a copies of ∂F I

∂Q− gfptsand ∂F I

∂Q+gfpts

are made to avoid communication costs when

constructing the element local Jacobian.

The process of the computing the common interface solution Jacobians is shown

in Eq. (8.63) and also correlates to the element local structures through pointers,

∂U I

∂U− gfpts→ ∂U I

∂U,

∂U I

∂U+ gfpts→ ∂U I

∂U. (10.8)

As shown in equations (8.57) and (8.65), the boundary flux and solution Jacobians

are computed using the boundary functions ∂F b

∂U(U,Q), ∂F

b

∂Q(U,Q) and ∂Ub

∂U(U) defined

in [92] for a given boundary condition of the Euler and Navier-Stokes equations. For

the GPU implementation, this entire process is parallelized over all global flux points.

Modified Common Interface Flux Jacobians

The neighbor contribution Jacobian, ∂QN

∂UI,N, is computed only once during prepro-

cessing by utilizing Eq. (8.103) or Eq. 8.103 for a two- or three-dimensional case,

respectively. The contribution from neighboring solution gradients are then added to

the common interface flux Jacobians using Eq. (8.43),

∂F I

∂U=∂F I

∂U+∂F I

∂QN

∂QN

∂U I,N

∂U I,N

∂U, (10.9)

where ∂UI,N

∂U= ∂UI

∂U. For the GPU implementation, this process is parallelized over all

elements and variables. When solving the advection or Euler equations, the common

interface flux Jacobians do not need to be modified and this section is skipped.


10.3.2 Constructing the Element Local Jacobians

This section provides a description of the left-hand side matrix data structure and

examples of algorithms used for the Kronecker product formulation of the element

local Jacobians.

Data Structure

The element local Jacobians are stored in the left-hand side (LHS) matrices described

in section 10.1 using a row-major, array of structures (AoS) format where data as-

sociated with a single LHS matrix is contiguous in memory. This data layout was

chosen in order to utilize the various batch cuBLAS [66] routines for the direct solver

in section 10.1.3. The dimensions of the LHS matrix structure is

Left-hand side matrix: LHS : [Neles, Nvars, Nspts, Nvars, Nspts] . (10.10)

The LHS matrix structure uses the most amount of memory within the implicit

algorithm because it’s directly dependent on the amount of elements in the domain

and the square of the number of solution points within an element. The amount

memory used by the LHS matrix per element is (P + 1)2d where P is the degree of

the Lagrange basis polynomials and d is the spatial dimension. Table 10.1 shows

some common sizes for an element local Jacobian.

P 2D AD 3D AD 2D NS 3D NS

1 (4× 4) (8× 8) (16× 16) (40× 40)

2 (9× 9) (27× 27) (36× 36) (135× 135)

3 (16× 16) (64× 64) (64× 64) (320× 320)

4 (25× 25) (125× 125) (100× 100) (625× 625)

5 (36× 36) (216× 216) (144× 144) (1080× 1080)

Table 10.1: Common sizes for an element local Jacobian for the advection-diffusion(AD) and Navier-Stokes (NS) equations. These sizes remain the same for the advec-tion and Euler equations.


Two-Dimensional Element Local Jacobian

The first term, ∇δ ·(∂fele

∂uele

), of the two-dimensional element local Jacobian is con-

structed from the Kronecker products in Eq. (8.88). An example of one term in

the formulation is shown in algorithm 10.1. The resulting sparsity pattern follows

Algorithm 10.1 Calculate 2D Kξ(∂fξele

∂uele

)1: for ele = 1 to Neles do2: for varj = 1 to Nvars do3: for vari = 1 to Nvars do4: for fi = 1 to Nspts1D do5: for si = 1 to Nspts1D do6: for sj = 1 to Nspts1D do7: spti ← fi * Nspts1D + si8: sptj ← fi * Nspts1D + sj9: ∂F

∂U← ∂F

∂U(0, sptj, vari, varj, ele)

10: ∂F I,L

∂UL← ∂F I,L

∂UL(fi, vari, varj, ele)

11: ∂F I,R

∂UR← ∂F I,R

∂UR(fi, vari, varj, ele)

12: LHS(ele, varj, sptj, vari, spti) ← LHS(ele, varj, sptj, vari, spti) +

K(si, sj) * ∂F∂U

+ KL(si, sj) * ∂F I,L

∂UL+ KR(si, sj) * ∂F I,R

∂UR

the same sparsity pattern as a Kronecker sum as shown in Figures 8.1 and 8.2 for a

polynomial order of P = 2.


∂qele

∂qele

∂uele

), of the two-dimensional element local Jacobian

is constructed from the product of the two Kronecker formulations in Eq. (8.90). The

product produces four terms which can be written explicitly as[∇δ ·

(∂fele

∂qele

∂qele

∂uele

)]vi,vj

=∑

dj∈x,y

Nvars∑vk=1[

Kξ(∂fξele

∂qdjele

)]vi,vk

J−1′

(dj,ξ)ele

[Kξ(∂qξele

∂uele

)]vk,vj

+[Kη(∂fηele

∂qdjele

)]vi,vk

J−1′

(dj,η)ele

[Kη(∂qηele

∂uele

)]vk,vj

+[Kξ(∂fξele

∂qdjele

)]vi,vk

J−1′

(dj,η)ele

[Kη(∂qηele

∂uele

)]vk,vj

+


[Kη(∂fηele

∂qdjele

)]vi,vk

J−1′

(dj,ξ)ele

[Kξ(∂qξele

∂uele

)]vk,vj

, (10.11)

where a loop over variables is included to account for boundary conditions as discussed

in section 8.3. The first two terms are like-terms where the sparsity pattern is the

same between the two terms in the product. An example of this process is shown

in algorithm 10.2. The last two terms are cross-terms where the sparsity pattern is

different between the two terms in the product. The result produces a dense matrix

with no sparsity pattern because of the products between the cross-terms as shown in

Figure 8.3 for P = 2. This is shown in algorithm 10.3. For the GPU implementation,

algorithms 10.1–10.3 are parallelized over all elements and variables.

Three-Dimensional Element Local Jacobian

The three-dimensional implementation of the element local Jacobian is very similar

to the two-dimensional implementation with a few exceptions. The first term, ∇δ ·(∂fele

∂uele

), is constructed using Eq. (8.104). The sparsity patterns are shown in Figures

8.4 and 8.5 for a polynomial order of P = 2. The algorithm for each Kronecker

product formulation looks similar to algorithm 10.1.


∂qele

∂qele

∂uele

), is constructed using Eq. (8.106). The product

between the two Kronecker formulations produces a total of nine terms where three

terms are like-terms and six are cross-terms. The algorithms for building the like-

terms and cross-terms looks similar to algorithms 10.2 and 10.3, respectively. The

sparsity patterns are shown in Figures 8.6 and 8.7 for P = 2.

Scale Jacobian

The last step in constructing the element local Jacobians is to scale each Jacobian

by the determinant of the geometric Jacobian as shown in Eq. 8.113. For the GPU

implementation, this process is parallelized over all elements and variables.


Algorithm 10.2 Calculate 2D like-term[Kξ(∂fξele

∂qdjele

)]vi,vk

J−1′

(dj,ξ)ele

[Kξ(∂qξele

∂uele

)]vk,vj

1: for ele = 1 to Neles do2: for varj = 1 to Nvars do3: for vari = 1 to Nvars do4: for fi = 1 to Nspts1D do5: for si = 1 to Nspts1D do6: for sj = 1 to Nspts1D do7: val ← 0.08: for dimj = 1 to Ndims do9: for vark = 1 to Nvars do

10: diag ← (vark == varj);

11: ∂F I,L

∂QL← ∂F I,L

∂QL(dimj, fi, vari, vark, ele)

12: ∂F I,R

∂QR← ∂F I,R

∂QR(dimj, fi, vari, vark, ele)

13: ∂UI,L

∂UL← ∂UI,L

∂UL(fi, vark, varj, ele)

14: ∂UI,R

∂UR← ∂UI,R

∂UR(fi, vark, varj, ele)

15: for sk = 1 to Nspts1D do16: sptk ← fi * Nspts1D + sk17: ∂F

∂Q← ∂F

∂Q(0, dimj, sptk, vari, vark, ele)

18: J−1′ ← J−1′(0, sptk, dimj, ele)19: kron1 ←K(si, sk) * ∂F

∂Q+

KL(si, sk) * ∂F I,L

∂QL+ KR(si, sk) * ∂F I,R

∂QR

20: kron2 ←K(sk, sj) * diag +

KL(sk, sj) * ∂UI,L

∂UL+ KR(sk, sj) * ∂UI,R

∂UR)

21: val ← val + kron1 * J−1′ * kron2

22: end for23: end for24: end for25: spti ← fi * Nspts1D + si26: sptj ← fi * Nspts1D + sj27: LHS(ele, varj, sptj, vari, spti) ← LHS(ele, varj, sptj, vari, spti) +

val


Algorithm 10.3 Calculate 2D cross-term[Kξ(∂fξele

∂qdjele

)]vi,vk

J−1′

(dj,η)ele

[Kη(∂qηele

∂uele

)]vk,vj

1: for ele = 1 to Neles do2: for varj = 1 to Nvars do3: for vari = 1 to Nvars do4: for etai = 1 to Nspts1D do5: for etaj = 1 to Nspts1D do6: for xii = 1 to Nspts1D do7: for xij = 1 to Nspts1D do8: val ← 0.09: sptij ← etai * Nspts1D + xij

10: for dimj = 1 to Ndims do11: for vark = 1 to Nvars do12: diag ← (vark == varj);13: ∂F

∂Q← ∂F

∂Q(0, dimj, sptij, vari, vark, ele)

14: ∂F I,L

∂QL← ∂F I,L

∂QL(dimj, etai, vari, vark, ele)

15: ∂F I,R

∂QR← ∂F I,R

∂QR(dimj, etai, vari, vark, ele)

16: ∂UI,B

∂UB← ∂UI,B

∂UB(xij, vark, varj, ele)

17: ∂UI,T

∂UT← ∂UI,T

∂UT(xij, vark, varj, ele)

18: J−1′ ← J−1′(1, sptij, dimj, ele)19: kron1 ←K(xii, xij) * ∂F

∂Q+

KL(xii, xij) * ∂F I,L

∂QL+ KR(xii, xij) * ∂F I,R

∂QR

20: kron2 ←K(etai, etaj) * diag +

KL(etai, etaj) * ∂UI,B

∂UB+ KR(etai, etaj) * ∂UI,T

∂UT)

21: val ← val + kron1 * J−1′ * kron2

22: end for23: end for24: spti ← etai * Nspts1D + xii25: sptj ← etaj * Nspts1D + xij26: LHS(ele, varj, sptj, vari, spti) ← LHS(ele, varj, sptj, vari, spti) +

val

Chapter 11

Performance Analysis

In this chapter, the computational performance of the implicit method within ZEFR

is characterized within three sections. The first section tests the pseudo time stepping

method with MCGS on the bump and NACA 0012 test cases. The results show that,

in general, the total amount of pseudo time steps needed for convergence increases

with mesh refinement but does not necessarily increase with polynomial refinement.

The second section studies the GPU performance of the unsteady and steady-state

solvers on the half cylinder and NACA 0012 test cases, respectively. A comparison

between the explicit RK45 and implicit ESDIRK4 shows that the implicit method is

about 35 times slower than the explicit method for the special case of viscous flow over

a half cylinder, Re = 1000, P = 3 on 12 NVIDIA Tesla K80 GPUs. A performance

breakdown of the unsteady implicit method shows that 50% of the time is spent on

the batched matrix vector operation while the Jacobian computations only take up

about 15%. It is also shown that at least an order of magnitude speedup is achieved

on a single GPU over a single CPU core for the steady-state solver.

The last section studies the multi-GPU scalability of the steady-state solver for

the NACA 0012 test case. The results show that the implicit steady-state solver

achieves better strong and weak scalability when compared to the explicit method

and a weak scalability efficiency of about 98% across eight GPUs is obtained.

In what follows, all GPU tests invert the left-hand side matrices using the modified

direct solver described in section 10.1.3.

155

CHAPTER 11. PERFORMANCE ANALYSIS 156

11.1 Multicolored Gauss-Seidel

In this section, the convergence properties of the pseudo time stepping method with

MCGS is studied for the bump and NACA 0012 airfoil test cases from sections 9.1

and 9.2, respectively. As stated in the beginning of chapter 9, a solution is considered

converged if the residual drops by 10 orders of magnitude. All cases start from uniform

flow, and the CFL is exponentially increased at each pseudo time step until a CFL

of around 10000 is reached. Local time stepping is used and 100 block iterations

are used per pseudo time step. All cases are performed on a single NVIDIA Tesla

C2070 GPU and the element local Jacobians are not constructed using the Kronecker

product formulation described in 10.3.

11.1.1 Inviscid flow over a bump

Table 11.1 shows the total number of pseudo time steps, the wall-clock time and the

entropy error for a series of meshes, P = 2, for the bump test case. The results show

Neles (24× 8) (48× 16) (96× 32) (192× 64)

Pseudo Time Steps 14 27 52 103Wall Time (s) 0.722 2.01 10.43 75.35Entropy Error 7.28e-05 1.04e-05 1.40e-06 1.79e-07

Table 11.1: Convergence results for different meshes, inviscid flow over a bump, im-plicit pseudo time stepping with two color MCGS, single GPU, P = 2

that the total amount of pseudo time steps needed for convergence increases with

mesh refinement.

Table 11.2 shows the total number of pseudo time steps, the wall-clock time and

the entropy error for a series of polynomial orders, (48× 16) mesh, for the bump test

case. In this case, the total amount of pseudo time steps remained relatively the same

as the polynomial order was increased.

Figure 11.1 summarizes the results by showing the entropy error vs. total number

of block iterations and wall-clock time for all cases. The results show that increasing


P 2 3 4 5

Pseudo Time Steps 27 44 47 43Wall Time (s) 2.01 4.92 13.47 20.02Entropy Error 1.04e-05 1.30e-06 6.65e-07 3.63e-07

Table 11.2: Convergence results for different polynomial orders, inviscid flow over abump, implicit pseudo time stepping with two color MCGS, single GPU, (48 × 16)quadrilateral mesh

103

104

10−7

10−6

10−5

10−4

Iterations

‖eS‖L2(Ω

)

P = 2(48× 16)

(a) Entropy Error vs. Block Iterations

100

101

102

10−7

10−6

10−5

10−4

Wall-clock time (s)

‖eS‖L2(Ω

)

P = 2(48× 16)

(b) Entropy Error vs. Wall Time (s)

Figure 11.1: Entropy error, inviscid flow over a bump, implicit pseudo time steppingwith two color MCGS, single GPU

the polynomial order on the (48 × 16) mesh reduces the entropy error while main-

taining a smaller wall-clock time compared to the more refined meshes.

11.1.2 Inviscid flow over the NACA 0012 airfoil

Table 11.3 shows the total number of pseudo time steps, the wall-clock time and the

lift coefficient for a series of O-meshes, P = 4, for the NACA 0012 test case. As in

the previous test case, the total amount of pseudo time steps needed for convergence

increased as the mesh was refined. The results are the same for the mixed meshes in

Table 11.4.

Table 11.5 shows the total number of pseudo time steps, the wall-clock time and


Neles (8× 8) (16× 16) (32× 32) (64× 64) (128× 128)

Pseudo Time Steps 8 10 20 42 76Wall Time (s) 0.718 1.302 7.52 56.48 395.32Lift Coefficient 1.8107e-01 1.8037e-01 1.7948e-01 1.7949e-01 1.7950e-01

Table 11.3: Convergence results for different meshes, inviscid flow over the NACA0012 airfoil, implicit pseudo time stepping with two color MCGS, single GPU, P = 4

Neles 764 1512 6048

Pseudo Time Steps 36 43 105Wall Time (s) 12.28 24.81 220.14Lift Coefficient 1.7953e-01 1.7952e-01 1.7949e-01

Table 11.4: Convergence results for different mixed meshes, inviscid flow over theNACA 0012 airfoil, implicit pseudo time stepping with four color MCGS, single GPU,P = 4

the lift coefficient for a series of polynomial orders, (32 × 32) mesh, for the NACA

0012 test case. As shown previously, the total amount of pseudo time steps remained

P 2 3 4 5

Pseudo Time Steps 15 37 20 39Wall Time (s) 1.357 5.23 7.50 24.54Lift Coefficient 1.7853e-01 1.7963e-01 1.7948e-01 1.7950e-01

Table 11.5: Convergence results for different polynomial orders, inviscid flow over theNACA 0012 airfoil, implicit pseudo time stepping with two color MCGS, single GPU,(32× 32) quadrilateral mesh

relatively the same as the polynomial order was increased.

11.2 GPU performance

This section studies the GPU performance of the unsteady and steady-state solvers.

The GPU performance study on the unsteady solver compares the explicit RK45


and implicit ESDIRK4 methods and shows a relative performance breakdown of each

component of the implicit method. The GPU performance study on the steady-state

solver shows the speedup of a single GPU over a single CPU core.

11.2.1 Unsteady Solver

In this section, the GPU performance of the unsteady solver is studied for the half

cylinder test case in section 9.5. The unsteady solver for this problem uses the

ESDIRK4 scheme from section 7.1.2 on 12 NVIDIA Tesla K80 GPUs and consists of

five implicit stages per time step. As discussed in the beginning of chapter 9, each

stage uses a fixed number of one pseudo time step and 200 block iterations where

the stage residual drops by 7 orders of magnitude. For this section, the element local

Jacobians are constructed using the Kronecker product formulation described in 10.3.

Comparison between explicit and implicit

The number of time steps and wall-clock time for the full simulation from section 9.5

for the explicit RK45 and implicit ESDIRK4 schemes are compared in Table 11.6.

The results show that the implicit method is about 35 times slower than the explicit

Case Studies Time Steps Wall-clock Time (s)

ZEFR - RK54 212000 2696.9ZEFR - ESDIRK4 5400 94919.1

Table 11.6: A comparison between explicit RK45 and implicit ESDIRK4 for viscousflow over a half cylinder, Re = 1000, P = 3 on 12 NVIDIA Tesla K80 GPUs.

method for this particular test case.

Performance Breakdown

The implicit method can be split into two sections. The first section called “Inverse

Jacobian” involves the construction and inverse of the left-hand side matrices used

to advance the solution. The second section is the block iterative method where the


inverse matrices are multiplied by the right-hand side vectors. The implicit method is

profiled over the course of several time steps using NVIDIA’s nvprof, a profiling tool

for GPUs [65]. Figure 11.2 compares the wall-clock time between the two sections.

The results show that the majority of the time, 78%, is being spent within the block

Inverse Jacobian: 22%

Block Iterations: 78%

Figure 11.2: A wall-clock time comparison between the inverse Jacobian computationand the block iterative method for one ESDIRK4 time step of viscous flow over a halfcylinder, Re = 1000, P = 3, on 12 NVIDIA Tesla K80 GPUs.

iterative method.

The “Inverse Jacobian” profile is broken down even further in Figure 11.3 where

each section of the element local Jacobian computation and inverse is compared.

“Functions”, “Term 1” and “Scaling” take up a very small portion of overall time.

These refer to the construction of the flux and solution Jacobians from section 10.3.1,

the first term in the construction of the element local Jacobian from section 10.3.2

and the geometric and time step scaling operations for the left-hand side matrices,

respectively. The results show that the majority of the time is being spent construct-

ing “Term 2” (60%) and inverting all the left-hand side matrices (34%). The second

term in the element local Jacobian computation requires multiple products between

matrices so it’s not surprising that its much more expensive than the first term.

The profile for the block iterative method is also broken down further in Figure


Functions: 1%

Term 2: 60%

Term 1: 2%

Scaling: 3%

Inverse: 34%

Figure 11.3: A wall-clock time comparison between each section of “Inverse Jacobian”for viscous flow over a half cylinder, Re = 1000, P = 3, on 12 NVIDIA Tesla K80GPUs.

11.4 where the computation of the residual called “Residual”, the right-hand side vec-

tors called “RHS” and the batched matrix-vector operations called “DgemvBatched”

are compared. Based on these results, the operation that takes the most amount of

time within the block iterative method is the batched matrix vector operation (64%).

Overall, across a single time step, the batched matrix vector operation takes up

a total of about 50% while the element local Jacobian computation takes up about

15%.

11.2.2 Steady-state Solver

In this section, the CPU and GPU performance of the steady-state solver are com-

pared for the NACA 0012 airfoil test case in section 9.2. Since convergence is not

important in this section, the total number of pseudo time steps is fixed to 10 and

the pseudo time step is fixed to ∆τ = 1× 10−9. All cases are performed on a single


Residual: 23%

RHS: 13%

DgemvBatched: 64%

Figure 11.4: A wall-clock time comparison between each section of the block iterativemethod for viscous flow over a half cylinder, Re = 1000, P = 3, on 12 NVIDIA TeslaK80 GPUs.

NVIDIA Tesla C2070 GPU and a single Intel Xeon X5650 CPU core. In this sec-

tion, the element local Jacobians are not constructed using the Kronecker product

formulation described in 10.3.

Table 11.7 shows the overall speedup of the GPU code compared to the CPU code

for the steady-state solver with pseudo time stepping and 2 color MCGS. The results

Neles (32× 32) (64× 64) (128× 128) (256× 256)

P = 2 12.5 18.0 20.5 21.7P = 3 18.6 22.3 24.2 25.4P = 4 15.3 17.2 18.0 —P = 5 17.9 20.3 21.4 —

Table 11.7: Speedup of a single GPU over a single CPU core for inviscid flow overthe NACA 0012 airfoil, implicit pseudo time stepping with two color MCGS

show that at least an order of magnitude speedup is achieved on the GPU over the

CPU. The results also show that the speedup of the GPU code over the CPU code

increases with mesh size but stays roughly the same with polynomial order. Note that


the memory requirements for the steady-state solver were reached for the (256× 256)

mesh with polynomial orders P = 4 and P = 5. As a reference, Tesla C2070 GPUs

contain 6 GB of device memory.

11.3 Multi-GPU Scalability

In this section, the multi-GPU scalability of the steady-state solver is studied for the

NACA 0012 airfoil test case in section 9.2. Since convergence is not important in this

section, the total number of pseudo time steps is fixed to 50 and the pseudo time step

is fixed to ∆τ = 1× 10−9. All cases are performed on two nodes of a NVIDIA Tesla

C2070 GPU cluster with six GPUs and two CPUs installed on each node. In this

section, the element local Jacobians are not constructed using the Kronecker product

formulation described in 10.3.

11.3.1 Strong Scalability

An MPI strong scalability study is performed on up to 12 GPUs to study the per-

formance of the explicit and implicit methods as the workload per device becomes

smaller and the communication overhead becomes more significant. Figure 11.5 shows

the speedup relative to one GPU for both the explicit and implicit methods. The re-

sults show that strong scalability improves with increased mesh refinement for both

methods. In this case, the implicit steady-state solver achieves better strong scala-

bility for smaller problem sizes because there’s more work relative to communication

available with the implicit method.

11.3.2 Weak Scalability

An MPI weak scalability study is performed on up to 8 GPUs to study the per-

formance of the explicit and implicit methods as the workload per device remains

constant and additional resources are used to solve a larger problem. Tables 11.8 and

11.9 show the wall-clock time and efficiency for the explicit and implicit methods.

The memory required for the left-hand side matrices are also reported. The results


2 3 4 5 6 7 8 9 10 11 120

2

4

6

8

10

12

Number of GPUs

Speedup

32x3264x64128x128256x256

(a) Explicit RK4

2 3 4 5 6 7 8 9 10 11 120

2

4

6

8

10

12

Number of GPUs

Speedup

32x3264x64128x128

(b) Implicit Steady w/ MCGS

Figure 11.5: Speedup relative to one GPU for inviscid flow over the NACA 0012airfoil, P = 5

NGPUs 1 2 4 8Neles (128× 128) (256× 128) (256× 256) (512× 256)

Wall Time (s) 191.6 205.28 213.04 212.11Efficiency (%) 100 93.3 89.9 90.3

Table 11.8: Weak scalability study for inviscid flow over the NACA 0012 airfoil,explicit RK4, P = 5

NGPUs 1 2 4 8Neles (128× 128) (256× 128) (256× 256) (512× 256)

Wall Time (s) 396.81 398.19 401.35 403.35Efficiency (%) 100 99.6 98.9 98.4Memory (GB) 2.71 5.44 10.87 21.74

Table 11.9: Weak scalability study for inviscid flow over the NACA 0012 airfoil,implicit pseudo time stepping with two color MCGS, P = 5

show that the implicit steady-state solver achieves better weak scalability compared

to the explicit method. The implicit method is also able to maintain an efficiency

greater than 98% on a problem with over 21 GB of data. This suggests that the

implicit method can be effectively distributed over multiple GPUS to solve larger


problems without a significant degradation in performance.

Chapter 12

Conclusions

This dissertation focused on addressing two major limitations preventing high-order

flux reconstruction methods from being utilized for unsteady aerospace simulations.

First, explicit methods often become unreliable because they are pushed close to their

numerical stability limit. Second, implicit methods can be prohibitively expensive for

high-order methods, specifically on modern hardware.

In Part I of this dissertation, it was shown that the CFL condition for nodal DG

via FR for the advection-diffusion equation is stricter than that for pure-advection

and pure-diffusion individually. This demonstrated that the coupling of advective

and diffusive terms within any equation have a dramatic effect on numerical stability.

Given these findings, a maximum stable time step estimate for the linear advection-

diffusion and Navier-Stokes equations on unstructured, tensor product elements was

deduced. The estimates were shown to be accurate within 50% error on all test cases

and conservative on tests with Cartesian grids.

Theoretical and numerical verification was also presented that showed that schemes

with centered values produce less error for well resolved solutions while schemes with

one-sided values produce less error for solutions that are under-resolved. It was also

shown that the CFL condition is strongly influenced by the choice of interface fluxes

and, in general, the condition for a scheme using centered values is much higher than

that which has one-sided values. This motivates the use of centered schemes for

explicit and implicit methods.

166

CHAPTER 12. CONCLUSIONS 167

In Part II of this dissertation, a multi-GPU, high-order, implicit time stepping

method was developed, implemented, tested and shown to be feasible on GPU ar-

chitectures. It was shown that the steady-state solver was able to obtain the correct

rates of convergence of P + 1 for solution quantities and 2P for integral functional

quantities. High-order polynomials were utilized to obtain the same error with less

degrees of freedom at a faster wall-clock time. The total amount of pseudo time

steps needed for convergence increased with mesh refinement but did not necessar-

ily increase with polynomial refinement and at least an order of magnitude speedup

was achieved on a single GPU over a single CPU core. The steady-state solver also

achieved better strong and weak scalability when compared to the explicit method

and a weak scalability efficiency of about 98% across eight GPUs was obtained. All of

these attributes are promising for the development of future high-order steady-state

solvers.

A Kronecker product formulation was used to show that the time complexity

of computing the analytical element local Jacobian for DFR can be reduced from

O(P 3d−1

)to O

(P d+1

)for the advection or Euler equations and O

(P 3d)

to O(P d+2

)for the advection-diffusion or Navier-Stokes equations. It was shown that the element

local Jacobian can be split into two terms which are constructed from two or three

distinct sparsity patterns based on Kronecker products as shown in Figures 8.1–

8.7. The analysis of the sparsity pattern of the element local Jacobians led to the

construction of efficient algorithms for GPU architectures.

A comparison between the explicit RK45 and implicit ESDIRK4 showed that the

unsteady implicit method was about 35 times slower than the explicit method for the

special case of viscous flow over a half cylinder, Re = 1000, P = 3 on 12 NVIDIA

Tesla K80 GPUs. This seems to indicate that explicit methods are more efficient

than implicit methods for these particular types of problems. Based on the results,

one could estimate that an implicit method would begin to become competitive when

the maximum time step of the explicit method is around two orders of magnitude

lower. This seems reasonable as this problem had an explicit time step restriction

on the order of 10−4 which is not particularly large. A performance breakdown of

the unsteady implicit method showed that 50% of the time was spent on the batched

CHAPTER 12. CONCLUSIONS 168

matrix vector operation while the Jacobian computations spent about 15%. This is

an important finding moving forward as future development focuses on creating more

efficient methods which eliminate these types of bottle necks.

Lastly, the memory size of the left-hand side matrices increased rapidly with

polynomial order but the memory issues on the GPU were mitigated by utilizing

the robust scaling properties of the method. Despite the memory deficiencies and

lack of competitiveness with GPU accelerated explicit methods, the results show

promise for solving problems of engineering importance on GPU clusters. As modern

GPU clusters become more powerful, implicit methods will become more feasible

for high Reynolds number flows where explicit methods are much more limited. An

investigation into methods of reducing memory requirements and producing more

efficient methods is warranted and is a topic of future research.

Appendix A

Boundary Conditions

For the Navier-Stokes equations, the common interface fluxes at boundary faces are

computed as,

F b (U,Q) = F n(U b(U), Qb(U,Q)

), (A.1)

where F n(U) is the flux normal to the face, U b(U) is the solution prescribed at the

boundary face, Qb(U,Q) is the solution gradient prescribed at the boundary face, U

is the solution extrapolated to the face and Q is the solution gradient extrapolated

to the face.

In two dimensions, the extrapolated solution and solution gradient are defined as

U =

ρ

ρu

ρv

e

, Qd =

∂ρ∂d∂ρu∂d∂ρv∂d∂e∂d

, d ∈ x, y, (A.2)

and the components of the unit normal vector are defined as n = [nx, ny]T . In three

169

APPENDIX A. BOUNDARY CONDITIONS 170

dimensions, the extrapolated solution and solution gradient are

U =

ρ

ρu

ρv

ρw

e

, Qd =

∂ρ∂d∂ρu∂d∂ρv∂d∂ρw∂d∂e∂d

, d ∈ x, y, z, (A.3)

and the components of the unit normal vector are n = [nx, ny, nz]T . The solution, U b,

and solution gradient, Qb, at the boundary are determined directly from the following

boundary conditions.

A.1 Solid Slip-Wall and Symmetry

On a solid surface where the flow is allowed to slip, the flow must remain tangent to

the surface [61]. In 2D, the velocities on the boundary can be written as,

ub = u− V nnx, vb = v − V nny.

An extrapolated pressure is used to compute the total energy on the wall so that the

solution on the boundary face in 2D is computed as,

U b =

ρ

ρub

ρvbp

γ−1+ 1

2ρ(u2

b + v2b )

. (A.4)


For 3D, the process is very similar,

U b =

ρ

ρub

ρvb

ρwbp

γ−1+ 1

2ρ(u2

b + v2b + w2

b )

. (A.5)

This boundary condition can be applied for symmetry boundary conditions and is

only used for Euler so a solution gradient doesn’t need to be defined.

A.2 No-Slip Adiabatic Wall

On a solid surface where the flow is not allowed to slip [61], the velocity is set to zero.

The prescribed solution on the boundary in 2D is found by extrapolating the density

and pressure so that

U b =

ρ

0

0p

γ−1

, (A.6)

and in 3D,

U b =

ρ

0

0

0p

γ−1

. (A.7)

The solution gradient at the boundary is computed by removing the temperature

derivative normal to the wall from the energy gradient. The temperature gradient is

computed as,

∂T

∂d=∂e

∂d− ∂ρ

∂d

e

ρ− ρ

(u∂u

∂d+ v

∂v

∂d

), d ∈ x, y,


and the temperature derivative normal to the wall is computed as,

∂T

∂n=∑

d∈x,y

∂T

∂dnd.

The energy gradient at the boundary can then be computed as,

∂eb∂d

=∂e

∂d− ∂T

∂nnd, d ∈ x, y.

The remaining gradients at the boundary are extrapolated so that the solution gra-

dient in 2D becomes,

Qbd =

∂ρ∂d∂ρu∂d∂ρv∂d∂eb∂d

, d ∈ x, y. (A.8)

In 3D, the same procedure is used and the solution gradient becomes,

Qbd =

∂ρ∂d∂ρu∂d∂ρv∂d∂ρw∂d∂eb∂d

, d ∈ x, y, z. (A.9)

A.3 Characteristic Riemann Invariant Far Field

On a far field boundary, Riemann invariants for a one dimensional flow normal to the

boundary are used to determine the solution at the boundary face [46]. First, the

velocity normal to the face and the speed of sound is computed using the extrapolated

solution and the free stream values in 2D,

V n = u nx + v ny, V n∞ = u∞nx + v∞ny,


c =

√γp

ρ, c∞ =

√γp∞ρ∞

,

where ∞ denotes freestream values that are set for a specific problem. The Riemann

invariants can then be written as,

R = V n +2c

γ − 1, R∞ = V n

∞ −2c∞γ − 1

.

The normal velocity and speed of sound at the boundary are written as,

V nb =

1

2(R +R∞), cb =

γ − 1

4(R−R∞),

If V n < 0, the flow is entering the domain and the velocity and entropy at the

boundary is computed as,

ub = u∞ + (V nb − V n

∞)nx, vb = v∞ + (V nb − V n

∞)ny, sb =p∞ργ∞

,

otherwise, the flow is exiting the domain and the velocity and entropy at the boundary

is computed as,

ub = u+ (V nb − V n)nx, vb = v + (V n

b − V n)ny, sb =p

ργ.

The density and the pressure at the boundary can be computed from the entropy and

speed of sound so that,

ρb =

(1

γ

c2b

sb

) 1γ−1

, pb =1

γρbc

2b .

The 2D solution at the boundary can then be computed as,

U b =

ρb

ρbub

ρbvbpbγ−1

+ 12ρb(u

2b + v2

b )

. (A.10)


Following the same procedure, the 3D solution at the boundary is computed as,

U b =

ρb

ρbub

ρbvb

ρbwbpbγ−1

+ 12ρb(u

2b + v2

b + w2b )

. (A.11)

The solution gradient at the boundary is the same as the extrapolated gradient,

Qb = Q.

Bibliography

[1] 1st International Workshop on High-Order CFD Methods. Problem C1.1.

Inviscid flow through a channel with a smooth bump. http://dept.ku.edu/

~cfdku/hiocfd/case_c1.1.html. Accessed: 2016-05-07.

[2] 4th International Workshop on High-Order Methods. BL1 - Laminar

Joukowski airfoil at Re=1000. https://how4.cenaero.be/content/

bl1-laminar-joukowski-airfoil-re1000. Accessed: 2017-10-11.

[3] M Ainsworth, P Monk, and W Muniz. Dispersive and dissipative properties of

discontinuous Galerkin finite element methods for the second-order wave equa-

tion. Journal of Scientific Computing, 27(1-3):5–40, 2006.

[4] Mark Ainsworth. Discrete dispersion relation for hp-version finite element ap-

proximation at high wave number. SIAM Journal on Numerical Analysis,

42(2):553–575, 2004.

[5] Mark Ainsworth. Dispersive and dissipative behaviour of high order discon-

tinuous Galerkin finite element methods. Journal of Computational Physics,

198(1):106–130, 2004.

[6] Roger Alexander. Diagonally implicit Runge–Kutta methods for stiff ODEs.

SIAM Journal on Numerical Analysis, 14(6):1006–1021, 1977.

[7] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable. Bulletin

of the American mathematical Society, 82(5):711–712, 1976.

175

http://dept.ku.edu/~cfdku/hiocfd/case_c1.1.html

http://dept.ku.edu/~cfdku/hiocfd/case_c1.1.html

https://how4.cenaero.be/content/bl1-laminar-joukowski-airfoil-re1000

https://how4.cenaero.be/content/bl1-laminar-joukowski-airfoil-re1000

BIBLIOGRAPHY 176

[8] Kartikey Asthana. Analysis and design of optimal discontinous finite element

schemes. PhD thesis, Stanford University, 2016.

[9] Kartikey Asthana and Antony Jameson. High-order flux reconstruction schemes

with minimal dispersion and dissipation. Journal of Scientific Computing, pages

1–32, 2014.

[10] Kartikey Asthana, Manuel R Lopez-Morales, and Antony Jameson. Non-linear

stabilization of high-order flux reconstruction schemes via Fourier-spectral filter-

ing. Journal of Computational Physics, 303:269–294, 2015.

[11] Kartikey Asthana, Jerry Watkins, and Antony Jameson. On the rate of con-

vergence of flux reconstruction for steady-state problems. SIAM Journal on

Numerical Analysis, 54(5):2910–2937, 2016.

[12] Kartikey Asthana, Jerry Watkins, and Antony Jameson. On consistency and

rate of convergence of flux reconstruction for time-dependent problems. Journal

of Computational Physics, 334:367–391, 2017.

[13] F Bassi, A Colombo, C De Bartolo, N Franchina, A Ghidoni, and A Nigro. Inves-

tigation of high-order temporal schemes for the discontinuous Galerkin solution

of the Navier–Stokes equations. In Joint 11th World Congress on Computational

Mechanics, WCCM 2014, the 5th European Conference on Computational Me-

chanics, ECCM 2014 and the 6th European Conference on Computational Fluid

Dynamics, ECFD 2014, pages 5651–5662. International Center for Numerical

Methods in Engineering, 2014.

[14] Francesco Bassi and Stefano Rebay. A high-order accurate discontinuous finite

element method for the numerical solution of the compressible Navier–Stokes

equations. Journal of computational physics, 131(2):267–279, 1997.

[15] Patrice Castonguay. High-order energy stable flux reconstruction schemes for

fluid flow simulations on unstructured grids. PhD thesis, Stanford University,

2012.

BIBLIOGRAPHY 177

[16] Patrice Castonguay, Peter E Vincent, and Antony Jameson. A new class of high-

order energy stable flux reconstruction schemes for triangular elements. Journal

of Scientific Computing, 51(1):224–256, 2012.

[17] Patrice Castonguay, David M Williams, Peter E Vincent, Manuel Lopez, and

Antony Jameson. On the development of a high-order, multi-GPU enabled,

compressible viscous flow solver for mixed unstructured grids. AIAA paper,

3229:2011, 2011.

[18] Kennedy Christopher A and Carpenter Mark H. Additive Runge-Kutta schemes

for convection-diffusion-reaction equations. 2001.

[19] Bernardo Cockburn and Bo Dong. An analysis of the minimal dissipation local

discontinuous Galerkin method for convection–diffusion problems. Journal of

Scientific Computing, 32(2):233–262, 2007.

[20] Bernardo Cockburn, George E Karniadakis, and Chi-Wang Shu. The develop-

ment of discontinuous Galerkin methods. Springer, 2000.

[21] Bernardo Cockburn and Chi-Wang Shu. TVB Runge-Kutta local projection

discontinuous Galerkin finite element method for conservation laws. II. General

framework. Mathematics of computation, 52(186):411–435, 1989.

[22] Bernardo Cockburn and Chi-Wang Shu. The local discontinuous Galerkin

method for time-dependent convection-diffusion systems. SIAM Journal on Nu-

merical Analysis, 35(6):2440–2463, 1998.

[23] Bernardo Cockburn and Chi-Wang Shu. Runge–Kutta discontinuous Galerkin

methods for convection-dominated problems. Journal of scientific computing,

16(3):173–261, 2001.

[24] Christopher Cox, Chunlei Liang, and Michael W Plesniak. A high-order method

for solving unsteady incompressible Navier-Stokes equations with implicit time

stepping on unstructured grids. In 53rd AIAA Aerospace Sciences Meeting, page

0830, 2015.

BIBLIOGRAPHY 178

[25] Christopher Cox, Chunlei Liang, and Michael W Plesniak. A high-order solver

for unsteady incompressible Navier–Stokes equations using the flux reconstruc-

tion method on unstructured grids with implicit dual time stepping. Journal of

Computational Physics, 314:414–435, 2016.

[26] Laslo T Diosady and Scott M Murman. Tensor-product preconditioners for

higher-order space–time discontinuous Galerkin methods. Journal of Computa-

tional Physics, 330:296–318, 2017.

[27] Richard S Falk. Analysis of finite element methods for linear hyperbolic problems.

In Discontinuous Galerkin Methods, pages 103–112. Springer, 2000.

[28] Krzysztof J Fidkowski, Todd A Oliver, James Lu, and David L Darmo-

fal. p-multigrid solution of high-order discontinuous Galerkin discretizations of

the compressible Navier–Stokes equations. Journal of Computational Physics,

207(1):92–113, 2005.

[29] Marshall C. Galbraith and Carl Ollivier-Gooch. BI2 - smooth bump, BL1 -

laminar airfoil & BR1 - turbulent airfoil. https://how4.cenaero.be/system/

files/filedepot/12/BI2_BL1_BR1_Summary_Galbraith.pdf. Accessed: 2017-

10-11.

[30] Haiyang Gao and ZJ Wang. A conservative correction procedure via reconstruc-

tion formulation with the chain-rule divergence evaluation. Journal of Compu-

tational Physics, 232(1):7–13, 2013.

[31] Michael B Giles and Endre Suli. Adjoint methods for PDEs: a posteriori error

analysis and postprocessing by duality. Acta numerica, 11:145–236, 2002.

[32] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU

Press, 2012.

[33] Gael Guennebaud, Benoit Jacob, et al. Eigen: a C++ linear algebra library.

http://eigen.tuxfamily.org. Accessed: 2017-05-7.

https://how4.cenaero.be/system/files/filedepot/12/BI2_BL1_BR1_Summary_Galbraith.pdf

https://how4.cenaero.be/system/files/filedepot/12/BI2_BL1_BR1_Summary_Galbraith.pdf

http://eigen.tuxfamily.org

BIBLIOGRAPHY 179

[34] Ernst Hairer and Gerhard Wanner. Solving ordinary differential equations. II,

volume 14 of Springer series in computational mathematics, 1996.

[35] Jan S Hesthaven and Tim Warburton. Nodal discontinuous Galerkin methods:

algorithms, analysis, and applications, volume 54. Springer Science & Business

Media, 2007.

[36] Malte Hoffmann, Claus-Dieter Munz, and ZJ Wang. Efficient implementation

of the CPR formulation for the Navier–Stokes equations on GPUs. In Seventh

International Conference on Computational Fluid Dynamics (ICCFD7), 2012.

[37] Fang Q Hu and Harold L Atkins. Eigensolution analysis of the discontinuous

Galerkin method with nonuniform grids: I. one space dimension. Journal of

Computational Physics, 182(2):516–545, 2002.

[38] Fang Q Hu and Harold L Atkins. Two-dimensional wave analysis of the discon-

tinuous Galerkin method with non-uniform grids and boundary conditions. In

Proceedings of the 8th AIAA/CEAS Aeroacoustics Conference, 2002.

[39] Fang Q Hu, MY Hussaini, and Patrick Rasetarinera. An analysis of the discontin-

uous Galerkin method for wave propagation problems. Journal of Computational

Physics, 151(2):921–946, 1999.

[40] Thomas JR Hughes. The finite element method: linear static and dynamic finite

element analysis. Courier Corporation, 2012.

[41] HT Huynh. A flux reconstruction approach to high-order schemes including

discontinuous Galerkin methods. AIAA paper, 4079:2007, 2007.

[42] Hung T Huynh. A reconstruction approach to high-order schemes including

discontinuous Galerkin for diffusion. AIAA paper, 403:2009, 2009.

[43] Wolfram Research, Inc. Mathematica, Version 11.2. Champaign, IL, 2017.

[44] Antony Jameson. Time dependent calculations using multigrid, with applications

to unsteady flows past airfoils and wings. AIAA paper, 1596:1991, 1991.

BIBLIOGRAPHY 180

[45] Antony Jameson. Application of dual time stepping to fully implicit Runge

Kutta schemes for unsteady flow calculations. In 22nd AIAA Computational

Fluid Dynamics Conference, page 2753, 2015.

[46] Antony Jameson and Timothy Baker. Solution of the Euler equations for complex

configurations. In 6th Computational Fluid Dynamics Conference Danvers, page

1929, 1983.

[47] Antony Jameson, Peter E Vincent, and Patrice Castonguay. On the non-

linear stability of flux reconstruction schemes. Journal of Scientific Computing,

50(2):434–445, 2012.

[48] Claes Johnson and Juhani Pitkaranta. An analysis of the discontinuous Galerkin

method for a scalar hyperbolic equation. Mathematics of computation, 46(173):1–

26, 1986.

[49] Carl Timothy Kelley and David E Keyes. Convergence analysis of pseudo-

transient continuation. SIAM Journal on Numerical Analysis, 35(2):508–523,

1998.

[50] Christopher A Kennedy, Mark H Carpenter, and R Michael Lewis. Low-storage,

explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. Ap-

plied numerical mathematics, 35(3):177–219, 2000.

[51] Andreas Klockner, Tim Warburton, Jeff Bridge, and Jan S Hesthaven. Nodal dis-

continuous Galerkin methods on graphics processors. Journal of Computational

Physics, 228(21):7863–7882, 2009.

[52] David A Kopriva and John H Kolias. A conservative staggered-grid Chebyshev

multidomain method for compressible flow. Technical report, DTIC Document,

1995.

[53] P Lasaint and PA Raviart. On a finite element method for solving the neu-

tron transport equation. In Mathematical Aspects of Finite Elements in Partial

BIBLIOGRAPHY 181

Differential Equations, Proceedings of a Symposium Conducted by the Mathemat-

ics Research Center, the University of Wisconsin-Madison, Madison, WI, USA,

pages 1–3, 1974.

[54] Sanjiva K Lele. Compact finite difference schemes with spectral-like resolution.

Journal of Computational Physics, 103(1):16–42, 1992.

[55] P Lesaint and Pierre-Arnaud Raviart. On a finite element method for solving the

neutron transport equation. Mathematical aspects of finite elements in partial

differential equations, (33):89–123, 1974.

[56] Chunlei Liang, Ravi Kannan, and ZJ Wang. A p-multigrid spectral difference

method with explicit and implicit smoothers on unstructured triangular grids.

Computers & fluids, 38(2):254–265, 2009.

[57] Yen Liu, Marcel Vinokur, and ZJ Wang. Spectral difference method for unstruc-

tured grids I: basic formulation. Journal of Computational Physics, 216(2):780–

801, 2006.

[58] M Lopez-Morales, Jonathan Bull, Jacob Crabill, Thomas D Economon, David

Manosalvas, Joshua Romero, Abhishek Sheshadri, JE Watkins, David Williams,

Francisco Palacios, et al. Verification and validation of HiFiLES: a high-order

LES unstructured solver on multi-GPU platforms. In 32nd AIAA applied aero-

dynamics conference, Atlanta, Georgia, USA, pages 16–20, 2014.

[59] R. W. MacCormack. Current status of numerical solutions of the Navier-Stokes

equations. AIAA paper, 32:1985, 1985.

[60] Robert William MacCormack. A numerical method for solving the equations of

compressible viscous flow. AIAA journal, 20(9):1275–1281, 1982.

[61] Gianmarco Mengaldo, Daniele De Grazia, J Peiro, Antony Farrington, F With-

erden, PE Vincent, and SJ Sherwin. A guide to the implementation of boundary

conditions in compact high-order methods for compressible aerodynamics. In

BIBLIOGRAPHY 182

7th AIAA Theoretical Fluid Mechanics Conference, AIAA Aviation, American

Institute of Aeronautics and Astronautics, 2014.

[62] RC Moura, SJ Sherwin, and J Peiro. Linear dispersion–diffusion analysis and

its application to under-resolved turbulence simulations using discontinuous

Galerkin spectral/hp methods. Journal of Computational Physics, 298:695–710,

2015.

[63] Cristian R Nastase and Dimitri J Mavriplis. High-order discontinuous Galerkin

methods using an hp-multigrid approach. Journal of Computational Physics,

213(1):330–357, 2006.

[64] A Nigro, A Ghidoni, Stefano Rebay, and Francesco Bassi. Modified extended

BDF scheme for the discontinuous Galerkin solution of unsteady compressible

flows. International Journal for Numerical Methods in Fluids, 76(9):549–574,

2014.

[65] NVIDIA. nvprof. https://docs.nvidia.com/cuda/profiler-users-guide/.

Accessed: 2017-11-09.

[66] NVIDIA. CUBLAS library. https://developer.nvidia.com/cublas. Ac-

cessed: 2016-05-23.

[67] Will Pazner and Per-Olof Persson. High-order DNS and LES simulations us-

ing an implicit tensor-product discontinuous Galerkin method. In 23rd AIAA

Computational Fluid Dynamics Conference, page 3948, 2017.

[68] Will Pazner and Per-Olof Persson. Stage-parallel fully implicit Runge–Kutta

solvers for discontinuous Galerkin fluid simulations. Journal of Computational

Physics, 335:700–717, 2017.

[69] Will Pazner and Per-Olof Persson. Approximate tensor-product preconditioners

for very high order discontinuous Galerkin methods. Journal of Computational

Physics, 354(Supplement C):344 – 369, 2018.

https://docs.nvidia.com/cuda/profiler-users-guide/

https://developer.nvidia.com/cublas

BIBLIOGRAPHY 183

[70] P-O Persson and Jaime Peraire. Newton-GMRES preconditioning for discontin-

uous Galerkin discretizations of the Navier–Stokes equations. SIAM Journal on

Scientific Computing, 30(6):2709–2733, 2008.

[71] Per-Olof Persson. High-order LES simulations using implicit-explicit Runge-

Kutta schemes. In Proceedings of the 49th AIAA Aerospace Sciences Meeting

and Exhibit, AIAA, volume 684, 2011.

[72] Per-Olof Persson. A sparse and high-order accurate line-based discontinuous

Galerkin method for unstructured meshes. Journal of Computational Physics,

233:414–429, 2013.

[73] Niles A Pierce and Michael B Giles. Adjoint recovery of superconvergent func-

tionals from PDE approximations. SIAM review, 42(2):247–264, 2000.

[74] Roldon Pozo. Template Numerical Toolkit. http://math.nist.gov/tnt/. Ac-

cessed: 2016-05-23.

[75] Patrick Rasetarinera and MY Hussaini. An efficient implicit discontinuous spec-

tral Galerkin method. Journal of Computational Physics, 172(2):718–738, 2001.

[76] Wm H Reed and TR Hill. Triangular mesh methods for the neutron transport

equation. Los Alamos Report LA-UR-73-479, 1973.

[77] J Romero, K Asthana, and A Jameson. A simplified formulation of the flux

reconstruction method. Journal of Scientific Computing, pages 1–24, 2015.

[78] J Romero, FD Witherden, and A Jameson. A direct flux reconstruction scheme

for advection–diffusion problems on triangular grids. Journal of Scientific Com-

puting, pages 1–30, 2017.

[79] Joshua Romero. On the development of the direct flux reconstruction scheme for

high-order fluid flow simulations. PhD thesis, Stanford University, 2018.

[80] Joshua Romero and Antony Jameson. Extension of the flux reconstruction

method to triangular elements using collapsed-edge quadrilaterals. In 54th AIAA

Aerospace Sciences Meeting, page 1825, 2016.

http://math.nist.gov/tnt/

BIBLIOGRAPHY 184

[81] T Strouboulis and JT Oden. A posteriori error estimation of finite element

approximations in fluid mechanics. Computer methods in applied mechanics and

engineering, 78(2):201–242, 1990.

[82] Endre Suli. A posteriori error analysis and adaptivity for finite element approx-

imations of hyperbolic problems. In An introduction to recent developments in

theory and numerics for conservation laws, pages 123–194. Springer, 1999.

[83] Yuzhi Sun, ZJ Wang, and Yen Liu. Efficient implicit non-linear LU-SGS approach

for compressible flow computation using high-order spectral difference method.

Comput. Phys, 5(2-4):760–778, 2009.

[84] Thomas Toulorge and Wim Desmet. CFL conditions for Runge–Kutta discontin-

uous Galerkin methods on triangular grids. Journal of Computational Physics,

230(12):4657–4678, 2011.

[85] Jacobus JW van der Vegt and H Van der Ven. Space–time discontinuous Galerkin

finite element method with dynamic grid motion for inviscid compressible flows:

I. General formulation. Journal of Computational Physics, 182(2):546–585, 2002.

[86] John C Vassberg and Antony Jameson. In pursuit of grid convergence for two-

dimensional Euler solutions. Journal of Aircraft, 47(4):1152–1166, 2010.

[87] Peter E Vincent, Patrice Castonguay, and Antony Jameson. Insights from von

Neumann analysis of high-order flux reconstruction schemes. Journal of Com-

putational Physics, 230(22):8134–8154, 2011.

[88] Peter E Vincent, Patrice Castonguay, and Antony Jameson. A new class of high-

order energy stable flux reconstruction schemes. Journal of Scientific Computing,

47(1):50–72, 2011.

[89] Li Wang and Dimitri J Mavriplis. Implicit solution of the unsteady Euler equa-

tions for high-order accurate discontinuous Galerkin discretizations. Journal of

Computational Physics, 225(2):1994–2015, 2007.

BIBLIOGRAPHY 185

[90] ZJ Wang, Krzysztof Fidkowski, Remi Abgrall, Francesco Bassi, Doru Caraeni,

Andrew Cary, Herman Deconinck, Ralf Hartmann, Koen Hillewaert, HT Huynh,

et al. High-order CFD methods: current status and perspective. International

Journal for Numerical Methods in Fluids, 72(8):811–845, 2013.

[91] ZJ Wang and Haiyang Gao. A unifying lifting collocation penalty formulation in-

cluding the discontinuous Galerkin, spectral volume/difference methods for con-

servation laws on mixed grids. Journal of Computational Physics, 228(21):8161–

8186, 2009.

[92] Jerry Watkins. Navier-Stokes Jacobians. https://github.com/jewatkins/

navier-stokes-jacobians, 2017. Accessed: 2017-11-11.

[93] Jerry Watkins, Kartikey Asthana, and Antony Jameson. A numerical analy-

sis of the nodal discontinuous Galerkin scheme via flux reconstruction for the

advection-diffusion equation. Computers & Fluids, 139:233–247, 2016.

[94] Jerry Watkins, Joshua Romero, and Antony Jameson. Multi-GPU, implicit time

stepping for high-order methods on unstructured grids. In 46th AIAA Fluid

Dynamics Conference, page 3965, 2016.

[95] DM Williams, Patrice Castonguay, Peter E Vincent, and Antony Jameson. En-

ergy stable flux reconstruction schemes for advection–diffusion problems on tri-

angles. Journal of Computational Physics, 250:53–76, 2013.

[96] DM Williams and A Jameson. Energy stable flux reconstruction schemes for

advection–diffusion problems on tetrahedra. Journal of Scientific Computing,

59(3):721–759, 2014.

[97] Freddie D Witherden, Antony M Farrington, and Peter E Vincent. PyFR: An

open source framework for solving advection–diffusion type problems on stream-

ing architectures using the flux reconstruction approach. Computer Physics Com-

munications, 185(11):3028–3040, 2014.

https://github.com/jewatkins/navier-stokes-jacobians

https://github.com/jewatkins/navier-stokes-jacobians

BIBLIOGRAPHY 186

[98] Freddie David Witherden. On the Development and Implementation of High-

Order Flux Reconstruction Schemes for Computational Fluid Dynamics. PhD

thesis, PhD thesis, Imperial College London, 2015.

numerical analysis and implicit time stepping for high...

Documents