numerical analysis and implicit time stepping for high...
TRANSCRIPT
NUMERICAL ANALYSIS AND IMPLICIT TIME STEPPING FOR
HIGH-ORDER, FLUID FLOW SIMULATIONS ON
GPU ARCHITECTURES
A DISSERTATION
SUBMITTED TO THE DEPARTMENT OF
AERONAUTICS AND ASTRONAUTICS
AND THE COMMITTEE ON GRADUATE STUDIES
OF STANFORD UNIVERSITY
IN PARTIAL FULFILLMENT OF THE REQUIREMENTS
FOR THE DEGREE OF
DOCTOR OF PHILOSOPHY
Jerry Watkins
December 2017
http://creativecommons.org/licenses/by-nc/3.0/us/
This dissertation is online at: http://purl.stanford.edu/xw593rh0121
© 2017 by Jerry Enrique Watkins, II. All Rights Reserved.
Re-distributed by Stanford University under license with the author.
This work is licensed under a Creative Commons Attribution-Noncommercial 3.0 United States License.
ii
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Antony Jameson, Primary Adviser
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Juan Alonso
I certify that I have read this dissertation and that, in my opinion, it is fully adequatein scope and quality as a dissertation for the degree of Doctor of Philosophy.
Sanjiva Lele
Approved for the Stanford University Committee on Graduate Studies.
Patricia J. Gumport, Vice Provost for Graduate Education
This signature page was generated electronically upon submission of this dissertation in electronic format. An original signed hard copy of the signature page is on file inUniversity Archives.
iii
Abstract
High-order discontinuous finite element methods are becoming increasingly more pop-
ular for simulations of vortex dominated flows over complex geometries because they
are inherently less dissipative than traditional second-order finite volume methods.
In particular, the Flux Reconstruction (FR) approach has gained popularity because
it not only offers less dissipation but is also eminently parallelizable on multi-core
processors and accelerators. An extensive amount of research has been performed
for accelerated explicit methods for FR but more realistic simulations often require
high aspect ratio meshes where the maximum stable time step or CFL condition is
determined by the smallest cell. This can lead to significant limitations in explicit
time integration.
In this dissertation, eigensolution analysis is performed to study the effect of
these limitations on the stability, dissipation and dispersion properties of the nodal
Discontinuous Galerkin (DG) scheme via FR for advection-diffusion. It is shown that
the CFL condition for advection-diffusion is stricter than that for pure-advection
or pure-diffusion individually and a suitable estimate for the maximum stable time
step is proposed. The CFL condition is strongly influenced by the choice of interface
fluxes and, in general, the condition for a scheme using centered values is much higher
than that which has one-sided values. It is also shown that schemes with centered
interface values produce less error for well resolved solutions while schemes with one-
sided interface values produce less error for solutions that are under-resolved. These
results are verified for one- and two-dimensional advection-diffusion of an approximate
Gaussian and two-dimensional Couette flow.
In addition to eigensolution analysis, a multi-GPU, implicit time stepping method
iv
is developed, implemented and tested in order to study the feasibility of an implicit
scheme on modern hardware. It is shown that a nonlinear solver can be constructed
for the Direct Flux Reconstruction (DFR) method which maintains element-locality
and high arithmetic intensity with minimal communication costs on modern GPU
clusters. Analytical element local Jacobians for the DFR method are derived and a
Kronecker product formulation is used to reduce the time complexity from O(P 3d−1
)to O
(P d+1
)for Euler simulations and O
(P 3d)
to O(P d+2
)for Navier-Stokes where
P is the degree of the Lagrange basis polynomials and d is the spatial dimension.
Numerical results are obtained for the inviscid bump, inviscid NACA0012 airfoil,
isentropic vortex, laminar Joukowski airfoil and three-dimensional half cylinder with
a Reynolds number of 1000.
v
Acknowledgments
As a student who has benefited greatly from the financial support of grants, scholar-
ships and fellowships throughout my university education, I feel it is only fitting that
I begin by acknowledging the Rose M. Chappelear Memorial Fund and the National
Science Foundation Graduate Research Fellowship Program for supporting me finan-
cially during my graduate level education. I could not have completed my education
without their support and I consider myself very fortunate to have been selected
among the many applicants. For this, I am very grateful.
There are many people who have made my experience at Stanford a memorable
one, but my advisor, Professor Antony Jameson, stands out as someone who has
strongly influenced my academic development. As an advisor, he has always given
me the freedom to choose my own research path and has always been available to
provide valuable feedback. This has given me the opportunity to grow as both an
independent thinker and a researcher. His level of expertise in a wide variety of
subjects continues to fascinate me and I often enjoyed our conversations. It has been
a privilege to have had him as a mentor and I am very grateful for his support.
I would also like to express my gratitude to Professor Juan J. Alonso and Profes-
sor Sanjiva Lele for being a part of my dissertation reading committee and providing
valuable feedback towards the improvement of my dissertation. My gratitude also
extends to Professor Matthias Ihme and Professor Michael Saunders for participat-
ing in my oral examination and providing insightful comments about my research.
I would also like to thank Professor Margot Gerritsen for allowing me to perform
my numerical experiments on the XStream GPU cluster, supported by the National
Science Foundation Major Research Instrumentation program (ACI-1429830).
vi
I’m extremely grateful to my colleagues in the Aerospace Computing Laboratory
who have made my academic experience very enjoyable. First, I’d like to thank David
Williams for teaching me the fundamentals of high-order methods during my first year
of graduate school. I’d also like to thank Kartikey Asthana for being an exceptional
co-author and friend. His expertise in numerical analysis is matched only by his skill
with a guitar, and my experience at Stanford would not have been the same without
the late night jam sessions with him, Akshay and Zach. I’d like to give a special
thanks to Joshua Romero for leading the effort in developing ZEFR, the software
used for the second part of this dissertation, and for co-authoring our first paper
on implicit time stepping. I’ve benefited immensely from his expertise on scientific
computing. I’m also grateful to Jacob Crabill for his help with research over the past
few years and Freddie Witherden for providing numerical expertise and exceptional
feedback as a mentor. Freddie has been an invaluable asset and I’ve learned a great
deal about high-order methods and high performance computing from him. I would
also like to thank David Manosalvas, Manuel Lopez, Abhishek Sheshadri, Jonathan
Chiew and Jonathan Bull for their insight and support throughout my studies.
I’d like to acknowledge my mentor at Sandia National Laboratories, Irina Tezaur,
for being extremely supportive during my last year in graduate school. As a Sandia
intern, she gave me a comprehensive introduction to Sandia National Laboratories
and I have gained valuable experience which has helped me complete my degree.
My friends and family have also been overwhelmingly supportive during my time in
graduate school. I’d like to thank Kristopher Bravo, Matthew Rutledge, Joshua Day,
John Bitcon, Ashley Rutledge, Nicole Siminski-Day, Alan Gamage, Bobby Kianmajd,
Christina Lee and Jade Ontivmaemura for always being great friends and providing
unconditional support. I’m grateful for having such a loving family who has always
been there for me. I want to thank my father, Andy Watkins, for teaching me the
value of hard work, my mother, Petra Watkins, for teaching me the importance of
generosity, my siblings, Andrew Watkins and Xiomara Watkins-Breschi, for teaching
me to follow my dreams and my brother-in-law, Scott Breschi, for providing a positive
outlook on life. I also want to acknowledge my niece and nephew, Camila and Drew
Watkins, in the hope that this might inspire them to pursue a higher education.
vii
Lastly, this dissertation is dedicated to my loving girlfriend, Diana Portillo, who
has stood by my side throughout my entire graduate studies. I’m sincerely grateful
for having her in my life. Her love and support has made this work possible.
viii
Preface
Fluid motion is a complex physical phenomenon that can be found in a variety of
different applications ranging from the analysis of blood flow in the cardiovascular
system to the supersonic flight of aircraft. The primary tool used to analyze this phe-
nomenon is computational fluid dynamics (CFD). The modern scientist or engineer
relies heavily on CFD to obtain an accurate representation of fluid motion. However,
current generation CFD software deployed in industry is not capable of accurately
predicting transient, highly separated flows over complex geometries. Examples of
such flows include high-resolution wingtip vortices from an aircraft during take-off or
the complex vortex structure generated by the blades of a rotorcraft.
These high-fidelity simulations often require a massive amount of memory and
computation to provide sufficient resolution. This often requires the use of large high-
performance computing clusters which have undergone a dramatic change over the
past several years. Memory constrained accelerators characterized by an abundance
of computational power relative to memory bandwidth are now commonplace and
many CFD codes are now having trouble scaling on modern heterogeneous clusters.
This problem will only become worse as pre-exascale systems come online over the
next few years. The design of scalable CFD software will need to be completely
reconsidered in order to balance cheap arithmetic and expensive memory fetches with
limitations on the amount of high-bandwidth memory.
There are two primary reasons why current generation CFD software is unable to
adequately resolve complex flow structures on large scale computing clusters. First,
numerical methods, typically second order accurate finite volume schemes, have a ten-
dency to be overly dissipative. Hence, they require an excessive amount of resolution
ix
in order to successfully track complex flow features over time. Second, the methods
themselves are not well suited to the requirements of modern hardware platforms
which typically include coprocessors and Graphical Processing Units (GPUs). These
limitations have prevented scientists and engineers from conducting the types of high
fidelity simulations that accurately resolve the aforementioned phenomena.
The primary motivation of this dissertation has been to address these issues
through the development of high-order methods. High-order methods refer to a col-
lection of numerical schemes whose spatial accuracy is at least third order. These
methods can reduce numerical dissipation and improve the accuracy of a simulation
at a reduced computational cost compared to traditional second-order methods [90].
They also tend to have more computational work per degree of freedom which means
they benefit more from the high computational throughput on modern hardware.
Discontinuous Galerkin (DG) methods [76, 55, 20] have been the focal point of
recent efforts in developing high-order CFD codes which solve the Navier-Stokes equa-
tions because of their ability to simulate flows around complex geometries. Similar
to continuous finite element methods, DG methods utilize basis and test functions to
obtain high-order accuracy. However, since the solution is allowed to be discontinuous
at element interfaces, they are able to maintain conservation through a methodology
similar to finite volume methods. Methods of particular interest include collocation
based nodal DG [35] and spectral difference (SD) [52, 57] methods which have been
widely adopted because of their simplicity and efficiency.
In 2007, Huynh [41] proposed a Flux Reconstruction (FR) approach for tensor-
product elements that provides a generalized differential framework for recovering
both the nodal DG and SD schemes for linear advection. In 2009, he further extended
this framework to diffusion problems [42]. Since then, a proof of linear stabiliy for
a specific class of FR schemes called the energy stable FR (ESFR) schemes [88] has
been developed along with theory on non-linear stability and the role of aliasing errors
[47, 10]. These schemes have also been successfully extended to triangular [16, 95]
and tetrahedral [96] elements which further improved their capabilities on complex
geometry. Other frameworks such as the Correction Procedure via Reconstruction
(CPR) [30] have been proposed that unify the FR and the Lifting Collocation Penalty
x
(LCP) [91] formulations. Recently, the direct Flux Reconstruction (DFR) method
has been developed as a simplified formulation of the FR method that reduces the
theoretical and implementation complexity of the FR method [77].
Along with high-order accuracy, which serves to reduce numerical dissipation, the
FR approach allows for the formulation of an explicit semi-discrete equation with
a majority of operations being element-local. This is distinct from a typical finite
element method which requires the assembly of a global matrix. Within the context
of an explicit time stepping method, the FR approach is highly parallelizable on
modern architectures because it is able to make efficient use of fast-access memory
when performing element local operations.
Thus far, the majority of research in scalable methods on modern architectures
using FR has focused on explicit time stepping on GPUs. For example, in 2009,
Klockner et al. [51] demonstrated that a consumer graphics processor could be used to
accelerate a three-dimensional, unstructured nodal DG solver for Maxwell’s equations.
Shortly after, Castonguay et al. [17] developed a three-dimensional, unstructured
FR solver for the Navier-Stokes equations on GPU architectures. This solver was
then extended for Large Eddy Simulations (LES) and the Reynolds Averaged Navier-
Stokes (RANS) equations and released under an open source license using the name
HiFiLES [58]. The most successful implementation of the FR schemes on GPUs has
been PyFR [97] which has seen widespread success due to its Python based framework
which allows the code to be executed on a range of hardware platforms.
Accelerated explicit methods have been researched extensively for flow simula-
tions with low to moderate Reynolds number but more realistic applications in the
aerospace field are high Reynolds number flows with thin boundary layers. These
types of flows often require high aspect ratio meshes where the maximum stable time
step or CFL condition is determined by the smallest cell. This can lead to signif-
icant limitations in explicit time integration because the time step requirement for
numerical stability may be much smaller than the time step needed for accuracy. Of-
ten times, a reliable estimate for the maximum stable time step is not available and
researchers must rely on empirically determined estimates or a step size controller
[34]. In this situation, the solution can become unstable and large scale simulations
xi
can become unfeasible. Unsteady solutions are often generated using explicit schemes
near their numerical stability limit so understanding the numerical properties of the
scheme near this limit becomes increasingly important.
Implicit methods allow for larger time steps and can enable a time step to be
chosen based on the desired accuracy rather than numerical stability. However, an
implicit method often requires the construction of a global matrix which can be
prohibitively expensive for higher polynomial orders. The high memory requirements
and low arithmetic intensity of constructing a global matrix is often what makes
implicit methods unfeasible for modern hardware. This also makes it difficult to
compete with explicit methods on large GPU clusters.
These major limitations on explicit and implicit time stepping methods must be
addressed before reliable simulations of high Reynolds number flow over aircraft or
rotor blades can be accomplished with high-order methods on GPU architectures.
This dissertation focuses on addressing these issues in two parts.
In Part I, an eigensolution analysis of the FR formulation is performed on the linear
advection-diffusion equation to investigate the stability, dissipation and dispersion
properties associated with the nodal DG scheme for explicit time integration. Some
of the major contributions in Part I include:
• A verification that the CFL condition for advection-diffusion is stricter than
that for pure-advection and pure-diffusion individually,
• A maximum stable time step estimate for the linear advection-diffusion and
Navier-Stokes equations on unstructured, tensor product elements,
• An analysis showing that schemes with centered interface values produce less
error for well resolved solutions while schemes with one-sided interface values
produce less error for solutions that are under-resolved,
• A verification that the CFL condition for schemes with centered interface values
is much higher than schemes with one-sided interface values.
In Part II, an implicit time stepping method for GPU architectures is designed,
implemented and tested in order to study the feasibility of an implicit method on
xii
modern hardware. The major contributions in Part II include:
• An analysis showing that a high-order, implicit time stepping method can be
implemented efficiently on multi-GPU architectures,
• A reduction in the time complexity of computing analytical element local Jaco-
bians for advection-diffusion, Euler and the Navier-Stokes equations.
xiii
Contents
Abstract iv
Acknowledgments vi
Preface ix
I Numerical Analysis for Advection-Diffusion 1
1 Introduction 2
2 Eigensolution Analysis 6
2.1 Eigensolution Analysis for 1D Advection-Diffusion . . . . . . . . . . . 6
2.1.1 Problem specification . . . . . . . . . . . . . . . . . . . . . . . 7
2.1.2 The Flux Reconstruction method . . . . . . . . . . . . . . . . 8
2.1.3 Analytical solution of the semi-discrete and exact equations . 11
2.1.4 Stability, dissipation, dispersion and modal weights . . . . . . 12
2.1.5 Dissipation and dispersion error of nodal DG for advection-
diffusion . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13
2.2 Spectral Comparison . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.2.1 Relative error and resolving efficiency . . . . . . . . . . . . . . 17
2.2.2 Spectral comparison of nodal DG with different interface fluxes 19
3 CFL Restrictions and Time Step Estimates 21
3.1 Time Integration and CFL Restrictions . . . . . . . . . . . . . . . . . 21
xiv
3.1.1 Multi-stage explicit time integration scheme . . . . . . . . . . 22
3.1.2 Stability Criteria . . . . . . . . . . . . . . . . . . . . . . . . . 22
3.1.3 CFL restrictions on advection-diffusion . . . . . . . . . . . . . 23
3.1.4 CFL restrictions of nodal DG with different interface fluxes . . 23
3.2 Time Step Estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
3.2.1 Conservative prediction method . . . . . . . . . . . . . . . . . 25
3.2.2 Extension to tensor product elements . . . . . . . . . . . . . . 27
3.2.3 Extension to Navier-Stokes equations . . . . . . . . . . . . . . 29
4 Numerical Experiments 31
4.1 1D Advection-Diffusion of an approximate Gaussian . . . . . . . . . . 32
4.2 2D Advection-Diffusion of an approximate Gaussian . . . . . . . . . . 35
4.3 2D Couette Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
II Implicit Time Stepping on GPU Architectures 42
5 Introduction 43
6 The Direct Flux Reconstruction Method 45
6.1 One-Dimensional Formulation for Advection-Diffusion . . . . . . . . . 46
6.1.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . 46
6.1.2 Direct Flux Reconstruction . . . . . . . . . . . . . . . . . . . 47
6.2 Two-Dimensional Formulation for Advection-Diffusion . . . . . . . . 52
6.2.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . 52
6.2.2 Direct Flux Reconstruction . . . . . . . . . . . . . . . . . . . 54
6.3 Three-Dimensional Formulation for Advection-Diffusion . . . . . . . . 61
6.3.1 Problem Specification . . . . . . . . . . . . . . . . . . . . . . . 61
6.3.2 Direct Flux Reconstruction . . . . . . . . . . . . . . . . . . . 62
6.4 Extension to Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . 66
6.4.1 The Euler Equations . . . . . . . . . . . . . . . . . . . . . . . 66
6.4.2 The Navier-Stokes Equations . . . . . . . . . . . . . . . . . . 67
6.4.3 Direct Flux Reconstruction . . . . . . . . . . . . . . . . . . . 69
xv
7 Time Stepping Schemes 70
7.1 Time Accurate Schemes . . . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.1 Explicit Runge-Kutta . . . . . . . . . . . . . . . . . . . . . . . 71
7.1.2 Diagonally Implicit Runge-Kutta . . . . . . . . . . . . . . . . 72
7.1.3 Step size control . . . . . . . . . . . . . . . . . . . . . . . . . 73
7.2 Pseudo Time Stepping . . . . . . . . . . . . . . . . . . . . . . . . . . 74
7.2.1 The Global Linear System . . . . . . . . . . . . . . . . . . . . 75
7.2.2 Block Iterative Methods . . . . . . . . . . . . . . . . . . . . . 76
7.3 Dual Time Stepping . . . . . . . . . . . . . . . . . . . . . . . . . . . 81
7.3.1 Application to Diagonally Implicit Runge-Kutta Schemes . . . 81
8 The Element Local Jacobian Matrix 83
8.1 One-Dimensional Formulation for Advection-Diffusion . . . . . . . . . 84
8.1.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
8.1.2 Kronecker Product Formulation . . . . . . . . . . . . . . . . . 92
8.2 Two- and Three-Dimensional Formulation for Advection-Diffusion . . 93
8.2.1 Derivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
8.2.2 Two-Dimensional Kronecker Product Formulation . . . . . . . 105
8.2.3 Three-Dimensional Kronecker Product Formulation . . . . . . 112
8.2.4 Time Complexity . . . . . . . . . . . . . . . . . . . . . . . . . 118
8.3 Extension to Fluid Flow . . . . . . . . . . . . . . . . . . . . . . . . . 121
9 Numerical Experiments 125
9.1 Inviscid flow over a bump . . . . . . . . . . . . . . . . . . . . . . . . 126
9.2 Inviscid flow over the NACA 0012 airfoil . . . . . . . . . . . . . . . . 128
9.3 Convection of an isentropic vortex . . . . . . . . . . . . . . . . . . . . 128
9.4 Laminar flow over a Joukowski airfoil . . . . . . . . . . . . . . . . . . 132
9.5 Viscous flow over a half cylinder . . . . . . . . . . . . . . . . . . . . . 135
10 Implementation 139
10.1 Overview of Solvers . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
10.1.1 Unsteady Solver . . . . . . . . . . . . . . . . . . . . . . . . . . 140
xvi
10.1.2 Steady-state Solver . . . . . . . . . . . . . . . . . . . . . . . . 140
10.1.3 Direct Solver . . . . . . . . . . . . . . . . . . . . . . . . . . . 141
10.2 Multicolored Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . . 143
10.2.1 Mesh Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . 143
10.2.2 Residual Computation per Color . . . . . . . . . . . . . . . . 145
10.3 Element Local Jacobians . . . . . . . . . . . . . . . . . . . . . . . . . 145
10.3.1 Constructing the Flux and Solution Jacobians . . . . . . . . . 146
10.3.2 Constructing the Element Local Jacobians . . . . . . . . . . . 150
11 Performance Analysis 155
11.1 Multicolored Gauss-Seidel . . . . . . . . . . . . . . . . . . . . . . . . 156
11.1.1 Inviscid flow over a bump . . . . . . . . . . . . . . . . . . . . 156
11.1.2 Inviscid flow over the NACA 0012 airfoil . . . . . . . . . . . . 157
11.2 GPU performance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.2.1 Unsteady Solver . . . . . . . . . . . . . . . . . . . . . . . . . . 159
11.2.2 Steady-state Solver . . . . . . . . . . . . . . . . . . . . . . . . 161
11.3 Multi-GPU Scalability . . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.3.1 Strong Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 163
11.3.2 Weak Scalability . . . . . . . . . . . . . . . . . . . . . . . . . 163
12 Conclusions 166
A Boundary Conditions 169
A.1 Solid Slip-Wall and Symmetry . . . . . . . . . . . . . . . . . . . . . . 170
A.2 No-Slip Adiabatic Wall . . . . . . . . . . . . . . . . . . . . . . . . . . 171
A.3 Characteristic Riemann Invariant Far Field . . . . . . . . . . . . . . . 172
Bibliography 175
xvii
List of Tables
2.1 Upwinding coefficients and common names for DG schemes with dif-
ferent interface flux formulations. . . . . . . . . . . . . . . . . . . . . 14
2.2 Resolving efficiencies, ε = 0.1, for the DG scheme using Gauss-Legendre
solution points solving the advection-diffusion equation, a = 10. . . . 20
3.1 Maximum nondimensional time steps for the RK44, DG scheme using
Gauss-Legendre points. . . . . . . . . . . . . . . . . . . . . . . . . . . 26
4.1 Element size ‘h’ and nondimensional parameter ‘a’ for the meshes em-
ployed for Couette flow. . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Couette flow results for the RK44, DG scheme using Gauss-Legendre
solution points on Cartesian quadrilateral meshes. . . . . . . . . . . . 40
4.3 Couette flow results for the RK44, DG scheme using Gauss-Legendre
solution points on unstructured quadrilateral meshes. . . . . . . . . . 41
7.1 Scaling factors used for the explicit and implicit step controller. . . . 74
8.1 Diagonal Jacobian matrices used in the construction of the one-dimensional
element local Jacobian matrix for advection-diffusion . . . . . . . . . 85
8.2 Two- and three-dimensional sets . . . . . . . . . . . . . . . . . . . . . 94
8.3 Diagonal Jacobian matrices used in the construction of the two- and
three-dimensional element local Jacobian matrix for advection-diffusion.
The full names of the Jacobians matrices can be found in Table 8.1. . 96
9.1 Numerical time steps, ∆t, used for the convection of an isentropic vortex.131
xviii
9.2 Density error and rate of convergence for different time integration
schemes, convection of an isentropic vortex, implicit dual time stepping
with two color MCGS. . . . . . . . . . . . . . . . . . . . . . . . . . . 131
9.3 (x× y) Joukowski airfoil meshes used for each polynomial order where
x is the number of elements along the airfoil and wake and y is the
number of elements in the normal direction. The number of elements
in x is split evenly between the airfoil and wake. The total number of
elements, Neles, is given by the product of x and y. . . . . . . . . . . . 133
9.4 Drag coefficient error and rate of convergence for each polynomial or-
der, laminar flow over a Joukowski airfoil, implicit pseudo time stepping
with two color MCGS. . . . . . . . . . . . . . . . . . . . . . . . . . . 134
9.5 Lift and drag coefficients and the Strouhal number for viscous flow over
a half cylinder, Re = 1000, P = 3. . . . . . . . . . . . . . . . . . . . . 136
10.1 Common sizes for an element local Jacobian for the advection-diffusion
(AD) and Navier-Stokes (NS) equations. These sizes remain the same
for the advection and Euler equations. . . . . . . . . . . . . . . . . . 150
11.1 Convergence results for different meshes, inviscid flow over a bump,
implicit pseudo time stepping with two color MCGS, single GPU, P = 2156
11.2 Convergence results for different polynomial orders, inviscid flow over
a bump, implicit pseudo time stepping with two color MCGS, single
GPU, (48× 16) quadrilateral mesh . . . . . . . . . . . . . . . . . . . 157
11.3 Convergence results for different meshes, inviscid flow over the NACA
0012 airfoil, implicit pseudo time stepping with two color MCGS, single
GPU, P = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.4 Convergence results for different mixed meshes, inviscid flow over the
NACA 0012 airfoil, implicit pseudo time stepping with four color MCGS,
single GPU, P = 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
11.5 Convergence results for different polynomial orders, inviscid flow over
the NACA 0012 airfoil, implicit pseudo time stepping with two color
MCGS, single GPU, (32× 32) quadrilateral mesh . . . . . . . . . . . 158
xix
11.6 A comparison between explicit RK45 and implicit ESDIRK4 for viscous
flow over a half cylinder, Re = 1000, P = 3 on 12 NVIDIA Tesla K80
GPUs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
11.7 Speedup of a single GPU over a single CPU core for inviscid flow over
the NACA 0012 airfoil, implicit pseudo time stepping with two color
MCGS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 162
11.8 Weak scalability study for inviscid flow over the NACA 0012 airfoil,
explicit RK4, P = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
11.9 Weak scalability study for inviscid flow over the NACA 0012 airfoil,
implicit pseudo time stepping with two color MCGS, P = 5 . . . . . . 164
xx
List of Figures
2.1 Numerical dissipation in each eigenmode p = 1, 2, 3 for the DG scheme
of order P = 2 using Gauss-Legendre solution points solving the advection-
diffusion equation, a = 10. . . . . . . . . . . . . . . . . . . . . . . . . 14
2.2 Numerical dispersion in each eigenmode p = 1, 2, 3 for the DG scheme
of order P = 2 using Gauss-Legendre solution points solving the advection-
diffusion equation, a = 10. . . . . . . . . . . . . . . . . . . . . . . . . 15
2.3 Energy distributed among each eigenmode p = 1, 2, 3 for the DG
scheme of order P = 2 using Gauss-Legendre solution points solving
the advection-diffusion equation, a = 10. . . . . . . . . . . . . . . . . 15
2.4 Energy distributed among each eigenmode p = 1, 2, 3 for the centered-
centered, DG scheme of order P = 2 solving the advection-diffusion
equation, a = 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16
2.5 Upper bound on initial slope of the relative error vs. nondimensional
wavenumber for the DG scheme of order P = 2 using Gauss-Legendre
solution points solving the advection-diffusion equation, a = 10. . . . 19
3.1 Maximum physical time step, ∆tmax, vs. nondimensional wavenumber
for the RK44, DG, centered-centered scheme using Gauss-Legendre
solution points. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
3.2 Maximum nondimensional time step, ∆tmax, vs. nondimensional wavenum-
ber, k/(P + 1), for the RK44, DG scheme using Gauss-Legendre solu-
tion points solving the advection-diffusion equation, a = 10. . . . . . 24
xxi
3.3 Maximum nondimensional time step estimates vs. nondimensional
wavespeed for the RK44, DG scheme of order P = 2 using Gauss-
Legendre solution points. . . . . . . . . . . . . . . . . . . . . . . . . . 26
3.4 A visual representation of a quadrilateral element for a polynomial or-
der of P = 1. The flux points are marked by red squares and west, east,
south and north faces are represented by W,E, S,N , respectively. The
distances between flux points, h1,6, h3,8, h2,5, h4,7 are used to estimate
the maximum time step in the element. . . . . . . . . . . . . . . . . . 29
4.1 High resolution approximate Gaussian, P = 2, σ = 8/√
2π. . . . . . . 33
4.2 Relative error vs. time periods for the RK44, DG scheme of order
P = 2 using Gauss-Legendre solution points solving the advection-
diffusion equation, a = 10, on a high resolution approximate Gaussian,
σ = 8/√
2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
4.3 Low resolution approximate Gaussian, P = 2, σ = 1/√
2π. . . . . . . 34
4.4 Relative error vs. time periods for the RK44, DG scheme of order
P = 2 using Gauss-Legendre solution points solving the advection-
diffusion equation, a = 10, on a low resolution approximate Gaussian,
σ = 1/√
2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
4.5 Nonuniform rectilinear grid with 40 × 50 elements and a minimum
element size of h1 = 0.2 and h2 = 0.1. . . . . . . . . . . . . . . . . . . 36
4.6 Initial condition and relative error for the RK44, DG scheme of order
P = 2 using Gauss-Legendre solution points solving the advection-
diffusion equation, a = 10, on a high resolution approximate Gaussian,
σ = 8/√
2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.7 Initial condition and relative error for the RK44, DG scheme of order
P = 2 using Gauss-Legendre solution points solving the advection-
diffusion equation, a = 10, on a low resolution approximate Gaussian,
σ = 1/√
2π. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
4.8 Couette flow Mach contours, DG upwind-one-sided scheme using Gauss-
Legendre solution points, P = 3. . . . . . . . . . . . . . . . . . . . . . 38
xxii
4.9 3rd order results for Couette flow, RK44, DG scheme using Gauss-
Legendre solution points, P = 2. . . . . . . . . . . . . . . . . . . . . . 39
4.10 4th order results for Couette flow, RK44, DG scheme using Gauss-
Legendre solution points, P = 3. . . . . . . . . . . . . . . . . . . . . . 40
6.1 A visual representation of a quadrilateral element in parent space for
a polynomial order of P = 1. The solution points are marked by
blue circles, the flux points are marked by red squares and left, right,
bottom and top faces are represented by L,R,B,T , respectively. Left
and right states in an interface flux are represented by − and +. . . . 56
7.1 A Cartesian mesh with a red-black element color mapping. The DFR
element stencil (white) relies on information from neighboring elements
from the opposite color. . . . . . . . . . . . . . . . . . . . . . . . . . 79
8.1 The two distinct sparsity patterns used in the two-dimensional Kro-
necker product formulation of the element local Jacobian for DFR,
P = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110
8.2 The summation of the two terms in Figure 8.1 produces the sparsity
pattern for the first term, ∇δ ·(∂fele
∂uele
), in the two-dimensional element
local Jacobian for DFR, P = 2. . . . . . . . . . . . . . . . . . . . . . 111
8.3 The cross-terms Kξ (x)Kη (y) and Kη (x)Kξ (y) in the second term,
∇δ ·(∂fele
∂qele
∂qele
∂uele
)produce a dense matrix so the final sparsity pattern of
the two-dimensional element local Jacobian for DFR, P = 2, is fully
dense. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 111
8.4 The three distinct sparsity patterns used in the three-dimensional Kro-
necker product formulation of the element local Jacobian for DFR, P = 2.118
8.5 The summation of the three terms in Figure 8.4 produces the sparsity
pattern for the first term, ∇δ ·(∂fele
∂uele
), in the three-dimensional element
local Jacobian for DFR, P = 2. . . . . . . . . . . . . . . . . . . . . . 118
xxiii
8.6 The six cross-terms in the second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), produce
three distinct sparsity patterns used in the three-dimensional Kro-
necker product formulation of the element local Jacobian for DFR,
P = 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 119
8.7 The summation of all terms produces the final sparsity pattern for the
three-dimensional element local Jacobian for DFR, P = 2. . . . . . . 119
9.1 Entropy error vs. 1√nDoF
for inviscid flow over a bump, implicit pseudo
time stepping with two color MCGS. . . . . . . . . . . . . . . . . . . 127
9.2 (48× 16) quadrilateral mesh and pressure contours, inviscid flow over
a bump, implicit pseudo time stepping with two color MCGS, P = 2. 127
9.3 Lift coefficient vs. 1√nDoF
for inviscid flow over the NACA 0012 airfoil,
implicit pseudo time stepping with two and four color MCGS . . . . . 129
9.4 Mesh and pressure contours for inviscid flow over the NACA 0012
airfoil, implicit pseudo time stepping, P = 4. . . . . . . . . . . . . . . 129
9.5 Initial density contours for convection of an isentropic vortex. . . . . 131
9.6 Density error vs. numerical time step for convection of an isentropic
vortex, implicit dual time stepping with two color MCGS. . . . . . . 132
9.7 Drag coefficient vs. h = 1√nDoF
for laminar flow over a Joukowski
airfoil, implicit MCGS. . . . . . . . . . . . . . . . . . . . . . . . . . . 133
9.8 Laminar flow over Joukowski airfoil, implicit pseudo time stepping with
two color MCGS, P = 4, (48× 24) mesh. . . . . . . . . . . . . . . . . 134
9.9 Drag coefficient error vs. h = 1√nDoF
for laminar flow over a Joukowski
airfoil, implicit pseudo time stepping with two color MCGS. . . . . . 135
9.10 Half cylinder mesh with 20,292 unstructured hexahedral elements. Hex-
ahedral elements are created by extruding the quadrilaterals shown
above. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
9.11 Instantaneous isosurfaces of density colored by Mach number for vis-
cous flow over a half cylinder, Re = 1000, P = 3. . . . . . . . . . . . 138
9.12 Time history of lift and drag coefficient for viscous flow over a half
cylinder, Re = 1000, P = 3. . . . . . . . . . . . . . . . . . . . . . . . 138
xxiv
10.1 Examples of the mesh coloring algorithm. . . . . . . . . . . . . . . . . 144
11.1 Entropy error, inviscid flow over a bump, implicit pseudo time stepping
with two color MCGS, single GPU . . . . . . . . . . . . . . . . . . . 157
11.2 A wall-clock time comparison between the inverse Jacobian computa-
tion and the block iterative method for one ESDIRK4 time step of
viscous flow over a half cylinder, Re = 1000, P = 3, on 12 NVIDIA
Tesla K80 GPUs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 160
11.3 A wall-clock time comparison between each section of “Inverse Jaco-
bian” for viscous flow over a half cylinder, Re = 1000, P = 3, on 12
NVIDIA Tesla K80 GPUs. . . . . . . . . . . . . . . . . . . . . . . . . 161
11.4 A wall-clock time comparison between each section of the block itera-
tive method for viscous flow over a half cylinder, Re = 1000, P = 3,
on 12 NVIDIA Tesla K80 GPUs. . . . . . . . . . . . . . . . . . . . . 162
11.5 Speedup relative to one GPU for inviscid flow over the NACA 0012
airfoil, P = 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 164
xxv
Part I
Numerical Analysis for
Advection-Diffusion
1
Chapter 1
Introduction
Numerical analysis of DG methods has primarily relied on functional analysis. For
example, in [53, 48, 27, 81, 82], error estimates for linear and nonlinear time-dependent
problems were derived using approximation theory. For Flux Reconstruction (FR), a
proof of linear stability was developed [88] for ESFR schemes along with theory on
non-linear stability [47].
The primary tool for analyzing dissipation and dispersion properties has been
Fourier (von Neumann) analysis. For DG methods, this results in eigensolution anal-
ysis because of the eigendecomposition generated from the schemes. In the case of
linear advection, Hu et al. [39] showed that the P th order DG scheme results in one
physical mode and P spurious or parasitic modes which dampen out quickly for an
upwind flux but remain indefinitely for a centered flux. They also showed in [37] that
the order of the dissipation and dispersion error for the physical mode is 2P + 2 and
2P +3, respectively. In [38], these results were extended to quadrilateral and triangu-
lar elements. In the asymptotic limit of small wavenumber kh→ 0, Ainsworth [4, 5]
proved that for a fixed mesh of spacing h, dissipation and dispersion errors decay at
an exponential rate when 2P + 1 ≈ κkh where k is the wavenumber and κ > 1 is
a constant. This was then extended to the second-order wave equation in [3]. More
recently, Moura et al. [62] showed that the spurious modes, in fact, replicate the be-
havior of the physical mode. They also provided estimates of the largest wavenumber
that can be accurately resolved within a set tolerance for Burgers turbulence.
2
CHAPTER 1. INTRODUCTION 3
Eigensolution analysis has also been used to determine the CFL condition. For
example, the CFL condition for the linear advection equation is,
a∆t
h≤ C, (1.1)
where a is the physical wavespeed, ∆t is the physical time step, h is the element size
and C is a constant determined by the eigensolution analysis of the fully discrete
equation. By rearranging Eq. (1.1), a time step estimate is obtain,
∆t ≤ Ch
a. (1.2)
In [23], this condition is used for convection-dominated flows for the Euler and Navier-
Stokes equations. Since the constant C depends on the polynomial order P , it was
proved in [21] that a time step estimate would be linearly stable for a P + 1 order,
P + 1 stage Runge-Kutta method if
C =1
2P + 1, P ≤ 2. (1.3)
It was demonstrated in [23] that this condition holds for P ≥ 2 with less than 5%
error. In [84], Toulorge et al. obtained CFL estimates for linear advection through
eigensolution analysis for triangles. The CFL condition in Eq. (1.1) is often used for
two- and three-dimensional cases where a distinct reference length, h, is used for each
element type but a strict bound has yet to be proposed for two- and three-dimensional
elements.
For FR, Vincent et al. [87] applied eigensolution analysis to study the ESFR
schemes for linear advection and were able to identify the “c+” scheme that offers
the highest CFL limit for a given polynomial. Asthana et al. [9] derived a new set of
linearly stable schemes which have minimal dissipation and dispersion error.
In [12], Asthana, Jameson and I utilized eigensolution analysis to prove that any
FR scheme is consistent for linear advection and any stable FR scheme is convergent.
We also established that in the limit of small wavenumber kh → 0, the rate of
convergence is a function of time with short-time rates, t = 0+, being determined
CHAPTER 1. INTRODUCTION 4
by interpolation error and long-time rates, t → ∞, being determined by numerical
differentiation error. In [11], we were able to derive analytical estimates for the
rates of convergence of the first and second derivative operators. This lead to the
construction of a new class of superconvergent schemes for centered fluxes called
SFR which includes the nodal DG scheme. We also demonstrated that the rate of
convergence for a steady-state, forced, advection-diffusion problem is the same as the
short-time rate of convergence described in [12] and the rates of convergence for the
first and second derivative.
In Part I of this dissertation, eigensolution analysis of the FR formulation for the
linear advection-diffusion equation is performed. Stability, dissipation and disper-
sion over the entire range of resolvable wavenumbers is investigated to develop an
understanding of how FR behaves when the flow features are under-resolved. The
analysis focuses on different interface flux formulations of the nodal DG scheme on
Gauss-Legendre points although the same technique can be applied to any scheme
within the FR formulation. In pursuit of an accurate, stable time step, a method
for estimating the maximum stable time step for the linear advection-diffusion and
Navier-Stokes equations on unstructured, tensor product elements is provided.
The first part of this dissertation is formatted as follows. In chapter 2, we introduce
the semi-discrete advection-diffusion equation for FR and discuss the dissipation and
dispersion properties associated with nodal DG. In this chapter, we also derive a
measure of relative error in order to compare schemes on the basis of wave-propagation
error. In chapter 3, we utilize the fully discrete advection-diffusion equation and its
stability criteria to compare the CFL restrictions of different equations and schemes.
In this chapter, we also propose a conservative prediction method for the maximum
physical time step and extend it to multidimensional tensor product elements for the
Navier-Stokes equations. Lastly, in chapter 4 we provide the results for 1D and 2D
numerical experiments in order to verify the results found in the previous chapters.
Part I of this dissertation is based on the following publication:
• Jerry Watkins, Kartikey Asthana, and Antony Jameson. A numerical analysis
of the nodal discontinuous Galerkin scheme via Flux Reconstruction for the
advection-diffusion equation. Computers & Fluids, 2016. [93]
CHAPTER 1. INTRODUCTION 5
and is a continuation of the work provided in:
• Kartikey Asthana, Jerry Watkins, and Antony Jameson. On the rate of con-
vergence of flux reconstruction for steady-state problems. SIAM Journal on
Numerical Analysis, 2016. [11]
• Kartikey Asthana, Jerry Watkins, and Antony Jameson. On consistency and
rate of convergence of Flux Reconstruction for time-dependent problems. Jour-
nal of Computational Physics, 2017. [12]
Chapter 2
Eigensolution Analysis
This chapter is split into two sections. In the first section, an eigensolution analysis
is performed on the 1D FR formulation for the advection-diffusion equation and the
dissipation and dispersion properties for the nodal DG scheme are discussed. In
the second section, a measure of relative error is derived and different schemes are
compared based on wave-propagation error.
2.1 Eigensolution Analysis for
1D Advection-Diffusion
We begin by demonstrating that eigensolution analysis provides a purely element-
local description of the 1D semi-discrete linear advection-diffusion equation. The
derived linear dynamical system can then be solved analytically to obtain the analyt-
ical solution to the semi-discrete equation. The stability, dissipation and dispersion
properties of any FR scheme follow directly from the corresponding eigensolutions
[8]. This section concludes with a short discussion on the dissipation and dispersion
error of the nodal DG scheme on Gauss-Legendre points for the advection-diffusion
equation.
6
CHAPTER 2. EIGENSOLUTION ANALYSIS 7
2.1.1 Problem specification
Consider the 1D linear advection-diffusion equation,
∂u
∂t+ a
∂u
∂x= b
∂2u
∂x2, x ∈ R, t > 0, (2.1)
where a ∈ R is the constant wavespeed and b > 0 is the diffusion coefficient. A
periodic initial condition is introduced as an isolated Fourier component, u(x, 0) =
exp(ikx), where k > 0 is the wavenumber. Using h as the length scale and h2/b as
the time scale, Eq. (2.1) and the initial condition can be nondimensionalized as,
∂u
∂t+ a
∂u
∂x=∂2u
∂x2, x ∈ R, t > 0, (2.2)
u(x, 0) = exp(ikx),
where the nondimensional parameters are x = x/h, t = tb/h2, k = kh, and a = ah/b.
Following a traditional nodal finite element method [40], the domain is partitioned
into non-overlapping elements, Ω =⋃n
Ωn where Ωn = x|xn ≤ x < xn+1 = [xn, xn+1)
and the sequence of nodes, xn, is increasing on the real line. The element is
further discretized into P + 1 distinct solution points xn = [xn,1, xn,2, . . . , xn,P+1]T
where P is the degree of the piecewise interpolating polynomial representing the
numerical solution in a given element. To reduce complexity, a uniform element size
of h = xn+1 − xn,∀n is used. A linear isoparametric mapping is introduced from the
physical domain x ∈ Ωn to the parent domain ξ ∈ [−1, 1) such that
ξ|Ωn(x) = 2(x− xn)− 1 x ∈ [xn, xn+1). (2.3)
The distribution of solution points across all elements in the parent domain is kept
the same, ξ = (ξp)p=1,2,...,P+1, so that the numerical solution in the nth element can
be represented as
uδn(ξ, t) =P+1∑p=1
uδn,p(t)`p(ξ), ξ ∈ [−1, 1), (2.4)
CHAPTER 2. EIGENSOLUTION ANALYSIS 8
where `p is the pth Lagrange polynomial.
Assuming exact time integration, Eq. (2.2) leads to a purely element-local descrip-
tion of the semi-discrete numerical equation,
duδndt
+ 2aQ1(k)uδn = 4Q2(k)uδn, ∀n, t > 0, (2.5)
uδn(0) = exp(ikxn)w0(k),
where uδn(t) = (uδn,p(t))p=1,2,...,P+1 denotes the vector of solution values, w0(k) is the
collocation projection of the initial condition onto the solution points,
w0(k) =
(exp
(ik
1
2(1 + ξp)
))p=1,2,...,P+1
, (2.6)
and Q1(k) ∈ C(P+1)×(P+1) and Q2(k) ∈ C(P+1)×(P+1) are the first order and second
order numerical differentiation operators in parent space. The definition of these
operators depend on the numerical scheme being used and the constants 2 and 4
come from the transformation of the numerical differentiation operator from parent
to physical space.
2.1.2 The Flux Reconstruction method
Under the Flux Reconstruction method, the first numerical differentiation operator
can be obtained by first defining a globally continuous, piecewise numerical solution
polynomial of degree P +1 using correction function polynomials that convey bound-
ary information to the interior of the element [41]. The numerical derivative is then
obtained by differentiating this globally continuous solution,
aδuδnδx
(ξ, t) = 2
[a∂uδn∂ξ
(ξ, t) + (auδL − auδn(−1, t))dgLdξ
(ξ)
+ (auδR − auδn(+1, t))dgRdξ
(ξ)
], ξ ∈ [−1, 1), (2.7)
CHAPTER 2. EIGENSOLUTION ANALYSIS 9
where gL(ξ), gR(ξ) ∈ PP+1 are the left-boundary and right-boundary correction func-
tions in the parent space which satisfy the constraints,
gL(−1) = gR(+1) = 1,
gL(+1) = gR(−1) = 0, (2.8)
and can recover nodal DG, SD and various other high-order formulations [41, 88]. The
common interface fluxes, auδL and auδR, are computed from the polynomial functions
on either side of the interface,
uδL = (1− α)uδn−1(+1, t) + αuδn(−1, t),
uδR = (1− α)uδn(+1, t) + αuδn+1(−1, t), (2.9)
where the upwinding coefficient, α, determines the type of common interface value
used. A one-sided, upwinding flux is obtained for α = 0.0 and a centered flux is
obtained for α = 0.5. Eq. (2.7) and Eq. (2.9) are now utilized to construct the
numerical derivative,
δuδnδx
(ξ, t) = 2
[∂uδn∂ξ
(ξ, t) + (1− α)(uδn−1(+1, t)− uδn(−1, t))dgLdξ
(ξ)
+ α(uδn+1(−1, t)− uδn(+1, t))dgRdξ
(ξ)
], ξ ∈ [−1, 1). (2.10)
This can then be transformed to a matrix-vector operation on uδn,
δuδ
δx
∣∣∣∣n
=1∑
j=−1
Cjuδn+j , (2.11)
where Cj ∈ R(P+1)×(P+1) is given by
C−1 = (1− α)gL,ξ`T+,
C0 = D − (1− α)gL,ξ`T− − αgR,ξ`T+,
C+1 = αgR,ξ`T−.
CHAPTER 2. EIGENSOLUTION ANALYSIS 10
Here D ∈ R(P+1)×(P+1) is the polynomial differentiation operator such that
Dp,m =d`mdξ
(ξp), p,m = 1, 2, . . . , P + 1, (2.12)
where `m is the mth Lagrange polynomial in the parent domain. gL,ξ, gR,ξ ∈ R(P+1)×1
are the derivatives of the left-boundary and right-boundary correction functions at
the solution points, and `−, `+ ∈ R(P+1)×1 are the extrapolated values of the P + 1
Lagrange basis polynomials,
`+p = `p(+1), `−p = `p(−1), p = 1, 2, . . . , P + 1. (2.13)
The second numerical derivative can be constructed by performing the same operation
on the first numerical derivative. Let vδn = δuδ
δx
∣∣n, then,
δ2uδ
δx2
∣∣∣∣n
=δvδ
δx
∣∣∣∣n
=1∑
j=−1
Cjvδn+j . (2.14)
The sinusoidal form of the initial condition in Eq. (2.5) provides the displacement
relations, uδn−1 = exp(−ik)uδn, uδn+1 = exp(ik)uδn, which can then be used to obtain
the numerical differentiation operators,
Q1(k) =1∑
j=−1
C(1,1)j exp(ikj),
Q2(k) =2∑
j=−2
Bj exp(ikj), (2.15)
where Bj ∈ R(P+1)×(P+1) is given by
B−2 = C(2,2)−1 C
(2,1)−1 ,
B−1 = C(2,2)−1 C
(2,1)0 +C
(2,2)0 C
(2,1)−1 ,
B0 = C(2,2)+1 C
(2,1)−1 +C
(2,2)0 C
(2,1)0 +C
(2,2)−1 C
(2,1)+1 ,
B+1 = C(2,2)+1 C
(2,1)0 +C
(2,2)0 C
(2,1)+1 ,
CHAPTER 2. EIGENSOLUTION ANALYSIS 11
B+2 = C(2,2)+1 C
(2,1)+1 .
Here (1, 1), (2, 1) and (2, 2) are identifiers used to differentiate between different
correction procedures in the advection and diffusion terms. The first index in the
superscript refers to the first or second numerical differentiation operator while the
second index refers to the correction procedure for the discontinuous solution or the
discontinuous first derivative. The identifier defines the upwinding coefficient and
correction functions used during the correction procedure, for example,
C(1,1)+1 = α(1,1)g
(1,1)R,ξ `
T−,
C(2,1)+1 = α(2,1)g
(2,1)R,ξ `
T−,
C(2,2)+1 = α(2,2)g
(2,2)R,ξ `
T−.
2.1.3 Analytical solution of the semi-discrete and exact equa-
tions
The analytical solution to the semi-discrete numerical equation, Eq. (2.5), can be
found by exactly integrating the linear dynamical system in time using matrix fac-
torization. The semi-discrete numerical equation can be rewritten as
duδndt
+R(a, k)uδn = 0, ∀n, t > 0, (2.16)
where R(a, k) ∈ C(P+1)×(P+1) and,
R(a, k) = 2aQ1(k)− 4Q2(k). (2.17)
Assuming R(a, k) is diagonalizable,
R(a, k) = W (a, k)Γ(a, k)W−1(a, k), (2.18)
where Γ is the diagonal matrix of eigenvalues γp(a, k) ∈ C for p = 1, 2, . . . , P + 1
and W ∈ CP+1×P+1 is the dense matrix containing eigenvectors of the differentiation
CHAPTER 2. EIGENSOLUTION ANALYSIS 12
operator. The initial condition can also be expanded in the basis of the eigenvectors,
w0(k) = W (a, k)β(a, k), (2.19)
where βp(a, k) ∈ C is the expansion coefficient along the pth column of W , wp(a, k),
for p = 1, 2, . . . , P + 1. The solution to Eq. (2.5) can now be written as
uδn(t) = exp(−tR)uδn(0)
= W exp(−tΓ)W−1 exp(ikxn)Wβ
= exp(ikxn)W exp(−tΓ)β
= exp(ikxn)P+1∑p=1
exp(−γpt)βpwp. (2.20)
This shows that the numerical solution is a superposition of P+1 eigenmodes along the
eigenvectors of R weighted by the expansion coefficients, βp, of the initial condition.
The time evolution of these modes is dictated by the eigenvalues γp.
The analytical solution to the exact equation, Eq. (2.2), on the vector of so-
lution points xn = xn + 12(1 + ξ) can be found by sampling the functional form
exp(ikx) exp(−(aik + k2)t),
un(t) = exp(ikxn) exp(−(aik + k2)t)P+1∑p=1
βpwp, (2.21)
2.1.4 Stability, dissipation, dispersion and modal weights
The stability, dissipation and dispersion properties of the numerical scheme can be
determined directly from the eigenvalues and eigenvectors of R. The condition for
numerical stability can be written as
γRep (a, k) ≥ 0 for p = 1, 2, . . . , P + 1, (2.22)
which must hold for k ∈ [0, (P + 1)π] where the upper bound corresponds to the
Nyquist limit for the mesh. The criteria for numerical dissipation and dispersion can
CHAPTER 2. EIGENSOLUTION ANALYSIS 13
be determined from the equation for absolute error of the numerical solution,
en(t) = uδn(t)− un(t)
= exp(ikxn) exp(−(aik + k2)t)P+1∑p=1
[exp(−(γp − (aik + k2))t)− 1
]βpwp.
(2.23)
For the initial condition of an isolated Fourier component, each eigenmode contributes
a certain amount of numerical dissipation and dispersion to the numerical solution.
The numerical dissipation in an eigenmode is characterized by γRep (a, k)− k2 > 0 and
leads to a mode which decays in amplitude more rapidly than the prescribed decay
rate of physical diffusion. An eigenmode can also exhibit numerical anti-dissipation
if 0 ≤ γRep (a, k) < k2. In this case, the mode decays in amplitude at a slower rate.
Numerical dispersion is present in an eigenmode if γImp (a, k) − ak 6= 0. This causes
the mode to propagate at an incorrect speed.
Since all P +1 eigenmodes are present in the numerical solution, it’s important to
determine how each mode contributes quantitatively through the modal weights, βp.
This quantity can best be described as a distribution of energy among all eigenmodes
[9, 8] and is important in determining whether the solution will decay or propagate
at an incorrect speed.
2.1.5 Dissipation and dispersion error of nodal DG for
advection-diffusion
The FR formulation can be used to recover various high-order schemes. In this study,
we will focus our analysis on the nodal DG scheme on Gauss-Legendre points with
different interface flux formulations. Table 2.1 defines the set of upwinding coefficients
used to obtain common interface values for four different types of interface flux for-
mulations. Since the nodal DG correction functions are used, the flux formulations
for diffusion are able to recover two common schemes within the DG community:
BR1 introduced by Bassi and Rebay [14] and local Discontinuous Galerkin (LDG)
CHAPTER 2. EIGENSOLUTION ANALYSIS 14
introduced by Cockburn and Shu [22]. For simplicity, the penalty term often used to
stabilize the schemes [35] will be omitted so that LDG becomes minimal dissipation
LDG [19].
Scheme α(1,1) α(2,1) α(2,2) Common name for DG scheme
Centered-centered 0.5 0.5 0.5 centered advection - BR1 diffusionUpwind-centered 0.0 0.5 0.5 upwinded advection - BR1 diffusionCentered-one-sided 0.5 1.0 0.0 centered advection - LDG diffusionUpwind-one-sided 0.0 1.0 0.0 upwinded advection - LDG diffusion
Table 2.1: Upwinding coefficients and common names for DG schemes with differentinterface flux formulations.
The numerical dissipation and dispersion properties for the solution of advection-
diffusion problems is dependent on the non-dimensional parameter a. For most results
in this part of the dissertation, a non-dimensional parameter of a = 10 is chosen for
illustration. Figures 2.1, 2.2 and 2.3 show the dissipation, dispersion and modal
weight of each eigenmode for a centered-centered and upwind-one-sided schemes of
polynomial order, P = 2. We see that for one of the modes, designated mode
0 0.5 1 1.5 2 2.5 3
−80
−60
−40
−20
0
20
40
60
80
k/(P + 1)
γRe
p−
k2
Mode 1Mode 2Mode 3ExactStability Limit
(a) Centered-centered scheme:α(1,1) = 0.5, α(2,1) = 0.5, α(2,2) = 0.5
0 0.5 1 1.5 2 2.5 3
−50
0
50
100
150
200
250
k/(P + 1)
γRe
p−
k2
Mode 1Mode 2Mode 3ExactStability Limit
(b) Upwind-one-sided scheme:α(1,1) = 0.0, α(2,1) = 1.0, α(2,2) = 0.0
Figure 2.1: Numerical dissipation in each eigenmode p = 1, 2, 3 for the DG schemeof order P = 2 using Gauss-Legendre solution points solving the advection-diffusionequation, a = 10.
CHAPTER 2. EIGENSOLUTION ANALYSIS 15
0 0.5 1 1.5 2 2.5 3
−140
−120
−100
−80
−60
−40
−20
0
20
40
60
80
k/(P + 1)
γIm p
−ak
Mode 1Mode 2Mode 3Exact
(a) Centered-centered scheme:α(1,1) = 0.5, α(2,1) = 0.5, α(2,2) = 0.5
0 0.5 1 1.5 2 2.5 3
−140
−120
−100
−80
−60
−40
−20
0
20
40
60
80
k/(P + 1)
γIm p
−ak
Mode 1Mode 2Mode 3Exact
(b) Upwind-one-sided scheme:α(1,1) = 0.0, α(2,1) = 1.0, α(2,2) = 0.0
Figure 2.2: Numerical dispersion in each eigenmode p = 1, 2, 3 for the DG schemeof order P = 2 using Gauss-Legendre solution points solving the advection-diffusionequation, a = 10.
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
k/(P + 1)
|βp|2
Mode 1Mode 2Mode 3
(a) Centered-centered scheme:α(1,1) = 0.5, α(2,1) = 0.5, α(2,2) = 0.5
0 0.5 1 1.5 2 2.5 3
0
1
2
3
4
5
6
7
k/(P + 1)
|βp|2
Mode 1Mode 2Mode 3
(b) Upwind-one-sided scheme:α(1,1) = 0.0, α(2,1) = 1.0, α(2,2) = 0.0
Figure 2.3: Energy distributed among each eigenmode p = 1, 2, 3 for the DG schemeof order P = 2 using Gauss-Legendre solution points solving the advection-diffusionequation, a = 10.
1, the dissipation and dispersion errors vanish in the asymptotic limit of k → 0,
and the squared modal weight |β1|2 → P + 1. This mode is defined as the physical
mode and its existence and uniqueness have been proven for advection problems
CHAPTER 2. EIGENSOLUTION ANALYSIS 16
[8]. Regarding the effect of interface fluxes, we see that both schemes have little
dispersion or dissipation for low wavenumbers. For higher wavenumbers, the physical
mode of the upwind-one-sided scheme remains dominant and the scheme is more
dissipative. On the other hand, the physical mode of the centered-centered scheme
loses its dominance as can be seen from Figure 2.3. Moreover, close to the Nyquist
limit, all the modes for this scheme are anti-dissipative.
Even though the chosen flux is a linear function of the state, the state itself
is sinusoidal and cannot be expressed in any finite-dimensional polynomial basis.
Correspondingly, the location of solution points has a direct impact on the distribution
of the initial condition among the P + 1 eigenmodes. This can be observed from
Figure 2.4 which plots the distribution of energy among the 3 eigenmodes for the
centered-centered DG scheme with P = 2, advection-diffusion parameter a = 10,
using Gauss-Legendre, equidistant and Gauss-Lobatto points respectively.
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
k/(P + 1)
|βp|2
Mode 1Mode 2Mode 3
(a) Gauss-Legendre
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
k/(P + 1)
|βp|2
Mode 1Mode 2Mode 3
(b) Equidistant
0 0.5 1 1.5 2 2.5 3
0
0.5
1
1.5
2
2.5
3
k/(P + 1)
|βp|2
Mode 1Mode 2Mode 3
(c) Gauss-Lobatto
Figure 2.4: Energy distributed among each eigenmode p = 1, 2, 3 for the centered-centered, DG scheme of order P = 2 solving the advection-diffusion equation, a = 10.
2.2 Spectral Comparison
This section provides an investigation into the relative error generated by the nodal
DG scheme with different interface flux formulations for the full range of resolvable
wavenumbers. A resolving efficiency is defined in order to measure the fraction of
resolved waves that propagate with minimal error. We see that the centered-centered
scheme produces the least amount of relative error for a well resolved solution while
CHAPTER 2. EIGENSOLUTION ANALYSIS 17
the centered-one-sided scheme leads to the least amount of relative error for solutions
that are under-resolved.
2.2.1 Relative error and resolving efficiency
A measure of relative error can be derived by using the vector `2 norm of absolute
error. The use of a pointwise error is motivated by the nodal nature of the FR
formulation and is different from the L2 functional norm that measures integrated
error over a fixed domain. Consider the vector `2 norm of the absolute error using
Eq. (2.23),
‖en(t)‖`2 = ‖uδn(t)− un(t)‖`2
= exp(−k2t)
∥∥∥∥∥P+1∑p=1
[exp(−(γp − (aik + k2))t)− 1
]βpwp
∥∥∥∥∥`2
, (2.24)
where we have used that | exp(ζ)| = 1 if ζ is purely imaginary. Similarly, the norm
of the analytical solution to the exact equation, Eq. (2.21) is given by
‖un(t)‖`2 = exp(−k2t) ‖w0‖`2 ,
= exp(−k2t)
√√√√P+1∑p=1
∣∣∣∣exp
(ik
1
2(1 + ξp)
)∣∣∣∣2,= exp(−k2t)
√P + 1. (2.25)
A relative error can then be defined by taking the difference between Eq. (2.24) and
(2.25),
‖en(t)‖`2‖un(t)‖`2
=1√P + 1
∥∥∥∥∥P+1∑p=1
[exp(−(γp − (aik + k2))t)− 1
]βpwp
∥∥∥∥∥`2
. (2.26)
CHAPTER 2. EIGENSOLUTION ANALYSIS 18
The triangle inequality can be used to eliminate the dependency on the eigenvectors,
‖en(t)‖`2‖un(t)‖`2
≤ 1√P + 1
P+1∑p=1
∥∥[exp(−(γp − (aik + k2))t)− 1]βpwp
∥∥`2,
=1√P + 1
P+1∑p=1
∣∣exp(−(γp − (aik + k2))t)− 1∣∣ |βp|. (2.27)
Then in the limit of t→ 0,
limt→0
‖en(t)‖`2‖un(t)‖`2
≤ 1√P + 1
P+1∑p=1
∣∣γp − (aik + k2)∣∣ |βp| t+O(t2), (2.28)
which gives an upper bound on the initial relative error. In other words, the initial
slope of the relative error can be written as
limt→0
1
t
‖en(t)‖`2‖un(t)‖`2
≤ 1√P + 1
P+1∑p=1
∣∣γp − (aik + k2)∣∣ |βp| = φ(a, k), (2.29)
which shows that the initial growth of error is determined by the products of eigen-
value errors and modal weights of the system. Based on [9, 54], a resolving efficiency
can now be defined as
η =kf
(P + 1)π, (2.30)
where kf is such that
φ(a, k) ≤ ε for k ≤ kf . (2.31)
η measures the fraction of resolved waves that are propagated with minimal error.
The initial growth rate of error for these waves are guaranteed to be less than the
specified slope tolerance, ε.
CHAPTER 2. EIGENSOLUTION ANALYSIS 19
2.2.2 Spectral comparison of nodal DG with different inter-
face fluxes
Figure 2.5 plots the initial slope of the relative error, Eq. (2.29), for the four types
of interface flux formulations in Table 2.1 for an advection-diffusion problem with
a = 10. We see that for low wavenumbers or well resolved waves, the centered-
10−3
10−2
10−1
100
10−10
10−5
100
k/(P + 1)
φ
Centered-centeredUpwind-centeredCentered-one-sidedUpwind-one-sided
(a) Log Scale
0 0.5 1 1.5 2 2.5 30
50
100
150
200
250
300
k/(P + 1)
φ
Centered-centeredUpwind-centeredCentered-one-sidedUpwind-one-sided
(b) Linear Scale
Figure 2.5: Upper bound on initial slope of the relative error vs. nondimensionalwavenumber for the DG scheme of order P = 2 using Gauss-Legendre solution pointssolving the advection-diffusion equation, a = 10.
centered scheme produces the least amount of error. For larger wavenumbers up to
k/(P + 1) ≈ 2, the centered-one-sided scheme generates the least amount of error.
Table 2.2 lists the resolving efficiencies for different interface flux formulations and
polynomial orders. A slope tolerance of ε = 0.1 is used to compare the ability of a
scheme to capture well resolved waves. We see that the centered-centered scheme
produces a better resolving efficiency for even or high polynomial orders while the
centered-one-sided scheme produces a higher resolving efficiency for odd polynomials
of low order. Interestingly, the commonly used upwind-one-sided scheme is not the
best for any polynomial order.
CHAPTER 2. EIGENSOLUTION ANALYSIS 20
P Centered-centered Upwind-centered Centered-one-sided Upwind-one-sided
1 0.0296 0.0297 0.0469 0.04642 0.0887 0.0650 0.0684 0.05073 0.0872 0.0842 0.0932 0.08844 0.1145 0.1120 0.1091 0.10345 0.1482 0.1350 0.1245 0.1211
Table 2.2: Resolving efficiencies, ε = 0.1, for the DG scheme using Gauss-Legendresolution points solving the advection-diffusion equation, a = 10.
Chapter 3
CFL Restrictions and Time Step
Estimates
This chapter is split into two sections. In the first section, the fully discrete equation
and its stability criteria are utilized to compare the CFL restrictions of different
schemes and equations. In the second section, a conservative prediction method for
the maximum physical time step is proposed and extended to multidimensional tensor
product elements for the Navier-Stokes equations.
3.1 Time Integration and CFL Restrictions
This section discusses the CFL restriction of nodal DG with different interface flux
formulations for the advection, diffusion and advection-diffusion equations. Towards
this end, the fully discrete advection-diffusion equation is constructed for a general ex-
plicit, M-stage Runge-Kutta (RK) type scheme and the stability criterion is imposed
to solve for the maximum nondimensional time step of the discrete linear dynamical
system. We show that the coupling of advection and diffusion leads to a stricter CFL
limit compared to pure-advection and pure-diffusion. We also show that the centered-
centered scheme has the least restrictive CFL limit with a limit around 5 times larger
than the CFL limit for the upwind-one-sided scheme for certain wavenumbers.
21
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 22
3.1.1 Multi-stage explicit time integration scheme
The fully discrete numerical update equation can be obtained from Eq. (2.16)
δuδnδt
= −R(a, k)uδn, ∀n, t > 0. (3.1)
In the case of a general explicit, M-stage Runge-Kutta (RK) time integration scheme,
the numerical solution can be written as
uδn(t+ ∆t) = S(a, k,∆t)uδn(t), ∀n, t > 0, (3.2)
where ∆t is the numerical time step and S(a, k,∆t) is the numerical integration
operator,
S(a, k,∆t) =M∑m=0
(−1)mνmm!
∆tmR(a, k)m, (3.3)
where νm for m = 0, 1, . . . ,M are coefficients of the RK scheme and M is the number
of stages.
3.1.2 Stability Criteria
The stability criteria of the discrete linear dynamical system in Eq. (3.2) can be
written as
ρ(S(a, k,∆t)) ≤ 1 ∀k ∈ [0, (P + 1)π], (3.4)
where ρ is the spectral radius. The computation of the spectral radius requires the
eigenvalues of S(a, k,∆t) which can be found by using the eigendecomposition from
Eq. (2.18) as
S(a, k,∆t) = W (a, k)
[M∑m=0
(−1)mνmm!
∆tmΓ(a, k)m
]W−1(a, k),
= W (a, k)Λ(a, k,∆t)W−1(a, k) (3.5)
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 23
where Λ is the diagonal matrix of eigenvalues λp(a, k,∆t) ∈ C for p = 1, 2, . . . , P + 1.
The maximum time step can then be obtained as the solution to
maximize ∆t
subject toP+1maxp=1|λp(a, k,∆t)| ≤ 1, ∀k ∈ [0, (P + 1)π], (3.6)
for a given time integration scheme and nondimensional wavespeed a. To find the
stablity criteria, the nondimensional timestep is transformed back into a physical time
step using the transformation introduced in the beginning of chapter 2,
∆t ≤ ∆tmax = ∆tmaxh2
b. (3.7)
In the sections to follow, we use a standard 4 stage, 4th order Runge-Kutta (RK44)
scheme for illustration. A bisection method is used to find ∆tmax.
3.1.3 CFL restrictions on advection-diffusion
The maximum physical time step, ∆tmax, is used to compare the CFL restrictions
for all wavenumbers for advection, diffusion and advection-diffusion problems. Figure
3.1 plots the maximum physical time step for different equations using P = 2 and
P = 3. The figure shows that the coupling of advection and diffusion decreases the
maximum physical time step for all wavenumbers leading to a stricter CFL limit.
Hence, choosing the minimum of the maximum stable time steps for the advection
equation and diffusion equation is inadequate when choosing a time step for the
advection-diffusion equation.
3.1.4 CFL restrictions of nodal DG with different interface
fluxes
Figure 3.2 plots the maximum nondimensional time step for the four types of interface
flux formulations in Table 2.1 using P = 2 and P = 3 for an advection-diffusion prob-
lem with a = 10. We see that the centered-centered scheme has the least restrictive
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 24
0 0.5 1 1.5 2 2.5 30
0.05
0.1
0.15
0.2
0.25
0.3
k/(P + 1)
∆t m
ax
Advection, a = 10
Diffusion, b = 1Advection-Diffusion, a = 10
(a) P = 2
0 0.5 1 1.5 2 2.5 30.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
0.05
0.055
0.06
k/(P + 1)
∆t m
ax
Advection, a = 10
Diffusion, b = 1Advection-Diffusion, a = 10
(b) P = 3
Figure 3.1: Maximum physical time step, ∆tmax, vs. nondimensional wavenumberfor the RK44, DG, centered-centered scheme using Gauss-Legendre solution points.
0 0.5 1 1.5 2 2.5 30
0.01
0.02
0.03
0.04
0.05
0.06
0.07
0.08
0.09
0.1
k/(P + 1)
∆t m
ax
Centered-centeredUpwind-centeredCentered-one-sidedUpwind-one-sided
(a) P = 2
0 0.5 1 1.5 2 2.5 30
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
k/(P + 1)
∆t m
ax
Centered-centeredUpwind-centeredCentered-one-sidedUpwind-one-sided
(b) P = 3
Figure 3.2: Maximum nondimensional time step, ∆tmax, vs. nondimensionalwavenumber, k/(P + 1), for the RK44, DG scheme using Gauss-Legendre solutionpoints solving the advection-diffusion equation, a = 10.
CFL limit while the upwind-one-sided scheme has the most restrictive CFL limit. In
fact, the CFL limit of the centered-centered scheme is larger than the upwind-one-
sided scheme for all wavenumbers and around 5 times larger for certain wavenumbers.
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 25
3.2 Time Step Estimates
This section begins by showing that the minimum of the maximum stable time steps
for pure-advection and pure-diffusion is inadequate when choosing a maximum stable
time step for the advection-diffusion equation. However, a conservative prediction
can be obtained from the harmonic sum of the maximum stable time steps for the
advection and diffusion equations. We propose extensions to multidimensional tensor
product elements in structured as well as unstructured grids and to the Navier-Stokes
equations.
3.2.1 Conservative prediction method
The stability criterion in Eq. (3.7) gives an exact value for the maximum physical
time step for an advection-diffusion problem but requires the knowledge of a multi-
variable, nonlinear function ∆tmax which changes depending on the time integration
scheme, choice of correction function, common interface value, polynomial order and
non-dimensional parameter a = ah/b. Several prediction methods are often used in
an effort to reduce the dependency of ∆tmax on a with the hopes that the method is
conservative. One method is to use the minimum of the maximum stable time steps
for pure-advection and pure-diffusion,
∆t ≤ min
(∆tadvmax
h
a, ∆tdiffmax
h2
b
)or ∆tmax = min
(∆tadvmax
1
a, ∆tdiffmax
),
(3.8)
where ∆tadvmax and ∆tdiffmax are the maximum nondimensional time steps for the ad-
vection and diffusion equations, respectively. These values can be determined by
removing the diffusion or advection term in the fully discrete equation and solving
the optimization problem in Eq. (3.6). Table 3.1 provides these values for a variety
of common interface values and polynomial orders.
A more conservative method that is used for the Navier-Stokes equation [35, 60, 59]
is the harmonic sum of the maximum stable time steps for the advection equation
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 26
∆tadvmax ∆tdiffmax
P Centered Upwind Centered One-sided
1 0.707 0.464 0.174 0.07732 0.349 0.235 0.0426 0.01873 0.213 0.145 0.0158 0.006344 0.143 0.100 0.00719 0.002665 0.103 0.0736 0.00373 0.00129
Table 3.1: Maximum nondimensional time steps for the RK44, DG scheme usingGauss-Legendre points.
and the diffusion equation,
∆t ≤ 1a
∆tadvmax h+ b
∆tdiffmax h2
or ∆tmax =1
a∆tadvmax
+ 1
∆tdiffmax
. (3.9)
Figure 3.3 shows how these methods compare with the exact maximum nondimen-
sional time step for P = 2. The estimate in Eq. (3.8) is at or above the maximum
10−2
100
102
104
0
0.005
0.01
0.015
0.02
0.025
0.03
0.035
0.04
0.045
a = ah/b
∆t m
ax
MinimumHarmonicExact
(a) Centered-centered scheme:α(1,1) = 0.5, α(2,1) = 0.5, α(2,2) = 0.5
10−2
100
102
104
0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
a = ah/b
∆t m
ax
MinimumHarmonicExact
(b) Upwind-one-sided scheme:α(1,1) = 0.0, α(2,1) = 1.0, α(2,2) = 0.0
Figure 3.3: Maximum nondimensional time step estimates vs. nondimensionalwavespeed for the RK44, DG scheme of order P = 2 using Gauss-Legendre solu-tion points.
allowable timestep for all values of a clearly showing that it fails as a conservative
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 27
prediction method. On the other hand, the harmonic sum gives conservative predic-
tions for the full range of advection-diffusion problems. Moreover, in the case of the
upwind-one-sided scheme, a prediction based on the harmonic sum almost exactly
matches the exact CFL limit. These observations hold for other choices of common
interface values and polynomial orders as well, showing that Eq. (3.9) can be used as
a conservative estimate for the CFL restriction. Figure 3.3 also shows that a ≈ 10
is near a region where the coupling of advection and diffusion has a dramatic effect
on the CFL limit of the scheme. In the case of the centered-centered scheme, the
harmonic sum is still conservative but loses accuracy near a ≈ 10.
3.2.2 Extension to tensor product elements
The 1D FR formulation for advection-diffusion can be extended to quadrilateral and
hexahedral elements as discussed in [41, 42, 15]. This section proposes a method
for extending the time step estimate from Eq. (3.9) to non-curved tensor product
elements. For simplicity, the discussion focuses on rectilinear and quadrilateral meshes
but the methodology can be easily extended to hexahedral elements.
Consider a multidimensional linear advection-diffusion problem,
∂u
∂t+
Nd∑d=1
ad∂u
∂xd= b
Nd∑d=1
∂2u
∂x2d
, xd ∈ R, t > 0,
where Nd is the number of spatial dimensions, ad is the advection coefficient in the
dth dimension and the domain is partitioned into Neles non-overlapping, conforming
tensor product elements, Ω =Neles⋃n=1
Ωn.
In the special case of a rectangular element, the element domain is defined by Ωn =
[xd,n, xd,n+1) for d = 1, 2, . . . , Nd. Each element is further discretized into (P +1)Nd
distinct solution points where each point belongs to Nd sets of P + 1 collinear points
allowing for a simple extension of the one-dimensional case to multiple dimensions.
The rectangular element size is defined by Nd side lengths of hd = xd,n+1 − xd,n for
d = 1, 2, . . . , Nd and a linear isoparametric mapping is again used on the physical
domain. The fully discrete numerical equation can now be constructed by following
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 28
the same procedure as before,
δuδnδt
= −Ruδn, ∀n, t > 0. (3.10)
where R is defined as,
R(a1, . . . , aNd , b, k1, . . . , kNd , h1, . . . , hNd) =
Nd∑d=1
[2adhdQ1d(kdhd)−
4b
h2d
Q2d(kdhd)
],
(3.11)
and the equation can again be integrated using a general explicit, M-stage RK scheme
to produce a similar stability criteria as Eq. (3.4). The conservative prediction method
for ∆t from Eq. (3.9) can now be extended, in a similar manner, to estimate the time
step of the problem,
∆t ≤ 1Nd∑d=1
[ad
∆tadvmax hd+ b
∆tdiffmax h2d
] . (3.12)
In the case of a non-rectangular element, we can construct a maximum time
step estimate as the harmonic sum of estimates for each dimension. Each edge of the
quadrilateral element contains P +1 distinct flux points for a total of Nfpts = 4(P +1)
flux points. A time step estimate on each flux point can be computed as,
∆tm =1
|am|∆tadvmax hm
+ b
∆tdiffmax h2m
, m = 1, 2, . . . , Nfpts, (3.13)
where |am| is the absolute value of the advection coefficient normal to the face and
hm are distances between flux points defined in Figure 3.4, for example, h1 = h6 =
h1,6. The minimum time step is then computed on each face. From Figure 3.4, the
minimum time steps computed on the west, east, south and north faces are
∆tW = min(∆t7,∆t8
), ∆tE = min
(∆t3,∆t4
), (3.14)
∆tS = min(∆t1,∆t2
), ∆tN = min
(∆t5,∆t6
). (3.15)
Finally from Eq. (3.9), the maximum time step estimate can be based on a harmonic
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 29
rr
r
r
rr
r
S1
2
3
4
56
7
8
r
N
EW
h1,6
h2,5
h3,8
h4,7
Figure 3.4: A visual representation of a quadrilateral element for a polynomial orderof P = 1. The flux points are marked by red squares and west, east, south and northfaces are represented by W,E, S,N , respectively. The distances between flux points,h1,6, h3,8, h2,5, h4,7 are used to estimate the maximum time step in the element.
sum of the minimum time step for each dimension,
∆t ≤ 11
min (∆tW ,∆tE)+ 1
min (∆tS ,∆tN)
. (3.16)
3.2.3 Extension to Navier-Stokes equations
Consider the unsteady, two-dimensional, compressible Navier-Stokes equations in con-
servative form,
∂U
∂t+∂Finv
∂x1
+∂Ginv
∂x2
=∂Fvisc
∂x1
+∂Gvisc
∂x2
(3.17)
where,
U =
ρ
ρu
ρv
e
, Finv =
ρu
ρu2 + p
ρuv
(e+ p)u
, Ginv =
ρv
ρvu
ρv2 + p
(e+ p)v
,
CHAPTER 3. CFL RESTRICTIONS AND TIME STEP ESTIMATES 30
Fvisc =
0
σ11
σ12
uσ11 + vσ21 − q1
, Gvisc =
0
σ21
σ22
uσ12 + vσ22 − q2
, (3.18)
where ρ is density, u, v are the velocity components in the x1, x2 directions, respec-
tively, and e is total energy per unit volume. The pressure is determined from the
equation of state,
p = (γ − 1)
(e− 1
2ρ(u2 + v2
)), (3.19)
where γ is the ratio of specific heats. For a Newtonian fluid, the viscous stresses are
σij = µ
(∂ui∂xj
+∂uj∂xi
)− 2
3µδij
∂uk∂xk
, (3.20)
and the heat fluxes are
qi = −k ∂T∂xi
, (3.21)
where k = Cpµ/Pr, T = p/(ρR), Pr is the Prandtl number, Cp is the specific heat at
constant pressure, R is the gas constant and µ is the dynamic viscosity.
As before, a time step estimate on each flux point can be computed as,
∆tm =1
|λm|∆tadvmax hm
+ νm∆tdiffmax h2
m
, m = 1, 2, . . . , Nfpts, (3.22)
where λm and νm are the numerical wavespeeds and numerical diffusion coefficients
on each flux point, respectively. From [59], these can be estimated as,
|λm| = |Vm|+ cm, νm = max
(µmρm
,γµmPrρm
), m = 1, 2, . . . , Nfpts, (3.23)
where Vm is the velocity normal to the face and cm =√
γpmρm
is the speed of sound.
The maximum time step can then be estimated from Eq. (3.16).
Chapter 4
Numerical Experiments
In this chapter, we verify the results obtained in the previous chapters by solving
the advection-diffusion and Navier-Stokes equations on 1D uniform, 2D rectilinear
and 2D unstructured grids. The numerical tests for advection-diffusion confirm that
the centered-centered scheme generates the least amount of error for well resolved
solutions while the centered-one-sided scheme generates the least amount of error
for solutions that are under-resolved. For the Navier-Stokes equations, P = 2, the
upwind-centered scheme generates the least amount of error for well resolved solutions
while the upwind-one-sided scheme generates the least amount of error for solutions
that are under-resolved. These results match the expectations in Section 2.2. The
time step estimates established in Section 3.2 are also verified and the formula in Eq.
(3.12) is shown to be an effective prediction for the maximum stable time step for
both the advection-diffusion and Navier-Stokes equations on Cartesian grids. In the
case of unstructured grids, this time step estimate is not always conservative but it
is still fairly accurate. The estimates are shown to be accurate within 50% error on
all test cases and conservative on tests with Cartesian grids.
In the numerical tests that follow, the correction functions and solution point
locations are chosen to recover the nodal DG scheme on Gauss-Legendre points and
a standard 4 stage, 4th order RK scheme is used for time integration.
31
CHAPTER 4. NUMERICAL EXPERIMENTS 32
4.1 1D Advection-Diffusion of an approximate
Gaussian
The first test case involves the solution of the 1D advection-diffusion equation with
an initial condition of an approximate Gaussian and periodic boundary conditions.
The domain is chosen as x ∈ [−10, 10] and Neles = 20 elements are used to construct a
uniform mesh with h = 1. An advection speed of a = 10 and a diffusion coefficient of
b = 1 are used in order to test cases with a nondimensional wavespeed of a = 10. All
cases have been verified to be stable using the maximum stable time step estimate
proposed in Eq. (3.9). We choose to use half of this estimate to minimize time
integration errors.
The initial condition and exact solution of an approximate Gaussian can be com-
puted from,
un,p(t) =
Nk∑j=−Nk
θj exp(−bk2j t) cos(kj(xn,p − at)), (4.1)
where θj is the jth spectral weight, kj = 2πj/L is the jth wavenumber associated with
the domain length, L = 20, and Nk is the number of waves used. Nk is chosen to be
the largest value under the constraint kNk ≤ (P + 1)π/h. The spectral weights are
defined as
θj =exp(−(σkj)
2/2)
√2πσ
Nk∑s=−Nk
exp(−(σks)2/2)
, −Nk ≤ j ≤ Nk, (4.2)
where σ is the standard deviation of the Gaussian. The standard deviation dictates
the width of the Gaussian and changes the spectral weight distribution across the
wave spectrum.
CHAPTER 4. NUMERICAL EXPERIMENTS 33
The vector `2 norm of the relative error is computed as
‖uδ(t)− u(t)‖`2‖u(t)‖`2
=
Neles∑n=1
P+1∑p=1
∣∣uδn,p(t)− un,p(t)∣∣2Neles∑n=1
P+1∑p=1
∣∣un,p(t)∣∣2
12
, (4.3)
where uδn,p(t) is the numerical solution in the nth element at the pth solution point.
We also compute the L2 functional norm of the relative error for comparison,
‖uδ(x, t)− u(x, t)‖L2
‖u(x, t)‖L2
=
Neles∑n=1
∫Ωn
∣∣uδn(x, t)− un(x, t)∣∣2 dΩ
Neles∑n=1
∫Ωn
∣∣un(x, t)∣∣2 dΩ
12
, (4.4)
where the integral is approximated numerically using Gaussian quadrature with 12
quadrature points in each element.
We begin with the case of a well resolved Gaussian with σ = 8/√
2π and a solution
polynomial of P = 2. Figure 4.1, which plots the initial condition and the spectral
weight distribution of this Gaussian, shows that the initial condition is indeed well
resolved. Figure 4.2 plots the relative error for different interface fluxes. As expected
−10 −5 0 5 10
−0.04
−0.02
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
x
u
(a) Initial condition
−3 −2 −1 0 1 2 310
−200
10−150
10−100
10−50
100
k/(P + 1)
µ
(b) Spectral weight distribution
Figure 4.1: High resolution approximate Gaussian, P = 2, σ = 8/√
2π.
CHAPTER 4. NUMERICAL EXPERIMENTS 34
from the results in Figure 2.5, the centered-centered scheme produces less error.
0 0.2 0.4 0.6 0.8 110
−5
10−4
10−3
t/T
‖uδ−
u‖2/‖u‖2
Centered-centeredUpwind-centered
Centered-one-sidedUpwind-one-sided
(a) `2 pointwise norm
0 0.2 0.4 0.6 0.8 110
−5
10−4
10−3
t/T
‖uδ−
u‖2/‖u‖2
Centered-centeredUpwind-centered
Centered-one-sidedUpwind-one-sided
(b) L2 functional norm
Figure 4.2: Relative error vs. time periods for the RK44, DG scheme of order P = 2using Gauss-Legendre solution points solving the advection-diffusion equation, a =10, on a high resolution approximate Gaussian, σ = 8/
√2π.
For the next case, the standard deviation of the Gaussian is decreased to σ =
1/√
2π. Figure 4.3 plots the initial condition and the spectral weight distribution for
this Gaussian showing that the initial condition is now poorly resolved. Figure 4.4
−10 −5 0 5 10−0.2
0
0.2
0.4
0.6
0.8
1
1.2
x
u
(a) Initial condition
−3 −2 −1 0 1 2 310
−5
10−4
10−3
10−2
10−1
k/(P + 1)
µ
(b) Spectral weight distribution
Figure 4.3: Low resolution approximate Gaussian, P = 2, σ = 1/√
2π.
plots the relative error for different interface fluxes. Once again, as expected from the
CHAPTER 4. NUMERICAL EXPERIMENTS 35
results in Figure 2.5, the least amount of error is now exhibited by the center-one-sided
scheme.
0 0.2 0.4 0.6 0.8 110
−3
10−2
10−1
t/T
‖uδ−
u‖2/‖u‖2
Centered-centeredUpwind-centered
Centered-one-sidedUpwind-one-sided
(a) `2 pointwise norm
0 0.2 0.4 0.6 0.8 110
−3
10−2
10−1
t/T
‖uδ−
u‖2/‖u‖2
Centered-centeredUpwind-centered
Centered-one-sidedUpwind-one-sided
(b) L2 functional norm
Figure 4.4: Relative error vs. time periods for the RK44, DG scheme of order P = 2using Gauss-Legendre solution points solving the advection-diffusion equation, a =10, on a low resolution approximate Gaussian, σ = 1/
√2π.
4.2 2D Advection-Diffusion of an approximate
Gaussian
We now consider the 2D advection-diffusion equation in a periodic box. The domain
is chosen to be square, x1 ∈ [−10, 10], x2 ∈ [−10, 10] and 2000 elements are used to
construct a 2D, nonuniform, rectilinear grid with a minimum element size of h1 = 0.2
and h2 = 0.1 near the center of the domain as shown in Figure 4.5. All cases have been
verified to be stable when using the maximum stable time step estimate proposed in
Eq. (3.12). We choose to use half of this estimate to minimize time integration errors.
The initial condition and exact solution can be computed from,
un,p(t) =
Nk2∑j2=−Nk2
Nk1∑j1=−Nk1
θj1,j2
2∏d=1
exp(−(bk2d,jdt) cos(kd,jd(xd,n,p − adt)), (4.5)
CHAPTER 4. NUMERICAL EXPERIMENTS 36
−10 −5 0 5 10−10
−8
−6
−4
−2
0
2
4
6
8
10
x1
x2
Figure 4.5: Nonuniform rectilinear grid with 40×50 elements and a minimum elementsize of h1 = 0.2 and h2 = 0.1.
where θj1,j2 is defined as
θj1,j2 =exp(−σ2(k2
1,j1+ k2
2,j2)/2)
2πσ
Nk2∑s2=−Nk2
Nk1∑s1=−Nk1
exp(−σ2(k21,j1
+ k22,j2
)/2)
, ∀j1, j2. (4.6)
The relative error is computed using Eq. (4.3).
The first case uses a well resolved Gaussian with σ = 8/√
2π and a solution
polynomial of P = 2. Figure 4.6 plots the initial condition and relative error for
different interface fluxes. As in the 1D case, the centered-centered scheme produces
the least amount of error.
The standard deviation of the Gaussian is now decreased to σ = 1/√
2π so that
the initial condition is now under-resolved. Figure 4.7 plots the initial condition and
relative error for different interface fluxes. Since the grid is nonuniform, the Gaussian
travels through a refined region during the first quarter period leading to smaller errors
for the centered-centered and upwind-centered schemes. As the Gaussian travels
through the less refined region near the half period, the centered-one-sided scheme
starts producing the least amount of error resulting in the most accurate solution at
the end of one period.
CHAPTER 4. NUMERICAL EXPERIMENTS 37
(a) Initial condition
0 0.2 0.4 0.6 0.8 110
−6
10−5
10−4
t/T
‖uδ−
u‖2/‖u‖2
Centered-centeredUpwind-centered
Centered-one-sidedUpwind-one-sided
(b) `2 pointwise norm of relative error
Figure 4.6: Initial condition and relative error for the RK44, DG scheme of orderP = 2 using Gauss-Legendre solution points solving the advection-diffusion equation,a = 10, on a high resolution approximate Gaussian, σ = 8/
√2π.
(a) Initial condition
0 0.2 0.4 0.6 0.8 1
10−4
10−3
t/T
‖uδ−
u‖2/‖u‖2
Centered-centeredUpwind-centered
Centered-one-sidedUpwind-one-sided
(b) `2 pointwise norm of relative error
Figure 4.7: Initial condition and relative error for the RK44, DG scheme of orderP = 2 using Gauss-Legendre solution points solving the advection-diffusion equation,a = 10, on a low resolution approximate Gaussian, σ = 1/
√2π.
4.3 2D Couette Flow
The last numerical test involves the solution of air passing between two parallel plates,
separated by a distance H. The lower plate is kept stationary at a fixed temperature
Tl = 300K while the upper plate travels with constant velocity Uu and at a fixed
CHAPTER 4. NUMERICAL EXPERIMENTS 38
temperature Tu = 315K. Uu is computed from a set Mach number of Mau = 0.2.
The Reynolds number of this flow is set to Reu = 20 and the dynamic viscosity, µ, is
held constant. The ratio of specific heats, γ, and the Prandtl number, Pr, are set to
1.4 and 0.72, respectively. The domain is rectangular, x1 ∈ [−1, 1], x2 ∈ [0, 1], with
periodic boundary conditions on the left and right boundaries and fixed and moving
isothermal boundary conditions on the bottom and top boundaries, respectively. The
flow field is initialized with the velocity of the top plate and RK44 is used to march
the simulation to a steady state solution. The exact solution for the velocity profile
is
u(x) =Uux2
H, v(x) = 0. (4.7)
A series of Cartesian and unstructured quadrilateral meshes are used to solve
the Navier-Stokes equations. For example, Figure 4.8 shows a (32 × 16) Cartesian
mesh and a unstructured quadrilateral mesh with (639) elements along with the
numerical Mach contours. For any given mesh, the nondimensional wavespeed, a,
0.1
Mach
0
0.2
(a) Cartesian mesh, (32× 16) elements
0.1
Mach
0
0.2
(b) Unstructured mesh, (639) elements
Figure 4.8: Couette flow Mach contours, DG upwind-one-sided scheme using Gauss-Legendre solution points, P = 3.
can be approximated by using Eq. (3.23) based on the properties of the flow attached
to the upper plate,
a ≈ λuh
νu=
(Uu + cu)h
max(µρu, γµ
Prρu
) . (4.8)
CHAPTER 4. NUMERICAL EXPERIMENTS 39
For air, max(µρu, γµ
Prρu
)= γµ
Prρu. a can now be reduced by using the Reynolds number,
Reu =ρuUuH
µ(4.9)
a =Peu(Mau + 1)
γMauHh, (4.10)
where Peu = ReuPr = 14.4 is the Peclet number. For a given mesh, the element size
h is computed as h =√
2Neles
which recovers the exact element size for a Cartesian
mesh. a is then computed for each mesh and shown in Table 4.1.
Neles
(4× 2) (8× 4) (16× 8) (32× 16) (159) (639)
h 0.5 0.25 0.125 0.0625 0.07931 0.03956a 30.86 15.43 7.714 3.857 4.894 2.441
Table 4.1: Element size ‘h’ and nondimensional parameter ‘a’ for the meshes employedfor Couette flow.
Convergence of the L2 velocity error with grid spacing h is shown in Figure 4.9
for P = 2 and Figure 4.10 for P = 3. The rates of convergence for both Cartesian
10−1
10−7
10−6
10−5
h =√
2Neles
‖eV‖2
Upwind-centeredUpwind-one-sidedOrder 3
(a) Cartesian meshes
10−1
10−8
10−7
10−6
h =√
2Neles
‖eV‖2
Upwind-centeredUpwind-one-sidedOrder 3
(b) Unstructured quadrilateral meshes
Figure 4.9: 3rd order results for Couette flow, RK44, DG scheme using Gauss-Legendre solution points, P = 2.
CHAPTER 4. NUMERICAL EXPERIMENTS 40
10−1
10−10
10−9
10−8
10−7
10−6
h =√
2Neles
‖eV‖2
Upwind-centeredUpwind-one-sidedOrder 4
(a) Cartesian meshes
10−1
10−10
10−9
h =√
2Neles
‖eV‖2
Upwind-centeredUpwind-one-sidedOrder 4
(b) Unstructured quadrilateral meshes
Figure 4.10: 4th order results for Couette flow, RK44, DG scheme using Gauss-Legendre solution points, P = 3.
and unstructured meshes match well with the expected orders of around P + 1 for
the upwind-one-sided scheme [11].
The L2 velocity errors and orders of accuracy are given in Table 4.2 for the Carte-
sian meshes and Table 4.3 for unstructured meshes. For P = 2, Cartesian meshes, the
Upwind-centered Upwind-one-sided
P Neles L2 Error Order ∆tmax
∆tpredL2 Error Order ∆tmax
∆tpred
2 (4× 2) 4.415e-05 - 1.3 4.358e-05 - 1.1(8× 4) 5.256e-06 3.07 1.4 5.427e-06 3.01 1.0(16× 8) 5.934e-07 3.15 1.5 6.430e-07 3.08 1.0(32× 16) 6.546e-08 3.18 1.3 7.741e-08 3.05 1.0
3 (4× 2) 4.584e-07 - 1.3 4.380e-07 - 1.1(8× 4) 3.089e-08 3.89 1.4 2.706e-08 4.02 1.0(16× 8) 1.995e-09 3.95 1.2 1.622e-09 4.06 1.0(32× 16) 1.274e-10 3.97 1.1 1.004e-10 4.01 1.0
Table 4.2: Couette flow results for the RK44, DG scheme using Gauss-Legendresolution points on Cartesian quadrilateral meshes.
upwind-one-sided scheme produces less error for the less refined case while the upwind-
centered scheme produces less error when the mesh is more refined. For P = 3, the
CHAPTER 4. NUMERICAL EXPERIMENTS 41
Upwind-centered Upwind-one-sided
P Neles L2 Error Order ∆tmax
∆tpredL2 Error Order ∆tmax
∆tpred
2 (159) 5.214e-07 - 1.5 5.432e-07 - 0.9(639) 5.249e-08 3.31 1.4 6.003e-08 3.18 0.9
3 (159) 1.777e-09 - 1.3 1.529e-09 - 0.8(639) 9.192e-11 4.27 1.3 7.922e-11 4.27 0.8
Table 4.3: Couette flow results for the RK44, DG scheme using Gauss-Legendresolution points on unstructured quadrilateral meshes.
upwind-one-sided scheme produces less error for all meshes which matches the results
from Table 2.2. For unstructured meshes, the upwind-centered scheme produces less
error for P = 2 and the upwind-one-sided scheme produces less error for P = 3. This
matches well with the results from the Cartesian meshes since the meshes are more
refined.
The ratios of maximum allowable time step over the estimated time step computed
using Eq. (3.16), ∆tmax
∆tpred, are shown in Table 4.2 for the Cartesian meshes. The results
show that the time step estimate remains conservative for both schemes and matches
almost exactly for the upwind-one-sided scheme as was expected from Figure 3.3. It’s
also important to note that in the case of the upwind-centered scheme, the time step
estimate loses accuracy near a ≈ 10, as expected from Figure 3.3.
The results from the unstructured meshes in Table 4.3 show that the time step
estimate is conservative for the upwind-centered scheme but not for the upwind-one-
sided scheme, however, the time step estimate is still fairly accurate for both cases.
Part II
Implicit Time Stepping on
GPU Architectures
42
Chapter 5
Introduction
The most popular implicit time integration strategies for high-order, unsteady, fluid
flow simulations fall into three categories: backward differentiation formulas (BDF)
[64], diagonally implicit Runge-Kutta (DIRK) methods [71] and fully implicit Runge-
Kutta (IRK) methods [68]. Each have their own advantages and disadvantages but
all require the solution of an implicit function. Pseudo time stepping [49] and New-
ton’s method are often used to solve the nonlinear implicit function. These methods
can be enhanced by using h-multigrid [85], p-multigrid [28], hp-multigrid [63] or a
Newton-Krylov method with preconditioning [75, 70]. Each technique requires the
construction of a Jacobian matrix to solve the resulting linear system. Thus, storing
and computing the Jacobian matrix is a vital portion of any implicit method.
Storing and computing the global Jacobian matrix can be prohibitively expensive
for high-order methods. There have been several methods proposed to attempt to
reduce the size and time complexity. For example, in [83, 56, 24, 25], lower-upper
symmetric Gauss-Seidel (LU-SGS) with first order approximate element local Jaco-
bians are used to avoid the construction of off-diagonal block Jacobians. In [28],
an element line Jacobi smoother is proposed which only stores the Jacobians for a
line of elements. In [72], a line-based DG method is suggested which significantly
reduces the sparsity pattern of the global Jacobian matrix. More recently, Kronecker
products were used in [26, 67, 69] to reduce the size and complexity of element local
Jacobians. Typically, implicit time stepping on GPU architectures has been a more
43
CHAPTER 5. INTRODUCTION 44
difficult problem to address because of the high memory requirements and the low
arithmetic intensity but, in [36], an implicit time stepping method was developed for
two-dimensional triangular grids on a single GPU.
In Part II of this dissertation, analytical element local Jacobians for the Direct
Flux Reconstruction (DFR) method applied to advection-diffusion, Euler and the
Navier-Stokes equations are derived. A Kronecker product formulation for these
Jacobians is developed to reduce the time complexity and a detailed discussion on the
sparsity pattern of these matrices is given. An efficient implicit time stepping method
for DFR on multi-GPU architectures is also designed, implemented and tested.
The second part of this dissertation is formatted as follows. In chapter 6, we intro-
duce a brief description of the DFR method in multiple dimensions for the advection-
diffusion, Euler and Navier-Stokes equations. In chapter 7, we describe the time
stepping methods used to solve steady and unsteady fluid flow simulations and the
block iterative methods used for the linear solver. In chapter 8, we analytically de-
rive the element local Jacobian and its Kronecker product formulation for the DFR
method in multiple dimensions for the advection-diffusion, Euler and Navier-Stokes
equations. In chapter 9, we provide numerical results for the inviscid bump, inviscid
NACA0012 airfoil, isentropic vortex, laminar Joukowski airfoil and three-dimensional
half cylinder with a Reynolds number of 1000. Lastly, in chapters 10 and 11, we
describe the implementation and performance of the GPU accelerated implicit time
stepping method for DFR.
Part II of this dissertation is based on the following publications:
• Jerry Watkins, Joshua Romero, and Antony Jameson. Multi-GPU, Implicit
Time Stepping for High-order Methods on Unstructured Grids. 46th AIAA
Fluid Dynamics Conference, 2016. [94]
• Jerry Watkins, Freddie Witherden, and Antony Jameson. A Kronecker prod-
uct formulation for the high-order, direct Flux Reconstruction method. (in
progress).
All results are generated using an in-house, high-order, compressible flow solver for
GPU architectures called ZEFR.
Chapter 6
The Direct Flux Reconstruction
Method
The direct Flux Reconstruction (DFR) method was first proposed by Romero et al.
in [77] as a simplified and efficient alternative to the FR method. In the DFR method,
a Lagrange interpolation over all interior and interface points is used to perform the
flux correction as opposed to a correction function. This scheme has been proven
capable of recovering the FR form of the nodal DG method when Gauss-Legendre
points are used in the interior [77] and the method has also been extended to triangle
elements [78]. This work focuses on unstructured tensor-product elements such as
quadrilateral and hexahedral elements.
This chapter is split into four sections. In section 6.1, a description of the DFR
method for one-dimensional advection-diffusion is given. This is then is extended to
two- and three-dimensional unstructured quadrilateral and hexahedral elements in
sections 6.2 and 6.3, respectively. Section 6.4 describes the DFR method applied to
the Euler and Navier-Stokes equations.
45
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 46
6.1 One-Dimensional Formulation for Advection-
Diffusion
We begin by describing a one-dimensional advection-diffusion problem with discrete
boundaries. The DFR method is then applied to this problem using the techniques
described in [77, 78].
6.1.1 Problem Specification
Consider the conservative form of the one-dimensional advection-diffusion equation
defined on a bounded domain and subject to appropriate initial conditions and bound-
ary conditions,∂u
∂t+∂f
∂x= 0, x ∈ Ω = [0, 1], t > 0, (6.1)
where x is the spatial coordinate, t is time, u = u(x, t) is a conserved scalar quantity
and f = fadv(u)−fdiff
(u, ∂u
∂x
)is the difference between the advective flux and diffusive
flux. Note that the equation can be rewritten as a system of first-order equations,
∂u
∂t+∂f (u, q)
∂x= 0, x ∈ Ω = [0, 1], t > 0,
q − ∂u
∂x= 0. (6.2)
Following a traditional nodal finite element method [40], the domain is partitioned
into Neles non-overlapping elements,
Ω =
Neles⋃ele=1
Ωele, (6.3)
where Ωele = [xele, xele+1). With the domain partitioned, the exact solution u, the
exact solution derivative q and the exact flux f(u) can be approximated by the nu-
merical solution, the numerical solution derivative and the numerical flux,
uδ =
Neles∑ele=1
uδele, qδ =
Neles∑ele=1
qδele, f δ =
Neles∑ele=1
f δele. (6.4)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 47
A linear isoparametric mapping is introduced from the physical domain x ∈ Ωele to
the parent domain ξ ∈ ΩS = [−1, 1) such that
ξ(x|Ωele) = 2
(x− xele
xele+1 − xele
)− 1,
x|Ωele(ξ) =
(1− ξ
2
)xele +
(1 + ξ
2
)xele+1. (6.5)
Applying this transformation gives rise to a transformed system of equations within
the standard element ΩS of the following form,
∂uδele
∂t+∂f δele
∂ξ= 0,
qδele −∂uδele
∂ξ= 0, (6.6)
where
uδele = |Jele| uδele(x|Ωele(ξ), t), qδele = |Jele| qδele(x|Ωele
(ξ), t), f δele = f δele(x|Ωele(ξ), t),
and |Jele| = 12(xele+1 − xele) is the determinant of the geometric element Jacobian
matrix of the coordinate transformation. In what follows, the superscript δ to dif-
ferentiate the numerical solution, solution derivative and flux from their exact values
will be dropped to improve readability.
6.1.2 Direct Flux Reconstruction
Consider rearranging Eq. (6.6) and producing the semi-discrete form of the one-
dimensional advection-diffusion equation as a system of equations,
∂uele
∂t= − 1
|Jele|δfele
δξ, (6.7)
qele =1
|Jele|δuele
δξ, (6.8)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 48
where δfele
δξis the numerical derivative of fele and δuele
δξis the numerical derivative of
uele. In the DFR method, globally C0 continuous piecewise polynomials uC and fC
are constructed in order compute the numerical derivative of uele and fele in each
element.
The first step in this process is to further discretize each element by Nspts1D = P+1
distinct solution points so that the discontinuous solution in each element uele can be
represented by a piecewise interpolating polynomial of degree P ,
uele(ξ) =
Nspts1D∑spt=1
uspt,ele `spt(ξ), (6.9)
where `1(ξ), `2(ξ), . . . , `Nspts1D(ξ) are the Lagrange basis polynomials defined at the
solution points ξ1, ξ2, . . . , ξNspts1D. This can be written in vector format as,
uele(ξ) = `(ξ)′uele, (6.10)
where `(ξ)′ = [`1(ξ), `2(ξ), . . . , `Nspts1D(ξ)] and uele = [u1,ele, u2,ele, . . . , uNspts1D,ele]
′. To
recover the nodal DG method, the solution points are chosen to be collocated with the
zeros of the Legendre polynomial of degree P + 1, also known as the Gauss-Legendre
points [77].
The next step is to extrapolate the discontinuous solution to the element interfaces
using Eq (6.10). The extrapolated values in each element are written as,
uLele = uele(−1) = EL uele, uRele = uele(+1) = ER uele, (6.11)
where uLele and uRele are the extrapolated discontinuous solution values on the left
and right boundaries of the eleth element and EL,ER ∈ R(1×Nspts1D) are polynomial
extrapolation operators where
EL = `(−1)′, ER = `(+1)′, (6.12)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 49
The common interface solutions are computed by using the extrapolated discon-
tinuous solution on both sides of each interface,
uI,Lele = uI(uRele-1, u
Lele
), uI,Rele = uI
(uRele, u
Lele+1
), (6.13)
where uI(u−, u+) is the interface solution function, u− and u+ are the left and right
states of the solution on each side of the interface and uI,Lele and uI,Rele are the common
interface solutions on the left and right boundaries of the eleth element. The interface
solution function is defined by the scheme used to formulate second-order numerical
derivatives. In this case, the Local Discontinuous Galerkin (LDG) approach is chosen,
uI(u−, u+) =1
2(u− + u+)− β (u− − u+), (6.14)
where β = ±0.5 corresponds to one-sided common interface solutions which recover
LDG [22] and β = 0 corresponds to centered common interface solutions which recover
BR1 [14]. The common interface solutions at boundaries are computed directly using
appropriate boundary conditions for the problem,
uI,L1 = ub(uL1), uI,RNeles
= ub(uRNeles
), (6.15)
where ub(u) is the function which applies the boundary condition and u is the solution
extrapolated to the boundary. Note that ub(u) may be different depending on the
boundary condition.
The next step is to construct a continuous solution uCele such that a piecewise
sum results in a globally C0 continuous solution uC that passes through the common
interface solution values at element interfaces. This is accomplished by utilizing
Lagrange interpolating polynomials of degree P + 2,
uCele(ξ) = uI,Lele˜0(ξ) +
Nspts1D∑spt=1
uspt,ele˜spt(ξ) + uI,Rele
˜P+2(ξ), (6.16)
where ˜0(ξ), ˜1(ξ), . . . , ˜
P+2(ξ) are the Lagrange interpolating polynomials of degree
P + 2 defined at P + 3 collocation points −1, ξ1, ξ2, . . . , ξNspts1D, 1. The derivative
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 50
of uCele can now be obtained by differentiating Eq. (6.16) with respect to ξ,
∂uCele
∂ξ(ξ) = uI,Lele
∂ ˜0
∂ξ(ξ) +
Nspts1D∑spt=1
uspt,ele∂ ˜
spt
∂ξ(ξ) + uI,Rele
∂ ˜P+2
∂ξ(ξ). (6.17)
This can be rewritten in vector format as,
∂uCele
∂ξ(ξ) = uI,Lele
∂ ˜0
∂ξ(ξ) +
∂ ˜
∂ξ(ξ)′uele + uI,Rele
∂ ˜P+2
∂ξ(ξ), (6.18)
where ˜(ξ)′ = [˜1(ξ), ˜2(ξ), . . . , ˜
Nspts1D(ξ)]. This equation can then be evaluated at
each solution point to obtain the numerical derivative of uele,
δuspt,ele
δξ= uI,Lele
∂ ˜0
∂ξ(ξspt) +
∂ ˜
∂ξ(ξspt)
′uele + uI,Rele
∂ ˜P+2
∂ξ(ξspt), (6.19)
where spt = 1, 2, . . . , Nspts1D. This can be manipulated into a matrix-vector format
so that,δuele
δξ= DL
ξ uI,Lele +Dξ uele +DR
ξ uI,Rele , (6.20)
where Dξ ∈ R(Nspts1D×Nspts1D) and DLξ ,D
Rξ ∈ R(Nspts1D×1) are polynomial differentia-
tion operators such that
Dξp,m =∂ ˜
m
∂ξ(ξp), p,m = 1, 2, . . . , Nspts1D,
DLξp =
∂ ˜0
∂ξ(ξp), p = 1, 2, . . . , Nspts1D,
DRξp =
∂ ˜P+2
∂ξ(ξp), p = 1, 2, . . . , Nspts1D. (6.21)
Coupling this with Eq. (6.8) concludes the evaluation of the numerical solution deriva-
tive in vector format,
qele =1
|Jele|δuele
δξ. (6.22)
The numerical derivative of fele can be evaluated in much the same way as the
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 51
numerical derivative of uele. The first step in this process is to extrapolate the dis-
continuous numerical solution derivative to element interfaces,
qLele = qele(−1) = EL qele, qRele = qele(+1) = ER qele. (6.23)
Once the discontinuous numerical solution derivative is extrapolated, the trans-
formed common interface fluxes can be computed,
f I,Lele = f I(uRele-1, u
Lele, q
Rele-1, q
Lele
),
f I,Rele = f I(uRele, u
Lele+1, q
Rele, q
Lele+1
), (6.24)
where f I(u−, u+, q−, q+) = f Iadv(u−, u+) − f Idiff(u−, u+, q−, q+) is the interface flux
function. The Rusanov flux is commonly used as the interface advective flux function
so that
f Iadv(u−, u+) =1
2(fadv(u+) + fadv(u−))− 1
2|λ(u−, u+)|(u+ − u−), (6.25)
where
|λ(u−, u+)| = max
(∣∣∣∣∂fadv
∂u(u+)
∣∣∣∣ , ∣∣∣∣∂fadv
∂u(u−)
∣∣∣∣) , (6.26)
and ∂fadv
∂u(u) is the wavespeed or the derivative of the advective flux with respect
to the solution. Continuing with the LDG formulation, the interface diffusive flux
function can be defined as
f Idiff(u−, u+, q−, q+) =1
2(fdiff(u−, q−) + fdiff(u+, q+))+
β (fdiff(u−, q−)− fdiff(u+, q+)) + τ (u− − u+). (6.27)
This leaves the transformed common interface fluxes at boundaries which can be
computed as
f I,L1 = f b(uL1 , q
L1
), f I,RNeles
= f b(uRNeles
, qRNeles
), (6.28)
where f b(u, q) is the function which applies the boundary condition and u and q are
the solution and solution derivative extrapolated to the boundary.
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 52
The transformed continuous flux fCele can be formed and differentiated in much
the same way as the continuous solution so that the numerical derivative of fele can
computed in matrix-vector format,
δfele
δξ= DL
ξ fI,Lele +Dξ fele +DR
ξ f I,Rele , (6.29)
where fele = f (uele, qele) = fadv (uele)−fdiff (uele, qele) for all elements. This is coupled
with Eq. (6.7) to obtain the semi-discrete equation in vector format,
∂uele
∂t= − 1
|Jele|δfele
δξ. (6.30)
6.2 Two-Dimensional Formulation for Advection-
Diffusion
The DFR method, along with other schemes within the Flux Reconstruction (FR)
family, can be directly extended to quadrilateral elements using a tensor-product
formulation [41, 42]. While the cited references describe the methodology in the
context of the standard FR method, the same procedure can be applied to the DFR
method by simply replacing the FR correction procedure using correction polynomials
with the Lagrange interpolation described above. A summary of the procedure is
described below for the two-dimension advection-diffusion equation.
6.2.1 Problem Specification
Consider the conservative form of the two-dimensional advection-diffusion equation
defined on a bounded domain and subject to appropriate initial condition and bound-
ary conditions,∂u
∂t+∇ · f = 0, (x, y) ∈ Ω, t > 0, (6.31)
where Ω is an arbitrary domain closed by an arbitrary boundary ∂Ω, x and y are
spatial coordinates, t is time, u = u(x, y, t) is a conserved scalar quantity and
f = (fx, fy) = (fx,adv(u)− fx,diff (u,∇u) , fy,adv(u)− fy,diff (u,∇u)) is the difference
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 53
between the advective flux and diffusive fluxes in the x and y directions. This can be
rewritten as a system of first-order equations,
∂u
∂t+∇ · f(u, q) = 0, (x, y) ∈ Ω, t > 0,
q −∇u = 0, (6.32)
where q = (qx, qy) is the solution gradient.
The domain is partitioned into Neles non-overlapping, conforming quadrilateral
elements,
Ω =
Neles⋃ele=1
Ωele. (6.33)
Each quadrilateral element in the physical domain (x, y) is mapped to a reference
element, ΩS = (ξ, η)|− 1 ≤ ξ, η < 1, in the transformed parent space (ξ, η) so that,(x
y
)= Γele(ξ, η) =
Nnpts∑npt=1
Mnpts(ξ, η)
(xnpt,ele
ynpt,ele
), (6.34)
where Mnpts(ξ, η) are the element shape functions and Nnpts is the number of points
used to define the physical space element.
Applying this transformation gives rise to a transformed system of equations of
the following form,
∂uδele
∂t+ ∇ · f δele = 0, (6.35)
qδele − ∇uδele = 0, (6.36)
where ∇ =(∂∂ξ, ∂∂η
), qδele =
(qδξele
, qδηele
), f δele =
(f δξele
, f δηele
)and
uδele = |Jele| uδele(Γele(ξ, η), t),
qδele = J ′ele qδele(Γele(ξ, η), t)
f δele = |Jele| J−1ele f
δele(Γele(ξ, η), t).
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 54
The Jacobian matrix of the element shape functions, Jele, its transpose, J ′ele, inverse,
J−1ele , and determinant, |Jele|, come directly from Eq. (6.34). In what follows, the
superscript δ to differentiate the numerical solution, solution gradient and fluxes
from their exact values will be dropped to improve readability.
6.2.2 Direct Flux Reconstruction
Consider rearranging Eq. (6.35) and producing the semi-discrete form of the two-
dimensional advection-diffusion equation as a system of equations,
∂uele
∂t= − 1
|Jele|∇δ · fele, (6.37)
qele = J−1′
ele ∇δuele, (6.38)
where ∇δ =(δδξ, δδη
)and J−1′
ele is the transpose of the inverse Jacobian matrix. The
DFR method for 2D quadrilateral elements is a direct extension of the 1D method
where numerical derivatives are computed along each set of collinear points. The first
step is to further discretize each quadrilateral element by Nspts2D = (P + 1)2 distinct
solution points generated through a tensor product of a set of 1D solution points.
Each solution point is defined by the Gauss-Legendre sets ξ1, ξ2, . . . , ξNspts1D and
η1, η2, . . . , ηNspts1D where the points in the former set are chosen as the most rapidly
changing index. The discontinuous solution in each element uele can be represented
by a product of piecewise interpolating polynomials of degree P ,
uele(ξ, η) =
Nspts2D∑spt=1
uspt,eleφspt(ξ, η), (6.39)
where φ1(ξ, η), φ2(ξ, η), . . . , φNspts2D(ξ, η) are the shape functions constructed from
1D Lagrange basis polynomials defined at the location of the sptth solution point.
This can be written in vector format as,
uele(ξ, η) = φ(ξ, η)′uele, (6.40)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 55
where uele = [u1,ele, u2,ele, . . . , uNspts2D,ele]′ and the vector φ(ξ, η)′ ∈ R(1×Nspts2D) is de-
fined by the Kronecker product between the vectors `(ξ)′ = [`1(ξ), `2(ξ), . . . , `Nspts1D(ξ)]
and `(η)′ = [`1(η), `2(η), . . . , `Nspts1D(η)],
φ(ξ, η)′ = `(η)′ ⊗ `(ξ)′. (6.41)
The next step is to extrapolate the discontinuous solution to Nspts1D = P + 1
distinct flux points on each face of the Nfaces2D = 4 faces of a quadrilateral element
for a total of Nfpts2D = Nfaces2D Nspts1D flux points per quadrilateral. Using Eq. (6.40),
the extrapolated values in each element and face are written as,
ufaceele = Eface uele, face ∈ L,R,B,T , (6.42)
where uLele,uRele,u
Bele,u
Tele ∈ R(Nspts1D×1) are the extrapolated discontinuous solution
vectors on the left, right, bottom and top boundaries of the eleth element, respec-
tively, as shown in Figure 6.1. EL,ER,EB,ET ∈ R(Nspts1D×Nspts2D) are polynomial
extrapolation operators such that
ELp,m = φm(−1, ηp), ER
p,m = φm(+1, ηp),
EBp,m = φm(ξp,−1), ET
p,m = φm(ξp,+1), (6.43)
where p = 1, 2, . . . , Nspts1D and m = 1, 2, . . . , Nspts2D.
The extrapolated solution can now be used to compute the common interface
solutions,
uI,faceele = uI
(ufaceN
eleN ,ufaceele
), face ∈ L,B,
uI,faceele = uI
(uface
ele ,ufaceNeleN
), face ∈ R,T , (6.44)
where “eleN” and “faceN” refers to the neighboring element and face, respectively,
and the neighboring vector ufaceNeleN is permuted appropriately to align with uface
ele . The
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 56
r
r
r
r b
−
B
T
RL
br rb
r
r
b
+ − +
−
+
+
−
Figure 6.1: A visual representation of a quadrilateral element in parent space for apolynomial order of P = 1. The solution points are marked by blue circles, the fluxpoints are marked by red squares and left, right, bottom and top faces are representedby L,R,B,T , respectively. Left and right states in an interface flux are representedby − and +.
LDG method [35] is used for the interface solution function,
uI(u−, u+) =1
2(u− + u+)− β · (u−n− + u+n+), (6.45)
where n− and n+ are unit normals, β = ±0.5n− recovers LDG [22] and β = 0
recovers BR1 [14]. The common interface solutions at boundaries are computed using
the same technique as in Eq. (6.15),
uI,∂Ωele = ub
(u∂Ω
ele
), (6.46)
where u∂Ωele = E∂Ω uele and ∂Ω represents the appropriate face for the boundary.
Following the 1D formulation in section 6.1.2 along each set of collinear points in
the quadrilateral, the continuous solution can now be constructed and differentiated
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 57
by using the common interface solution values at the element interfaces,
∂uCele
∂ξ(ξ, η) =
Nspts2D∑spt=1
uspt,ele∂ψspt
∂ξ(ξ, η) +
∑face∈L,R
Nspts1D∑fpt=1
uI,facefpt,ele
∂ψfacefpt
∂ξ(ξ, η),
∂uCele
∂η(ξ, η) =
Nspts2D∑spt=1
uspt,ele∂ψspt
∂η(ξ, η) +
∑face∈B,T
Nspts1D∑fpt=1
uI,facefpt,ele
∂ψfacefpt
∂η(ξ, η), (6.47)
where new sets of shape functions, ψ(ξ, η), have been constructed from sets of 1D
Lagrange interpolating polynomials of P + 2 defined at P + 3 collocation points. The
gradient can be written in vector format as
∂uCele
∂ξ(ξ, η) =
∂ψ
∂ξ(ξ, η)′ uele +
∑face∈L,R
∂ψface
∂ξ(ξ, η)′ uI,face
ele ,
∂uCele
∂η(ξ, η) =
∂ψ
∂η(ξ, η)′ uele +
∑face∈B,T
∂ψface
∂η(ξ, η)′ uI,face
ele . (6.48)
The polynomials ˜0(ξ), ˜
0(η), ˜P+2(ξ), ˜
P+2(η) are used along with the vectors `(ξ)′,
`(η)′, ˜(ξ)′ = [˜1(ξ), ˜2(ξ), . . . , ˜
Nspts1D(ξ)] and ˜(η)′ = [˜1(η), ˜
2(η), . . . , ˜Nspts1D
(η)], to
construct the shape functions,
∂ψ
∂ξ(ξ, η)′ = `(η)′ ⊗ ∂ ˜
∂ξ(ξ)′,
∂ψ
∂η(ξ, η)′ =
∂ ˜
∂η(η)′ ⊗ `(ξ)′, (6.49)
and
∂ψLm∂ξ
(ξ, η) = `m(η)∂ ˜
0
∂ξ(ξ),
∂ψRm∂ξ
(ξ, η) = `m(η)∂ ˜
P+2
∂ξ(ξ),
∂ψBm∂η
(ξ, η) =∂ ˜
0
∂η(η) `m(ξ),
∂ψTm∂η
(ξ, η) =∂ ˜
P+2
∂η(η) `m(ξ), (6.50)
for m = 1, 2, . . . , Nspts1D. The numerical solution gradient in reference space can then
be computed by evaluating the equation at each solution point. The final result is
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 58
shown in matrix-vector format,
δuele
δξ= DL
ξ uI,Lele +Dξ uele +DR
ξ uI,Rele ,
δuele
δη= DB
η uI,Bele +Dη uele +DT
η uI,Tele , (6.51)
where the polynomial differentiation operators Dξ, Dη ∈ R(Nspts2D×Nspts2D) and DLξ ,
DRξ , DB
η , DTη ∈ R(Nspts2D×Nspts1D) are defined as
Dξp,m =∂ψm∂ξ
(θp), Dηp,m =∂ψm∂η
(θp), (6.52)
for p,m = 1, 2, . . . , Nspts2D,
Dfaceξp,m =
∂ψfacem
∂ξ(θp), face ∈ L,R,
Dfaceηp,m =
∂ψfacem
∂η(θp), face ∈ B, T, (6.53)
for p = 1, 2, . . . , Nspts2D and m = 1, 2, . . . , Nspts1D and the set θ1, θ2, . . . , θNspts2D is
used to represent the set of solution points (ξ1, η1), (ξ2, η1), . . . , (ξNspts1D, ηNspts1D
).Eq. (6.51) can now be combined with Eq. (6.38) to obtain the numerical solution
gradient in matrix-vector format,
qxele= J−1′
(x,ξ)ele
δuele
δξ+ J−1′
(x,η)ele
δuele
δη,
qyele= J−1′
(y,η)ele
δuele
δξ+ J−1′
(y,η)ele
δuele
δη, (6.54)
where J−1′
(x,ξ)ele, J−1′
(x,η)ele, J−1′
(y,ξ)ele, J−1′
(y,η)ele∈ R(Nspts2D×Nspts2D) are the components of
J−1′
ele represented as diagonal matrices. This operation can be summarized as the
block matrix-vector multiplication,
qele = J−1′
ele ∇δuele. (6.55)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 59
The numerical gradient of the flux is evaluated by first extrapolating the discon-
tinuous numerical solution gradient to element interfaces,
qfacedele
= Eface qdele, face ∈ L,R,B,T , d ∈ x,y, (6.56)
where each numerical solution derivative, qele = (qxele, qyele
) is extrapolated indepen-
dently. The transformed common interface fluxes that are normal to the element
faces are then computed by using the extrapolated discontinuous solution and solu-
tion gradient on both sides of each interface as the left and right states in a common
interface function. The transformed common interface fluxes are written as
f I,facedele
= Afaceele f I
(ufaceN
eleN ,ufaceele , q
faceNeleN , qface
ele
), d, face ∈ ξ,L, η,B,
f I,facedele
= Afaceele f I
(uface
ele ,ufaceNeleN , qface
ele , qfaceNeleN
), d, face ∈ ξ,R, η,T ,
(6.57)
where ALele,A
Rele,A
Bele,A
Tele ∈ R(Nspts1D×Nspts1D) are diagonal matrices that transform
the common interface fluxes. The interface transformation matrices are defined as
Afacep,p,ele =
∣∣J facep,ele
∣∣ ∣∣∣(J facep,ele)
−1′nface∣∣∣ , face ∈ L,R,B, T, (6.58)
where JLp,ele, JRp,ele, J
Bp,ele, J
Tp,ele are the geometric element Jacobian matrices evaluated
at the pth flux point, nL, nR, nB, nT are the unit normals in parent space and p =
1, 2, . . . , Nspts1D. The Rusanov flux used for the common interface function from
Eq. (6.25) now becomes
f Iadv(u−, u+) =1
2(fnadv(u+) + fnadv(u−))− 1
2|λ(u−, u+)|(u+ − u−), (6.59)
where fnadv(u) = fadv(u) · n− is the advective flux normal to the face and
|λ(u−, u+)| = max
(∣∣∣∣∂fnadv
∂u(u+)
∣∣∣∣ , ∣∣∣∣∂fnadv
∂u(u−)
∣∣∣∣) , (6.60)
where∂fnadv
∂u(u) is the wavespeed of the normal advective flux or the derivative of the
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 60
normal advective flux with respect to the solution. For the interface diffusive flux
function, the LDG flux can be reduced to the following form
f Idiff(u−, u+, q−, q+) =1
2(fndiff(u−, q−) + fndiff(u+, q+))+
β · n− (fdiff(u−, q−) · n− + fdiff(u+, q+) · n+)+
τ (u− − u+), (6.61)
where fndiff(u, q) = fdiff(u, q) · n− is the diffusive flux normal to the face. Lastly, the
transformed common interface fluxes at boundaries are computed as,
f I,∂Ωele = A∂Ω
ele f b(u∂Ω
ele , q∂Ωele
), (6.62)
where q∂Ωele = E∂Ω qele for each dimension and ∂Ω represents the appropriate face for
the boundary.
Using each set of collinear points in the quadrilateral, the transformed continu-
ous fluxes are constructed in each element and differentiated to form the numerical
derivatives in matrix-vector format,
δfξele
δξ= DL
ξ fI,Lξele
+Dξ fξele+DR
ξ fI,Rξele
,
δfηele
δη= DB
η fI,Bηele
+Dη fηele+DT
η fI,Tηele
, (6.63)
where
fξele= J−1
(ξ,x)elefx (uele, qele) + J−1
(ξ,y)elefy (uele, qele) ,
fηele= J−1
(η,x)elefx (uele, qele) + J−1
(η,y)elefy (uele, qele) , (6.64)
and J−1(ξ,x)ele
, J−1(ξ,y)ele
, J−1(η,x)ele
, J−1(η,y)ele
∈ R(Nspts2D×Nspts2D) are the components of
|Jele| J−1ele represented as diagonal matrices. This can be expressed more compactly in
block matrix-vector form as
fele = J−1ele f (uele, qele) . (6.65)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 61
Eq. (6.37) can now be written in a matrix-vector format as
∂uele
∂t= −V −1
ele ∇δ · fele, (6.66)
where V −1ele ∈ R(Nspts2D×Nspts2D) is a diagonal matrix and is defined as,
V −1p,p,ele =
1
|Jp,ele|, p = 1, 2, . . . , Nspts2D. (6.67)
6.3 Three-Dimensional Formulation for Advection-
Diffusion
Much like the two-dimensional formulation, the three-dimensional formulation of the
DFR method on hexahedral elements for advection-diffusion is a direct extension of
the one-dimensional formulation. This section serves to summarize these steps for
clarity.
6.3.1 Problem Specification
Following directly from the definitions provided in the problem specification in section
6.2.1, the domain is extended to include the z-dimension (x, y, z) ∈ Ω so that u =
u(x, y, z, t), q = (qx, qy, qz), and f = (fx, fy, fz). Each hexahedral element in the
partitioned physical domain (x, y, z) is then mapped to a reference element in the
transformed parent space (ξ, η, ζ) so that,x
y
z
= Γele(ξ, η, ζ) =
Nnpts∑npt=1
Mnpts(ξ, η, ζ)
xnpt,ele
ynpt,ele
znpt,ele
, (6.68)
where Mnpts(ξ, η, ζ) are the element shape functions and Nnpts is the number of points
used to define the physical space element. The system of equations resulting from
applying this transformation to the three-dimensional advection-diffusion equation
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 62
becomes
∂uδele
∂t+ ∇ · f δele = 0, (6.69)
qδele − ∇uδele = 0, (6.70)
where ∇ =(∂∂ξ, ∂∂η, ∂∂ζ
), qele = (qξele
, qηele, qζele
), fele = (fξele, fηele
, fζele) and
uele = |Jele| uele(Γele(ξ, η, ζ), t),
qele = J ′ele qele(Γele(ξ, η, ζ), t)
fele = |Jele| J−1ele fele(Γele(ξ, η, ζ), t).
The Jacobian matrix of the element shape functions, Jele, its transpose, J ′ele, inverse,
J−1ele , and determinant, |Jele|, come directly from Eq. (6.68). In what follows, the
superscript δ to differentiate the numerical solution, solution gradient and fluxes
from their exact values will be dropped to improve readability.
6.3.2 Direct Flux Reconstruction
The semi-discrete form of the three-dimensional advection-diffusion equation is formed
by rearranging Eq. (6.69),
∂uele
∂t= − 1
|Jele|∇δ · fele, (6.71)
qele = J−1′
ele ∇δuele, (6.72)
where ∇δ =(δδξ, δδη, δδζ
). Much like the DFR method for quadrilateral elements, the
DFR method for hexahedral elements is a direct extension of the 1D method. Each
element is discretized into Nspts = (P + 1)3 distinct Gauss-Legendre solution points
defined by the sets ξ1, ξ2, . . . , ξNspts1D, η1, η2, . . . , ηNspts1D
and ζ1, ζ2, . . . , ζNspts1D
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 63
where ξi is chosen as the most rapidly changing index followed by ηj. The discontin-
uous solution is defined in vector format as
uele(ξ, η, ζ) = φ(ξ, η, ζ)′uele, (6.73)
where the vector φ(ξ, η, ζ)′ ∈ R(1×Nspts) is defined by the Kronecker product between
the vectors `(ξ)′ = [`1(ξ), `2(ξ), . . . , `Nspts1D(ξ)], `(η)′ = [`1(η), `2(η), . . . , `Nspts1D
(η)]
and `(ζ)′ = [`1(ζ), `2(ζ), . . . , `Nspts1D(ζ)],
φ(ξ, η, ζ)′ = `(ζ)′ ⊗ `(η)′ ⊗ `(ξ)′. (6.74)
The discontinuous solution is extrapolated to each face of the Nfaces = 6 faces of
the hexahedral element creating Nspts2D = (P + 1)2 flux points per face or Nfpts =
NfacesNspts2D distinct flux points per hexahedral element. Using Eq. (6.73), the ex-
trapolated values in each element are written as,
ufaceele = Eface uele, face ∈ L,R,F ,Bk,B,T , (6.75)
where the extrapolated discontinuous solution vectors are of size (Nspts2D × 1), the
polynomial extrapolation operators are of size (Nspts2D×Nspts) and L,R,F ,Bk,B,T
correspond to the left, right, front, back, bottom and top boundaries of the element,
respectively. The polynomial extrapolation operators are defined as,
ELp,m = φm(−1, ηi, ζj), ER
p,m = φm(+1, ηi, ζj),
EFp,m = φm(ξi,−1, ζj), EBk
p,m = φm(ξi,+1, ζj),
EBp,m = φm(ξi, ηj,−1), ET
p,m = φm(ξi, ηj,+1), (6.76)
where i, j = 1, 2, . . . , Nspts1D, p = j Nspts1D + i and m = 1, 2, . . . , Nspts.
Following the same procedure described in section 6.2.2, the common interface
solution values, including those on boundaries, can be computed,
uI,faceele = uI
(ufaceN
eleN ,ufaceele
), face ∈ L,F ,B,
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 64
uI,faceele = uI
(uface
ele ,ufaceNeleN
), face ∈ R,Bk,T ,
uI,∂Ωele = ub
(u∂Ω
ele
). (6.77)
The gradient of the continuous solution can be constructed along each set of
collinear points in the hexahedral. This can be written in vector format as
∂uCele
∂ξ(ξ, η, ζ) =
∂ψ
∂ξ(ξ, η, ζ)′ uele +
∑face∈L,R
∂ψface
∂ξ(ξ, η, ζ)′ uI,face
ele ,
∂uCele
∂η(ξ, η, ζ) =
∂ψ
∂η(ξ, η, ζ)′ uele +
∑face∈F ,Bk
∂ψface
∂η(ξ, η, ζ)′ uI,face
ele ,
∂uCele
∂ζ(ξ, η, ζ) =
∂ψ
∂ζ(ξ, η, ζ)′ uele +
∑face∈B,T
∂ψface
∂ζ(ξ, η, ζ)′ uI,face
ele . (6.78)
The polynomials ˜0(ξ), ˜
0(η), ˜0(ζ), ˜
P+2(ξ), ˜P+2(η), ˜
P+2(ζ) are used along with the
vectors `(ξ)′, `(η)′, `(ζ)′, ˜(ξ)′, ˜(η)′ and ˜(ζ)′, to construct the shape functions,
∂ψ
∂ξ(ξ, η, ζ)′ = `(ζ)′ ⊗ `(η)′ ⊗ ∂ ˜
∂ξ(ξ)′,
∂ψ
∂η(ξ, η, ζ)′ = `(ζ)′ ⊗ ∂ ˜
∂η(η)′ ⊗ `(ξ)′,
∂ψ
∂ζ(ξ, η, ζ)′ =
∂ ˜
∂ζ(ζ)′ ⊗ `(η)′ ⊗ `(ξ)′, (6.79)
and
∂ψLm∂ξ
(ξ, η, ζ) = `j(ζ) `i(η)∂ ˜
0
∂ξ(ξ),
∂ψRm∂ξ
(ξ, η, ζ) = `j(ζ) `i(η)∂ ˜
P+2
∂ξ(ξ),
∂ψFm∂η
(ξ, η, ζ) = `j(ζ)∂ ˜
0
∂η(η) `i(ξ),
∂ψBkm∂η
(ξ, η, ζ) = `j(ζ)∂ ˜
P+2
∂η(η), `i(ξ),
∂ψBm∂ζ
(ξ, η, ζ) =∂ ˜
0
∂ζ(ζ) `j(η) `i(ξ),
∂ψTm∂ζ
(ξ, η, ζ) =∂ ˜
P+2
∂ζ(ζ) `j(η) `i(ξ), (6.80)
for i, j = 1, 2, . . . , Nspts1D and m = j Nspts1D + i. Evaluating the equation at each
solution point and rearranging the results in a matrix-vector format produces the
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 65
numerical solution gradient in reference space,
δuele
δξ= DL
ξ uI,Lele +Dξ uele +DR
ξ uI,Rele ,
δuele
δη= DF
η uI,Fele +Dη uele +DBk
η uI,Bkele ,
δuele
δζ= DB
ζ uI,Bele +Dζ uele +DT
ζ uI,Tele , (6.81)
where the polynomial differentiation operators on solution and flux points are of size
(Nspts ×Nspts) and (Nspts ×Nspts2D), respectively. They are defined as
Dξp,m =∂ψm∂ξ
(θp), Dηp,m =∂ψm∂η
(θp), Dζp,m =∂ψm∂ζ
(θp), (6.82)
for p,m = 1, 2, . . . , Nspts,
Dfaceξp,m =
∂ψfacem
∂ξ(θp), face ∈ L,R,
Dfaceηp,m =
∂ψfacem
∂η(θp), face ∈ F,Bk,
Dfaceζp,m =
∂ψfacem
∂ζ(θp), face ∈ B, T, (6.83)
for p = 1, 2, . . . , Nspts and m = 1, 2, . . . , Nspts2D and the set θ1, θ2, . . . , θNspts is used
to represent the set of solution points
(ξ1, η1, ζ1), (ξ2, η1, ζ1), . . . , (ξNspts1D, ηNspts1D
, ζNspts1D). Eq. (6.81) can now be com-
bined with Eq. (6.72) to obtain the numerical solution gradient in block matrix-vector
format,
qele = J−1′
ele ∇δuele. (6.84)
Next, the numerical solution gradient can be extrapolated using the same tech-
nique as before,
qfacedele
= Eface qdele, face ∈ L,R,F ,Bk,B,T , d ∈ x,y, z, (6.85)
where each numerical solution derivative, qele = (qxele, qyele
, qzele) is extrapolated using
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 66
the same polynomial extrapolation operators. The transformed common interface
fluxes and the boundary fluxes are then computed as
f I,facedele
= Afaceele f I
(ufaceN
eleN ,ufaceele , q
faceNeleN , qface
ele
), d, face ∈ ξ,L, η,F , ζ,B,
f I,facedele
= Afaceele f I
(uface
ele ,ufaceNeleN , qface
ele , qfaceNeleN
), d, face ∈ ξ,R, η,Bk, ζ,T ,
f I,∂Ωele = A∂Ω
ele f b(u∂Ω
ele , q∂Ωele
), (6.86)
using the same techniques described in section 6.2.2. The numerical derivatives of
the transformed fluxes can then be expressed in matrix-vector format as
δfξele
δξ= DL
ξ fI,Lξele
+Dξ fξele+DR
ξ fI,Rξele
,
δfηele
δη= DF
η fI,Fηele
+Dη fηele+DBk
η f I,Bkηele,
δfζele
δζ= DB
ζ fI,Bζele
+Dζ fζele+DT
ζ fI,Tζele
, (6.87)
where
fele = J−1ele f (uele, qele) . (6.88)
Eq. (6.71) can now be written in a matrix-vector format as
∂uele
∂t= −V −1
ele ∇δ · fele. (6.89)
6.4 Extension to Fluid Flow
This section describes how the DFR method is applied to the Euler and Navier-Stokes
equations.
6.4.1 The Euler Equations
In conservative form, the unsteady Euler equations are written as
∂U
∂t+∇ · F = 0. (6.90)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 67
where F = (Fx, Fy),
U =
ρ
ρu
ρv
e
, Fx =
ρu
ρu2 + p
ρuv
(e+ p)u
, Fy =
ρv
ρvu
ρv2 + p
(e+ p)v
, (6.91)
in the two-dimensional case and F = (Fx, Fy, Fz),
U =
ρ
ρu
ρv
ρw
e
, Fx =
ρu
ρu2 + p
ρuv
ρuw
(e+ p)u
, Fy =
ρv
ρvu
ρv2 + p
ρvw
(e+ p)v
, Fz =
ρw
ρwu
ρwv
ρw2 + p
(e+ p)w
,
(6.92)
in the three-dimensional case. The equations are defined by the primitive variables
where ρ is density, u, v, w are the velocity components in the x, y, z directions,
respectively, and e is total energy per unit volume. The pressure is determined from
the equation of state,
p = (γ − 1)
(e− 1
2ρ |V |2
), (6.93)
where γ is the ratio of specific heats and |V | is the magnitude of velocity.
6.4.2 The Navier-Stokes Equations
In conservative form, the unsteady Navier-Stokes equations are written as
∂U
∂t+∇ · F = 0 (6.94)
where F = (Fx, Fy) in the two dimensions, F = (Fx, Fy, Fz) in three dimensions and
Fd = Fd,inv − Fd,visc, d ∈ x, y, z. (6.95)
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 68
The solution vector and inviscid flux are defined by the Euler equations in equa-
tions (6.91) and (6.92) for two dimensions and three dimensions, respectively. The
viscous flux is written in two dimensions as
Fx,visc =
0
σxx
σxy
uσxx + vσyx − qx
, Fy,visc =
0
σyx
σyy
uσxy + vσyy − qy
, (6.96)
while the three dimensional form is written as
Fx,visc =
0
σxx
σxy
σxz
uσxx + vσyx + wσzx − qx
, Fy,visc =
0
σyx
σyy
σyz
uσxy + vσyy + wσzy − qy
,
Fz,visc =
0
σzx
σzy
σzz
uσxz + vσyz + wσzz − qz
. (6.97)
For a Newtonian fluid, the viscous stresses are defined in Einstein notation as
σij = µ
(∂ui∂xj
+∂uj∂xi
)− 2
3µδij
∂uk∂xk
, (6.98)
and the heat fluxes are
qi = −k ∂T∂xi
, (6.99)
where k = Cpµ/Pr, T = p/(ρR), Pr is the Prandtl number, Cp is the specific heat at
constant pressure, R is the gas constant and µ is the dynamic viscosity. In this work,
the dynamic viscosity is treated as a constant.
CHAPTER 6. THE DIRECT FLUX RECONSTRUCTION METHOD 69
6.4.3 Direct Flux Reconstruction
The DFR method can be directly applied to the two-dimensional and three-dimensional,
Euler and Navier-Stokes equations using the same techniques described in section 6.2
and section 6.3, respectively. In the case of the Euler equations, a second order term
doesn’t exist and the construction of a numerical solution gradient and diffusive flux
can be avoided.
The transformed semi-discrete equation can be written as,
∂Uele
∂t= −V −1
ele ∇δ · Fele, (6.100)
where the vector Uele = [u′1,ele,u′2,ele, . . . ,u
′Nvars,ele]
′ and Nvars is the number of conser-
vative variables in the equation. All DFR operators become block diagonal matrices
and act on each conservation equation separately. For example, the extrapolation
and differentiation operations are carried out on each variable. Also, the wavespeed
used for the Rusanov flux changes to,
|λ(U−, U+)| = max(|V n(U+)|+ c(U+), |V n(U−)|+ c(U−)
), (6.101)
where V n(U) is the velocity normal to the face and c(U) is the speed of sound. The
boundary functions used at the boundary faces are dependent on the problem and a
list of boundary conditions used are shown in appendix A.
Chapter 7
Time Stepping Schemes
Following directly from Eq. (6.100), consider the construction of a global, transformed
semi-discrete equation,∂U
∂t= −V −1 ∇δ · F , (7.1)
where U and ∇δ · F are vectors containing the solution and the divergence of the
flux in all elements. The fully discrete equation is obtained by substituting the exact
time derivative term with the numerical time derivative,
δU
δt= R (U) , (7.2)
where R (U) is the residual,
R (U ) = −V −1 ∇δ · F . (7.3)
A numerical time stepping method can be applied to Eq. (7.2) using the methods
described in this chapter.
The chapter is split into three sections. The first section discusses the explicit
and implicit Runge-Kutta (RK) methods used in this part of the dissertation. The
second section gives a description of pseudo time stepping for steady-state problems
and block iterative methods for linear solvers. The last section gives an overview of
dual time stepping for diagonally implicit Runge-Kutta (DIRK) methods.
70
CHAPTER 7. TIME STEPPING SCHEMES 71
7.1 Time Accurate Schemes
Consider the fully discrete equation described in Eq. (7.2). The unsteady solution
of this equation requires a time accurate method which can preserve the order of
accuracy from the spatial discretization. In order to accomplish this goal, Runge-
Kutta (RK) methods are often used to achieve higher temporal accuracy.
RK methods are often described in the following form,
Un+1 = Un + ∆tnNstages∑s=1
bsRs, (7.4)
where Un is solution at the current time step n, Un+1 is the solution at the next
time step, ∆tn is the numerical time step, bs are the weights of the residual, Rs are
the residuals evaluated at each stage s and Nstages is the number of stages for the
method.
7.1.1 Explicit Runge-Kutta
In this work, an explicit, four stage, fourth order RK scheme (RK44) is used to update
the solution,
R1 = R (Un) ,
R2 = R (Un + a2,1 ∆tn R1) ,
R3 = R (Un + a3,2 ∆tn R2) ,
R4 = R (Un + a4,3 ∆tn R3) ,
Un+1 = Un + ∆tn(b1 R1 + b2 R2 + b3 R3 + b4 R4), (7.5)
CHAPTER 7. TIME STEPPING SCHEMES 72
where the coefficients a and b are defined by the Butcher tableau,
0
c2 a2,1
c3 0 a3,2
c4 0 0 a4,3
b1 b2 b3 b4
=
0
1/2 1/2
1/2 0 1/2
1 0 0 1
1/6 1/3 1/3 1/6
.
A low-storage, explicit, five stage, fourth order RK scheme (RK45) from [50]
referred to as RK4(3)5[2R+] is also used to update the solution. This scheme is
particularly useful because it only requires the storage of the residual from two stages.
7.1.2 Diagonally Implicit Runge-Kutta
Consider the three stage, fourth order diagonally implicit RK scheme (DIRK34)
from [6],
R1 = R (Un + ∆tn a1,1 R1) ,
R2 = R (Un + ∆tn (a2,1 R1 + a2,2 R2)) ,
R3 = R (Un + ∆tn (a3,1R1 + a3,2R2 + a3,3R3)) ,
Un+1 = Un + ∆tn (b1 R1 + b2 R2 + b3 R3) , (7.6)
where the coefficients a and b are defined by the Butcher tableau,
c1 a1,1
c2 a2,1 a2,2
c3 a3,1 a3,2 a3,3
b1 b2 b3
=
(1 + α)/2 (1 + α)/2
1/2 −α/2 (1 + α)/2
(1− α)/2 1 + α −(1 + 2α) (1 + α)/2
1/ (6α2) 1− 1/ (3α2) 1/ (6α2)
and α = 2 cos(π/18) /√
3. This type of scheme requires the solution of an implicit
function at each stage. It’s important to note that each stage of a DIRK scheme is
completely independent from any future stages. This is in sharp contrast to a fully
CHAPTER 7. TIME STEPPING SCHEMES 73
implicit RK scheme which couples all stages of an RK scheme. Since all the diagonal
terms are equal, this scheme could also be referred to as a singly diagonally implicit
scheme (SDIRK).
In this work, explicit, singly diagonally implicit RK schemes (ESDIRK) are used
to advance the solution in time. In this case, the first stage of the scheme is explicit
while the remaining stages are implicit. The four stage, third order (ESDIRK3) and
six stage, fourth order (ESDIRK4) schemes referred to as ARK3(2)4L[2]SA-ESDIRK
and ARK4(3)6L[2]SA-ESDIRK, respectively, are utilized from [18].
7.1.3 Step size control
The RK45 and ESDIRK4 schemes described above both use the PI step size controller
described in [34, 98] to estimate the maximum stable time step. Consider applying a
time stepping method one order lower than the given scheme in Eq. (7.4),
Un+1 = Un + ∆tnNstages∑s=1
bsRs. (7.7)
An error approximation can be made by taking the difference between Eq. (7.4) and
Eq. (7.7),
en(tn + ∆tn) ≈ ∆tnNstages∑s=1
(bs − bs
)Rs. (7.8)
Normalizing this equation and setting a tolerance produces the relative error,
σn =|en|
Tola + Tolr max (|Un|, |Un+1|), (7.9)
where Tola and Tolr are the absolute and relative error tolerances, respectively. The
maximum relative error across all values is found and defined as σn and the scaling
factor can then be computed as
f = (σn)−α/Q(σn−1
)−β/Q, (7.10)
CHAPTER 7. TIME STEPPING SCHEMES 74
where α = 0.7, β = 0.4 and Q is the order of the scheme. A minimum, maximum
and safety factor is also introduced such that,
fmin ≤ fsafef ≤ fmax, (7.11)
where the factors are set based on the scheme as shown in Table 7.1. This forces a
Scheme fmin fsafe fmax
Explicit 0.3 0.8 2.5Implicit 0.5 0.9 2.5
Table 7.1: Scaling factors used for the explicit and implicit step controller.
smoother transition when changing the time step. The estimate for the maximum
stable time step can then be computed as
∆tn+1 = ∆tn fsafef. (7.12)
7.2 Pseudo Time Stepping
Starting from Eq. (7.2), consider the following steady-state problem,
R(U) = 0. (7.13)
The solution to this equation can be found by utilizing pseudo time stepping or
pseudo-transient continuation [49]. This particular method is useful for solving non-
linear systems of partial differential equations where a good initial guess for a proper
root finding technique is not immediately available.
Applying a pseudo time stepping technique to Eq. (7.13) forms the following fully
discrete equation,δU
δτ= R(U), (7.14)
where an explicit or implicit method can be used in place of the numerical pseudo
time derivative of the solution to march the solution to steady-state. In this case,
CHAPTER 7. TIME STEPPING SCHEMES 75
stability is considerably more important than accuracy when choosing a time stepping
scheme.
Given an initial solution of Um where m is the current pseudo time step iteration,
backward Euler can be applied to Eq. (7.14) to find the solution at the next pseudo
time step,
∆Um = ∆Tm R(Um+1
), (7.15)
where ∆Um = Um+1−Um and ∆Tm is a diagonal matrix containing the time step,
∆τm, along the diagonal. When an element local time step estimate is used based
on the local CFL number, ∆τmele, is used along the diagonal. Following a linearized
backward Euler technique, a Taylor series expansion of R (Um+1) is used to linearize
the equation,
R(Um+1
)≈ Rm +
∂Rm
∂Um∆Um, (7.16)
whereRm = R (Um). Rearranging Eq. (7.15) by using the approximation in Eq. (7.16)
gives the following global linear system,((∆Tm)−1 − ∂Rm
∂Um
)∆Um = Rm. (7.17)
It’s important to note that Eq. (7.17) approaches Newton’s method as ∆τmele →∞,
∂Rm
∂Um∆Um = −Rm. (7.18)
This allows the method to efficiently transition into Newton’s method when the so-
lution is sufficiently close to the final result.
7.2.1 The Global Linear System
The global linear system in Eq. (7.17) can be written as
A ∆Um = Rm, (7.19)
CHAPTER 7. TIME STEPPING SCHEMES 76
where
A = (∆Tm)−1 − ∂Rm
∂Um. (7.20)
The global matrix, A, is a large, sparse, block matrix where each block contains an
element residual Jacobian. The diagonal blocks of the matrix are defined as
Aele,ele = (∆Tmele)−1 − ∂Rm
ele
∂Umele
, (7.21)
while the off-diagonal blocks are written as
Aele,eleN = − ∂Rmele
∂UmeleN
, (7.22)
where eleN refers to neighboring elements within the stencil of the scheme. For
example, one-dimensional advection would contain two neighboring elements while
one-dimensional diffusion could contain up to four.
7.2.2 Block Iterative Methods
The size of the global matrix can force linear solvers to be prohibitively expensive
for high-order methods. This size can be mitigated through the use of block iterative
methods. A block iterative method can be reformulated to avoid the construction of
the off-diagonal blocks defined in Eq. (7.22). This eliminates the need to store and
compute off-diagonal blocks, helps maintain element locality within a linear solver
and reduces the linear system to a batch of smaller linear systems.
Starting from a traditional splitting method [32], the matrix can be split into two
parts, A = M −N so that Eq. (7.19) becomes,
M ∆Uk+1 = N ∆Uk +Rm, (7.23)
where ∆Uk+1 → ∆Um when k →∞ if the iterative method is convergent.
In order to avoid constructing N , Eq. (7.23) is rearranged. Subtracting M ∆Uk
CHAPTER 7. TIME STEPPING SCHEMES 77
from both sides of Eq. (7.23) produces
M ∆2Uk = Rm −A ∆Uk, (7.24)
where
∆2Uk = ∆Uk+1 −∆Uk =(Uk+1 −Um
)−(Uk −Um
)= Uk+1 −Uk. (7.25)
The linear approximation in Eq. (7.16) is used to reduce Eq. (7.24) even further. The
approximation is written as
R(Uk) ≈ R(Um) +∂Rm
∂Um∆Uk, (7.26)
and can be rewritten as
Rm = Rk − ∂Rm
∂Um∆Uk. (7.27)
Applying this approximation and Eq. (7.20) directly to Eq. (7.24) produces the fol-
lowing,
M ∆2Uk = Rk − (∆Tm)−1 ∆Uk. (7.28)
Note that Uk+1 → Um+1 as k →∞ but since time accuracy isn’t particularly impor-
tant when advancing a pseudo time step, the iterative method need not fully converge
and the total amount of iterations in k is usually small. This particular formulation
has the advantage of avoiding the construction of N but has the disadvantage of
having to recompute the residual R(Uk) at every k iteration. It’s also important to
note that the last term in Eq. (7.28), (∆Tm)−1 ∆Uk, can sometimes be removed for
faster convergence [83].
In this work, block Jacobi and multicolored Gauss-Seidel (MCGS) are used to
solve the linear system in Eq. (7.28). Each method has their own advantages and
disadvantages. Block Jacobi is particular efficient because all linear systems can be
solved simultaneously as opposed to the color sequence in MCGS but MCGS tends to
have better convergence properties compared to block Jacobi. Both methods require
nonsingular diagonal blocks. This is provided by adjusting the pseudo time step to
CHAPTER 7. TIME STEPPING SCHEMES 78
ensure strictly diagonal dominant matrices.
Block Jacobi
The simplest block iterative method is the block Jacobi method which splits the global
matrix into a block diagonal matrix and a matrix containing the off-diagonal terms.
In this case, the matrices M and N take the form,
Mele,ele = (∆Tmele)−1 − ∂Rm
ele
∂Umele
,
Nele,eleN =∂Rm
ele
∂UmeleN
, (7.29)
where the remaining blocks in both matrices are empty. By placing the off-diagonal
terms in the matrix N , a much smaller system can be solved in each element.
Eq. (7.28) is rewritten as a batch of element local systems,((∆Tm
ele)−1 − ∂Rm
ele
∂Umele
)∆2Uk
ele = Rkele − (∆Tm
ele)−1 ∆Uk
ele, (7.30)
where all linear systems can be solved simultaneously.
Multicolored Gauss-Seidel
The multicolored block Gauss-Seidel (MCGS) method is a block iterative method
which attempts to improve convergence by coloring sets of elements and solving the
resulting system for each color sequentially in order to update the residual of the next
color.
The splitting method used on the global matrix is highly dependent on the coloring
algorithm and the spatial discretization. In this work, groups of elements are chosen
such that no two adjacent elements have the same color. For example, consider the
application of two colors, red and black, on the Cartesian mesh shown in Figure 7.1.
The corresponding global matrix can be represented as
CHAPTER 7. TIME STEPPING SCHEMES 79
b
b
b
b b
(a) Stencil for Advection/Euler
b
b
b
b b
b
bb
b
b
b
bb
(b) Stencil for Diffusion/Navier-Stokes
Figure 7.1: A Cartesian mesh with a red-black element color mapping. The DFRelement stencil (white) relies on information from neighboring elements from theopposite color.
A =
[A1,1 A1,2
A2,1 A2,2
]
where 1 and 2 correspond to the colors red and black, respectively.
When solving advection or Euler equations, the neighboring elements of any par-
ticular element belong to the opposite color (see Figure 7.1a). This ensures that neigh-
boring element Jacobians are located inA1,2,A2,1 and that the matricesA1,1 = D1,1,
A2,2 = D2,2 where D1,1, D2,2 are block diagonal matrices. The splitting method
then follows a traditional block Gauss-Seidel method where M is a lower block tri-
angular matrix and N is a strictly upper block triangular matrix,
M =
[D1,1 0
A2,1 D2,2
], N =
[0 A1,2
0 0
].
For diffusion or the Navier-Stokes equations, the stencil is much larger and some
of the neighboring elements may belong to the same color (see Figure 7.1b). In this
CHAPTER 7. TIME STEPPING SCHEMES 80
case, the matrices A1,1, A2,2 have neighboring element Jacobians so that
A1,1 = D1,1 +C1,1,
A2,2 = D2,2 +C2,2, (7.31)
where D1,1, D2,2 contain the block diagonal components and C1,1, C2,2 contain the
off-diagonal blocks. The splitting method is then,
M =
[D1,1 0
A2,1 D2,2
], N =
[C1,1 A1,2
0 C2,2
].
By ensuring that the diagonal matrices of M are block diagonal, all element local
systems within a given color can be solved simultaneously. Combining the M matrix
above with Eq. (7.28) produces the following result,
D1,1 ∆2Uk1 = Rk
1 − (∆Tm1 )−1 ∆Uk
1 ,
A2,1 ∆2Uk1 +D2,2 ∆2Uk
2 = Rk2 − (∆Tm
2 )−1 ∆Uk2 , (7.32)
where
D1,1ele,ele=(∆Tm
1ele
)−1 −∂Rm
1ele
∂Um1ele
,
D2,2ele,ele=(∆Tm
2ele
)−1 −∂Rm
2ele
∂Um2ele
,
A2,1ele,eleN= −
∂Rm2ele
∂Um1eleN
. (7.33)
A linear approximation is used in each element to further reduce Eq. (7.32). The
approximation is written as
R(Uk+1
1eleN,Uk
2ele,Uk
2eleN
)≈ R
(Uk
1eleN,Uk
2ele,Uk
2eleN
)+∑eleN
∂Rm2ele
∂Um1eleN
(Uk+1
1eleN−Uk
1eleN
),
(7.34)
where, in this case, it is assumed that element neighbors may exist within the same
CHAPTER 7. TIME STEPPING SCHEMES 81
color group. This approximation can be rewritten as
R∗2ele
= Rk2ele
+∑eleN
∂Rm2ele
∂Um1eleN
∆2Uk1eleN
, (7.35)
where the ∗ in R∗2ele
defines a computed residual given the most up-to-date solution
values in all elements. Applying this approximation for all elements within the second
color directly to Eq. (7.32) produces the following,
D1,1 ∆2Uk1 = Rk
1 − (∆Tm1 )−1 ∆Uk
1 ,
D2,2 ∆2Uk2 = R∗
2 − (∆Tm2 )−1 ∆Uk
2 . (7.36)
Eq. (7.36) can be generalized to multiple colors and is rewritten as a batch of element
local systems for each color,((∆Tm
cele
)−1 −∂Rm
cele
∂Umcele
)∆2Uk
cele= R∗
cele−(∆Tm
cele
)−1∆Uk
cele, (7.37)
where all linear systems within a given color, c, can be solved simultaneously.
7.3 Dual Time Stepping
Dual time stepping [44] is the process of converting each implicit time step into a
modified steady state problem in order to apply a pseudo time stepping technique.
It’s usually used in conjunction with a second order backward difference formula
(BDF2) but it can be applied to implicit Runge-Kutta methods as well [45].
7.3.1 Application to Diagonally Implicit Runge-Kutta Schemes
Consider rewriting Eq. (7.6) into a general form which solves for the solution at each
stage,
Us = Un + ∆tns∑j=1
as,jRj ,
CHAPTER 7. TIME STEPPING SCHEMES 82
Un+1 = Un + ∆tnNstages∑s=1
bsRs, (7.38)
where s can be any stage within the DIRK scheme and each stage is solved sequen-
tially. A modified residual can be formed for each stage so that
Rs(Us) = 0, (7.39)
where
Rs(Us) = − (∆T n)−1 (Us −Un) +s∑j=1
as,jRj , (7.40)
and ∆T n is a diagonal matrix containing the time step, ∆tn, along the diagonal.
Following directly from the pseudo time stepping technique described in section 7.2,
a global linear system can be constructed for each stage,((∆Tm)−1 − ∂Rm
s
∂Ums
)∆Um
s = Rms . (7.41)
where∂Rm
s
∂Ums
= − (∆T n)−1 + as,s∂Rm
s
∂Ums
. (7.42)
This global linear system can then be solved using the methods described in sections
7.2.1 and 7.2.2. It’s important to note that the stage residual Jacobian, ∂Rms∂Ums
, can
often be replaced by the residual Jacobian of the first stage without a significant
loss in convergence. This means that the element local Jacobian only needs to be
computed once per time step.
Chapter 8
The Element Local Jacobian
Matrix
The block iterative methods described in section 7.2.2 require the construction of
element local Jacobians,∂Rmele
∂Umele. What follows is a derivation of the element local
Jacobian matrix for the DFR method and a reformulation of these matrices into
Kronecker products.
This chapter is split into three sections. Section 8.1 describes the element local
Jacobian derivation for one-dimensional DFR and the one-dimensional Kronecker
operators needed for the multidimensional Kronecker product formulation. Section
8.2 extends the derivation and Kronecker product formulation for two- and three-
dimensional unstructured quadrilateral and hexahedral elements. We show that the
element local Jacobian can be split into two terms which are constructed from two or
three distinct sparsity patterns based on Kronecker products as shown in Figures 8.1–
8.7. We also show that the time complexity of computing an element local Jacobian
can be reduced from O(P 3d−1
)to O
(P d+1
)for the advection or Euler equations and
O(P 3d)
to O(P d+2
)for the advection-diffusion or Navier-Stokes equations. Section
8.3 describes the changes needed to form element local Jacobian matrices for the Euler
and Navier-Stokes equations including the wavespeed Jacobians and modifications to
elements on boundaries.
83
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 84
8.1 One-Dimensional Formulation for Advection-
Diffusion
Following from Eq. (6.30), the one-dimensional DFR formulation of the semi-discrete
advection-diffusion equation in each element can be written as
rele = r(uele-1,uele,uele+1) = − 1
|Jele|δfele
δξ, (8.1)
where rele = r(uele-1,uele,uele+1) is the residual. This equation can be differentiated
with respect to the discontinuous solution vector, uele, to produce the element local
Jacobian matrix,∂rele
∂uele
= − 1
|Jele|∂
∂uele
(δfele
δξ
), (8.2)
where ∂rele
∂ueleis a matrix of size (Nspts1D × Nspts1D). The Jacobian of the numerical
derivative of the flux is split into two distinct parts by applying a chain rule,
∂
∂uele
(δfele
δξ
)=
δ
δξ
(∂fele
∂uele
)+
δ
δξ
(∂fele
∂qele
∂qele
∂uele
), (8.3)
where the numerical derivative of the flux with respect to the solution is defined as
δ
δξ
(∂fele
∂uele
)= DL
ξ
∂f I,Lele
∂uLele
∗
EL +Dξ∂fele
∂uele
+DRξ
∂f I,Rele
∂uRele
∗
ER, (8.4)
and the numerical derivative of the flux with respect to the solution derivative is
defined as
δ
δξ
(∂fele
∂qele
∂qele
∂uele
)=
(DLξ
∂f I,Lele
∂qLele
EL +Dξ∂fele
∂qele
+DRξ
∂f I,Rele
∂qRele
ER
)(− 1
|Jele|
) (DLξ
∂uI,Lele
∂uLele
EL +Dξ +DRξ
∂uI,Rele
∂uRele
ER
). (8.5)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 85
The modified common interface flux Jacobians also contain contributions from neigh-
boring solution derivatives and are defined as
∂f I,Lele
∂uLele
∗
=∂f I,Lele
∂uLele
+∂f I,Lele
∂qRele-1
∂qRele-1
∂uI,Rele-1
∂uI,Rele-1
∂uLele
,
∂f I,Rele
∂uRele
∗
=∂f I,Rele
∂uRele
+∂f I,Rele
∂qLele+1
∂qLele+1
∂uI,Lele+1
∂uI,Lele+1
∂uRele
, (8.6)
Table 8.1 defines the Jacobians which are diagonal matrices primarily constructed
through function evaluations. These are described in more detail in section 8.1.1.
Name Variable Size
Discontinuous Flux ∂fele
∂uele, ∂fele
∂qele(Nspts1D ×Nspts1D)
Common Flux∂fI,Lele
∂uLele,∂fI,Rele
∂uRele,∂fI,Lele
∂qLele,∂fI,Rele
∂qRele,∂fI,Lele
∂qRele-1,∂fI,Rele
∂qLele+1(1× 1)
Common Solution∂uI,Lele
∂uLele,∂uI,Rele
∂uRele,∂uI,Rele-1
∂uLele,∂uI,Lele+1
∂uRele(1× 1)
Modified Common Flux∂fI,Lele
∂uLele
∗,∂fI,Rele
∂uRele
∗(1× 1)
Neighbor Contribution∂qRele-1
∂uI,Rele-1
,∂qLele+1
∂uI,Lele+1
(1× 1)
Table 8.1: Diagonal Jacobian matrices used in the construction of the one-dimensionalelement local Jacobian matrix for advection-diffusion
8.1.1 Derivation
The derivation of the one-dimensional element local Jacobian begins by noting that
there is no dependence on the discontinuous solution when applying the numerical
derivative, δδξ
, to the continuous flux in Eq. (8.2). This means that the Jacobian with
respect to the discontinuous solution, ∂∂uele
, can be applied directly to the flux by
modifying Eq. (6.29),
∂
∂uele
(δfele
δξ
)= DL
ξ
∂
∂uele
(f I,Lele
)+Dξ
∂
∂uele
(fele) +DRξ
∂
∂uele
(f I,Rele
). (8.7)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 86
The derivation can now be split into sections where the Jacobians in Table 8.1 are
constructed.
Discontinuous Flux Jacobian
The Jacobian of the discontinuous flux can be computed by applying the chain rule
to the original flux function,
∂
∂uele
(fele) =∂
∂uele
(f (uele, qele)) ,
=∂fele
∂uele
+∂fele
∂qele
∂
∂uele
(qele) , (8.8)
where
∂fele
∂uele
=∂f
∂u(uele, qele) =
∂fadv
∂u(uele)−
∂fdiff
∂u(uele, qele) ,
∂fele
∂qele
=∂f
∂q(uele, qele) = −∂fdiff
∂q(uele, qele) , (8.9)
and ∂fadv
∂u(u), ∂fdiff
∂u(u, q) and ∂fdiff
∂q(u, q) are the exact derivatives of the flux functions
fadv(u) and fdiff(u, q) with respect to u and q, accordingly. Since there’s no dependence
between solution points when evaluating the flux functions, the discontinuous flux
Jacobians, ∂fele
∂ueleand ∂fele
∂qeleare diagonal matrices of size (Nspts1D ×Nspts1D).
Common Interface Flux Jacobians
The Jacobians of the common interface fluxes can also be computed by applying the
chain rule. Starting from Eq. (6.24) and utilizing the extrapolation equations (6.11)
and (6.23), the Jacobians are formulated as
∂
∂uele
(f I,Lele
)=
∂
∂uele
(f I(uRele-1, u
Lele, q
Rele-1, q
Lele
)),
=∂f I,Lele
∂uLele
EL +∂f I,Lele
∂qLele
EL∂
∂uele
(qele) +
∂f I,Lele
∂qRele-1
ER∂
∂uele
(qele-1) ,
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 87
∂
∂uele
(f I,Rele
)=
∂
∂uele
(f I(uRele, u
Lele+1, q
Rele, q
Lele+1
)),
=∂f I,Rele
∂uRele
ER +∂f I,Rele
∂qRele
ER∂
∂uele
(qele) +
∂f I,Rele
∂qLele+1
EL∂
∂uele
(qele+1) , (8.10)
where
∂f I,Lele
∂uLele
=∂f I
∂u+
(uRele-1, u
Lele, q
Lele
),
∂f I,Rele
∂uRele
=∂f I
∂u−(uRele, u
Lele+1, q
Rele
),
∂f I,Lele
∂qLele
=∂f I
∂q+
(uLele, q
Lele
),
∂f I,Rele
∂qRele
=∂f I
∂q−(uRele, q
Rele
),
∂f I,Lele
∂qRele-1
=∂f I
∂q−(uRele-1, q
Rele-1
),
∂f I,Rele
∂qLele+1
=∂f I
∂q+
(uLele+1, q
Lele+1
), (8.11)
and
∂f I
∂u−(u−, u+, q−
)=∂f Iadv
∂u−(u−, u+
)− ∂f Idiff
∂u−(u−, q−
),
∂f I
∂u+
(u−, u+, q+
)=∂f Iadv
∂u+
(u−, u+
)− ∂f Idiff
∂u+
(u+, q+
),
∂f I
∂q−(u−, q−
)= −∂f
Idiff
∂q−(u−, q−
),
∂f I
∂q+
(u+, q+
)= −∂f
Idiff
∂q+
(u+, q+
), (8.12)
are the exact derivatives of the interface flux function f I (u−, u+, q−, q+) with respect
to u−, u+, q− and q+, respectively. Much like the flux function, the interface flux
function has no dependence between solution points so the resulting common interface
flux Jacobians evaluated from the interface flux derivatives are scalars in the one-
dimensional case.
The derivatives of the interface advective flux function or, in this case, the Rusanov
flux are computed by differentiating Eq. (6.25) with respect to u− and u+,
∂f Iadv
∂u−(u−, u+
)=
1
2
∂fadv
∂u(u−)− 1
2
∂
∂u−(|λ(u−, u+)|(u+ − u−)
),
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 88
∂f Iadv
∂u+
(u−, u+
)=
1
2
∂fadv
∂u(u+)− 1
2
∂
∂u+
(|λ(u−, u+)|(u+ − u−)
), (8.13)
where
∂
∂u−(|λ(u−, u+)|(u+ − u−)
)= (u+ − u−)
∂
∂u−(|λ(u−, u+)|
)− |λ(u−, u+)|,
∂
∂u+
(|λ(u−, u+)|(u+ − u−)
)= (u+ − u−)
∂
∂u+
(|λ(u−, u+)|
)+ |λ(u−, u+)|, (8.14)
and
∂
∂u−(|λ(u−, u+)|
)=
∂∂u−
(∣∣∂fadv
∂u(u−)
∣∣) if∣∣∂fadv
∂u(u−)
∣∣ > ∣∣∂fadv
∂u(u+)
∣∣,0 otherwise,
∂
∂u+
(|λ(u−, u+)|
)=
∂∂u+
(∣∣∂fadv
∂u(u+)
∣∣) if∣∣∂fadv
∂u(u+)
∣∣ > ∣∣∂fadv
∂u(u−)
∣∣,0 otherwise.
The derivatives of the interface diffusive flux are formed by differentiating the LDG
flux in Eq. (6.27),
∂f Idiff
∂u−(u−, q−
)=
1
2
∂fdiff
∂u(u−, q−) + β
∂fdiff
∂u(u−, q−) + τ,
∂f Idiff
∂u+
(u+, q+
)=
1
2
∂fdiff
∂u(u+, q+)− β ∂fdiff
∂u(u+, q+)− τ,
∂f Idiff
∂q−(u−, q−
)=
1
2
∂fdiff
∂q(u−, q−) + β
∂fdiff
∂q(u−, q−),
∂f Idiff
∂q+
(u+, q+
)=
1
2
∂fdiff
∂q(u+, q+)− β ∂fdiff
∂q(u+, q+). (8.15)
Boundary Flux Jacobians
The Jacobians of the boundary fluxes are found by differentiating Eq. (6.28),
∂
∂u1
(f I,L1
)=
∂
∂u1
(f b(uL1 , q
L1
)),
=∂f I,L1
∂uL1EL +
∂f I,L1
∂qL1EL
∂
∂u1
(q1) ,
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 89
∂
∂uNeles
(f I,RNeles
)=
∂
∂uNeles
(f b(uRNeles
, qRNeles
)),
=∂f I,RNeles
∂uRNeles
ER +∂f I,RNeles
∂qRNeles
ER∂
∂uNeles
(qNeles) , (8.16)
where
∂f I,L1
∂uL1=∂f b
∂u
(uL1 , q
L1
),
∂f I,RNeles
∂uRNeles
=∂f b
∂u
(uRNeles
, qRNeles
),
∂f I,L1
∂qL1=∂f b
∂q
(uL1 , q
L1
),
∂f I,RNeles
∂qRNeles
=∂f b
∂q
(uRNeles
, qRNeles
). (8.17)
and ∂fb
∂u(u, q) and ∂fb
∂q(u, q) are the derivatives of the function which applies the bound-
ary condition.
Contribution from local solution derivative
Following directly from Eq. (6.22), the Jacobian of the numerical solution derivative
with respect to the discontinuous solution in each element can be written as
∂
∂uele
(qele) = − 1
|Jele|∂
∂uele
(δuele
δξ
), (8.18)
where ∂∂uele
can applied directly to the solution in Eq. (6.20),
∂
∂uele
(δuele
δξ
)= DL
ξ
∂
∂uele
(uI,Lele
)+Dξ +DR
ξ
∂
∂uele
(uI,Rele
). (8.19)
The Jacobians of the common solution are constructed by applying the chain rule
to Eq. (6.13) and differentiating Eq. (6.11),
∂
∂uele
(uI,Lele
)=
∂
∂uele
(uI(uRele-1, u
Lele
))=∂uI,Lele
∂uLele
EL,
∂
∂uele
(uI,Rele
)=
∂
∂uele
(uI(uRele, u
Lele+1
))=∂uI,Rele
∂uRele
ER, (8.20)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 90
where∂uI,Lele
∂uLele
=∂uI
∂u+=
1
2+ β,
∂uI,Rele
∂uRele
=∂uI
∂u−=
1
2− β, (8.21)
are the exact derivatives of the LDG interface solution function uI (u−, u+) with
respect to u−, u+, respectively.
In the case of a boundary, the Jacobian is found by differentiating Eq.(6.15),
∂
∂u1
(uI,L1
)=
∂
∂u1
(ub(uL1))
=∂uI,L1
∂uL1EL,
∂
∂uNeles
(uI,RNeles
)=
∂
∂uNeles
(ub(uRNeles
))=∂uI,RNeles
∂uRNeles
ER, (8.22)
where∂uI,L1
∂uL1=∂ub
∂u
(uL1),
∂uI,RNeles
∂uRNeles
=∂ub
∂u
(uRNeles
), (8.23)
and ∂ub
∂u(u) is the derivative of the function which applies the boundary condition for
the solution at the specified boundary.
Contribution from neighboring solution derivatives
Since neighboring solution derivatives, qele-1 and qele+1, depend on the element local
discontinuous solution, the Jacobians for these contributions must be included in the
derivation. The Jacobians of the neighboring solution derivatives can be written as
∂
∂uele
(qele-1) = − 1
|Jele-1|∂
∂uele
(δuele-1
δξ
),
∂
∂uele
(qele+1) = − 1
|Jele+1|∂
∂uele
(δuele+1
δξ
). (8.24)
Applying ∂∂uele
to Eq. (6.20) for the neighboring elements produces
∂
∂uele
(δuele-1
δξ
)= DR
ξ
∂
∂uele
(uI,Rele-1
),
∂
∂uele
(δuele+1
δξ
)= DL
ξ
∂
∂uele
(uI,Lele+1
), (8.25)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 91
where the Jacobians of the common solution are written as
∂
∂uele
(uI,Rele-1
)=
∂
∂uele
(uI(uRele−1, u
Lele
))=∂uI,Rele-1
∂uLele
EL,
∂
∂uele
(uI,Lele+1
)=
∂
∂uele
(uI(uRele, u
Lele+1
))=∂uI,Lele+1
∂uRele
ER, (8.26)
and∂uI,Rele-1
∂uLele
=∂uI,Lele
∂uLele
,∂uI,Lele+1
∂uRele
=∂uI,Rele
∂uRele
. (8.27)
The neighboring common interface solution Jacobians can then be evaluating using
the results from Eq. (8.21).
Modified Common Interface Flux Jacobians and Final Result
The modified common interface flux Jacobians in Eq. (8.6) are constructed by rewrit-
ing the Jacobians of the common fluxes in Eq. (8.10) so that the contributions from
neighboring solution derivatives are combined with other terms in the equation,
∂
∂uele
(f I,Lele
)=∂f I,Lele
∂uLele
∗
EL +∂f I,Lele
∂qLele
EL∂
∂uele
(qele) ,
∂
∂uele
(f I,Rele
)=∂f I,Rele
∂uRele
∗
ER +∂f I,Rele
∂qRele
ER∂
∂uele
(qele) , (8.28)
where the modified common interface flux Jacobians,∂fI,Lele
∂uLele
∗and
∂fI,Rele
∂uRele
∗are defined to
include∂fI,Lele
∂uLeleand
∂fI,Rele
∂uRelefrom Eq. (8.11) as well as the contributions from neighboring
solution derivatives in equations (8.10) and (8.24)–(8.27),
∂f I,Lele
∂uLele
∗
=∂f I,Lele
∂uLele
+∂f I,Lele
∂qRele-1
ER(− 1
|Jele-1|
)DRξ
∂uI,Rele-1
∂uLele
,
∂f I,Rele
∂uRele
∗
=∂f I,Rele
∂uRele
+∂f I,Rele
∂qLele+1
EL(− 1
|Jele+1|
)DLξ
∂uI,Lele+1
∂uRele
. (8.29)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 92
The neighbor contribution Jacobians are now defined as
∂qRele-1
∂uI,Rele-1
= ER(− 1
|Jele-1|
)DRξ ,
∂qLele+1
∂uI,Lele+1
= EL(− 1
|Jele+1|
)DLξ , (8.30)
where it’s important to note that the results are scalars.
By utilizing equations (8.8), (8.28) and (8.18)–(8.20) and grouping terms together,
the Jacobian of the numerical derivative of the flux from Eq. (8.7) can now be split
into two distinct parts and the final result in Eq. (8.3) can be constructed.
8.1.2 Kronecker Product Formulation
The element local Jacobian follows a distinct pattern which comes directly from the
Lagrange basis polynomials used in the DFR method. This allows for a Kronecker
product formulation which is more compact. The numerical derivative of the flux
with respect to the solution and the solution derivative in equations (8.4) and (8.5)
can be reformulated as
δ
δξ
(∂fele
∂uele
)=∂f I,Lele
∂uLele
∗
⊗KL + 1⊗(K
∂fele
∂uele
)+∂f I,Rele
∂uRele
∗
⊗KR, (8.31)
and
δ
δξ
(∂fele
∂qele
∂qele
∂uele
)=
(∂f I,Lele
∂qLele
⊗KL + 1⊗(K
∂fele
∂qele
)+∂f I,Rele
∂qRele
⊗KR
)(− 1
|Jele|
) (∂uI,Lele
∂uLele
⊗KL + 1⊗K +∂uI,Rele
∂uRele
⊗KR
), (8.32)
where the Kronecker product operators are of size (Nspts1D×Nspts1D) and come directly
from equations (6.21) and (6.12),
K = Dξ =∂ ˜
∂ξ(ξ)′,
KL = DLξ E
L =∂ ˜
0
∂ξ(ξ) `(−1)′,
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 93
KR = DRξ E
R =∂ ˜
P+2
∂ξ(ξ) `(+1)′. (8.33)
For clarity, these operators can also be written as,
Kp,m =∂ ˜
m
∂ξ(ξp), p,m = 1, 2, . . . , Nspts1D,
KLp,m =
∂ ˜0
∂ξ(ξp) `m(−1), p,m = 1, 2, . . . , Nspts1D,
KRp,m =
∂ ˜P+2
∂ξ(ξp) `m(+1), p,m = 1, 2, . . . , Nspts1D. (8.34)
Using this new notation, the neighbor contribution Jacobians from Eq. (8.30) can
also be reformulated as
∂qRele-1
∂uI,Rele-1
=
(− 1
|Jele-1|
) Nspts1D∑p=1
KRp,p,
∂qLele+1
∂uI,Lele+1
=
(− 1
|Jele+1|
) Nspts1D∑p=1
KLp,p. (8.35)
In the one-dimensional case, Kronecker products do very little to simplify the equation
but by rewriting the equation in terms of Kronecker products, it’s much easier to see
the extension to multiple dimensions.
8.2 Two- and Three-Dimensional Formulation for
Advection-Diffusion
In this section, the two- and three dimensional element local Jacobian formulation
for advection-diffusion is generalized for multiple dimensions to avoid repetition. In
order to generalize the formulation, a number of sets are defined in Table 8.2. These
are used throughout the section.
Following from Eq. (6.71), the two- or three-dimensional DFR formulation of the
semi-discrete advection-diffusion equation in each element can be written as
rele = r(uele,ueleN) = −V −1ele ∇
δ · fele, (8.36)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 94
Variable 2D 3D
Dx x, y x, y, zDξ ξ, η ξ, η, ζDLξ ξ,L, η,B ξ,L, η,F , ζ,BDRξ ξ,R, η,T ξ,R, η,Bk, ζ,T FL L,B L,F ,BFR R,T R,Bk,T
Table 8.2: Two- and three-dimensional sets
where r(uele,ueleN) is the residual. Differentiating with respect to the discontinuous
solution vector, uele, produces the element local Jacobian matrix,
∂rele
∂uele
= −V −1ele
∂
∂uele
(∇δ · fele
), (8.37)
where the Jacobian of the numerical gradient of the flux is split into two distinct
parts by applying a chain rule,
∂
∂uele
(∇δ · fele
)= ∇δ ·
(∂fele
∂uele
)+ ∇δ ·
(∂fele
∂qele
∂qele
∂uele
). (8.38)
In the two-dimensional case, the element local Jacobian, ∂rele
∂ueleis a matrix of size
(Nspts2D×Nspts2D). The numerical gradient of the flux with respect to the solution is
defined as
∇δ ·
(∂fele
∂uele
)=DL
ξ
∂f I,Lξele
∂uLele
∗
EL +Dξ∂fξele
∂uele
+DRξ
∂f I,Rξele
∂uRele
∗
ER+
DBη
∂f I,Bηele
∂uBele
∗
EB +Dη∂fηele
∂uele
+DTη
∂f I,Tηele
∂uTele
∗
ET , (8.39)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 95
and the numerical gradient of the flux with respect to the solution gradient is
∇δ ·
(∂fele
∂qele
∂qele
∂uele
)=
∑dj∈x,y(
DLξ
∂f I,Lξele
∂qLdjele
EL +Dξ∂fξele
∂qdjele
+DRξ
∂f I,Rξele
∂qRdjele
ER+
DBη
∂f I,Bηele
∂qBdjele
EB +Dη∂fηele
∂qdjele
+DTη
∂f I,Tηele
∂qTdjele
ET
)(J−1′
(dj,ξ)ele
(DLξ
∂uI,Lele
∂uLele
EL +Dξ +DRξ
∂uI,Rele
∂uRele
ER
)+
J−1′
(dj,η)ele
(DBη
∂uI,Bele
∂uBele
EB +Dη +DTη
∂uI,Tele
∂uTele
ET
)). (8.40)
In the three-dimensional case, the size of ∂rele
∂ueleis (Nspts × Nspts). The numerical
gradient of the flux with respect to the solution is
∇δ ·
(∂fele
∂uele
)=DL
ξ
∂f I,Lξele
∂uLele
∗
EL +Dξ∂fξele
∂uele
+DRξ
∂f I,Rξele
∂uRele
∗
ER+
DFη
∂f I,Fηele
∂uFele
∗
EF +Dη∂fηele
∂uele
+DBkη
∂f I,Bkηele
∂uBkele
∗
EBk,
DBζ
∂f I,Bζele
∂uBele
∗
EB +Dζ∂fζele
∂uele
+DTζ
∂f I,Tζele
∂uTele
∗
ET , (8.41)
and the numerical gradient of the flux with respect to the solution gradient is
∇δ ·
(∂fele
∂qele
∂qele
∂uele
)=
∑dj∈x,y,z(
DLξ
∂f I,Lξele
∂qLdjele
EL +Dξ∂fξele
∂qdjele
+DRξ
∂f I,Rξele
∂qRdjele
ER+
DFη
∂f I,Fηele
∂qFdjele
EF +Dη∂fηele
∂qdjele
+DBkη
∂f I,Bkηele
∂qBkdjele
EBk+
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 96
DBζ
∂f I,Bζele
∂qBdjele
EB +Dζ∂fζele
∂qdjele
+DTζ
∂f I,Tζele
∂qTdjele
ET
)(J−1′
(dj,ξ)ele
(DLξ
∂uI,Lele
∂uLele
EL +Dξ +DRξ
∂uI,Rele
∂uRele
ER
)+
J−1′
(dj,η)ele
(DFη
∂uI,Fele
∂uFele
EF +Dη +DBkη
∂uI,Bkele
∂uBkele
EBk
)+
J−1′
(dj,ζ)ele
(DBζ
∂uI,Bele
∂uBele
EB +Dζ +DTζ
∂uI,Tele
∂uTele
ET
)). (8.42)
The modified common interface flux Jacobians contain contributions from neigh-
boring solution derivatives and are defined as
∂f I,facediele
∂ufaceele
∗
=∂f I,face
diele
∂ufaceele
+∑dj∈Dx
∂f I,facediele
∂qfaceNdjeleN
∂qfaceNdjeleN
∂uI,faceNeleN
∂uI,faceNeleN
∂ufaceele
,
for di, face ∈ DLξ ∪ DRξ , (8.43)
Table 8.3 defines the Jacobians which are diagonal matrices primarily constructed
through function evaluations. These are described in more detail in section 8.2.1.
Name Variable 2D Size 3D Size
D. Flux∂fdele
∂uele,∂fdele
∂qele(Nspts2D ×Nspts2D) (Nspts ×Nspts)
C. Flux∂fI,facedele
∂ufaceele
,∂fI,facediele
∂qfacedjele
,∂fI,facediele
∂qfaceNdjeleN
(Nspts1D ×Nspts1D) (Nspts2D ×Nspts2D)
C. Solution∂uI,face
ele
∂ufaceele
,∂uI,faceN
eleN
∂ufaceele
(Nspts1D ×Nspts1D) (Nspts2D ×Nspts2D)
M. C. Flux∂fI,facedele
∂ufaceele
∗(Nspts1D ×Nspts1D) (Nspts2D ×Nspts2D)
N. Contribution∂qfaceNdjeleN
∂uI,faceNeleN
(Nspts1D ×Nspts1D) (Nspts2D ×Nspts2D)
Table 8.3: Diagonal Jacobian matrices used in the construction of the two- and three-dimensional element local Jacobian matrix for advection-diffusion. The full names ofthe Jacobians matrices can be found in Table 8.1.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 97
8.2.1 Derivation
The Jacobian of the numerical gradient of the flux follows directly from Eq. (6.87),
∂
∂uele
(∇δ · fele
)=∑d∈Dξ
Dd∂
∂uele
(fdele) +
∑d,face∈DLξ ∪D
Rξ
Dfaced
∂
∂uele
(f I,facedele
).
(8.44)
The derivation can now be split into sections where the Jacobians in Table 8.3 are
constructed.
Discontinuous Flux Jacobians
The Jacobian of the transformed discontinuous flux in block matrix-vector format is
found directly from the transformation in Eq. (6.88),
∂
∂uele
(fele
)= J−1
ele
∂
∂uele
(f (uele, qele)) . (8.45)
This can be written as
∂
∂uele
(fdiele) =
∑dk∈Dx
J−1(di,dk)ele
∂
∂uele
(fdk (uele, qele)) , di ∈ Dξ, (8.46)
where the Jacobian of the discontinuous flux is found by applying the chain rule to
the flux function,
∂
∂uele
(fdi (uele, qele)) =∂fdiele
∂uele
+∑dj∈Dx
∂fdiele
∂qdjele
∂
∂uele
(qdjele) , di ∈ Dx, (8.47)
and
∂fdiele
∂uele
=∂fdi∂u
(uele, qele) =∂fdi,adv
∂u(uele)−
∂fdi,diff
∂u(uele, qele) ,
∂fdiele
∂qdjele
=∂fdi∂qdj
(uele, qele) = −∂fdi,diff
∂qdj(uele, qele) , di, dj ∈ Dx. (8.48)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 98
Note that the discontinuous flux Jacobians,∂fdiele
∂ueleand
∂fdiele
∂qdjele
are diagonal matrices
because there’s no dependence among solution points.
Common Interface Flux Jacobians
The Jacobians of the transformed common fluxes are found by differentiating Eq. (6.86),
∂
∂uele
(f I,facedele
)= Aface
ele
∂
∂uele
(f I(ufaceN
eleN ,ufaceele , q
faceNeleN , qface
ele
)),
for d, face ∈ DLξ ,∂
∂uele
(f I,facedele
)= Aface
ele
∂
∂uele
(f I(uface
ele ,ufaceNeleN , qface
ele , qfaceNeleN
)),
for d, face ∈ DRξ . (8.49)
Applying the chain rule and differentiating the extrapolation equations (6.75) and (6.85)
produces the following result,
∂
∂uele
(f I,facediele
)=∂f I,face
diele
∂ufaceele
Eface+
∑dj∈Dx
(∂f I,face
diele
∂qfacedjele
Eface ∂
∂uele
(qdjele) +
∂f I,facediele
∂qfaceNdjeleN
EfaceN ∂
∂uele
(qdjeleN)
),
for di, face ∈ DLξ ∪ DRξ , (8.50)
where
∂f I,facediele
∂ufaceele
= Afaceele
∂f I
∂u+
(ufaceN
eleN ,ufaceele , q
faceele
),
∂f I,facediele
∂qfacedjele
= Afaceele
∂f I
∂q+dj
(uface
ele , qfaceele
),
∂f I,facediele
∂qfaceNdjeleN
= Afaceele
∂f I
∂q−dj
(ufaceN
eleN , qfaceNeleN
),
for di, face ∈ DLξ , dj ∈ Dx,
∂f I,facediele
∂ufaceele
= Afaceele
∂f I
∂u−(uface
ele ,ufaceNeleN , qface
ele
),
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 99
∂f I,facediele
∂qfacedjele
= Afaceele
∂f I
∂q−dj
(uface
ele , qfaceele
),
∂f I,facediele
∂qfaceNdjeleN
= Afaceele
∂f I
∂q+dj
(ufaceN
eleN , qfaceNeleN
),
for di, face ∈ DRξ , dj ∈ Dx, (8.51)
and
∂f I
∂u−(u−, u+, q−
)=∂f Iadv
∂u−(u−, u+
)− ∂f Idiff
∂u−(u−, q−
),
∂f I
∂u+
(u−, u+, q+
)=∂f Iadv
∂u+
(u−, u+
)− ∂f Idiff
∂u+
(u+, q+
),
∂f I
∂q−d
(u−, q−
)= −∂f
Idiff
∂q−d
(u−, q−
), d ∈ Dx,
∂f I
∂q+d
(u+, q+
)= −∂f
Idiff
∂q+d
(u+, q+
), d ∈ Dx, (8.52)
are the exact derivatives of the interface flux function f I (u−, u+, q−, q+) with respect
to u−, u+, q−d and q+d for d ∈ Dx, respectively. Note that common interface flux Jaco-
bians,∂fI,facediele
∂ufaceele
,∂fI,facediele
∂qfacedjele
and∂fI,facediele
∂qfaceNdjeleN
are diagonal matrices because there’s no dependence
among flux points.
In the multi-dimensional case, the derivatives of the interface advective flux func-
tion or the Rusanov flux are computed by differentiating Eq. (6.59) with respect to
u− and u+,
∂f Iadv
∂u−(u−, u+
)=
1
2
∂fnadv
∂u(u−)− 1
2
∂
∂u−(|λ(u−, u+)|(u+ − u−)
),
∂f Iadv
∂u+
(u−, u+
)=
1
2
∂fnadv
∂u(u+)− 1
2
∂
∂u+
(|λ(u−, u+)|(u+ − u−)
), (8.53)
where∂fnadv
∂u(u) = ∂fadv
∂u(u) · n−,
∂
∂u−(|λ(u−, u+)|(u+ − u−)
)= (u+ − u−)
∂
∂u−(|λ(u−, u+)|
)− |λ(u−, u+)|,
∂
∂u+
(|λ(u−, u+)|(u+ − u−)
)= (u+ − u−)
∂
∂u+
(|λ(u−, u+)|
)+ |λ(u−, u+)|, (8.54)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 100
and
∂
∂u−(|λ(u−, u+)|
)=
∂∂u
(∣∣∣∂fnadv
∂u(u−)
∣∣∣) if∣∣∣∂fnadv
∂u(u−)
∣∣∣ > ∣∣∣∂fnadv
∂u(u+)
∣∣∣,0 otherwise,
∂
∂u+
(|λ(u−, u+)|
)=
∂∂u
(∣∣∣∂fnadv
∂u(u+)
∣∣∣) if∣∣∣∂fnadv
∂u(u+)
∣∣∣ > ∣∣∣∂fnadv
∂u(u−)
∣∣∣,0 otherwise.
The derivatives of the interface diffusive flux are formed by differentiating the LDG
flux in Eq. (6.61),
∂f Idiff
∂u−(u−, q−
)=
1
2
∂fndiff
∂u(u−, q−) + β · n−
(∂fndiff
∂u(u−, q−) · n−
)+ τ,
∂f Idiff
∂u+
(u+, q+
)=
1
2
∂fndiff
∂u(u+, q+) + β · n−
(∂fndiff
∂u(u+, q+) · n+
)− τ,
∂f Idiff
∂q−d
(u−, q−
)=
1
2
∂fndiff
∂qd(u−, q−) + β · n−
(∂fndiff
∂qd(u−, q−) · n−
), d ∈ Dx,
∂f Idiff
∂q+d
(u+, q+
)=
1
2
∂fndiff
∂qd(u+, q+) + β · n−
(∂fndiff
∂qd(u+, q+) · n+
), d ∈ Dx,
(8.55)
where∂fndiff
∂u(u, q) = ∂fdiff
∂u(u, q) · n− and
∂fndiff
∂qd(u, q) = ∂fdiff
∂qd(u, q) · n−.
Boundary Flux Jacobian
The Jacobian of the common flux at boundaries is found by differentiating Eq. (6.62),
∂
∂uele
(f I,∂Ω
ele
)=A∂Ω
ele
∂
∂uele
(f b(u∂Ω
ele , q∂Ωele
)),
=∂f I,∂Ω
ele
∂u∂Ωele
E∂Ω +∑d∈Dx
∂f I,∂Ωele
∂q∂Ωdele
E∂Ω ∂
∂uele
(qdele) , (8.56)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 101
where
∂f I,∂Ωele
∂u∂Ωele
= A∂Ωele
∂f b
∂u
(u∂Ω
ele , q∂Ωele
),
∂f I,∂Ωele
∂q∂Ωdele
= A∂Ωele
∂f b
∂qd
(u∂Ω
ele , q∂Ωele
), d ∈ Dx, (8.57)
and ∂fb
∂u(u, q) and ∂fb
∂qd(u, q) are the derivatives of the function which applies the bound-
ary condition.
Contribution from local solution gradient
Following directly from Eq. (6.84), the Jacobian of the numerical solution gradient
with respect to the discontinuous solution in block matrix-vector format can be writ-
ten as∂
∂uele
(qele) = J−1′
ele
∂
∂uele
(∇δuele
), (8.58)
where ∂∂uele
can applied directly to the solution in Eq. (6.51) in the two-dimensional
case,
∂
∂uele
(δuele
δξ
)= DL
ξ
∂
∂uele
(uI,Lele
)+Dξ +DR
ξ
∂
∂uele
(uI,Rele
),
∂
∂uele
(δuele
δη
)= DB
η
∂
∂uele
(uI,Bele
)+Dη +DT
η
∂
∂uele
(uI,Tele
), (8.59)
and Eq. (6.81) in the three-dimensional case,
∂
∂uele
(δuele
δξ
)= DL
ξ
∂
∂uele
(uI,Lele
)+Dξ +DR
ξ
∂
∂uele
(uI,Rele
),
∂
∂uele
(δuele
δη
)= DF
η
∂
∂uele
(uI,Fele
)+Dη +DBk
η
∂
∂uele
(uI,Bkele
),
∂
∂uele
(δuele
δζ
)= DB
ζ
∂
∂uele
(uI,Bele
)+Dζ +DT
ζ
∂
∂uele
(uI,Tele
). (8.60)
The Jacobians of the common solution values are constructed by differentiating
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 102
Eq. (6.44),
∂
∂uele
(uI,face
ele
)=
∂
∂uele
(uI(ufaceN
eleN ,ufaceele
)), face ∈ FL,
∂
∂uele
(uI,face
ele
)=
∂
∂uele
(uI(uface
ele ,ufaceNeleN
)), face ∈ FR. (8.61)
Applying the chain rule to this equation and differentiating Eq. (6.75) produces the
following,∂
∂uele
(uI,face
ele
)=∂uI,face
ele
∂ufaceele
Eface, face ∈ FL ∪ FR, (8.62)
where
∂uI,faceele
∂ufaceele
=∂uI
∂u+=
1
2− β · n+, face ∈ FL,
∂uI,faceele
∂ufaceele
=∂uI
∂u−=
1
2− β · n−, face ∈ FR, (8.63)
are the exact derivatives of the LDG interface solution function uI (u−, u+) with
respect to u−, u+, respectively. It’s important to note that the common interface
solution Jacobians∂uI,face
ele
∂ufaceele
are diagonal matrices defined by a single scalar.
In the case of a boundary, the Jacobian of the common solution is found by
differentiating Eq.(6.46),
∂
∂uele
(uI,∂Ω
ele
)=
∂
∂uele
(ub(u∂Ω
ele
))=∂uI,∂Ω
ele
∂u∂Ωele
E∂Ω, (8.64)
where∂uI,∂Ω
ele
∂u∂Ωele
=∂ub
∂u
(u∂Ω
ele
), (8.65)
and ∂ub
∂u(u) is the derivative of the function which applies the boundary condition for
the solution at the specified boundary.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 103
Contribution from neighboring solution gradients
The Jacobian of the neighboring solution gradient can be written as
∂
∂uele
(qeleN) = J−1′
eleN
∂
∂uele
(∇δueleN
). (8.66)
Applying ∂∂uele
to Eq. (6.81) for the neighboring elements produces
∂
∂uele
(δueleN
δd
)= DfaceN
d
∂
∂uele
(uI,faceN
eleN
), d, faceN ∈ DLξ ∪ DRξ , (8.67)
where the Jacobians of the common solution values are written as
∂
∂uele
(uI,faceN
eleN
)=∂uI,faceN
eleN
∂ufaceele
Eface, face ∈ FL ∪ FR, (8.68)
and∂uI,faceN
eleN
∂ufaceele
=∂uI,face
ele
∂ufaceele
, face ∈ FL ∪ FR. (8.69)
The neighboring common interface solution Jacobians can then be evaluated using
the results from Eq. (8.63).
Modified Common Interface Flux Jacobians and Final Result
The Jacobian of the transformed discontinuous flux can be rewritten by combining
equations (8.46) and (8.47) and applying the flux transformation directly to the dis-
continuous flux Jacobians,
∂
∂uele
(fdiele) =
∂fdiele
∂uele
+∑dj∈Dx
∂fdiele
∂qdjele
∂
∂uele
(qdjele) , di ∈ Dξ, (8.70)
where
∂fdiele
∂uele
=∑dk∈Dx
J−1(di,dk)ele
∂fdkele
∂uele
, di ∈ Dξ,
∂fdiele
∂qdjele
=∑dk∈Dx
J−1(di,dk)ele
∂fdkele
∂qdjele
di ∈ Dξ, dj ∈ Dx. (8.71)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 104
The modified common interface flux Jacobians in Eq. (8.43) are constructed by
including the contributions from the neighboring solution gradients in the common
interface flux Jacobians from Eq. (8.50),
∂
∂uele
(f I,facediele
)=∂f I,face
diele
∂ufaceele
∗
Eface +∑dj∈Dx
∂f I,facediele
∂qfacedjele
Eface ∂
∂uele
(qdjele) ,
for di, face ∈ DLξ ∪ DRξ , (8.72)
where the common interface flux Jacobians,∂fI,facediele
∂ufaceele
∗, are defined to include
∂fI,facediele
∂ufaceele
from Eq. (8.51) as well as the contributions from neighboring solution gradients in
equations (8.50) and (8.66)–(8.69),
∂f I,facediele
∂ufaceele
∗
=∂f I,face
diele
∂ufaceele
+∑dj∈Dx
∂f I,facediele
∂qfaceNdjeleN
EfaceN J−1′
(dj,dN)eleNDfaceNdN
∂uI,faceNeleN
∂ufaceele
,
for di, face ∈ DLξ ∪ DRξ , (8.73)
where “dN” refers to the neighboring face’s dimension in reference space. Eq. (8.73)
can be further simplified by grouping terms which do not depend on the solution,
∂qfaceNdjeleN
∂uI,faceNeleN
= EfaceN J−1′
(dj,dN)eleNDfaceNdN ,
for dj ∈ Dx, faceN ∈ FL ∪ FR, (8.74)
where∂qfaceNdjeleN
∂uI,faceNeleN
are diagonal matrices which are permuted according to the matching
local element flux point locations. This leads directly to the construction of Eq (8.43).
By utilizing equations (8.70), (8.72) and (8.58)–(8.62) and grouping terms to-
gether, the Jacobian of the numerical gradient of the flux from Eq. (8.44) can now be
split into two distinct parts and the final result in Eq. (8.38) can be constructed.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 105
8.2.2 Two-Dimensional Kronecker Product Formulation
The two-dimensional element local Jacobian can be written more compactly by us-
ing Kronecker products. First, consider the Kronecker product formulations of the
polynomial differentiation and extrapolation operators. The extrapolation operators
in equations (6.41) and (6.43) can be written as
EL = `(η)′ ⊗ `(−1)′, ER = `(η)′ ⊗ `(+1)′,
EB = `(−1)′ ⊗ `(ξ)′, ET = `(+1)′ ⊗ `(ξ)′. (8.75)
The differentiation operators in equations (6.49) and (6.52) are written as
Dξ = `(η)′ ⊗ ∂ ˜
∂ξ(ξ)′, Dη =
∂ ˜
∂η(η)′ ⊗ `(ξ)′, (8.76)
and from equations (6.50) and (6.53),
DLξ = `(η)′ ⊗ ∂ ˜
0
∂ξ(ξ), DR
ξ = `(η)′ ⊗ ∂ ˜P+2
∂ξ(ξ),
DBη =
∂ ˜0
∂η(η)⊗ `(ξ)′, DT
η =∂ ˜
P+2
∂η(η)⊗ `(ξ). (8.77)
In order to simplify these expressions, we invoke the following property for Lagrange
polynomials,
`m(ξp) =
1 if m = p,
0 otherwise,p,m = 1, 2, . . . , Nspts1D, (8.78)
so that equations (8.75), (8.76) and (8.77) become,
EL = I ⊗ `(−1)′, ER = I ⊗ `(+1)′,
EB = `(−1)′ ⊗ I, ET = `(+1)′ ⊗ I,
Dξ = I ⊗ ∂ ˜
∂ξ(ξ)′, Dη =
∂ ˜
∂η(η)′ ⊗ I,
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 106
DLξ = I ⊗ ∂ ˜
0
∂ξ(ξ), DR
ξ = I ⊗ ∂ ˜P+2
∂ξ(ξ),
DBη =
∂ ˜0
∂η(η)⊗ I, DT
η =∂ ˜
P+2
∂η(η)⊗ I, (8.79)
where I is an identity matrix of size (Nspts1D ×Nspts1D).
Discontinuous Flux Jacobians
The discontinuous flux Jacobians can also be rewritten in terms of Kronecker prod-
ucts. As an example, consider expanding the discontinuous flux Jacobians,∂fdiele
∂uele,
into block diagonals with each diagonal containing Nspts1D values corresponding to
the collinear points along the direction di ∈ ξ, η,
∂fξele
∂uele
=
Nspts1D∑j=1
eje′j ⊗
∂fξele
∂uele
∣∣∣∣j
,
∂fηele
∂uele
=
Nspts1D∑i=1
∂fηele
∂uele
∣∣∣∣i
⊗ eie′i, (8.80)
where ei represents a vector of size (Nspts1D × 1) with a 1 in the ith index and 0 in
all other indices. Now consider a product of the polynomial differentiation operator
and one of the discontinuous flux Jacobians,
Dξ∂fξele
∂uele
=
(I ⊗ ∂ ˜
∂ξ(ξ)′
)Nspts1D∑j=1
eje′j ⊗
∂fξele
∂uele
∣∣∣∣j
,
=
Nspts1D∑j=1
eje′j ⊗
(∂ ˜
∂ξ(ξ)′
∂fξele
∂uele
∣∣∣∣j
), (8.81)
where Eq. (8.79) is used and the mixed product property of Kronecker products is
invoked. The same operation can be performed on the other product except the
Kronecker product is in reverse. Substituting the Kronecker product operator from
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 107
Eq. (8.33) into both formulations produces the following,
Dξ∂fξele
∂uele
=
Nspts1D∑j=1
eje′j ⊗
(K
∂fξele
∂uele
∣∣∣∣j
),
Dη∂fηele
∂uele
=
Nspts1D∑i=1
(K
∂fηele
∂uele
∣∣∣∣i
)⊗ eie′i. (8.82)
This operation can be repeated for the discontinuous flux Jacobians,∂fdiele
∂qdjele
.
Common Interface Jacobians
The product of the polynomial differentiation operator, common interface Jacobian
and polynomial extrapolation operator forms a Kronecker product as well. As an
example consider the following reformulation,
DLξ
∂f I,Lξele
∂uLele
∗
EL =
(I ⊗ ∂ ˜
0
∂ξ(ξ)
)(∂f I,Lξele
∂uLele
∗
⊗ 1
)(I ⊗ `(−1)′) ,
=∂f I,Lξele
∂uLele
∗
⊗
(∂ ˜
0
∂ξ(ξ) `(−1)′
), (8.83)
where Eq. (8.79) is used, the mixed product property of Kronecker products is invoked
and a scalar 1 is placed in the appropriate location to complete the product. The
Kronecker product operator from Eq. (8.33) can now be substituted so that,
DLξ
∂f I,Lξele
∂uLele
∗
EL =∂f I,Lξele
∂uLele
∗
⊗KL. (8.84)
This type of operation is repeated for all products involving common interface Jaco-
bians.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 108
Neighbor Contribution Jacobians
The neighbor contribution Jacobians from Eq. (8.74) are also reformulated. As an
example, consider expanding J−1′
(dj,ξ)eleNinto block diagonal components as was per-
formed on the discontinuous flux Jacobians,
J−1′
(dj,ξ)eleN=
Nspts1D∑j=1
eje′j ⊗ J
−1′
(dj,ξ)eleN
∣∣∣j, dj ∈ Dx. (8.85)
Now consider the left face of a neighbor. The neighbor contribution Jacobians can
be expressed as
∂qLdjeleN
∂uI,LeleN
= EL J−1′
(dj,ξ)eleNDLξ , dj ∈ Dx
= (I ⊗ `(−1)′)
Nspts1D∑j=1
eje′j ⊗ J
−1′
(dj,ξ)eleN
∣∣∣j
(I ⊗ ∂ ˜0
∂ξ(ξ)
),
=
Nspts1D∑j=1
eje′j ⊗
(`(−1)′ J−1′
(dj,ξ)eleN
∣∣∣j
∂ ˜0
∂ξ(ξ)
), (8.86)
where Eq. (8.79) is used and the mixed product property of Kronecker products is
invoked. The final result after utilizing the Kronecker product operator in Eq. (8.33)
is
∂qLdjeleN
∂uI,LeleN
=
Nspts1D∑j=1
Nspts1D∑i=1
eje′j ⊗KL
i,i J−1′
(dj,ξ)eleN
∣∣∣i,j, dj ∈ Dx, (8.87)
This type of operation is repeated for all products involving neighbor contribution Ja-
cobians. It’s also important to note that the resulting Jacobians need to be permuted
to match the local element flux points.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 109
Final Result
By utilizing the Kronecker product formulations in equations (8.79), (8.82) and (8.84),
Eq. (8.39) is rewritten as
∇δ ·
(∂fele
∂uele
)= Kξ
(∂fξele
∂uele
)+Kη
(∂fηele
∂uele
), (8.88)
where
Kξ(∂fξele
∂uele
)=∂f I,Lξele
∂uLele
∗
⊗KL +
Nspts1D∑j=1
eje′j ⊗
(K
∂fξele
∂uele
∣∣∣∣j
)+∂f I,Rξele
∂uRele
∗
⊗KR,
Kη(∂fηele
∂uele
)=KL ⊗
∂f I,Bηele
∂uBele
∗
+
Nspts1D∑i=1
(K
∂fηele
∂uele
∣∣∣∣i
)⊗ eie′i +KR ⊗
∂f I,Tηele
∂uTele
∗
,
(8.89)
and Eq. (8.40) is rewritten as
∇δ ·
(∂fele
∂qele
∂qele
∂uele
)=
∑dj∈x,y
(Kξ(∂fξele
∂qdjele
)+Kη
(∂fηele
∂qdjele
))(J−1′
(dj,ξ)eleKξ(∂qξele
∂uele
)+ J−1′
(dj,η)eleKη(∂qηele
∂uele
)), (8.90)
where
Kξ(∂fξele
∂qdjele
)=∂f I,Lξele
∂qLdjele
⊗KL +
Nspts1D∑j=1
eje′j ⊗
(K
∂fξele
∂qdjele
∣∣∣∣j
)+∂f I,Rξele
∂qRdjele
⊗KR,
Kη(∂fηele
∂qdjele
)= KL ⊗
∂f I,Bηele
∂qBdjele
+
Nspts1D∑i=1
(K
∂fηele
∂qdjele
∣∣∣∣i
)⊗ eie′i +KR ⊗
∂f I,Tηele
∂qTdjele
,
Kξ(∂qξele
∂uele
)=∂uI,Lele
∂uLele
⊗KL + I ⊗K +∂uI,Rele
∂uRele
⊗KR,
Kη(∂qηele
∂uele
)= KL ⊗ ∂uI,Bele
∂uBele
+K ⊗ I +KR ⊗ ∂uI,Tele
∂uTele
. (8.91)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 110
Sparsity Pattern
The Kronecker product formulation highlights the unique dependency between the
sparsity pattern of the matrix and the dimensions of the numerical derivatives. There
are two distinct structures that emerge from the formulation: Kξ (x) and Kη (x). Fig-
ure 8.1 shows an example of the sparsity pattern for these structures for a polynomial
order of P = 2. These patterns dictate the structure of the element local Jacobian.
2 4 6 8
1
2
3
4
5
6
7
8
9
Number of nonzeros: 27
(a) Kξ (x)
2 4 6 8
1
2
3
4
5
6
7
8
9
Number of nonzeros: 27
(b) Kη (x)
Figure 8.1: The two distinct sparsity patterns used in the two-dimensional Kroneckerproduct formulation of the element local Jacobian for DFR, P = 2.
The first term, ∇δ ·(∂fele
∂uele
), defined in Eq. 8.88 is simply the summation of these two
structures. Figure 8.2 shows the sparsity pattern for ∇δ ·(∂fele
∂uele
)for P = 2.
The second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), defined in Eq. 8.90 can be expanded into four
terms where two terms are like-terms, Kξ (x)Kξ (y) and Kη (x)Kη (y), and two terms
are cross-terms, Kξ (x)Kη (y) and Kη (x)Kξ (y). The two like-terms do not change
the sparsity pattern of the matrix. The cross-terms, however, produce the dense
matrix shown in Figure 8.3.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 111
2 4 6 8
1
2
3
4
5
6
7
8
9
Number of nonzeros: 45
Figure 8.2: The summation of the two terms in Figure 8.1 produces the sparsity
pattern for the first term, ∇δ ·(∂fele
∂uele
), in the two-dimensional element local Jacobian
for DFR, P = 2.
2 4 6 8
1
2
3
4
5
6
7
8
9
Number of nonzeros: 81
Figure 8.3: The cross-terms Kξ (x)Kη (y) and Kη (x)Kξ (y) in the second term,
∇δ ·(∂fele
∂qele
∂qele
∂uele
)produce a dense matrix so the final sparsity pattern of the two-
dimensional element local Jacobian for DFR, P = 2, is fully dense.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 112
8.2.3 Three-Dimensional Kronecker Product Formulation
Following the same procedure described in section 8.2.2, consider applying the prop-
erty in Eq. (8.78) to the three-dimensional Kronecker product formulations of the
polynomial differentiation and extrapolation operators. The extrapolation operators
in equations (6.74) and (6.76) can be written as
EL = I ⊗ I ⊗ `(−1)′, ER = I ⊗ I ⊗ `(+1)′,
EF = I ⊗ `(−1)′ ⊗ I, EBk = I ⊗ `(+1)′ ⊗ I,
EB = `(−1)′ ⊗ I ⊗ I, ET = `(+1)′ ⊗ I ⊗ I, (8.92)
The differentiation operators in equations (6.79) and (6.82) are written as
Dξ = I ⊗ I ⊗ ∂ ˜
∂ξ(ξ)′, Dη = I ⊗ ∂ ˜
∂η(η)′ ⊗ I, Dζ =
∂ ˜
∂ζ(ζ)′ ⊗ I ⊗ I, (8.93)
and from equations (6.80) and (6.83),
DLξ = I ⊗ I ⊗ ∂ ˜
0
∂ξ(ξ), DR
ξ = I ⊗ I ⊗ ∂ ˜P+2
∂ξ(ξ),
DFη = I ⊗ ∂ ˜
0
∂η(η)⊗ I, DBk
η = I ⊗ ∂ ˜P+2
∂η(η)⊗ I,
DBζ =
∂ ˜0
∂ζ(ζ)⊗ I ⊗ I, DT
ζ =∂ ˜
P+2
∂ζ(ζ)⊗ I ⊗ I. (8.94)
These operators can then be used to write Kronecker product formulations of the
various Jacobians.
Discontinuous Flux Jacobians
Consider expanding the discontinuous flux Jacobians,∂fdiele
∂uele, into block diagonals
with each diagonal containing Nspts1D values corresponding to the collinear points
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 113
along the direction di ∈ ξ, η, ζ,
∂fξele
∂uele
=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
∂fξele
∂uele
∣∣∣∣j,k
,
∂fηele
∂uele
=
Nspts1D∑k=1
Nspts1D∑i=1
eke′k ⊗
∂fηele
∂uele
∣∣∣∣i,k
⊗ eie′i,
∂fζele
∂uele
=
Nspts1D∑j=1
Nspts1D∑i=1
∂fζele
∂uele
∣∣∣∣i,j
⊗ eje′j ⊗ eie′i. (8.95)
An example of a product between a polynomial differentiation operator from Eq. (8.93)
and a discontinuous flux Jacobian is given below,
Dξ∂fξele
∂uele
=
(I ⊗ I ⊗ ∂ ˜
∂ξ(ξ)′
)Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
∂fξele
∂uele
∣∣∣∣j,k
,
=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
(∂ ˜
∂ξ(ξ)′
∂fξele
∂uele
∣∣∣∣j,k
), (8.96)
where the mixed product property of Kronecker products is invoked. The same type
of operation can also be performed on the other products. Substituting the Kronecker
product operator from Eq. (8.33) produces the following,
Dξ∂fξele
∂uele
=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
(K
∂fξele
∂uele
∣∣∣∣j,k
),
Dη∂fηele
∂uele
=
Nspts1D∑k=1
Nspts1D∑i=1
eke′k ⊗
(K
∂fηele
∂uele
∣∣∣∣i,k
)⊗ eie′i,
Dζ∂fζele
∂uele
=
Nspts1D∑j=1
Nspts1D∑i=1
(K
∂fζele
∂uele
∣∣∣∣i,j
)⊗ eje′j ⊗ eie′i. (8.97)
This operation can be repeated for the discontinuous flux Jacobians,∂fdiele
∂qdjele
.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 114
Common Interface Jacobians
As an example, consider expanding the following common interface Jacobian into
block diagonals,
∂f I,Lξele
∂uLele
∗
=
Nspts1D∑j=1
eje′j ⊗
∂f I,Lξele
∂uLele
∗∣∣∣∣∣j
. (8.98)
The product between the polynomial differentiation operator from Eq. (8.94), the
common interface Jacobian and the polynomial extrapolation operator from Eq. (8.92)
is given below,
DLξ
∂f I,Lξele
∂uLele
∗
EL =
(I ⊗ I ⊗ ∂ ˜
0
∂ξ(ξ)
)Nspts1D∑j=1
eje′j ⊗
∂f I,Lξele
∂uLele
∗∣∣∣∣∣j
⊗ 1
(I ⊗ I ⊗ `(−1)′) ,
=
Nspts1D∑j=1
eje′j ⊗
∂f I,Lξele
∂uLele
∗∣∣∣∣∣j
⊗
(∂ ˜
0
∂ξ(ξ) `(−1)′
), (8.99)
where the mixed product property of Kronecker products is invoked and a scalar 1 is
placed in the appropriate location to complete the product. The Kronecker product
operator from Eq. (8.33) can now be substituted so that,
DLξ
∂f I,Lξele
∂uLele
∗
EL =
Nspts1D∑j=1
eje′j ⊗
∂f I,Lξele
∂uLele
∗∣∣∣∣∣j
⊗KL. (8.100)
This type of operation is repeated for all products involving common interface Jaco-
bians.
Neighbor Contribution Jacobians
As an example, consider expanding J−1′
(dj,ξ)eleNinto block diagonal components as was
performed on the discontinuous flux Jacobians,
J−1′
(dj,ξ)eleN=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗ J
−1′
(dj,ξ)eleN
∣∣∣j,k, dj ∈ Dx. (8.101)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 115
Now consider the left face of a neighbor. The neighbor contribution Jacobians can
be expressed as
∂qLdjeleN
∂uI,LeleN
= EL J−1′
(dj,ξ)eleNDLξ , dj ∈ Dx
= (I ⊗ I ⊗ `(−1)′)
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗ J
−1′
(dj,ξ)eleN
∣∣∣j,k
(I ⊗ I ⊗ ∂ ˜0
∂ξ(ξ)
),
=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
(`(−1)′ J−1′
(dj,ξ)eleN
∣∣∣j,k
∂ ˜0
∂ξ(ξ)
), (8.102)
where equations (8.94) and (8.92) are used and the mixed product property of Kro-
necker products is invoked. The final result after utilizing the Kronecker product
operator in Eq. (8.33) is
∂qLdjeleN
∂uI,LeleN
=
Nspts1D∑k=1
Nspts1D∑j=1
Nspts1D∑i=1
eke′k ⊗ eje′j ⊗KL
i,i J−1′
(dj,ξ)eleN
∣∣∣i,j,k
, dj ∈ Dx,
(8.103)
This type of operation is repeated for all products involving neighbor contribution Ja-
cobians. It’s also important to note that the resulting Jacobians need to be permuted
to match the local element flux points.
Final Result
By utilizing the Kronecker product formulations in equations (8.93), (8.97) and
(8.100), Eq. (8.41) is rewritten as
∇δ ·
(∂fele
∂uele
)= Kξ
(∂fξele
∂uele
)+Kη
(∂fηele
∂uele
)+Kζ
(∂fζele
∂uele
), (8.104)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 116
where
Kξ(∂fξele
∂uele
)=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
(K
∂fξele
∂uele
∣∣∣∣j,k
)+
Nspts1D∑j=1
eje′j ⊗
∂f I,Lξele
∂uLele
∗∣∣∣∣∣j
⊗KL +∂f I,Rξele
∂uRele
∗∣∣∣∣∣j
⊗KR
,
Kη(∂fηele
∂uele
)=
Nspts1D∑k=1
Nspts1D∑i=1
eke′k ⊗
(K
∂fηele
∂uele
∣∣∣∣i,k
)⊗ eie′i+
Nspts1D∑j=1
eje′j ⊗
KL ⊗∂f I,Fηele
∂uFele
∗∣∣∣∣∣j
+KR ⊗∂f I,Bkηele
∂uBkele
∗∣∣∣∣∣j
,
Kζ(∂fζele
∂uele
)=
Nspts1D∑j=1
Nspts1D∑i=1
(K
∂fζele
∂uele
∣∣∣∣i,j
)⊗ eje′j ⊗ eie′i+
Nspts1D∑i=1
(KL ⊗
∂f I,Bζele
∂uBele
∗∣∣∣∣∣i
+KR ⊗∂f I,Tζele
∂uTele
∗∣∣∣∣∣i
)⊗ eie′i, (8.105)
and Eq. (8.42) is rewritten as
∇δ·
(∂fele
∂qele
∂qele
∂uele
)=
∑dj∈x,y,z
(Kξ(∂fξele
∂qdjele
)+Kη
(∂fηele
∂qdjele
)+Kζ
(∂fζele
∂qdjele
))(J−1′
(dj,ξ)eleKξ(∂qξele
∂uele
)+ J−1′
(dj,η)eleKη(∂qηele
∂uele
)+ J−1′
(dj,ζ)eleKζ(∂qζele
∂uele
)), (8.106)
where
Kξ(∂fξele
∂qdjele
)=
Nspts1D∑k=1
Nspts1D∑j=1
eke′k ⊗ eje′j ⊗
(K
∂fξele
∂qdjele
∣∣∣∣j,k
)+
Nspts1D∑j=1
eje′j ⊗
∂f I,Lξele
∂qLdjele
∣∣∣∣∣j
⊗KL +∂f I,Rξele
∂qRdjele
∣∣∣∣∣j
⊗KR
,
Kη(∂fηele
∂qdjele
)=
Nspts1D∑k=1
Nspts1D∑i=1
eke′k ⊗
(K
∂fηele
∂qdjele
∣∣∣∣i,k
)⊗ eie′i+
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 117
Nspts1D∑j=1
eje′j ⊗
KL ⊗∂f I,Fηele
∂qFdjele
∣∣∣∣∣j
+KR ⊗∂f I,Bkηele
∂qBkdjele
∣∣∣∣∣j
,
Kζ(∂fζele
∂qdjele
)=
Nspts1D∑j=1
Nspts1D∑i=1
(K
∂fζele
∂qdjele
∣∣∣∣i,j
)⊗ eje′j ⊗ eie′i+
Nspts1D∑i=1
(KL ⊗
∂f I,Bζele
∂qBdjele
∣∣∣∣∣i
+KR ⊗∂f I,Tζele
∂qTdjele
∣∣∣∣∣i
)⊗ eie′i,
Kξ(∂qξele
∂uele
)=I ⊗ I ⊗K +
Nspts1D∑j=1
eje′j ⊗
∂uI,Lele
∂uLele
∣∣∣∣∣j
⊗KL +∂uI,Rele
∂uRele
∣∣∣∣∣j
⊗KR
,
Kη(∂qηele
∂uele
)=I ⊗K ⊗ I +
Nspts1D∑j=1
eje′j ⊗
KL ⊗ ∂uI,Fele
∂uFele
∣∣∣∣∣j
+KR ⊗ ∂uI,Bkele
∂uBkele
∣∣∣∣∣j
,
Kζ(∂qζele
∂uele
)=K ⊗ I ⊗ I +
Nspts1D∑i=1
(KL ⊗ ∂uI,Bele
∂uBele
∣∣∣∣∣i
+KR ⊗ ∂uI,Tele
∂uTele
∣∣∣∣∣i
)⊗ eie′i,
(8.107)
Sparsity Pattern
For the three-dimensional formulation, there are three distinct sparsity patterns that
emerge: Kξ (x), Kη (x) and Kζ (x). Figure 8.4 shows an example of the sparsity
pattern for these structures for a polynomial order of P = 2. The first term, ∇δ ·(∂fele
∂uele
), defined in Eq. 8.104 is the summation of these three structures. Figure 8.5
shows the sparsity pattern for ∇δ ·(∂fele
∂uele
)for P = 2.
The second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), defined in Eq. 8.106 can be expanded into
nine terms where three terms are like-terms and six terms are cross-terms. The three
like-terms do not change the sparsity pattern of the matrix. The cross-terms can be
split into three more structures as shown in Figure 8.6. The summation of all the
terms produce the sparsity pattern for ∇δ ·(∂fele
∂uele
)as shown in Figure 8.7 for P = 2.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 118
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 81
(a) Kξ (x)
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 81
(b) Kη (x)
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 81
(c) Kζ (x)
Figure 8.4: The three distinct sparsity patterns used in the three-dimensional Kro-necker product formulation of the element local Jacobian for DFR, P = 2.
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 189
Figure 8.5: The summation of the three terms in Figure 8.4 produces the sparsity
pattern for the first term, ∇δ ·(∂fele
∂uele
), in the three-dimensional element local Jacobian
for DFR, P = 2.
8.2.4 Time Complexity
When solving the advection or Euler equations, the first term, ∇δ ·(∂fele
∂uele
), is the
only computation needed to form the element local Jacobians. The time complexity
for the original formulation can be found by considering the most expensive term in
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 119
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 243
(a) Kξ (x)Kη (y)Kη (x)Kξ (y)
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 243
(b) Kξ (x)Kζ (y)Kζ (x)Kξ (y)
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 243
(c) Kη (x)Kζ (y)Kζ (x)Kη (y)
Figure 8.6: The six cross-terms in the second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), produce three dis-
tinct sparsity patterns used in the three-dimensional Kronecker product formulationof the element local Jacobian for DFR, P = 2.
5 10 15 20 25
5
10
15
20
25
Number of nonzeros: 513
Figure 8.7: The summation of all terms produces the final sparsity pattern for thethree-dimensional element local Jacobian for DFR, P = 2.
Eq. (8.39),
DLξ
∂f I,Lξele
∂uLele
∗
EL → P d P d−1 P d → O(P 3d−1
). (8.108)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 120
For the two- and three-dimensional case, the time complexity of computing this term,
is Nspts2DNspts1DNspts2D ≈ O (P 5) and NsptsNspts2DNspts ≈ O (P 8), respectively. This
leads to the general time complexity of O(P 3d−1
). The most expensive term in the
Kronecker product formulation in Eq.(8.88) or Eq.(8.104) is
∑ ∂f I,Lξele
∂uLele
∗
KL → P d−1 P P → O(P d+1
). (8.109)
For the two- and three-dimensional case, the time complexity of computing this term
multiple times, is N3spts1D ≈ O (P 3) and N4
spts1D ≈ O (P 4), respectively. This leads to
the time complexity O(P d+1
)which is significantly less than the original form.
For the advection-diffusion or Navier-Stokes equations, the second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), is required to form the element local Jacobians. In this case, the original
time complexity is dictated by the matrix multiplication between the two terms in
Eq. (8.40). As an example, consider
Dξ∂fξele
∂uele
Dξ → P d P d P d → O(P 3d), (8.110)
which is Nspts2DNspts2DNspts2D ≈ O (P 6) in 2D and NsptsNsptsNspts ≈ O (P 9) in 3D.
This leads to the time complexity O(P 3d). Through the use of Kronecker products,
this matrix multiplication is expanded into multiple terms. The time complexity is
then dictated by a sum of matrix multiplications, for example,
∑K
∂fξele
∂uele
K → P d−1 P P P → O(P d+2
). (8.111)
After accounting for the sums, the time complexity is N4spts1D ≈ O (P 4) for 2D and
N5spts1D ≈ O (P 5) for 3D. This leads to the the general form O
(P d+2
).
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 121
8.3 Extension to Fluid Flow
Following from Eq. (6.100), the DFR formulation of the semi-discrete Euler or Navier-
Stokes equations in each element can be written as
Rele = R(Uele,UeleN) = −V −1ele ∇
δ · Fele. (8.112)
Differentiating with respect to the discontinuous solution vector, Uele, produces the
element local Jacobian matrix,
∂Rele
∂Uele
= −V −1ele
∂
∂Uele
(∇δ · Fele
), (8.113)
where ∂Rele
∂Ueleis a matrix of size (Nspts2DNvars×Nspts2DNvars) or (NsptsNvars×NsptsNvars)
depending on whether the problem is two-dimensional or three-dimensional. In the
case of the Euler equations, the second derivative doesn’t exist and the construction
of the solution Jacobians and viscous flux Jacobians can be avoided.
As stated in section 6.4.3, all DFR operators become block diagonal matrices and
act on each conservation equation separately. The discontinuous and common inter-
face flux Jacobians,∂fdiele
∂uele,∂fdiele
∂qdjele
,∂fI,facediele
∂ufaceele
,∂fI,facediele
∂qfacedjele
and∂fI,facediele
∂qfaceNdjeleN
become block matrices
with a diagonal matrix for each variable pair. Lastly, since the common interface solu-
tion values have no dependence between conservative variables for interior interfaces,
the Jacobians of the numerical solution gradient and neighboring numerical solution
gradients are also block diagonal matrices. After evaluating the block matrix multi-
plications, the element local Jacobian matrix on interior elements can be constructed
by a direct application of the techniques in section 8.2 for each variable pair.
For the Navier-Stokes equations, the exact derivatives of the flux functions, Finv(U)
and Fvisc(U,Q), with respect to U and Q are ∂Finv
∂U(U), ∂Fvisc
∂U(U,Q) and ∂Fvisc
∂Q(U,Q).
These are given in reference [92].
Wavespeed Jacobian
Since the wavespeed used for the Rusanov flux was redefined in section 6.4.3, the
derivative of the wavespeed is also redefined. The derivative of the term with the
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 122
wavespeed can be split into a piecewise function,
∂
∂U−(|λ(U−, U+)|(U+ − U−)
)= (U+ − U−)
∂
∂U−(|λ(U−, U+)|
)− |λ(U−, U+)| I,
∂
∂U+
(|λ(U−, U+)|(U+ − U−)
)= (U+ − U−)
∂
∂U+
(|λ(U−, U+)|
)+ |λ(U−, U+)| I,
(8.114)
where I represents an identity matrix of size (Nvars ×Nvars) and
∂
∂U−(|λ(U−, U+)|
)=
∂λ∂U
(U−) if λ(U−) > λ(U+),
0 otherwise,
∂
∂U+
(|λ(U−, U+)|
)=
∂λ∂U
(U+) if λ(U+) > λ(U−),
0 otherwise.
The derivative of the wavespeed, λ(U) = |V n(U)|+ c(U) is computed as
∂λ
∂U(U) =
−sgn(V n)V
n
ρ− c
2ρ+ γ(γ−1)(u2+v2)
4ρc
sgn(V n)nxρ− γ(γ−1)u
2ρc
sgn(V n)nyρ− γ(γ−1)v
2ρcγ(γ−1)
2ρc
, (8.115)
for the two-dimensional case and
∂λ
∂U(U) =
−sgn(V n)Vn
ρ− c
2ρ+ γ(γ−1)(u2+v2+w2)
4ρc
sgn(V n)nxρ− γ(γ−1)u
2ρc
sgn(V n)nyρ− γ(γ−1)v
2ρc
sgn(V n)nzρ− γ(γ−1)w
2ρcγ(γ−1)
2ρc
, (8.116)
for the three-dimensional case where sgn(x) is the signum function and nx, ny, nz are
the x, y and z components of the unit normal vector n.
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 123
Boundary Flux Jacobians
Boundary conditions for the Euler and Navier-Stokes equations can be found in ap-
pendix A. The boundary flux Jacobians can be computed by taking the derivative
of the boundary flux with respect to the extrapolated solution and the extrapolated
solution gradient. Eq. (A.1) is differentiated to obtain,
∂F b
∂U(U,Q) =
∂F n
∂U
(U b, Qb
) ∂U b
∂U(U) +
∂F n
∂Qb
(U b, Qb
) ∂Qb
∂U(U,Q) , (8.117)
∂F b
∂Q(U,Q) =
∂F n
∂Qb
(U b, Qb
) ∂Qb
∂Q(U,Q) , (8.118)
where F n is the flux normal to the face. The normal flux Jacobians evaluated using
the solution and solution gradient at the boundary are computed using the the exact
derivatives of the flux functions. The boundary Jacobians, ∂Ub
∂U(U), ∂Qb
∂U(U,Q) and
∂Qb
∂Q(U,Q) are found by taking the exact derivative of the boundary conditions in
appendix A. These are derived using Mathematica [43] and the results are given in
reference [92].
Element local Jacobian formulation on boundary elements
For elements with boundary conditions on a face, a block matrix multiplication among
variable pairs must be performed between the flux Jacobians and the Jacobian of the
numerical solution gradient. The Jacobian of the discontinuous flux Jacobian on
boundary elements is written as[∂
∂uele
(fdi (uele, qele))
]vi,vj
=
[∂fdiele
∂uele
]vi,vj
+
∑dj∈x,y,z
Nvars∑vk=1
[∂fdiele
∂qdjele
]vi,vk
[∂
∂uele
(qdjele)
]vk,vj
,
for di ∈ x, y, z, vi, vj = 1, 2, . . . , Nvars, (8.119)
CHAPTER 8. THE ELEMENT LOCAL JACOBIAN MATRIX 124
and the Jacobian of the boundary flux is written as
[∂
∂uele
(f I,∂Ω
ele
)]vi,vj
=
[∂f I,∂Ω
ele
∂u∂Ωele
]vi,vj
E∂Ω+
∑d∈x,y,z
Nvars∑vk=1
[∂f I,∂Ω
ele
∂q∂Ωdele
]vi,vk
E∂Ω
[∂
∂uele
(qdele)
]vk,vj
,
for vi, vj = 1, 2, . . . , Nvars, (8.120)
where vi, vj refer to a single component of an (Nvars × Nvars) block matrix. The
sparsity of the block matrix for the Jacobian of the numerical solution gradient is
dependent on the boundary condition used for the common solution,
[∂
∂uele
(uI,∂Ω
ele
)]vi,vj
=
[∂uI,∂Ω
ele
∂u∂Ωele
]vi,vj
E∂Ω, vi, vj = 1, 2, . . . , Nvars, (8.121)
where∂uI,∂Ω
ele
∂u∂Ωele
=∂ub
∂u
(u∂Ω
ele
). (8.122)
Chapter 9
Numerical Experiments
In this chapter, five test cases are studied in order to verify the implementation of the
implicit, high-order DFR method on unstructured meshes for GPUs. The first test
case deals with the inviscid flow over a bump where the rate of convergence of entropy
error is verified for a polynomial order of P = 2. For the second test case, inviscid flow
over the NACA 0012 airfoil is simulated for a polynomial order of P = 4 and a grid
converged lift coefficient is obtained which compares well with results from Overflow
and CFL3D. In the third test case, convection of an isentropic vortex is performed
using the ESDIRK3 and ESDIRK4 schemes and the correct order of accuracy for
time integration is obtained. In the fourth test case, laminar flow over a Joukowski
airfoil is simulated for various polynomial orders and a rate of convergence of at least
2P is found for the drag coefficient. The last test case is an unsteady simulation of
viscous flow over a half cylinder where it is shown that the Strouhal number for the
implicit method compares well with the Strouhal number for the explicit method on
ZEFR and PYFR [97].
The vector `1 norm of the residual for the continuity equation is computed and is
used to track the convergence of all steady-state simulations. A converged solution
is assumed if the residual drops by 10 orders of magnitude from the initial residual.
100 block iterations are used per pseudo time step and local time stepping is used
in all cases. For simulations that start from uniform flow, the CFL from section 3.2
is exponentially increased until a CFL of around 10000 is reached. For the unsteady
125
CHAPTER 9. NUMERICAL EXPERIMENTS 126
simulations, each stage uses a fixed number of one pseudo time step and 200 block
iterations where the stage residual drops by at least 7 orders of magnitude. It’s
important to note that the solution diverges without the relaxation from the pseudo
time step (i.e. Newton iterations do not work). All meshes are constructed with
high-order elements.
9.1 Inviscid flow over a bump
The first test case involves the solution of subsonic flow over a smooth Gaussian
bump in a channel. The inflow Mach number is set to 0.5 with zero angle of attack.
A characteristic Riemann invariant farfield boundary condition is used for the in-
flow and outflow boundaries and a slip-wall or symmetry boundary condition is used
for the surface of the bump and the top boundary. These are described in appen-
dices A.3 and A.1, respectively. The L2 functional norm of the entropy error is used
to determine the accuracy of the solution and is given by
‖eS‖L2(Ω) =
√√√√√∫
Ω
(pp∞
(ρ∞ρ
)γ− 1)2
dV∫ΩdV
, (9.1)
where the integrals are approximated numerically using Gaussian quadrature with 10
quadrature points in each element. A full description of this problem can be found
in the first international high-order workshop [1].
The entropy error is computed for a series of meshes on a single GPU using
the pseudo time stepping method and the two color, MCGS block iterative method
described in section 7.2. Figure 9.1 shows the entropy error vs. length scale h =1√
nDoF. The results show a rate convergence of 2.89 for the fixed polynomial order of
P = 2 which is close to the theoretical results for a linear, steady-state case: P + 1
[11]. The results also show that increasing the polynomial order on a (48× 16) mesh
reduces the entropy error while maintaining less degrees of freedom than a more
refined mesh. The (48 × 16) quadrilateral mesh and final pressure contours for a
polynomial order of P = 2 are shown in Figure 9.2.
CHAPTER 9. NUMERICAL EXPERIMENTS 127
10−2
10−9
10−8
10−7
10−6
10−5
10−4
h = 1√nDoF
‖eS‖ L
2(Ω
)
P = 2(48× 16)
Order 3
Figure 9.1: Entropy error vs. 1√nDoF
for inviscid flow over a bump, implicit pseudotime stepping with two color MCGS.
-1
-0.75
-0.5
-0.25
0
Cp
-1.19
0.212
Figure 9.2: (48 × 16) quadrilateral mesh and pressure contours, inviscid flow over abump, implicit pseudo time stepping with two color MCGS, P = 2.
CHAPTER 9. NUMERICAL EXPERIMENTS 128
9.2 Inviscid flow over the NACA 0012 airfoil
The second test case involves the solution of subsonic flow over the NACA 0012 airfoil
at 1.25 degree angle of attack. The inflow Mach number is set to 0.5, a characteristic
Riemann invariant farfield boundary condition from appendix A.3 is used on the
farfield and a slip wall boundary condition from appendix A.1 is used on the surface
of the airfoil. The lift coefficient is used to determine the accuracy of the simulation
and is compared to results from Vassberg and Jameson [86]. A complete description
is also found in this reference.
The lift coefficient is computed on a series of O-meshes and mixed, quadrilateral
and triangle meshes using the pseudo time stepping method and the two and four
color, MCGS block iterative method described in section 7.2 on a single GPU. Trian-
gles are generated using an edge-collapsing method [80]. All meshes have a far field
located 100 chord lengths away. Figure 9.3 shows the lift coefficient vs. length scale
h = 1√nDoF
for three different codes. Degrees of freedom for Overflow and CFL3D
are assumed to be equal to the number of mesh elements. The figure shows that
ZEFR is able to obtain a lift coefficient that is relatively close to the results from
Overflow and CFL3D with less degrees of freedom. It’s also important to note that
the coarsest mixed mesh was able to obtain a fairly accurate lift coefficient with less
degrees of freedom by coarsening the far field regions. The (32 × 32) quadrilateral
and 764 mixed, quadrilateral and triangle mesh are shown in Figure 9.4 along with
final pressure contours for a polynomial order of P = 4. It’s important to note that
the mixed mesh cases maintain the ability to run large time steps, unconstrained by
the CFL limit. This is a notable result, as a strong CFL constraint was observed to
limit the utility of the collapsed-edge triangular elements when coupled with explicit
time stepping [80].
9.3 Convection of an isentropic vortex
In this section, a two-dimensional isentropic vortex is convected in inviscid flow across
a square box domain, Ω = (x, y)| − 20 < x, y < 20, where the freestream Mach
CHAPTER 9. NUMERICAL EXPERIMENTS 129
10−4
10−3
10−2
0.178
0.1785
0.179
0.1795
0.18
0.1805
0.181
0.1815
0.182
h = 1√
nDoF
Cl
ZEFR - Quad, P = 4
ZEFR - Mixed, P = 4
OverflowCFL3D
Figure 9.3: Lift coefficient vs. 1√nDoF
for inviscid flow over the NACA 0012 airfoil,implicit pseudo time stepping with two and four color MCGS
-0.4
0
0.4
0.8
Cp
-0.72
1
(a) (32× 32) O-mesh, 2 color MCGS
-0.4
0
0.4
0.8
Cp
-0.72
1
(b) 764 mixed mesh, 4 color MCGS
Figure 9.4: Mesh and pressure contours for inviscid flow over the NACA 0012 airfoil,implicit pseudo time stepping, P = 4.
CHAPTER 9. NUMERICAL EXPERIMENTS 130
number is set to 0.4. The initial condition follows the description in [87, 12] and con-
sists of uniform flow in the y-direction superposed with an isentropic vortex centered
at the origin whereρ
u
v
p
(x, y, 0) =
[1− γ−1
2(ωMRf)2
] 1γ−1
−ωfy1 + ωfx
1γM2ρ(x, y, 0)γ
, (9.2)
ω =Γ
2πR(9.3)
f = exp
(1− (x2 + y2)
2R2
). (9.4)
Following directly from [87, 12], Γ = 13.5 is the strength of the vortex and R = 1.5
is a measure of the radius of the vortex. Periodic boundary conditions are used on
the exterior boundaries of the mesh so that the exact solution can be determined
by the periodic propagation of the vortex in the y-direction. The periodic boundary
condition creates a lattice of infinite vortices which interact with each other through
dispersion error. This test case is used to verify the implementation of the unsteady
implicit time stepping method and is commonly used to test high-order time integra-
tion schemes [89, 13].
The error in the solution is computed after 5 periods on a uniform Cartesian grid
of (180 × 180) quadrilaterals where a polynomial order of P = 3 is used. In this
case, a large time step is used to ensure that the error due to time integration is
much larger than spatial discretization error. The implicit ESDIRK3 and ESDIRK4
schemes described in section 7.1.2 are used to advance the solution in time on 16
NVIDIA Tesla K80 GPUs. Pseudo time stepping and MCGS are used to converge
each stage of the DIRK scheme. An example of the density contours for the isentropic
vortex is shown in Figure 9.5. The time steps used for each case are given in Table 9.1.
Figure 9.6 shows the error in density after 5 periods for all cases described above.
CHAPTER 9. NUMERICAL EXPERIMENTS 131
Figure 9.5: Initial density contours for convection of an isentropic vortex.
ESDIRK3 ESDIRK4
Case 1 0.125 0.5Case 2 0.0625 0.25Case 3 0.03125 0.125
Table 9.1: Numerical time steps, ∆t, used for the convection of an isentropic vortex.
The figure shows that the same error can be obtained with the ESDIRK4 scheme
with a much larger time step compared to the ESDIRK3 scheme. Linear regression
is used to obtain the rates of convergence in Figure 9.6 and the results are shown in
Table 9.2. For this particular test case, the ESDIRK schemes are able to maintain
ESDIRK3 ESDIRK4
Case 1 8.565e-03 1.426e-02Case 2 1.116e-03 6.291e-04Case 3 1.416e-04 5.049e-05
Order 2.959± 0.006 4.1± 0.144
Table 9.2: Density error and rate of convergence for different time integration schemes,convection of an isentropic vortex, implicit dual time stepping with two color MCGS.
CHAPTER 9. NUMERICAL EXPERIMENTS 132
10-2
10-1
100
10-4
10-3
10-2
Figure 9.6: Density error vs. numerical time step for convection of an isentropicvortex, implicit dual time stepping with two color MCGS.
the correct rates of convergence for time steps which are significantly larger than than
the maximum stable time step for an explicit method. For example, in this particular
case the maximum stable time step for the RK45 scheme was on the order of 10−3.
9.4 Laminar flow over a Joukowski airfoil
In this section, steady, laminar flow over a symmetric Joukowski airfoil at a zero
degree angle of attack is simulated where the freestream Mach number is set to 0.5,
the Reynolds number based on the chord is set to 1000, the heat capacity ratio is set
to γ = 1.4, the Prandtl number is set to 0.72 and the dynamic viscosity remains fixed.
A characteristic Riemann invariant farfield boundary condition is used on the farfield
and a no-slip adiabatic wall boundary condition is used on the surface of the airfoil.
These are described in appendices A.3 and A.2, respectively. This benchmark case
comes from the 4th International Workshop on High-order CFD Methods [2] and is
used to verify the high-order DFR implementation of the Navier-Stokes equations by
computing rates of convergence on drag coefficient.
The drag coefficient is computed on a series of structured quadrilateral c-meshes
CHAPTER 9. NUMERICAL EXPERIMENTS 133
from [2] using the pseudo time stepping method and the two color, MCGS block
iterative method described in section 7.2 on up to 16 NVIDIA Tesla K80 GPUs
and polynomial orders of P = 1, 2, 3, 4. The meshes used for each case are given
in Table 9.3. The number of degrees of freedom (nDoF ) for each case is found by
P 1 2 3 4
Case 1 (256× 128) (64× 32) (32× 16) (24× 12)Case 2 (512× 256) (128× 64) (48× 24) (32× 16)Case 3 (1024× 512) (256× 128) (64× 32) (48× 24)
Table 9.3: (x×y) Joukowski airfoil meshes used for each polynomial order where x isthe number of elements along the airfoil and wake and y is the number of elements inthe normal direction. The number of elements in x is split evenly between the airfoiland wake. The total number of elements, Neles, is given by the product of x and y.
multiplying Neles by Nspts2D where Nspts2D = (P + 1)2 is the nDoF per element.
Figure 9.7 shows the drag coefficient vs. length scale h = 1√nDoF
for each case. A
10−3
10−2
0.1218
0.122
0.1222
0.1224
0.1226
0.1228
h =1
√
nDoF
Cd
P = 1P = 2P = 3P = 4Reference
Figure 9.7: Drag coefficient vs. h = 1√nDoF
for laminar flow over a Joukowski airfoil,implicit MCGS.
machine zero lift and a drag coefficient of Cd = 0.1219 is obtained that is consistent
CHAPTER 9. NUMERICAL EXPERIMENTS 134
with the findings from multiple CFD codes from the high-order workshop [29]. The
most efficient case in terms of nDoF is the P = 4, case 3 simulation on the (48× 24)
c-mesh. The Mach and pressure contours for this case are shown in Figure 9.8.
(a) Mach Contours (b) Pressure Contours
Figure 9.8: Laminar flow over Joukowski airfoil, implicit pseudo time stepping withtwo color MCGS, P = 4, (48× 24) mesh.
A drag coefficient error is also computed by converging a P = 4 case with (256×128) elements and using the drag coefficient as a reference as shown in Figure 9.7.
The drag coefficient error is then plotted against h = 1√nDoF
in Figure 9.9. The
figure shows that the same error can be obtained with less degrees of freedom. Linear
regression is used to obtain the rates of convergence in Figure 9.9 and Table 9.4. The
P 1 2 3 4
Case 1 2.309e-04 2.143e-04 3.961e-04 6.243e-04Case 2 5.284e-05 9.015e-06 3.866e-05 3.814e-05Case 3 1.257e-05 4.432e-07 3.706e-06 1.351e-06
Order 2.099± 0.006 4.46± 0.025 6.7± 0.253 8.8± 0.155
Table 9.4: Drag coefficient error and rate of convergence for each polynomial order,laminar flow over a Joukowski airfoil, implicit pseudo time stepping with two colorMCGS.
results show that the DFR scheme is able to obtain a rate of convergence of at least
2P for the drag coefficient which is expected for a integral functional quantity in a
CHAPTER 9. NUMERICAL EXPERIMENTS 135
10−3
10−2
10−7
10−6
10−5
10−4
10−3
h =1
√
nDoF
CdError
P = 1
P = 2
P = 3
P = 4
Line fit
Figure 9.9: Drag coefficient error vs. h = 1√nDoF
for laminar flow over a Joukowskiairfoil, implicit pseudo time stepping with two color MCGS.
finite element method [73, 31]. This is also consistent with the results from other
finite element methods shown in the high-order workshop [29].
9.5 Viscous flow over a half cylinder
In this section, unsteady, viscous flow over a half cylinder is studied where the
freestream Mach number is set to 0.2, the Reynolds number based on the diame-
ter is set to 1000, the heat capacity ratio is set to γ = 1.4, the Prandtl number is set
to 0.72 and the dynamic viscosity remains fixed. As before, a characteristic Riemann
invariant farfield boundary condition is used on the farfield and a no-slip adiabatic
CHAPTER 9. NUMERICAL EXPERIMENTS 136
wall boundary condition is used on the surface of the half cylinder. The half cylinder
is used to both verify the 3D Navier-Stokes discretization of the DFR method and
the unsteady implicit time stepping method.
The flow simulation over a half cylinder with a span of 0.5 times the diameter uses
a polynomial order of P = 3 and a mesh containing 20, 292 unstructured hexahedral
elements, shown in Figure 9.10, such that the total number of degrees of freedom is
approximately 1.3 million.
The explicit RK45 and implicit ESDIRK4 schemes described in sections 7.1.1
and 7.1.2, respectively, are used to advance the solution in time on 12 NVIDIA Tesla
K80 GPUs. Pseudo time stepping and block Jacobi are used to converge each stage for
the implicit method. An example of the instantaneous isosurfaces of density colored
by Mach number is given in Figure 9.11.
Table 9.5 shows the lift and drag coefficients and the Strouhal number for three
cases where the span times π times the diameter is used as the reference area. The ta-
Case Studies CL CD St
PYFR - RK54 7e-05 ± 0.112 1.30 ± 0.084 0.224ZEFR - RK54 1e-03 ± 0.152 1.3 ± 0.157 0.222ZEFR - ESDIRK4 3e-03 ± 0.155 1.3 ± 0.165 0.220
Table 9.5: Lift and drag coefficients and the Strouhal number for viscous flow over ahalf cylinder, Re = 1000, P = 3.
ble compares the results generated through the explicit RK45 time stepping method in
PyFR [97] with the the RK45 and ESDIRK4 methods in ZEFR using the same mesh.
The results show a reasonable agreement between the three simulations. Figure 9.12
shows the time history of the lift and drag coefficients for all three simulations.
CHAPTER 9. NUMERICAL EXPERIMENTS 137
(a) Full domain
(b) Close up
Figure 9.10: Half cylinder mesh with 20,292 unstructured hexahedral elements. Hex-ahedral elements are created by extruding the quadrilaterals shown above.
CHAPTER 9. NUMERICAL EXPERIMENTS 138
Figure 9.11: Instantaneous isosurfaces of density colored by Mach number for viscousflow over a half cylinder, Re = 1000, P = 3.
0 10 20 30 40 50 60 70 80 90 100−1
−0.8
−0.6
−0.4
−0.2
0
0.2
0.4
0.6
0.8
1
t
CL
PYFR - RK54
ZEFR - RK54
ZEFR - ESDIRK
(a) Lift Coefficient
0 10 20 30 40 50 60 70 80 90 1001
1.1
1.2
1.3
1.4
1.5
1.6
1.7
1.8
1.9
2
t
CD
PYFR - RK54
ZEFR - RK54
ZEFR - ESDIRK
(b) Drag Coefficient
Figure 9.12: Time history of lift and drag coefficient for viscous flow over a halfcylinder, Re = 1000, P = 3.
Chapter 10
Implementation
The methods described in the previous chapters have been implemented within an
existing in-house, high-order, compressible flow solver for GPU architectures called
ZEFR. An explanation of the implementation of the explicit method along with the
computation of the residual is given in [79]. What follows is the implementation de-
tails of the implicit method. We show that the implicit method proposed does not
require the construction of global Jacobian matrices or any additional MPI commu-
nication compared to an explicit method. We also show that all operations can be
performed in an element-local fashion.
This chapter is split into three sections. Section 10.1 provides an overview of
the steps within each solver for the implicit method. Section 10.2 describes the mesh
coloring algorithm and the modifications to the residual computation for multicolored
Gauss-Seidel (MCGS) block iterative solver. Section 10.3 discusses the steps, data
structures and algorithms needed to construct the element local Jacobians.
10.1 Overview of Solvers
This section provides an overview of the steps within each solver for the implicit
method. We begin by providing the steps for the unsteady solver which utilizes the
dual time stepping technique described in section 7.3.1. We then provide the steps
for the steady-state solver in section 7.2 and conclude with a discussion on the direct
139
CHAPTER 10. IMPLEMENTATION 140
solver used to solve each element local linear system.
10.1.1 Unsteady Solver
A dual time stepping approach for DIRK schemes, described in section 7.3.1, is used
to solve unsteady problems. The main steps for one time step are given below,
1. (ESDIRK) Compute first stage residual, R1
2. Loop over all remaining stages:
(a) Solve modified stage residual problem, Rs(Us) = 0, using steady-state
solver
3. Advance solution using Eq. (7.38)
4. (Ctrl) Compute error from Eq. (7.8)
5. (Ctrl) Estimate next time step using Eq. (7.12)
where (ESDIRK) is used to refer to the first explicit stage in ESDIRK schemes and
(Ctrl) is used to refer to the step size control for DIRK schemes with embedded pairs
in section 7.1.3. If only one Newton iteration or pseudo time step is required for
solving the modified stage residual, it’s possible to use the same LHS matrix for all
stages. This significantly reduces the computation time but limits the physical time
step.
10.1.2 Steady-state Solver
Newton’s method and pseudo time stepping, described in section 7.2, are used to
solve steady-state problems. Both methods can be split into two main sections for
each Newton iteration or pseudo time step.
First, the left-hand side (LHS) matrices are constructed and processed for the
direct solver. The steps in this procedure are as follows,
1. Compute residual, Rmele
CHAPTER 10. IMPLEMENTATION 141
2. (PTS) Compute and/or adapt element local pseudo time step, ∆τmele
3. Compute element local Jacobians,∂Rmele
∂Umele
4. (PTS) Apply pseudo time step to element local Jacobians
5. Process LHS matrices for direct solver
where (PTS) is used to describe the steps which are only applicable to pseudo time
stepping. The residual is computed first in order to obtain the information necessary
to compute the element local Jacobians. The implementation details of the element
local Jacobian and the direct solvers are given in sections 10.3 and 10.1.3, respectively.
The second section of the steady-state solver is the block iterative method. Block
Jacobi and multicolored Gauss-Seidel (MCGS) are both described in section 7.2.2
where the equations are given in (7.30) and (7.37), respectively. The total amount
of iterations is fixed and is usually set between 100–200. The procedure to complete
one block iteration is given below,
1. Compute residual, Rkele or R∗
cele
2. Construct right-hand side (RHS) from Eq. (7.30) or Eq. (7.37)
3. Solve linear system for ∆2Ukele or ∆2Uk
celeusing direct solver
4. Advance solution by adding difference to current solution, Ukele or Uk
cele
where the procedure is repeated for each color for MCGS. More information about
the implementation of MCGS can be found in section 10.2.
10.1.3 Direct Solver
A solver is needed to solve each element local linear system. In this work, LU factor-
ization is used on both the CPU and GPU. A modified procedure where the inverse
of the left-hand side is computed is also used to enhance performance. The procedure
for the direct solver is split into two main parts as shown in section 10.1.2.
CHAPTER 10. IMPLEMENTATION 142
First, after computing the left-hand side for all elements, a batch LU factorization
is performed. If the modified procedure is chosen, a batch inverse is also computed
using the LU factorization and the inverse is stored in a separate data structure.
During the block iteration procedure, the element local systems are solved using
batch forward and backward substitution. If an inverse is computed, a batch matrix-
vector multiplication is used instead.
For the CPU implementation, all batch operations are performed in serial by using
either the TNT/JAMA libraries [74] or the Eigen linear algebra library [33]. For the
GPU implementation, cuBLAS [66] is used for the batch LU factorization, inverse
computation and forward and backward substitution. The kernel used for the batch
matrix-vector multiplication on the GPU is shown in Listing 10.1.
Listing 10.1: CUDA kernel for batched matrix-vector multiplication where
blockDim = (32, 6) and
gridDim = ((M + 32 - 1)/32, std::min((batchCount + 6 - 1)/6, 65535)).
1 __global__
2 void DgemvBatched(const int M, const int N,
3 const double ** Aarray , int lda , const double ** xarray ,
4 int incx , double ** yarray , int incy , int batchCount)
5
6 const unsigned int i = blockDim.x * blockIdx.x +
7 threadIdx.x;
8
9 if (i >= M) return;
10
11 for (unsigned int batch = blockDim.y * blockIdx.y +
12 threadIdx.y; batch < batchCount; batch += gridDim.y *
13 blockDim.y)
14
15 double sum = 0.0;
16 for (unsigned int j = 0; j < N; j++)
17 sum += Aarray[batch ][i + j*lda] *
CHAPTER 10. IMPLEMENTATION 143
18 xarray[batch][j * incx];
19
20 yarray[batch][i * incy] = sum;
21
22 __syncthreads ();
23
24
10.2 Multicolored Gauss-Seidel
This section describes the mesh coloring algorithm and the modifications to the resid-
ual computation for multicolored Gauss-Seidel (MCGS) block iterative solver. We
show that the mesh coloring algorithm is able to obtain the target two colors for a
structured mesh and four colors for an unstructured mesh.
10.2.1 Mesh Coloring
MCGS requires graph coloring in order to have separate sets of elements to advance
sequentially and improve convergence. In this work, a modified vertex coloring algo-
rithm is created where the main requirement is such that no two adjecent elements
have the same color. This is done in order to maximize the amount of data in M and
minimize the amount of data in N for the MCGS splitting method in section 7.2.2.
The algorithm also attempts to distribute the amount of elements used for each color
evenly in order to obtain a near equal distribution. This ensures that the computa-
tion requirement for each color remains about the same and the GPU has the most
amount of work possible per color.
The total number of colors used in the algorithm is set by the user. Ideally, a
minimal amount of colors is desirable in order to maintain larger subproblem sizes
leading to more efficient GPU performance. For two-dimensional meshes, the mini-
mum is two for structured meshes and four for unstructured meshes via the four-color
theorem for planar graphs [7].
CHAPTER 10. IMPLEMENTATION 144
The algorithm used is a modified greedy mesh coloring algorithm. First, a vector
of counts for each color is initialized to zero. Then, the element connectivity graph
is traversed in a breadth-first order. Each element is set to a color with the lowest
count. The color must also be unused by adjacent elements. If all available colors are
used, the algorithm fails.
Examples of the mesh coloring algorithm on two-dimensional structured and un-
structured meshes are shown in Figure 10.1. In these examples, the algorithm was
(a) Channel mesh, 2 color distribution: (384, 384)
(b) NACA0012 mesh,2 color distribution: (512, 512)
(c) Mixed NACA0012 mesh,4 color distribution: (379, 378, 378, 377)
Figure 10.1: Examples of the mesh coloring algorithm.
able to obtain the target two colors for the structured cases and four colors for the
unstructured case. It was also able to distribute the colors evenly as desired. The
mesh coloring is performed in serial on a single process on the CPU. The colors sets
are then partitioned between ranks for MPI simulations.
CHAPTER 10. IMPLEMENTATION 145
10.2.2 Residual Computation per Color
The algorithm used to compute the residual requires coalesced memory access among
elements in order to be efficient on the GPU. Creating staggered color sets and com-
puting the elements within one color set at a time leads to irregular memory access
patterns which is detrimental to the performance. In order to avoid this problem,
elements are shuffled in memory after mesh coloring. This ensures that each color
has coalesced memory access patterns.
In the inviscid case, all residual operations which occur on the element only occur
for a particular color. For example, the computation of the discontinuous flux and
the extrapolation and differentiation operations. In the viscous case, the gradient is
recomputed and extrapolated on all elements to ensure the most up-to-date value.
For both cases, all operations which occur on faces occurs on all faces. For exam-
ple, the computation of the common interface flux. This also means that all MPI
communication is performed for each color.
The work load for the residual computation in MCGS is higher than that for
block Jacobi, especially when using a large number of colors. These inefficiencies
could be mitigated by only updating information that is necessary but this makes the
algorithm more complex and leads to more irregular data access patterns.
10.3 Element Local Jacobians
The construction of the element local Jacobians is carried out in several steps:
1. Compute discontinuous flux Jacobians
2. Compute boundary condition Jacobians
3. Compute common interface flux and solution Jacobians
4. Compute modified interface flux Jacobians
5. Construct element local Jacobians
6. Scale Jacobians by determinant of geometric Jacobian
CHAPTER 10. IMPLEMENTATION 146
Each step is described in detail in the sections that follow. The flux Jacobian and
boundary Jacobians are computed using the functions in [92]. It’s important to
note that the element local Jacobian computation requires only the solution and
solution gradient information on MPI interfaces. This should already be provided
from the residual computation so no additional MPI communication is required for
the computation of the element local Jacobians or the implicit method in general.
10.3.1 Constructing the Flux and Solution Jacobians
The flux and solution Jacobian computations follow a similar procedure to the re-
spective functions in the residual computation. This section provides a high level
overview of the additional data structures and computations required.
Data Structures
All flux and solution Jacobian data structures are organized into a row-major, struc-
ture of arrays (SoA) format where data across all elements is contiguous in memory.
This data layout leads to a stride when accessing different values within an element
but is particular beneficial on GPUs because it allows for coalesced memory access
and vectorization of operations within kernels when performing the various function
evaluations. The dimensions of the data structures are
Discontinuous Flux:∂F
∂U: [Ndims, Nspts, Nvars, Nvars, Neles] ,
∂F
∂Q: [Ndims, Ndims, Nspts, Nvars, Nvars, Neles] ,
Common Interface Flux:∂F I
∂U: [Nfpts, Nvars, Nvars, Neles] ,
∂F I
∂Q: [Ndims, Nfpts, Nvars, Nvars, Neles] ,
∂F I
∂QN: [Ndims, Nfpts, Nvars, Nvars, Neles] ,
Common Interface Solution:∂U I
∂U: [Nfpts, Nvars, Nvars, Neles] ,
CHAPTER 10. IMPLEMENTATION 147
Neighbor Contribution:∂QN
∂U I,N: [Ndims, Nfpts, Neles] , (10.1)
where QN refers to the neighboring element solution gradient, Ndims is number of
spatial dimensions, Nspts is the number of solution points per element, Nfpts is the
number of flux points per element, Nvars is the number of conservative variables in
the equation and Neles is the number of elements in the domain. These numbers
are defined in chapter 6 for each problem type. It’s also important to note that the
common interface fluxes for each face can be found from each respective structure,
for example,
∂F I,face
∂U face,∈ ∂F
I
∂U,
∂F I,face
∂Qface,∈ ∂F
I
∂Q,
∂U I,face
∂U face,∈ ∂U
I
∂U,
for face ∈ FL ∪ FR. (10.2)
Lastly, the transpose of the inverse geometric Jacobian matrix, J−1′ , is also a neces-
sary structure and the dimensions are defined as,
Geometric Jacobian: J−1′ : [Ndims, Nspts, Ndims, Neles] . (10.3)
Discontinuous Flux Jacobians
The discontinuous flux Jacobians are computed by directly evaluating the flux Jaco-
bians at solutions points as shown in Eq. (8.48),
∂F
∂U spts=∂F
∂U(Uspts,Qspts) ,
∂F
∂Q spts
=∂F
∂Q(Uspts,Qspts) , (10.4)
where Uspts is the solution at solution points, Qspts is the solution gradient at solution
points and the functions ∂F∂U
(U,Q) and ∂F∂Q
(U,Q) are given in [92] for the Euler and
Navier-Stokes equations.
CHAPTER 10. IMPLEMENTATION 148
The discontinuous flux is then transformed to create the the transformed discon-
tinuous flux using Eq. (8.71),
∂F
∂U= J−1 ∂F
∂U spts,
∂F
∂Q= J−1 ∂F
∂Q spts
, (10.5)
where a temporary structure is used to avoid overwriting data. For the GPU imple-
mentation, this process is parallelized over all elements and solution points.
Common Interface Jacobians
Common interface Jacobians are computed by evaluating functions on global flux
points, “gfpts”, as opposed to element local flux points, “fpts”, where there is a
unique global flux point for each flux point pair between elements. This is done to
avoid computing values multiple times.
The process of computing the common interface flux Jacobians is described in
detail from an element local perspective starting from Eq. (8.51) in section 8.2.1.
Using this process from a global flux point perspective, the common interface flux
Jacobians are expressed as
∂F I
∂U− gfpts= Agfpts
∂F I
∂U−(U−
gfpts,U+gfpts,Q
−gfpts
),
∂F I
∂U+ gfpts= Agfpts
∂F I
∂U+
(U−
gfpts,U+gfpts,Q
+gfpts
),
∂F I
∂Q−gfpts
= Agfpts∂F I
∂Q−(U−
gfpts,Q−gfpts
),
∂F I
∂Q+gfpts
= Agfpts∂F I
∂Q+
(U+
gfpts,Q+gfpts
), (10.6)
where U−gfpts, U
+gfpts, Q
−gfpts and Q+
gfpts are the solution and solution gradient extrap-
olated from element solution points to the left and right sides of each global flux
point, respectively. These Jacobians directly correlate to the element local common
CHAPTER 10. IMPLEMENTATION 149
interface flux Jacobians through the use of pointers,
∂F I
∂U− gfpts→ ∂F I
∂U,
∂F I
∂U+ gfpts→ ∂F I
∂U,
∂F I
∂Q−gfpts
→ ∂F I
∂Q,
∂F I
∂Q+gfpts
→ ∂F I
∂Q,
∂F I
∂Q−gfpts
→ ∂F I
∂QN,
∂F I
∂Q+gfpts
→ ∂F I
∂QN, (10.7)
where a copies of ∂F I
∂Q− gfptsand ∂F I
∂Q+gfpts
are made to avoid communication costs when
constructing the element local Jacobian.
The process of the computing the common interface solution Jacobians is shown
in Eq. (8.63) and also correlates to the element local structures through pointers,
∂U I
∂U− gfpts→ ∂U I
∂U,
∂U I
∂U+ gfpts→ ∂U I
∂U. (10.8)
As shown in equations (8.57) and (8.65), the boundary flux and solution Jacobians
are computed using the boundary functions ∂F b
∂U(U,Q), ∂F
b
∂Q(U,Q) and ∂Ub
∂U(U) defined
in [92] for a given boundary condition of the Euler and Navier-Stokes equations. For
the GPU implementation, this entire process is parallelized over all global flux points.
Modified Common Interface Flux Jacobians
The neighbor contribution Jacobian, ∂QN
∂UI,N, is computed only once during prepro-
cessing by utilizing Eq. (8.103) or Eq. 8.103 for a two- or three-dimensional case,
respectively. The contribution from neighboring solution gradients are then added to
the common interface flux Jacobians using Eq. (8.43),
∂F I
∂U=∂F I
∂U+∂F I
∂QN
∂QN
∂U I,N
∂U I,N
∂U, (10.9)
where ∂UI,N
∂U= ∂UI
∂U. For the GPU implementation, this process is parallelized over all
elements and variables. When solving the advection or Euler equations, the common
interface flux Jacobians do not need to be modified and this section is skipped.
CHAPTER 10. IMPLEMENTATION 150
10.3.2 Constructing the Element Local Jacobians
This section provides a description of the left-hand side matrix data structure and
examples of algorithms used for the Kronecker product formulation of the element
local Jacobians.
Data Structure
The element local Jacobians are stored in the left-hand side (LHS) matrices described
in section 10.1 using a row-major, array of structures (AoS) format where data as-
sociated with a single LHS matrix is contiguous in memory. This data layout was
chosen in order to utilize the various batch cuBLAS [66] routines for the direct solver
in section 10.1.3. The dimensions of the LHS matrix structure is
Left-hand side matrix: LHS : [Neles, Nvars, Nspts, Nvars, Nspts] . (10.10)
The LHS matrix structure uses the most amount of memory within the implicit
algorithm because it’s directly dependent on the amount of elements in the domain
and the square of the number of solution points within an element. The amount
memory used by the LHS matrix per element is (P + 1)2d where P is the degree of
the Lagrange basis polynomials and d is the spatial dimension. Table 10.1 shows
some common sizes for an element local Jacobian.
P 2D AD 3D AD 2D NS 3D NS
1 (4× 4) (8× 8) (16× 16) (40× 40)
2 (9× 9) (27× 27) (36× 36) (135× 135)
3 (16× 16) (64× 64) (64× 64) (320× 320)
4 (25× 25) (125× 125) (100× 100) (625× 625)
5 (36× 36) (216× 216) (144× 144) (1080× 1080)
Table 10.1: Common sizes for an element local Jacobian for the advection-diffusion(AD) and Navier-Stokes (NS) equations. These sizes remain the same for the advec-tion and Euler equations.
CHAPTER 10. IMPLEMENTATION 151
Two-Dimensional Element Local Jacobian
The first term, ∇δ ·(∂fele
∂uele
), of the two-dimensional element local Jacobian is con-
structed from the Kronecker products in Eq. (8.88). An example of one term in
the formulation is shown in algorithm 10.1. The resulting sparsity pattern follows
Algorithm 10.1 Calculate 2D Kξ(∂fξele
∂uele
)1: for ele = 1 to Neles do2: for varj = 1 to Nvars do3: for vari = 1 to Nvars do4: for fi = 1 to Nspts1D do5: for si = 1 to Nspts1D do6: for sj = 1 to Nspts1D do7: spti ← fi * Nspts1D + si8: sptj ← fi * Nspts1D + sj9: ∂F
∂U← ∂F
∂U(0, sptj, vari, varj, ele)
10: ∂F I,L
∂UL← ∂F I,L
∂UL(fi, vari, varj, ele)
11: ∂F I,R
∂UR← ∂F I,R
∂UR(fi, vari, varj, ele)
12: LHS(ele, varj, sptj, vari, spti) ← LHS(ele, varj, sptj, vari, spti) +
K(si, sj) * ∂F∂U
+ KL(si, sj) * ∂F I,L
∂UL+ KR(si, sj) * ∂F I,R
∂UR
the same sparsity pattern as a Kronecker sum as shown in Figures 8.1 and 8.2 for a
polynomial order of P = 2.
The second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), of the two-dimensional element local Jacobian
is constructed from the product of the two Kronecker formulations in Eq. (8.90). The
product produces four terms which can be written explicitly as[∇δ ·
(∂fele
∂qele
∂qele
∂uele
)]vi,vj
=∑
dj∈x,y
Nvars∑vk=1[
Kξ(∂fξele
∂qdjele
)]vi,vk
J−1′
(dj,ξ)ele
[Kξ(∂qξele
∂uele
)]vk,vj
+[Kη(∂fηele
∂qdjele
)]vi,vk
J−1′
(dj,η)ele
[Kη(∂qηele
∂uele
)]vk,vj
+[Kξ(∂fξele
∂qdjele
)]vi,vk
J−1′
(dj,η)ele
[Kη(∂qηele
∂uele
)]vk,vj
+
CHAPTER 10. IMPLEMENTATION 152
[Kη(∂fηele
∂qdjele
)]vi,vk
J−1′
(dj,ξ)ele
[Kξ(∂qξele
∂uele
)]vk,vj
, (10.11)
where a loop over variables is included to account for boundary conditions as discussed
in section 8.3. The first two terms are like-terms where the sparsity pattern is the
same between the two terms in the product. An example of this process is shown
in algorithm 10.2. The last two terms are cross-terms where the sparsity pattern is
different between the two terms in the product. The result produces a dense matrix
with no sparsity pattern because of the products between the cross-terms as shown in
Figure 8.3 for P = 2. This is shown in algorithm 10.3. For the GPU implementation,
algorithms 10.1–10.3 are parallelized over all elements and variables.
Three-Dimensional Element Local Jacobian
The three-dimensional implementation of the element local Jacobian is very similar
to the two-dimensional implementation with a few exceptions. The first term, ∇δ ·(∂fele
∂uele
), is constructed using Eq. (8.104). The sparsity patterns are shown in Figures
8.4 and 8.5 for a polynomial order of P = 2. The algorithm for each Kronecker
product formulation looks similar to algorithm 10.1.
The second term, ∇δ ·(∂fele
∂qele
∂qele
∂uele
), is constructed using Eq. (8.106). The product
between the two Kronecker formulations produces a total of nine terms where three
terms are like-terms and six are cross-terms. The algorithms for building the like-
terms and cross-terms looks similar to algorithms 10.2 and 10.3, respectively. The
sparsity patterns are shown in Figures 8.6 and 8.7 for P = 2.
Scale Jacobian
The last step in constructing the element local Jacobians is to scale each Jacobian
by the determinant of the geometric Jacobian as shown in Eq. 8.113. For the GPU
implementation, this process is parallelized over all elements and variables.
CHAPTER 10. IMPLEMENTATION 153
Algorithm 10.2 Calculate 2D like-term[Kξ(∂fξele
∂qdjele
)]vi,vk
J−1′
(dj,ξ)ele
[Kξ(∂qξele
∂uele
)]vk,vj
1: for ele = 1 to Neles do2: for varj = 1 to Nvars do3: for vari = 1 to Nvars do4: for fi = 1 to Nspts1D do5: for si = 1 to Nspts1D do6: for sj = 1 to Nspts1D do7: val ← 0.08: for dimj = 1 to Ndims do9: for vark = 1 to Nvars do
10: diag ← (vark == varj);
11: ∂F I,L
∂QL← ∂F I,L
∂QL(dimj, fi, vari, vark, ele)
12: ∂F I,R
∂QR← ∂F I,R
∂QR(dimj, fi, vari, vark, ele)
13: ∂UI,L
∂UL← ∂UI,L
∂UL(fi, vark, varj, ele)
14: ∂UI,R
∂UR← ∂UI,R
∂UR(fi, vark, varj, ele)
15: for sk = 1 to Nspts1D do16: sptk ← fi * Nspts1D + sk17: ∂F
∂Q← ∂F
∂Q(0, dimj, sptk, vari, vark, ele)
18: J−1′ ← J−1′(0, sptk, dimj, ele)19: kron1 ←K(si, sk) * ∂F
∂Q+
KL(si, sk) * ∂F I,L
∂QL+ KR(si, sk) * ∂F I,R
∂QR
20: kron2 ←K(sk, sj) * diag +
KL(sk, sj) * ∂UI,L
∂UL+ KR(sk, sj) * ∂UI,R
∂UR)
21: val ← val + kron1 * J−1′ * kron2
22: end for23: end for24: end for25: spti ← fi * Nspts1D + si26: sptj ← fi * Nspts1D + sj27: LHS(ele, varj, sptj, vari, spti) ← LHS(ele, varj, sptj, vari, spti) +
val
CHAPTER 10. IMPLEMENTATION 154
Algorithm 10.3 Calculate 2D cross-term[Kξ(∂fξele
∂qdjele
)]vi,vk
J−1′
(dj,η)ele
[Kη(∂qηele
∂uele
)]vk,vj
1: for ele = 1 to Neles do2: for varj = 1 to Nvars do3: for vari = 1 to Nvars do4: for etai = 1 to Nspts1D do5: for etaj = 1 to Nspts1D do6: for xii = 1 to Nspts1D do7: for xij = 1 to Nspts1D do8: val ← 0.09: sptij ← etai * Nspts1D + xij
10: for dimj = 1 to Ndims do11: for vark = 1 to Nvars do12: diag ← (vark == varj);13: ∂F
∂Q← ∂F
∂Q(0, dimj, sptij, vari, vark, ele)
14: ∂F I,L
∂QL← ∂F I,L
∂QL(dimj, etai, vari, vark, ele)
15: ∂F I,R
∂QR← ∂F I,R
∂QR(dimj, etai, vari, vark, ele)
16: ∂UI,B
∂UB← ∂UI,B
∂UB(xij, vark, varj, ele)
17: ∂UI,T
∂UT← ∂UI,T
∂UT(xij, vark, varj, ele)
18: J−1′ ← J−1′(1, sptij, dimj, ele)19: kron1 ←K(xii, xij) * ∂F
∂Q+
KL(xii, xij) * ∂F I,L
∂QL+ KR(xii, xij) * ∂F I,R
∂QR
20: kron2 ←K(etai, etaj) * diag +
KL(etai, etaj) * ∂UI,B
∂UB+ KR(etai, etaj) * ∂UI,T
∂UT)
21: val ← val + kron1 * J−1′ * kron2
22: end for23: end for24: spti ← etai * Nspts1D + xii25: sptj ← etaj * Nspts1D + xij26: LHS(ele, varj, sptj, vari, spti) ← LHS(ele, varj, sptj, vari, spti) +
val
Chapter 11
Performance Analysis
In this chapter, the computational performance of the implicit method within ZEFR
is characterized within three sections. The first section tests the pseudo time stepping
method with MCGS on the bump and NACA 0012 test cases. The results show that,
in general, the total amount of pseudo time steps needed for convergence increases
with mesh refinement but does not necessarily increase with polynomial refinement.
The second section studies the GPU performance of the unsteady and steady-state
solvers on the half cylinder and NACA 0012 test cases, respectively. A comparison
between the explicit RK45 and implicit ESDIRK4 shows that the implicit method is
about 35 times slower than the explicit method for the special case of viscous flow over
a half cylinder, Re = 1000, P = 3 on 12 NVIDIA Tesla K80 GPUs. A performance
breakdown of the unsteady implicit method shows that 50% of the time is spent on
the batched matrix vector operation while the Jacobian computations only take up
about 15%. It is also shown that at least an order of magnitude speedup is achieved
on a single GPU over a single CPU core for the steady-state solver.
The last section studies the multi-GPU scalability of the steady-state solver for
the NACA 0012 test case. The results show that the implicit steady-state solver
achieves better strong and weak scalability when compared to the explicit method
and a weak scalability efficiency of about 98% across eight GPUs is obtained.
In what follows, all GPU tests invert the left-hand side matrices using the modified
direct solver described in section 10.1.3.
155
CHAPTER 11. PERFORMANCE ANALYSIS 156
11.1 Multicolored Gauss-Seidel
In this section, the convergence properties of the pseudo time stepping method with
MCGS is studied for the bump and NACA 0012 airfoil test cases from sections 9.1
and 9.2, respectively. As stated in the beginning of chapter 9, a solution is considered
converged if the residual drops by 10 orders of magnitude. All cases start from uniform
flow, and the CFL is exponentially increased at each pseudo time step until a CFL
of around 10000 is reached. Local time stepping is used and 100 block iterations
are used per pseudo time step. All cases are performed on a single NVIDIA Tesla
C2070 GPU and the element local Jacobians are not constructed using the Kronecker
product formulation described in 10.3.
11.1.1 Inviscid flow over a bump
Table 11.1 shows the total number of pseudo time steps, the wall-clock time and the
entropy error for a series of meshes, P = 2, for the bump test case. The results show
Neles (24× 8) (48× 16) (96× 32) (192× 64)
Pseudo Time Steps 14 27 52 103Wall Time (s) 0.722 2.01 10.43 75.35Entropy Error 7.28e-05 1.04e-05 1.40e-06 1.79e-07
Table 11.1: Convergence results for different meshes, inviscid flow over a bump, im-plicit pseudo time stepping with two color MCGS, single GPU, P = 2
that the total amount of pseudo time steps needed for convergence increases with
mesh refinement.
Table 11.2 shows the total number of pseudo time steps, the wall-clock time and
the entropy error for a series of polynomial orders, (48× 16) mesh, for the bump test
case. In this case, the total amount of pseudo time steps remained relatively the same
as the polynomial order was increased.
Figure 11.1 summarizes the results by showing the entropy error vs. total number
of block iterations and wall-clock time for all cases. The results show that increasing
CHAPTER 11. PERFORMANCE ANALYSIS 157
P 2 3 4 5
Pseudo Time Steps 27 44 47 43Wall Time (s) 2.01 4.92 13.47 20.02Entropy Error 1.04e-05 1.30e-06 6.65e-07 3.63e-07
Table 11.2: Convergence results for different polynomial orders, inviscid flow over abump, implicit pseudo time stepping with two color MCGS, single GPU, (48 × 16)quadrilateral mesh
103
104
10−7
10−6
10−5
10−4
Iterations
‖eS‖L2(Ω
)
P = 2(48× 16)
(a) Entropy Error vs. Block Iterations
100
101
102
10−7
10−6
10−5
10−4
Wall-clock time (s)
‖eS‖L2(Ω
)
P = 2(48× 16)
(b) Entropy Error vs. Wall Time (s)
Figure 11.1: Entropy error, inviscid flow over a bump, implicit pseudo time steppingwith two color MCGS, single GPU
the polynomial order on the (48 × 16) mesh reduces the entropy error while main-
taining a smaller wall-clock time compared to the more refined meshes.
11.1.2 Inviscid flow over the NACA 0012 airfoil
Table 11.3 shows the total number of pseudo time steps, the wall-clock time and the
lift coefficient for a series of O-meshes, P = 4, for the NACA 0012 test case. As in
the previous test case, the total amount of pseudo time steps needed for convergence
increased as the mesh was refined. The results are the same for the mixed meshes in
Table 11.4.
Table 11.5 shows the total number of pseudo time steps, the wall-clock time and
CHAPTER 11. PERFORMANCE ANALYSIS 158
Neles (8× 8) (16× 16) (32× 32) (64× 64) (128× 128)
Pseudo Time Steps 8 10 20 42 76Wall Time (s) 0.718 1.302 7.52 56.48 395.32Lift Coefficient 1.8107e-01 1.8037e-01 1.7948e-01 1.7949e-01 1.7950e-01
Table 11.3: Convergence results for different meshes, inviscid flow over the NACA0012 airfoil, implicit pseudo time stepping with two color MCGS, single GPU, P = 4
Neles 764 1512 6048
Pseudo Time Steps 36 43 105Wall Time (s) 12.28 24.81 220.14Lift Coefficient 1.7953e-01 1.7952e-01 1.7949e-01
Table 11.4: Convergence results for different mixed meshes, inviscid flow over theNACA 0012 airfoil, implicit pseudo time stepping with four color MCGS, single GPU,P = 4
the lift coefficient for a series of polynomial orders, (32 × 32) mesh, for the NACA
0012 test case. As shown previously, the total amount of pseudo time steps remained
P 2 3 4 5
Pseudo Time Steps 15 37 20 39Wall Time (s) 1.357 5.23 7.50 24.54Lift Coefficient 1.7853e-01 1.7963e-01 1.7948e-01 1.7950e-01
Table 11.5: Convergence results for different polynomial orders, inviscid flow over theNACA 0012 airfoil, implicit pseudo time stepping with two color MCGS, single GPU,(32× 32) quadrilateral mesh
relatively the same as the polynomial order was increased.
11.2 GPU performance
This section studies the GPU performance of the unsteady and steady-state solvers.
The GPU performance study on the unsteady solver compares the explicit RK45
CHAPTER 11. PERFORMANCE ANALYSIS 159
and implicit ESDIRK4 methods and shows a relative performance breakdown of each
component of the implicit method. The GPU performance study on the steady-state
solver shows the speedup of a single GPU over a single CPU core.
11.2.1 Unsteady Solver
In this section, the GPU performance of the unsteady solver is studied for the half
cylinder test case in section 9.5. The unsteady solver for this problem uses the
ESDIRK4 scheme from section 7.1.2 on 12 NVIDIA Tesla K80 GPUs and consists of
five implicit stages per time step. As discussed in the beginning of chapter 9, each
stage uses a fixed number of one pseudo time step and 200 block iterations where
the stage residual drops by 7 orders of magnitude. For this section, the element local
Jacobians are constructed using the Kronecker product formulation described in 10.3.
Comparison between explicit and implicit
The number of time steps and wall-clock time for the full simulation from section 9.5
for the explicit RK45 and implicit ESDIRK4 schemes are compared in Table 11.6.
The results show that the implicit method is about 35 times slower than the explicit
Case Studies Time Steps Wall-clock Time (s)
ZEFR - RK54 212000 2696.9ZEFR - ESDIRK4 5400 94919.1
Table 11.6: A comparison between explicit RK45 and implicit ESDIRK4 for viscousflow over a half cylinder, Re = 1000, P = 3 on 12 NVIDIA Tesla K80 GPUs.
method for this particular test case.
Performance Breakdown
The implicit method can be split into two sections. The first section called “Inverse
Jacobian” involves the construction and inverse of the left-hand side matrices used
to advance the solution. The second section is the block iterative method where the
CHAPTER 11. PERFORMANCE ANALYSIS 160
inverse matrices are multiplied by the right-hand side vectors. The implicit method is
profiled over the course of several time steps using NVIDIA’s nvprof, a profiling tool
for GPUs [65]. Figure 11.2 compares the wall-clock time between the two sections.
The results show that the majority of the time, 78%, is being spent within the block
Inverse Jacobian: 22%
Block Iterations: 78%
Figure 11.2: A wall-clock time comparison between the inverse Jacobian computationand the block iterative method for one ESDIRK4 time step of viscous flow over a halfcylinder, Re = 1000, P = 3, on 12 NVIDIA Tesla K80 GPUs.
iterative method.
The “Inverse Jacobian” profile is broken down even further in Figure 11.3 where
each section of the element local Jacobian computation and inverse is compared.
“Functions”, “Term 1” and “Scaling” take up a very small portion of overall time.
These refer to the construction of the flux and solution Jacobians from section 10.3.1,
the first term in the construction of the element local Jacobian from section 10.3.2
and the geometric and time step scaling operations for the left-hand side matrices,
respectively. The results show that the majority of the time is being spent construct-
ing “Term 2” (60%) and inverting all the left-hand side matrices (34%). The second
term in the element local Jacobian computation requires multiple products between
matrices so it’s not surprising that its much more expensive than the first term.
The profile for the block iterative method is also broken down further in Figure
CHAPTER 11. PERFORMANCE ANALYSIS 161
Functions: 1%
Term 2: 60%
Term 1: 2%
Scaling: 3%
Inverse: 34%
Figure 11.3: A wall-clock time comparison between each section of “Inverse Jacobian”for viscous flow over a half cylinder, Re = 1000, P = 3, on 12 NVIDIA Tesla K80GPUs.
11.4 where the computation of the residual called “Residual”, the right-hand side vec-
tors called “RHS” and the batched matrix-vector operations called “DgemvBatched”
are compared. Based on these results, the operation that takes the most amount of
time within the block iterative method is the batched matrix vector operation (64%).
Overall, across a single time step, the batched matrix vector operation takes up
a total of about 50% while the element local Jacobian computation takes up about
15%.
11.2.2 Steady-state Solver
In this section, the CPU and GPU performance of the steady-state solver are com-
pared for the NACA 0012 airfoil test case in section 9.2. Since convergence is not
important in this section, the total number of pseudo time steps is fixed to 10 and
the pseudo time step is fixed to ∆τ = 1× 10−9. All cases are performed on a single
CHAPTER 11. PERFORMANCE ANALYSIS 162
Residual: 23%
RHS: 13%
DgemvBatched: 64%
Figure 11.4: A wall-clock time comparison between each section of the block iterativemethod for viscous flow over a half cylinder, Re = 1000, P = 3, on 12 NVIDIA TeslaK80 GPUs.
NVIDIA Tesla C2070 GPU and a single Intel Xeon X5650 CPU core. In this sec-
tion, the element local Jacobians are not constructed using the Kronecker product
formulation described in 10.3.
Table 11.7 shows the overall speedup of the GPU code compared to the CPU code
for the steady-state solver with pseudo time stepping and 2 color MCGS. The results
Neles (32× 32) (64× 64) (128× 128) (256× 256)
P = 2 12.5 18.0 20.5 21.7P = 3 18.6 22.3 24.2 25.4P = 4 15.3 17.2 18.0 —P = 5 17.9 20.3 21.4 —
Table 11.7: Speedup of a single GPU over a single CPU core for inviscid flow overthe NACA 0012 airfoil, implicit pseudo time stepping with two color MCGS
show that at least an order of magnitude speedup is achieved on the GPU over the
CPU. The results also show that the speedup of the GPU code over the CPU code
increases with mesh size but stays roughly the same with polynomial order. Note that
CHAPTER 11. PERFORMANCE ANALYSIS 163
the memory requirements for the steady-state solver were reached for the (256× 256)
mesh with polynomial orders P = 4 and P = 5. As a reference, Tesla C2070 GPUs
contain 6 GB of device memory.
11.3 Multi-GPU Scalability
In this section, the multi-GPU scalability of the steady-state solver is studied for the
NACA 0012 airfoil test case in section 9.2. Since convergence is not important in this
section, the total number of pseudo time steps is fixed to 50 and the pseudo time step
is fixed to ∆τ = 1× 10−9. All cases are performed on two nodes of a NVIDIA Tesla
C2070 GPU cluster with six GPUs and two CPUs installed on each node. In this
section, the element local Jacobians are not constructed using the Kronecker product
formulation described in 10.3.
11.3.1 Strong Scalability
An MPI strong scalability study is performed on up to 12 GPUs to study the per-
formance of the explicit and implicit methods as the workload per device becomes
smaller and the communication overhead becomes more significant. Figure 11.5 shows
the speedup relative to one GPU for both the explicit and implicit methods. The re-
sults show that strong scalability improves with increased mesh refinement for both
methods. In this case, the implicit steady-state solver achieves better strong scala-
bility for smaller problem sizes because there’s more work relative to communication
available with the implicit method.
11.3.2 Weak Scalability
An MPI weak scalability study is performed on up to 8 GPUs to study the per-
formance of the explicit and implicit methods as the workload per device remains
constant and additional resources are used to solve a larger problem. Tables 11.8 and
11.9 show the wall-clock time and efficiency for the explicit and implicit methods.
The memory required for the left-hand side matrices are also reported. The results
CHAPTER 11. PERFORMANCE ANALYSIS 164
2 3 4 5 6 7 8 9 10 11 120
2
4
6
8
10
12
Number of GPUs
Speedup
32x3264x64128x128256x256
(a) Explicit RK4
2 3 4 5 6 7 8 9 10 11 120
2
4
6
8
10
12
Number of GPUs
Speedup
32x3264x64128x128
(b) Implicit Steady w/ MCGS
Figure 11.5: Speedup relative to one GPU for inviscid flow over the NACA 0012airfoil, P = 5
NGPUs 1 2 4 8Neles (128× 128) (256× 128) (256× 256) (512× 256)
Wall Time (s) 191.6 205.28 213.04 212.11Efficiency (%) 100 93.3 89.9 90.3
Table 11.8: Weak scalability study for inviscid flow over the NACA 0012 airfoil,explicit RK4, P = 5
NGPUs 1 2 4 8Neles (128× 128) (256× 128) (256× 256) (512× 256)
Wall Time (s) 396.81 398.19 401.35 403.35Efficiency (%) 100 99.6 98.9 98.4Memory (GB) 2.71 5.44 10.87 21.74
Table 11.9: Weak scalability study for inviscid flow over the NACA 0012 airfoil,implicit pseudo time stepping with two color MCGS, P = 5
show that the implicit steady-state solver achieves better weak scalability compared
to the explicit method. The implicit method is also able to maintain an efficiency
greater than 98% on a problem with over 21 GB of data. This suggests that the
implicit method can be effectively distributed over multiple GPUS to solve larger
CHAPTER 11. PERFORMANCE ANALYSIS 165
problems without a significant degradation in performance.
Chapter 12
Conclusions
This dissertation focused on addressing two major limitations preventing high-order
flux reconstruction methods from being utilized for unsteady aerospace simulations.
First, explicit methods often become unreliable because they are pushed close to their
numerical stability limit. Second, implicit methods can be prohibitively expensive for
high-order methods, specifically on modern hardware.
In Part I of this dissertation, it was shown that the CFL condition for nodal DG
via FR for the advection-diffusion equation is stricter than that for pure-advection
and pure-diffusion individually. This demonstrated that the coupling of advective
and diffusive terms within any equation have a dramatic effect on numerical stability.
Given these findings, a maximum stable time step estimate for the linear advection-
diffusion and Navier-Stokes equations on unstructured, tensor product elements was
deduced. The estimates were shown to be accurate within 50% error on all test cases
and conservative on tests with Cartesian grids.
Theoretical and numerical verification was also presented that showed that schemes
with centered values produce less error for well resolved solutions while schemes with
one-sided values produce less error for solutions that are under-resolved. It was also
shown that the CFL condition is strongly influenced by the choice of interface fluxes
and, in general, the condition for a scheme using centered values is much higher than
that which has one-sided values. This motivates the use of centered schemes for
explicit and implicit methods.
166
CHAPTER 12. CONCLUSIONS 167
In Part II of this dissertation, a multi-GPU, high-order, implicit time stepping
method was developed, implemented, tested and shown to be feasible on GPU ar-
chitectures. It was shown that the steady-state solver was able to obtain the correct
rates of convergence of P + 1 for solution quantities and 2P for integral functional
quantities. High-order polynomials were utilized to obtain the same error with less
degrees of freedom at a faster wall-clock time. The total amount of pseudo time
steps needed for convergence increased with mesh refinement but did not necessar-
ily increase with polynomial refinement and at least an order of magnitude speedup
was achieved on a single GPU over a single CPU core. The steady-state solver also
achieved better strong and weak scalability when compared to the explicit method
and a weak scalability efficiency of about 98% across eight GPUs was obtained. All of
these attributes are promising for the development of future high-order steady-state
solvers.
A Kronecker product formulation was used to show that the time complexity
of computing the analytical element local Jacobian for DFR can be reduced from
O(P 3d−1
)to O
(P d+1
)for the advection or Euler equations and O
(P 3d)
to O(P d+2
)for the advection-diffusion or Navier-Stokes equations. It was shown that the element
local Jacobian can be split into two terms which are constructed from two or three
distinct sparsity patterns based on Kronecker products as shown in Figures 8.1–
8.7. The analysis of the sparsity pattern of the element local Jacobians led to the
construction of efficient algorithms for GPU architectures.
A comparison between the explicit RK45 and implicit ESDIRK4 showed that the
unsteady implicit method was about 35 times slower than the explicit method for the
special case of viscous flow over a half cylinder, Re = 1000, P = 3 on 12 NVIDIA
Tesla K80 GPUs. This seems to indicate that explicit methods are more efficient
than implicit methods for these particular types of problems. Based on the results,
one could estimate that an implicit method would begin to become competitive when
the maximum time step of the explicit method is around two orders of magnitude
lower. This seems reasonable as this problem had an explicit time step restriction
on the order of 10−4 which is not particularly large. A performance breakdown of
the unsteady implicit method showed that 50% of the time was spent on the batched
CHAPTER 12. CONCLUSIONS 168
matrix vector operation while the Jacobian computations spent about 15%. This is
an important finding moving forward as future development focuses on creating more
efficient methods which eliminate these types of bottle necks.
Lastly, the memory size of the left-hand side matrices increased rapidly with
polynomial order but the memory issues on the GPU were mitigated by utilizing
the robust scaling properties of the method. Despite the memory deficiencies and
lack of competitiveness with GPU accelerated explicit methods, the results show
promise for solving problems of engineering importance on GPU clusters. As modern
GPU clusters become more powerful, implicit methods will become more feasible
for high Reynolds number flows where explicit methods are much more limited. An
investigation into methods of reducing memory requirements and producing more
efficient methods is warranted and is a topic of future research.
Appendix A
Boundary Conditions
For the Navier-Stokes equations, the common interface fluxes at boundary faces are
computed as,
F b (U,Q) = F n(U b(U), Qb(U,Q)
), (A.1)
where F n(U) is the flux normal to the face, U b(U) is the solution prescribed at the
boundary face, Qb(U,Q) is the solution gradient prescribed at the boundary face, U
is the solution extrapolated to the face and Q is the solution gradient extrapolated
to the face.
In two dimensions, the extrapolated solution and solution gradient are defined as
U =
ρ
ρu
ρv
e
, Qd =
∂ρ∂d∂ρu∂d∂ρv∂d∂e∂d
, d ∈ x, y, (A.2)
and the components of the unit normal vector are defined as n = [nx, ny]T . In three
169
APPENDIX A. BOUNDARY CONDITIONS 170
dimensions, the extrapolated solution and solution gradient are
U =
ρ
ρu
ρv
ρw
e
, Qd =
∂ρ∂d∂ρu∂d∂ρv∂d∂ρw∂d∂e∂d
, d ∈ x, y, z, (A.3)
and the components of the unit normal vector are n = [nx, ny, nz]T . The solution, U b,
and solution gradient, Qb, at the boundary are determined directly from the following
boundary conditions.
A.1 Solid Slip-Wall and Symmetry
On a solid surface where the flow is allowed to slip, the flow must remain tangent to
the surface [61]. In 2D, the velocities on the boundary can be written as,
ub = u− V nnx, vb = v − V nny.
An extrapolated pressure is used to compute the total energy on the wall so that the
solution on the boundary face in 2D is computed as,
U b =
ρ
ρub
ρvbp
γ−1+ 1
2ρ(u2
b + v2b )
. (A.4)
APPENDIX A. BOUNDARY CONDITIONS 171
For 3D, the process is very similar,
U b =
ρ
ρub
ρvb
ρwbp
γ−1+ 1
2ρ(u2
b + v2b + w2
b )
. (A.5)
This boundary condition can be applied for symmetry boundary conditions and is
only used for Euler so a solution gradient doesn’t need to be defined.
A.2 No-Slip Adiabatic Wall
On a solid surface where the flow is not allowed to slip [61], the velocity is set to zero.
The prescribed solution on the boundary in 2D is found by extrapolating the density
and pressure so that
U b =
ρ
0
0p
γ−1
, (A.6)
and in 3D,
U b =
ρ
0
0
0p
γ−1
. (A.7)
The solution gradient at the boundary is computed by removing the temperature
derivative normal to the wall from the energy gradient. The temperature gradient is
computed as,
∂T
∂d=∂e
∂d− ∂ρ
∂d
e
ρ− ρ
(u∂u
∂d+ v
∂v
∂d
), d ∈ x, y,
APPENDIX A. BOUNDARY CONDITIONS 172
and the temperature derivative normal to the wall is computed as,
∂T
∂n=∑
d∈x,y
∂T
∂dnd.
The energy gradient at the boundary can then be computed as,
∂eb∂d
=∂e
∂d− ∂T
∂nnd, d ∈ x, y.
The remaining gradients at the boundary are extrapolated so that the solution gra-
dient in 2D becomes,
Qbd =
∂ρ∂d∂ρu∂d∂ρv∂d∂eb∂d
, d ∈ x, y. (A.8)
In 3D, the same procedure is used and the solution gradient becomes,
Qbd =
∂ρ∂d∂ρu∂d∂ρv∂d∂ρw∂d∂eb∂d
, d ∈ x, y, z. (A.9)
A.3 Characteristic Riemann Invariant Far Field
On a far field boundary, Riemann invariants for a one dimensional flow normal to the
boundary are used to determine the solution at the boundary face [46]. First, the
velocity normal to the face and the speed of sound is computed using the extrapolated
solution and the free stream values in 2D,
V n = u nx + v ny, V n∞ = u∞nx + v∞ny,
APPENDIX A. BOUNDARY CONDITIONS 173
c =
√γp
ρ, c∞ =
√γp∞ρ∞
,
where ∞ denotes freestream values that are set for a specific problem. The Riemann
invariants can then be written as,
R = V n +2c
γ − 1, R∞ = V n
∞ −2c∞γ − 1
.
The normal velocity and speed of sound at the boundary are written as,
V nb =
1
2(R +R∞), cb =
γ − 1
4(R−R∞),
If V n < 0, the flow is entering the domain and the velocity and entropy at the
boundary is computed as,
ub = u∞ + (V nb − V n
∞)nx, vb = v∞ + (V nb − V n
∞)ny, sb =p∞ργ∞
,
otherwise, the flow is exiting the domain and the velocity and entropy at the boundary
is computed as,
ub = u+ (V nb − V n)nx, vb = v + (V n
b − V n)ny, sb =p
ργ.
The density and the pressure at the boundary can be computed from the entropy and
speed of sound so that,
ρb =
(1
γ
c2b
sb
) 1γ−1
, pb =1
γρbc
2b .
The 2D solution at the boundary can then be computed as,
U b =
ρb
ρbub
ρbvbpbγ−1
+ 12ρb(u
2b + v2
b )
. (A.10)
APPENDIX A. BOUNDARY CONDITIONS 174
Following the same procedure, the 3D solution at the boundary is computed as,
U b =
ρb
ρbub
ρbvb
ρbwbpbγ−1
+ 12ρb(u
2b + v2
b + w2b )
. (A.11)
The solution gradient at the boundary is the same as the extrapolated gradient,
Qb = Q.
Bibliography
[1] 1st International Workshop on High-Order CFD Methods. Problem C1.1.
Inviscid flow through a channel with a smooth bump. http://dept.ku.edu/
~cfdku/hiocfd/case_c1.1.html. Accessed: 2016-05-07.
[2] 4th International Workshop on High-Order Methods. BL1 - Laminar
Joukowski airfoil at Re=1000. https://how4.cenaero.be/content/
bl1-laminar-joukowski-airfoil-re1000. Accessed: 2017-10-11.
[3] M Ainsworth, P Monk, and W Muniz. Dispersive and dissipative properties of
discontinuous Galerkin finite element methods for the second-order wave equa-
tion. Journal of Scientific Computing, 27(1-3):5–40, 2006.
[4] Mark Ainsworth. Discrete dispersion relation for hp-version finite element ap-
proximation at high wave number. SIAM Journal on Numerical Analysis,
42(2):553–575, 2004.
[5] Mark Ainsworth. Dispersive and dissipative behaviour of high order discon-
tinuous Galerkin finite element methods. Journal of Computational Physics,
198(1):106–130, 2004.
[6] Roger Alexander. Diagonally implicit Runge–Kutta methods for stiff ODEs.
SIAM Journal on Numerical Analysis, 14(6):1006–1021, 1977.
[7] Kenneth Appel and Wolfgang Haken. Every planar map is four colorable. Bulletin
of the American mathematical Society, 82(5):711–712, 1976.
175
BIBLIOGRAPHY 176
[8] Kartikey Asthana. Analysis and design of optimal discontinous finite element
schemes. PhD thesis, Stanford University, 2016.
[9] Kartikey Asthana and Antony Jameson. High-order flux reconstruction schemes
with minimal dispersion and dissipation. Journal of Scientific Computing, pages
1–32, 2014.
[10] Kartikey Asthana, Manuel R Lopez-Morales, and Antony Jameson. Non-linear
stabilization of high-order flux reconstruction schemes via Fourier-spectral filter-
ing. Journal of Computational Physics, 303:269–294, 2015.
[11] Kartikey Asthana, Jerry Watkins, and Antony Jameson. On the rate of con-
vergence of flux reconstruction for steady-state problems. SIAM Journal on
Numerical Analysis, 54(5):2910–2937, 2016.
[12] Kartikey Asthana, Jerry Watkins, and Antony Jameson. On consistency and
rate of convergence of flux reconstruction for time-dependent problems. Journal
of Computational Physics, 334:367–391, 2017.
[13] F Bassi, A Colombo, C De Bartolo, N Franchina, A Ghidoni, and A Nigro. Inves-
tigation of high-order temporal schemes for the discontinuous Galerkin solution
of the Navier–Stokes equations. In Joint 11th World Congress on Computational
Mechanics, WCCM 2014, the 5th European Conference on Computational Me-
chanics, ECCM 2014 and the 6th European Conference on Computational Fluid
Dynamics, ECFD 2014, pages 5651–5662. International Center for Numerical
Methods in Engineering, 2014.
[14] Francesco Bassi and Stefano Rebay. A high-order accurate discontinuous finite
element method for the numerical solution of the compressible Navier–Stokes
equations. Journal of computational physics, 131(2):267–279, 1997.
[15] Patrice Castonguay. High-order energy stable flux reconstruction schemes for
fluid flow simulations on unstructured grids. PhD thesis, Stanford University,
2012.
BIBLIOGRAPHY 177
[16] Patrice Castonguay, Peter E Vincent, and Antony Jameson. A new class of high-
order energy stable flux reconstruction schemes for triangular elements. Journal
of Scientific Computing, 51(1):224–256, 2012.
[17] Patrice Castonguay, David M Williams, Peter E Vincent, Manuel Lopez, and
Antony Jameson. On the development of a high-order, multi-GPU enabled,
compressible viscous flow solver for mixed unstructured grids. AIAA paper,
3229:2011, 2011.
[18] Kennedy Christopher A and Carpenter Mark H. Additive Runge-Kutta schemes
for convection-diffusion-reaction equations. 2001.
[19] Bernardo Cockburn and Bo Dong. An analysis of the minimal dissipation local
discontinuous Galerkin method for convection–diffusion problems. Journal of
Scientific Computing, 32(2):233–262, 2007.
[20] Bernardo Cockburn, George E Karniadakis, and Chi-Wang Shu. The develop-
ment of discontinuous Galerkin methods. Springer, 2000.
[21] Bernardo Cockburn and Chi-Wang Shu. TVB Runge-Kutta local projection
discontinuous Galerkin finite element method for conservation laws. II. General
framework. Mathematics of computation, 52(186):411–435, 1989.
[22] Bernardo Cockburn and Chi-Wang Shu. The local discontinuous Galerkin
method for time-dependent convection-diffusion systems. SIAM Journal on Nu-
merical Analysis, 35(6):2440–2463, 1998.
[23] Bernardo Cockburn and Chi-Wang Shu. Runge–Kutta discontinuous Galerkin
methods for convection-dominated problems. Journal of scientific computing,
16(3):173–261, 2001.
[24] Christopher Cox, Chunlei Liang, and Michael W Plesniak. A high-order method
for solving unsteady incompressible Navier-Stokes equations with implicit time
stepping on unstructured grids. In 53rd AIAA Aerospace Sciences Meeting, page
0830, 2015.
BIBLIOGRAPHY 178
[25] Christopher Cox, Chunlei Liang, and Michael W Plesniak. A high-order solver
for unsteady incompressible Navier–Stokes equations using the flux reconstruc-
tion method on unstructured grids with implicit dual time stepping. Journal of
Computational Physics, 314:414–435, 2016.
[26] Laslo T Diosady and Scott M Murman. Tensor-product preconditioners for
higher-order space–time discontinuous Galerkin methods. Journal of Computa-
tional Physics, 330:296–318, 2017.
[27] Richard S Falk. Analysis of finite element methods for linear hyperbolic problems.
In Discontinuous Galerkin Methods, pages 103–112. Springer, 2000.
[28] Krzysztof J Fidkowski, Todd A Oliver, James Lu, and David L Darmo-
fal. p-multigrid solution of high-order discontinuous Galerkin discretizations of
the compressible Navier–Stokes equations. Journal of Computational Physics,
207(1):92–113, 2005.
[29] Marshall C. Galbraith and Carl Ollivier-Gooch. BI2 - smooth bump, BL1 -
laminar airfoil & BR1 - turbulent airfoil. https://how4.cenaero.be/system/
files/filedepot/12/BI2_BL1_BR1_Summary_Galbraith.pdf. Accessed: 2017-
10-11.
[30] Haiyang Gao and ZJ Wang. A conservative correction procedure via reconstruc-
tion formulation with the chain-rule divergence evaluation. Journal of Compu-
tational Physics, 232(1):7–13, 2013.
[31] Michael B Giles and Endre Suli. Adjoint methods for PDEs: a posteriori error
analysis and postprocessing by duality. Acta numerica, 11:145–236, 2002.
[32] Gene H Golub and Charles F Van Loan. Matrix computations, volume 3. JHU
Press, 2012.
[33] Gael Guennebaud, Benoit Jacob, et al. Eigen: a C++ linear algebra library.
http://eigen.tuxfamily.org. Accessed: 2017-05-7.
BIBLIOGRAPHY 179
[34] Ernst Hairer and Gerhard Wanner. Solving ordinary differential equations. II,
volume 14 of Springer series in computational mathematics, 1996.
[35] Jan S Hesthaven and Tim Warburton. Nodal discontinuous Galerkin methods:
algorithms, analysis, and applications, volume 54. Springer Science & Business
Media, 2007.
[36] Malte Hoffmann, Claus-Dieter Munz, and ZJ Wang. Efficient implementation
of the CPR formulation for the Navier–Stokes equations on GPUs. In Seventh
International Conference on Computational Fluid Dynamics (ICCFD7), 2012.
[37] Fang Q Hu and Harold L Atkins. Eigensolution analysis of the discontinuous
Galerkin method with nonuniform grids: I. one space dimension. Journal of
Computational Physics, 182(2):516–545, 2002.
[38] Fang Q Hu and Harold L Atkins. Two-dimensional wave analysis of the discon-
tinuous Galerkin method with non-uniform grids and boundary conditions. In
Proceedings of the 8th AIAA/CEAS Aeroacoustics Conference, 2002.
[39] Fang Q Hu, MY Hussaini, and Patrick Rasetarinera. An analysis of the discontin-
uous Galerkin method for wave propagation problems. Journal of Computational
Physics, 151(2):921–946, 1999.
[40] Thomas JR Hughes. The finite element method: linear static and dynamic finite
element analysis. Courier Corporation, 2012.
[41] HT Huynh. A flux reconstruction approach to high-order schemes including
discontinuous Galerkin methods. AIAA paper, 4079:2007, 2007.
[42] Hung T Huynh. A reconstruction approach to high-order schemes including
discontinuous Galerkin for diffusion. AIAA paper, 403:2009, 2009.
[43] Wolfram Research, Inc. Mathematica, Version 11.2. Champaign, IL, 2017.
[44] Antony Jameson. Time dependent calculations using multigrid, with applications
to unsteady flows past airfoils and wings. AIAA paper, 1596:1991, 1991.
BIBLIOGRAPHY 180
[45] Antony Jameson. Application of dual time stepping to fully implicit Runge
Kutta schemes for unsteady flow calculations. In 22nd AIAA Computational
Fluid Dynamics Conference, page 2753, 2015.
[46] Antony Jameson and Timothy Baker. Solution of the Euler equations for complex
configurations. In 6th Computational Fluid Dynamics Conference Danvers, page
1929, 1983.
[47] Antony Jameson, Peter E Vincent, and Patrice Castonguay. On the non-
linear stability of flux reconstruction schemes. Journal of Scientific Computing,
50(2):434–445, 2012.
[48] Claes Johnson and Juhani Pitkaranta. An analysis of the discontinuous Galerkin
method for a scalar hyperbolic equation. Mathematics of computation, 46(173):1–
26, 1986.
[49] Carl Timothy Kelley and David E Keyes. Convergence analysis of pseudo-
transient continuation. SIAM Journal on Numerical Analysis, 35(2):508–523,
1998.
[50] Christopher A Kennedy, Mark H Carpenter, and R Michael Lewis. Low-storage,
explicit Runge–Kutta schemes for the compressible Navier–Stokes equations. Ap-
plied numerical mathematics, 35(3):177–219, 2000.
[51] Andreas Klockner, Tim Warburton, Jeff Bridge, and Jan S Hesthaven. Nodal dis-
continuous Galerkin methods on graphics processors. Journal of Computational
Physics, 228(21):7863–7882, 2009.
[52] David A Kopriva and John H Kolias. A conservative staggered-grid Chebyshev
multidomain method for compressible flow. Technical report, DTIC Document,
1995.
[53] P Lasaint and PA Raviart. On a finite element method for solving the neu-
tron transport equation. In Mathematical Aspects of Finite Elements in Partial
BIBLIOGRAPHY 181
Differential Equations, Proceedings of a Symposium Conducted by the Mathemat-
ics Research Center, the University of Wisconsin-Madison, Madison, WI, USA,
pages 1–3, 1974.
[54] Sanjiva K Lele. Compact finite difference schemes with spectral-like resolution.
Journal of Computational Physics, 103(1):16–42, 1992.
[55] P Lesaint and Pierre-Arnaud Raviart. On a finite element method for solving the
neutron transport equation. Mathematical aspects of finite elements in partial
differential equations, (33):89–123, 1974.
[56] Chunlei Liang, Ravi Kannan, and ZJ Wang. A p-multigrid spectral difference
method with explicit and implicit smoothers on unstructured triangular grids.
Computers & fluids, 38(2):254–265, 2009.
[57] Yen Liu, Marcel Vinokur, and ZJ Wang. Spectral difference method for unstruc-
tured grids I: basic formulation. Journal of Computational Physics, 216(2):780–
801, 2006.
[58] M Lopez-Morales, Jonathan Bull, Jacob Crabill, Thomas D Economon, David
Manosalvas, Joshua Romero, Abhishek Sheshadri, JE Watkins, David Williams,
Francisco Palacios, et al. Verification and validation of HiFiLES: a high-order
LES unstructured solver on multi-GPU platforms. In 32nd AIAA applied aero-
dynamics conference, Atlanta, Georgia, USA, pages 16–20, 2014.
[59] R. W. MacCormack. Current status of numerical solutions of the Navier-Stokes
equations. AIAA paper, 32:1985, 1985.
[60] Robert William MacCormack. A numerical method for solving the equations of
compressible viscous flow. AIAA journal, 20(9):1275–1281, 1982.
[61] Gianmarco Mengaldo, Daniele De Grazia, J Peiro, Antony Farrington, F With-
erden, PE Vincent, and SJ Sherwin. A guide to the implementation of boundary
conditions in compact high-order methods for compressible aerodynamics. In
BIBLIOGRAPHY 182
7th AIAA Theoretical Fluid Mechanics Conference, AIAA Aviation, American
Institute of Aeronautics and Astronautics, 2014.
[62] RC Moura, SJ Sherwin, and J Peiro. Linear dispersion–diffusion analysis and
its application to under-resolved turbulence simulations using discontinuous
Galerkin spectral/hp methods. Journal of Computational Physics, 298:695–710,
2015.
[63] Cristian R Nastase and Dimitri J Mavriplis. High-order discontinuous Galerkin
methods using an hp-multigrid approach. Journal of Computational Physics,
213(1):330–357, 2006.
[64] A Nigro, A Ghidoni, Stefano Rebay, and Francesco Bassi. Modified extended
BDF scheme for the discontinuous Galerkin solution of unsteady compressible
flows. International Journal for Numerical Methods in Fluids, 76(9):549–574,
2014.
[65] NVIDIA. nvprof. https://docs.nvidia.com/cuda/profiler-users-guide/.
Accessed: 2017-11-09.
[66] NVIDIA. CUBLAS library. https://developer.nvidia.com/cublas. Ac-
cessed: 2016-05-23.
[67] Will Pazner and Per-Olof Persson. High-order DNS and LES simulations us-
ing an implicit tensor-product discontinuous Galerkin method. In 23rd AIAA
Computational Fluid Dynamics Conference, page 3948, 2017.
[68] Will Pazner and Per-Olof Persson. Stage-parallel fully implicit Runge–Kutta
solvers for discontinuous Galerkin fluid simulations. Journal of Computational
Physics, 335:700–717, 2017.
[69] Will Pazner and Per-Olof Persson. Approximate tensor-product preconditioners
for very high order discontinuous Galerkin methods. Journal of Computational
Physics, 354(Supplement C):344 – 369, 2018.
BIBLIOGRAPHY 183
[70] P-O Persson and Jaime Peraire. Newton-GMRES preconditioning for discontin-
uous Galerkin discretizations of the Navier–Stokes equations. SIAM Journal on
Scientific Computing, 30(6):2709–2733, 2008.
[71] Per-Olof Persson. High-order LES simulations using implicit-explicit Runge-
Kutta schemes. In Proceedings of the 49th AIAA Aerospace Sciences Meeting
and Exhibit, AIAA, volume 684, 2011.
[72] Per-Olof Persson. A sparse and high-order accurate line-based discontinuous
Galerkin method for unstructured meshes. Journal of Computational Physics,
233:414–429, 2013.
[73] Niles A Pierce and Michael B Giles. Adjoint recovery of superconvergent func-
tionals from PDE approximations. SIAM review, 42(2):247–264, 2000.
[74] Roldon Pozo. Template Numerical Toolkit. http://math.nist.gov/tnt/. Ac-
cessed: 2016-05-23.
[75] Patrick Rasetarinera and MY Hussaini. An efficient implicit discontinuous spec-
tral Galerkin method. Journal of Computational Physics, 172(2):718–738, 2001.
[76] Wm H Reed and TR Hill. Triangular mesh methods for the neutron transport
equation. Los Alamos Report LA-UR-73-479, 1973.
[77] J Romero, K Asthana, and A Jameson. A simplified formulation of the flux
reconstruction method. Journal of Scientific Computing, pages 1–24, 2015.
[78] J Romero, FD Witherden, and A Jameson. A direct flux reconstruction scheme
for advection–diffusion problems on triangular grids. Journal of Scientific Com-
puting, pages 1–30, 2017.
[79] Joshua Romero. On the development of the direct flux reconstruction scheme for
high-order fluid flow simulations. PhD thesis, Stanford University, 2018.
[80] Joshua Romero and Antony Jameson. Extension of the flux reconstruction
method to triangular elements using collapsed-edge quadrilaterals. In 54th AIAA
Aerospace Sciences Meeting, page 1825, 2016.
BIBLIOGRAPHY 184
[81] T Strouboulis and JT Oden. A posteriori error estimation of finite element
approximations in fluid mechanics. Computer methods in applied mechanics and
engineering, 78(2):201–242, 1990.
[82] Endre Suli. A posteriori error analysis and adaptivity for finite element approx-
imations of hyperbolic problems. In An introduction to recent developments in
theory and numerics for conservation laws, pages 123–194. Springer, 1999.
[83] Yuzhi Sun, ZJ Wang, and Yen Liu. Efficient implicit non-linear LU-SGS approach
for compressible flow computation using high-order spectral difference method.
Comput. Phys, 5(2-4):760–778, 2009.
[84] Thomas Toulorge and Wim Desmet. CFL conditions for Runge–Kutta discontin-
uous Galerkin methods on triangular grids. Journal of Computational Physics,
230(12):4657–4678, 2011.
[85] Jacobus JW van der Vegt and H Van der Ven. Space–time discontinuous Galerkin
finite element method with dynamic grid motion for inviscid compressible flows:
I. General formulation. Journal of Computational Physics, 182(2):546–585, 2002.
[86] John C Vassberg and Antony Jameson. In pursuit of grid convergence for two-
dimensional Euler solutions. Journal of Aircraft, 47(4):1152–1166, 2010.
[87] Peter E Vincent, Patrice Castonguay, and Antony Jameson. Insights from von
Neumann analysis of high-order flux reconstruction schemes. Journal of Com-
putational Physics, 230(22):8134–8154, 2011.
[88] Peter E Vincent, Patrice Castonguay, and Antony Jameson. A new class of high-
order energy stable flux reconstruction schemes. Journal of Scientific Computing,
47(1):50–72, 2011.
[89] Li Wang and Dimitri J Mavriplis. Implicit solution of the unsteady Euler equa-
tions for high-order accurate discontinuous Galerkin discretizations. Journal of
Computational Physics, 225(2):1994–2015, 2007.
BIBLIOGRAPHY 185
[90] ZJ Wang, Krzysztof Fidkowski, Remi Abgrall, Francesco Bassi, Doru Caraeni,
Andrew Cary, Herman Deconinck, Ralf Hartmann, Koen Hillewaert, HT Huynh,
et al. High-order CFD methods: current status and perspective. International
Journal for Numerical Methods in Fluids, 72(8):811–845, 2013.
[91] ZJ Wang and Haiyang Gao. A unifying lifting collocation penalty formulation in-
cluding the discontinuous Galerkin, spectral volume/difference methods for con-
servation laws on mixed grids. Journal of Computational Physics, 228(21):8161–
8186, 2009.
[92] Jerry Watkins. Navier-Stokes Jacobians. https://github.com/jewatkins/
navier-stokes-jacobians, 2017. Accessed: 2017-11-11.
[93] Jerry Watkins, Kartikey Asthana, and Antony Jameson. A numerical analy-
sis of the nodal discontinuous Galerkin scheme via flux reconstruction for the
advection-diffusion equation. Computers & Fluids, 139:233–247, 2016.
[94] Jerry Watkins, Joshua Romero, and Antony Jameson. Multi-GPU, implicit time
stepping for high-order methods on unstructured grids. In 46th AIAA Fluid
Dynamics Conference, page 3965, 2016.
[95] DM Williams, Patrice Castonguay, Peter E Vincent, and Antony Jameson. En-
ergy stable flux reconstruction schemes for advection–diffusion problems on tri-
angles. Journal of Computational Physics, 250:53–76, 2013.
[96] DM Williams and A Jameson. Energy stable flux reconstruction schemes for
advection–diffusion problems on tetrahedra. Journal of Scientific Computing,
59(3):721–759, 2014.
[97] Freddie D Witherden, Antony M Farrington, and Peter E Vincent. PyFR: An
open source framework for solving advection–diffusion type problems on stream-
ing architectures using the flux reconstruction approach. Computer Physics Com-
munications, 185(11):3028–3040, 2014.
BIBLIOGRAPHY 186
[98] Freddie David Witherden. On the Development and Implementation of High-
Order Flux Reconstruction Schemes for Computational Fluid Dynamics. PhD
thesis, PhD thesis, Imperial College London, 2015.