algorithms and software for fast atmospheric tracer transport in … · algorithms and software for...

47
Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing Research, Sandia National Laboratories 1 Joint work with Pete Bosler, Oksana Guba, Mark Taylor (CCR, SNL) 1 This work was supported by the U.S. Department of Energy, Office of Science, Biological and Environmental Research Program and Advanced Scientific Computing Research Program. This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sandia National Laboratories is a multimission laboratory managed and operated by the National Technology and Engineering Solutions of Sandia, L.L.C., a wholly owned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-NA-0003525. This presentation describes objective technical results and analysis. Any subjective views or opinions that might be expressed do not necessarily represent the views of the U.S. Department of Energy or the United States Government. SAND2019-14060 PE. Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 1 / 30

Upload: others

Post on 14-Aug-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Algorithms and software for fast atmospheric tracer transport in E3SM

Andrew M. Bradley

Center for Computing Research, Sandia National Laboratories1

Joint work with Pete Bosler, Oksana Guba, Mark Taylor (CCR, SNL)

1This work was supported by the U.S. Department of Energy, Office of Science, Biological and EnvironmentalResearch Program and Advanced Scientific Computing Research Program. This research used resources of the NationalEnergy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science ofthe U.S. Department of Energy under Contract No. DE-AC02-05CH11231. Sandia National Laboratories is a multimissionlaboratory managed and operated by the National Technology and Engineering Solutions of Sandia, L.L.C., a whollyowned subsidiary of Honeywell International, Inc., for the U.S. Department of Energy’s National Nuclear SecurityAdministration under contract DE-NA-0003525. This presentation describes objective technical results and analysis. Anysubjective views or opinions that might be expressed do not necessarily represent the views of the U.S. Department ofEnergy or the United States Government. SAND2019-14060 PE.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 1 / 30

Page 2: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 1 / 30

Page 3: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

DOE Energy Exascale Earth System Model2

“DOE’s E3SM is a state-of-the-science Earth system model development and simulation projectto investigate energy-relevant science using code optimized for DOE’s advanced computers.”(e3sm.org)

2E3SM Project [Computer Software] https://dx.doi.org/10.11578/E3SM/dc.20180418.36Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 2 / 30

Page 4: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

E3SM Atmosphere (EAM) dynamical core: HOMME3 v1

Horizontal: Continuous SpectralElement Method (SEM)

Vertically Lagrangian by default

Gauss-Lobatto-Legendre (GLL) basis

Diagonal mass matrix: direct stiffnesssummation (DSS); very efficient

total time step in HOMME

verticalremap3 horizontal steps

+ +

horizontal step

+

5 RK stagesfor dynamics

3 RK stagesfor tracers and HV

3 HV subcyclesfor dynamics HV

MPI exchange for halo minmaxMPI exchange

MPI exchange for hyperviscosity (HV)limiter+

3Dennis et al., CAM-SE: A scalable spectral element dynamical core for the Community Atmosphere Model, J. HPCAppl., 2012; Taylor, Conservation of Mass and Energy for the Moist Atmospheric Primitive Equations on UnstructuredGrids, in Numerical Techniques for Global Atmospheric Models, 2012; right figures from Bertagna et al, HOMMEXX 1.0:A Performance Portable Atmospheric Dynamical Core for the Energy Exascale Earth System Model, GMD, 2019.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 3 / 30

Page 5: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Motivation

MOVIE

EAM has three conceptual components:1 Dynamics: Compute wind, thermodynamic variable, total air density.2 Tracer transport: Transport constituent trace species according to the wind.3 Physics parameterizations: Compute parameterized column physics, such as microphysics,

turbulence, shallow and deep convection, aerosols, chemistry.

Depending on configuration, 2 or 3 is the dominant cost.To speed up 2, minimize number of communication rounds per unit of simulated time.I ⇒ Semi-Lagrangian transport.To speed up 3, match physics column grid density to effective resolution of dynamics.I ⇒ Place physics columns on a different grid than dynamics uses.I High-order physics-dynamics remap between grids.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 4 / 30

Page 6: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Problem: Property preservation

A property is a quantity that must be maintained to essentially machine precision.

Importance:

Discrete mass conservation is necessary for long-term (e.g., climate) simulations.

Shape preservation is needed by chemistry and physics models.Tracer consistency is an artifact of coupling multiple discretizations together.I Handling tracer consistency enables using a different discretization for each piece.I Two different pieces may imply a solution for a particular field.I They must agree.

Utility: Very few efficient algorithms are natively property preserving.⇒ A Constrained Density Reconstructor (CDR) is an algorithm that restores properties.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 5 / 30

Page 7: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Motivation, continued

The property preservation problem comes up a lot:I high-order semi-Lagrangian tracer transport,I high-order physics-dynamics remap,I many other use cases.

2-phase time step:I Apply a fast non-property-preserving algorithm.I Apply a CDR to restore properties.

A CDR is useful only if it isI fast: in particular, as little communication as possible;I accurate: property restoration entails a usefully bounded amount of mass redistribution.

Important problem distinction:I local: within an element, column, or both: no MPI communication;I global: among elements or columns, or both: MPI communication.

A CDR makes the overall space-time operator nonlinear. Strict property preservation withlocal shape preservationI is only 1st-order accurate if the space-time operator is linear, butI can be 2nd-order accurate with a CDR.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 6 / 30

Page 8: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

High-order semi-Lagrangian tracer transport

Time step is not restricted by the advectiveCFL number, allowing very large time steps.Variant 1: High-order cell-integratedincremental remap semi-Lagrangian method(CISL)I Locally mass conserving . . .I . . . but requires a CDR to restore properties.Variant 2: High-order interpolationsemi-Lagrangian method (ISL)I Point-basis method: 4× smaller

communication volume than CISL.I Requires a CDR to restore properties.I Not mass conserving, but a CDR fixes this for

essentially no additional cost.ISL is very cheap.I Can get arbitrarily high order (up to property

preservation) without having to create anintersection algorithm for curved edges.

I Careful MPI communication to take advantageof small communication volume.

I These details are a topic for another talk; todaywe take these as given.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 7 / 30

Page 9: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 7 / 30

Page 10: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Tracer transport4

Total density ρ > 0.

Tracer density Q ≥ 0. (Multipletracers in practice.)

Tracer mixing ratio q ≡ Qρ

.

Flow field (wind) u.

Continuity equation:

0 =∂ρ

∂t+∇ · (uρ)

Solved by the dynamical equationssolver.

Tracer transport equation:

0 =∂Q∂t

+∇ · (uQ) (conservation form)

=∂q∂t

+ u · ∇q (advective form)

Solved by the tracer transort solver.

Global mass of tracer is conserved⇒ Discrete massconservation problem.

In a parcel of air, q’s extrema are maintained; no newextrema form⇒ Shape preservation problem.

If q = 1, tracer transport equation same as continuityequation⇒ Tracer consistency problem.

4All material and figures in this section are from Bradley, A.M., Bosler, P.A., Guba, O., Taylor, M.A. and Barnett, G.A.,Communication-efficient property preservation in tracer transport, SISC, 41(3), pp.C161-C193, 2019.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 8 / 30

Page 11: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Property preservation

1 Discrete mass conservation. Discretization’s global masses (ρ̄◦, Q̄◦), in the absence of sourceterms, must be constant in time.

2 Shape preservation. Global and local extrema at time step n− 1, or some a priori bounds, arenot violated at time step n. Easier to harder:

Example (positivity, global static lower bound): ρi ≥ 0Example (mixing ratio, global static range): 0 ≤ qi ≤ 1Example (global dynamic range, safety): minj(qj)min ≤ qi ≤ maxj(qj)max

Example (local dynamic range, today’s focus): (qi)min ≤ qi ≤ (qi)max

3 Tracer consistency. If qn−1 = const everywhere, then Qn ∝ ρn, where ρn comes from solvingthe continuity equation.I Since transport operator is nonlinear in practice, we need to be precise about what is being requested.I (Could have a magic switch that detects constant qn−1 case.)I Reasonable: The transport operator must be a continuous function of qk , ρk , k = n− 1, n.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 9 / 30

Page 12: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Approaches

Eulerian finite volume, spectral element.I Severe time step restriction.I [Guba et al. JCP 2014]Flux-form cell-integrated semi-Lagrangian.I Mass conservation is free.I For the others, adjust departure cell geometry.I Example: CSLAM splits edges to adjust areas to match ρ fluxes. [Lauritzen et al. MWR 2017]I (Simply scaling fluxes gives tracer consistency but breaks shape preservation.)I Flux form required.Solve a Poisson equation for compensating fluxes to match ρ fluxes.I Iterative and expensive iterations.I [Rotman et al. JGR 2004]Heuristic or ad hoc iteration.I Noniterative (and can fail) or iterative but inexpensive iterations (reductions).I [Priestley MWR 1993], [Bermejo, Conde MWR 2002], [McGregor 2005], [Zerroukat JCP 2010]Solve a global optimization problem.I Iterative but inexpensive iterations (reductions).I [Bochev, Ridzal, Shashkov JCP 2013], [Bochev, Ridzal, Peterson, JCP 2014], [Bochev, Moe,

Peterson, Ridzal Coupled Problems 2015]

What can be achieved with exactly one MPI_Allreduce-like communication round?

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 10 / 30

Page 13: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 10 / 30

Page 14: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Problem formulation

Notation:I Element-wise operations: e.g., xy is

element-wise multiplication.I Inner product: x · y.I e is the vector of all ones.Tracer transport discretization variables:I Total density ρ > 0, total mass ρ̄ > 0;I Tracer density Q ≥ 0, tracer mass Q̄ ≥ 0;I Mixing ratio q ≡ Q/ρ = Q̄/ρ̄.Given:I Desired global mass Q̄◦;I Local bounds on q: qmin, qmax;I Preliminary, or target, solution Q̄∗.

Find:Q̄ ∈ Q̄(ρ̄, Q̄◦, qmin, qmax)

≡ {Q̄ : (i) e · Q̄ = Q̄◦ and

(ii) ρ̄qmin ≤ Q̄ ≤ ρ̄qmax}.

Correction variables:I Correction x = Q̄− Q̄∗;I Total mass a ≡ ρ̄ > 0;I Mass discrepancy b ≡ Q̄◦ − e · Q̄∗;I Local bounds

l ≡ ρ̄qmin − Q̄∗,

u ≡ ρ̄qmax − Q̄∗;

I Target x∗ ≡ Q̄∗ − Q̄∗ = 0e.

Find:x ∈ T (b, l, u)

≡ {x : (i) e · x = b and

(ii) l ≤ x ≤ u}.

1 (Feasibility N&SC)

T (b, l, u) 6= ∅ if and only ifl ≤ u ande · l ≤ b ≤ e · u.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 11 / 30

Page 15: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 11 / 30

Page 16: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 11 / 30

Page 17: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Optimization problems and algorithms

With ω > 0:

T1(b, l, u) ≡ arg minx∈T (b,l,u)

‖x‖1 ,

Tw1(ω, b, l, u) ≡ arg minx∈T (b,l,u)

ω · |x|,

Tw2(ω, b, l, u) ≡ arg minx∈T (b,l,u)

ω · x2.

See [Dai & Fletcher, Math. Prog., 2006] for an efficient, but iterative, method to solve the2-norm problem. Each iteration requires a reduction.

2 (1-norm minimal)

Tw1(ω, b, l, u), Tw2(ω, b, l, u) ⊆ T1(b, l, u).

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 12 / 30

Page 18: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 12 / 30

Page 19: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Correct, good solution with one batch all-to-all reduction5

1: function CLIPANDASSUREDSUM(b, l, u)2: x̄← CLIP(l, u)3: m← b− e · x̄4: if m = 0 then return x̄5: v← MAKEASSUREDWEIGHTS(l, u,m, x̄)6: return x̄ + mv7: end function8: function CLIP(l, u)9: for i← 1, n do x̄i ← max{li,min{ui, 0}}

10: return x̄11: end function12: function MAKEASSUREDWEIGHTS(l, u,m, x̄)13: return u−x̄

e·(u−x̄) if m > 0 else x̄−le·(̄x−l)

14: end function

3 (Correct)

If X ≡ T (b, l, u) 6= ∅, then x←CLIPANDASSUREDSUM(b, l, u) ∈ X .

5For example, equations 22–25 of Kaas et al, A hybrid Eulerian–Lagrangian numerical scheme for solving prognosticequations in fluid dynamics, GMD, 2013.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 13 / 30

Page 20: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

2D illustrations

e · y = b

e · x̄

u

lm

b

mvxcaas

xw2

T1(b, l,u)

xI

T1(b, l,u) = xw2

= xcaas

u

mv

le · y = b = 0

xI

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 14 / 30

Page 21: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Generic weights, assured solution

Use weights computed as a function of, e.g., smoothness [e.g., Bermejo, Conde MWR 2002].

These weights are not guaranteed to give a solution in T (b, l, u).

Algorithm CLIPANDASSUREDGENERICSUM finds the convex combination of proposed andassured weights nearest the proposed and in T (b, l, u) 6= ∅.Two batch all-to-all reductions:I At least two: Cannot know whether proposed w will push x̄ out of bounds until after m is computed.I Two ends up being enough.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 15 / 30

Page 22: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Correction magnitude

Out of boundedness:

Bl(l)≡∑

i

max {0, li} =∑i:li>0

li,

Bu(u)≡∑

i

max {0, −ui} = −∑

i:ui<0

ui,

B(l, u) ≡ Bl(l) + Bu(u) =∑

i

max {0, li, −ui}.

4 (Minimum correction magnitude)

For x ∈ T (b, l, u), ‖x‖1 ≥ max {|b|, B(l, u)}.

x, 1D spatial coordinate

Mix

ing

ratio

yCAASQLT

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 16 / 30

Page 23: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Correction magnitudeTechnical term:

d(l, u) ≡

e · l if this is positivee · u if this is negative0 else.

5 (1-norm)If T (b, l, u) 6= ∅, then x ∈ T1(b, l, u) has norm

‖x‖1 =

{b + 2Bu(l, u) if m ≥ 0−b + 2Bl(l, u) if m ≤ 0

≤ |b|+ 2(B(l, u)− |d(l, u)|)≤ |b|+ 2B(l, u).

6 (Correction magnitude)

If X ≡ T (b, l, u) 6= ∅ and x is returned by CLIPANDASSUREDSUM,CLIPANDASSUREDGENERICSUM, or one of the optimization algorithms,then x ∈ T1(b, l, u).

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 17 / 30

Page 24: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 17 / 30

Page 25: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

What if T (b, l,u) = ∅?

Q̄s(ρ̄, Q̄◦, qmin, qmax) ≡ {Q̄ : (i) e · Q̄ = Q̄◦ and

(ii) (mini

qmini )ρ̄ ≤ Q̄ ≤ (max

iqmax

i )ρ̄}.

Ts(Q̄∗, a, b, l, u) is the equivalent set forthe correction variables.

Q̄s(ρ̄, Q̄◦, qmin, qmax) is the set of allglobal range preserving corrections.

Local extrema may be violated, but globalextrema are not.

Informal:

7 (Safety feasibility SC)Assume:

1 a simulation conserves total mass;2 tracer mass is intended to be conserved;3 bound extrema are at least the extremaof the previous time step.

Then Ts(Q̄∗, a, b, l, u) 6= ∅.

⇒Mass conservation, tracer consistency arestill achievable.

RECONSTRUCTSAFELY wraps an unsafeCDR SELECTX.

Communication:

Safe CLIPANDASSUREDSUM stillrequires just one batch all-to-all reduction(BAR). Larger communication volume.

Safe CLIPANDASSUREDGENERICSUM

still requires two BARs. Largercommunication volume.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 18 / 30

Page 26: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 18 / 30

Page 27: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Quasi-Local Tree-based CDR (QLT)

u11 + u12

m∗11 + m∗

12

send l11,m∗11, u11

~m∗21 = (m∗

11,m∗12)

~u21 = (u11, u12)

store l11,m∗11, u11

m∗11 = x∗

1

store l12,m∗12, u12

m∗12 = x∗

2

store ~l21 = (l11, l12)

send l11 + l12

x1 ← m21(1) x2 ← m21(2)

m21(1) m21(2)

m31(1) m31(2)

min~m31 ‖~m31 − ~m∗31‖

s.t. m31(1) + m31(2) = MG~l31 ≤ ~m31 ≤ ~u31

~l21 ≤ ~m21 ≤ ~u21

s.t. m21(1) + m21(2) = m31(1)min~m21 ‖~m21 − ~m∗

21‖

Can incorporate a weight vector.

At each node, user can provide a node-local CDR SELECTX.

At each node, RECONSTRUCTSAFELY wraps SELECTX.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 19 / 30

Page 28: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Top-level algorithms: RECONSTRUCTSAFELY and QLT

8 (Correct)

If X ≡ T (b, l, u) 6= ∅, then RECONSTRUCTSAFELY and QLT return x ∈ X ;else (ii) if Xs ≡ Ts(Q̄∗, a, b, l, u) 6= ∅, then x ∈ Xs;else (iii) x such that e · x = b.

9 (Correction magnitude)

Suppose that if T (b, l, u) 6= ∅, SELECTX(b, l, u) returns y ∈ T (b, l, u) such that‖y‖1 ≤ |b|+ 2(B(l, u)− |d(l, u)|).

Then RECONSTRUCTSAFELY and QLT return x such that(i) if X ≡ T (b, l, u) 6= ∅, then ‖x‖1 ≤ |b|+ 2(B(l, u)− |d(l, u)|);else (ii) if b ≥ e · u, then ‖x− u‖1 ≤ b− e · u and x ≥ u,

else ‖l− x‖1 ≤ e · l − b and x ≤ l.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 20 / 30

Page 29: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Quasi-locality

10 (Quasi-locality)

For x returned by QLT, x may not be in X1 even if T (b, l, u) 6= ∅.

x, 1D spatial coordinate

Mix

ing

ratio

yCAASQLT

14

8

16

24

32

Tota

lmas

s

Density profile contours at 0.1 increments

0

1

Mas

sin

cell Density profile for total mass 16

0 16 32 48 63Cell index

01/4

1

Mas

sin

cell Density profile for total mass 24

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 21 / 30

Page 30: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Implementation

One MPI_Allreduce equivalent.I But can’t be implemented using MPI_Allreduce.

Less than half the communication volume of unsafe CLIPANDASSUREDSUM.Level schedule tree⇒ Solution is (bit-for-bit)I independent of process decomposition,I including in finite precision.I Practically relevant to E3SM, which requires decomposition-independent solutions.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 22 / 30

Page 31: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 22 / 30

Page 32: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

One-dimensional transport: ConvergenceLocality Sinusoid Rectangle Triangle Bell Gaussian

1 2 3 4

−3.6

−3.4

−3.2

−3.0

−2.8

−2.6

log 2l 2

rela

tive

erro

r

1/31/31/31/31/31/3

Locality

1 2 3 4

−21−20−19−18−17−16−15−14−13−12−11−10−9

2

3

2

3

2

3

2

3

2

3

2

3

Sinusoid

1 2 3 4

−4.0

−3.8

−3.6

−3.4

−3.2

−3.0

−2.8

1/31/31/31/31/31/3

Rectangle

1 2 3 4Refinement level

−9.0

−8.5

−8.0

−7.5

−7.0

−6.5

−6.0

−5.5

log 2l 2

rela

tive

erro

r

111111

Triangle

1 2 3 4Refinement level

−14

−13

−12

−11

−10

−9

−8

−7

222222

Bell

CubicCAASBC

1 2 3 4Refinement level

−18−17−16−15−14−13−12−11−10−9−8−7−6−5

23

23

23

23

23

23

Gaussian

LS(1)QLT(1)QLT(BC)

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 23 / 30

Page 33: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

One-dimensional transport: Randomized realizations

0.0 0.2 0.4 0.6 0.8 1.0

−16

−14

−12

−10

−8

−6

−4

−2

0

Uni

form

mes

h

log10 Relative error

0.0 0.2 0.4 0.6 0.8 1.0

−16

−14

−12

−10

−8

−6

−4

−2

log10 Relative mass redistribution

−1.30

−1.25

−1.20

−1.15

−1.10

−1.05

−1.00

−0.95

−0.90

0.0

0.2

0.4

0.6

0.8

1.0

Cumulative density ofl2 relative error

0.0 0.2 0.4 0.6 0.8 1.0x

−16

−14

−12

−10

−8

−6

−4

−2

0

Pert

urbe

dm

esh

0.0 0.2 0.4 0.6 0.8 1.0x

−16

−14

−12

−10

−8

−6

−4

−2

0

−1.30

−1.25

−1.20

−1.15

−1.10

−1.05

−1.00

−0.95

−0.90

log10 l2 relative error

0.0

0.2

0.4

0.6

0.8

1.0 CubicCAASBCLS(1)QLT(1)QLT(BC)

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 24 / 30

Page 34: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Transport on the sphere: Convergence and mass redistribution

0 1 2 3 4Refinement level

14

13

12

11

10

9

8

7

6

5

4

3

2

1

log

2 l

2 r

ela

tive e

rror

Slotted Cylinders

Cosine Bells

Gaussian Hills

Rotation

0 1 2 3 4Refinement level

Divergent Flow

None

CAAS

LS(1)

QLT(1)

0 1 2 3 4Refinement level

2

1/3

Nondivergent Flow

0 1 2 3 4Refinement level

16

15

14

13

12

11

10

9

8

7

6

5

4

log

2 r

ela

tive t

ota

l re

dis

trib

ute

d m

ass

Rotation

0 1 2 3 4Refinement level

Divergent Flow

CAAS

LS(1)

QLT(1)

0 1 2 3 4Refinement level

2

1/2

Nondivergent Flow

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 25 / 30

Page 35: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Outline

1 Introduction

2 Communication-efficient density reconstructorsProblem formulationAlgorithms

Optimization problems and algorithmsLimited-reduction algorithmsSafety problemTree algorithms

Numerical experiments

3 Integration into the HOMME dynamical core

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 25 / 30

Page 36: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Fidelity study6

3 ◦ 3/2 ◦ 3/4 ◦ 3/8 ◦ 3/16 ◦

Mesh resolution

10-3

10-2

10-1

l 2 r

ela

tive e

rror

Transport error with tuned parameters

HOMME tuned: Cosine Bells

HOMME/SL 5/3-halo: Cosine Bells

HOMME tuned: Gaussian Hills

HOMME/SL 5/3-halo: Gaussian Hills

3 ◦ 3/2 ◦ 3/4 ◦ 3/8 ◦ 3/16 ◦

Mesh resolution

10-3

10-2

10-1

l 2 r

ela

tive e

rror

Transport error with operational parameters

CAM operational: Gaussian Hills

HOMME/SL 2/3-halo: Gaussian Hills

Nondivergent flow test case.

Compare (1) tuned parameters and (2) operational parameters, as in previous slide.

SL transport is uniformly more accurate.

6“HOMME tuned” data are from O. Guba, et al, Optimization-based limiters for the spectral element method, JCP 2014.“CAM operational” data are from P. H. Lauritzen, et al. "Geoscientific Model Development A standard test case suite fortwo-dimensional linear transport on the sphere: results from a collection of state-of-the-art schemes." GMD 7(1) 2013.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 26 / 30

Page 37: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Resolution: DCMIP2016 Baroclinic Instability

Left: Eulerian transport

Middle: SL transport

Right: SL transport with optional hyperviscosity applied

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 27 / 30

Page 38: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

0 20 40 60 80 100 120Number of tracers

0.0

0.2

0.4

0.6

0.8

1.0

1.2

Wall-

clock

tim

e (

s) p

er

SL

transp

ort

tim

e s

tep length

SL HSW

SL KNL

Eulerian KNL

Eulerian HSW

Performance as a function of number of tracers24 KNL nodes, 64 ranks/node, 2 threads/rank24 HSW nodes, 32 ranks/node, 1 thread/rank

6,144 elements, 72 levels

0 2 4 6 8 100.00

0.02

0.04

0.06

0.08

0.10

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 28 / 30

Page 39: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

HOMME dycore strong scaling performance

8 16 32 64 128256

6751350

180036003600

Number of Compute Nodes

10−1

100

101

SY

PD

HOMME v1 1/4 Degree

Cori-KNL HOMMECori-KNL HOMME/SLEdison-IB HOMMEEdison-IB HOMME/SL

128256

5121024

20484096

5462

Number of Compute Nodes

10−1

100

101

SY

PD

13km NGGPS Benchmark

Cori-KNL HOMMECori-KNL HOMME/SLEdison-IB HOMMEEdison-IB HOMME/SL

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 29 / 30

Page 40: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Summary and conclusions

The property preservation problem appears in many settings in a coupled Earth system model.A fast Constrained Density Reconstructor enables fast 2-phase algorithms:I Fast non-property-preserving algorithmI Fast CDR to restore properties

Developed a framework to analyze CDRs and created some new algorithms.

Implemented in open-source library github.com/E3SM-Project/Compose.

CDRs are used in two separate parts of EAM so far: semi-Lagrangian tracer transport,physics-dynamics remap.

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 30 / 30

Page 41: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Thanks!

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 31 / 30

Page 42: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Generic weights, assured solution

1: function CLIPANDASSUREDGENERICSUM(b, l, u)2: x̄← CLIP(l, u)3: m← b− e · x̄4: if m = 0 then return x̄5: choose w ≥ 06: δ ← e · w7: if δ = 0 then return CLIPANDASSUREDSUM(b, l, u)8: z← w/δ9: v← MAKEASSUREDWEIGHTS(l, u,m, x̄)

10: α← MAKEBESTCONVEXCOMBINATION(l, u, z, v,m, x̄)11: y← αz + (1− α)v12: return x̄ + my13: end function14: function MAKEBESTCONVEXCOMBINATION(l, u, z, v,m, x̄)15: α← 116: if m = 0 then return α17: d← u if m > 0 else l18: for i← 1, n do if zi > vi then α← min

{α, di−x̄i−mvi

m(zi−vi)

}19: return α20: end function

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 32 / 30

Page 43: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Safety

1: function RECONSTRUCTSAFELY(Q̄∗, Q̄◦, a, l, u,w, SELECTX)2: if w = NONE then w← a3: b← Q̄◦ − e · Q̄∗

4: if e · l ≤ b ≤ e · u then return SELECTX(b, l, u,w) . T (b, l, u) 6= ∅5: if b > e · u then6: us ←

(maxi

Q̄∗i +uiρ̄i

)a− Q̄∗

7: if b ≤ e · us then8: return u + SELECTX(b− e · u, 0, us − u, w) . Ts(Q̄∗, a, b, l, u) 6= ∅9: else

10: return us + b−e·use·a a . Ts(Q̄∗, a, b, l, u) = ∅

11: end if12: else13: ls ←

(mini

Q̄∗i +liρ̄i

)a− Q̄∗

14: if b ≥ e · ls then15: return l + SELECTX(b− e · l, ls − l, 0, w)16: else17: return ls + b−e·ls

e·a a18: end if19: end if20: end function

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 33 / 30

Page 44: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

QLT, top level

1: function QLT(Q̄∗, Q̄◦, a, l, u,w, SELECTX, r)2: if w = NONE then w← a3: LEAVESTOROOT(r, Q̄∗, a, l, u,w)4: x← 05: b← Q̄◦ − r.data[0]6: ROOTTOLEAVES(r, b, x)7: return x8: end function

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 34 / 30

Page 45: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

QLT, implementation1: function LEAVESTOROOT(n, Q̄∗, a, l, u,w)2: if n.kids = ∅ then3: n.data← (Q̄∗[n.id], a[n.id], l[n.id], u[n.id],w[n.id]) . List of 5 scalars4: else5: for k in n.kids do LEAVESTOROOT(k, Q̄∗, a, l, u,w)6: n.data←

∑k∈n.kids k.data . Element-wise sum of kids’ lists

7: end if8: end function9: function ROOTTOLEAVES(n, bn, x)

10: if n.kids = ∅ then11: x[n.id] = bn

12: else13: Q̄∗n , an, ln, un,wn ← () . Initialize 5 empty lists14: for k ∈ n.kids do . Fill the lists15: append entries of k.data to Q̄∗n , an, ln, un,wn, respectively16: end for17: if e · wn = 0 then wn ← an

18: Q̄◦n ← bn + n.data[0]19: xn ← RECONSTRUCTSAFELY(Q̄∗n , Q̄◦n, an, ln, un,wn,SELECTX)20: for i← 1,LENGTH(n.kids) do ROOTTOLEAVES(n.kids[i], xn[i], x)21: end if22: end function

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 35 / 30

Page 46: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

Resolution: DCMIP2016 Baroclinic InstabilityConfiguration: theta-l, nonhydrostatic mode, moist, ne = 30, tstep = 300,rsplit×qsplit = 6Eulerian at left; SL at right

(a) qv, level 20, day 30 (b) qv, level 30, day 29

(c) Toy chemistry tracer, level 30, day 30 (d) Toy chemistry diagnostic, level 30, day 15

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 36 / 30

Page 47: Algorithms and software for fast atmospheric tracer transport in … · Algorithms and software for fast atmospheric tracer transport in E3SM Andrew M. Bradley Center for Computing

p-refined tracer transport

2 4 8 16 32 64 128

ne

4036322824201612840

log

2 l

2 r

ela

tive e

rror

Gaussian Hills

np 4

np 6

np 11

QLT

No limiter

Interpolated v

Exact v

2 4 8 16 32 64 128

ne

16

14

12

10

8

6

4

2

Cosine Bells

2 4 8 16 32 64 128

ne

4.0

3.5

3.0

2.5

2.0

1.5

1.0

0.5Slotted Cylinders

Andrew M. Bradley (CCR, SNL) Fast tracer transport 18 Nov 2019 37 / 30