porting s3d to titan (turbulent combustion) · 2017-07-20 · • recent 3d simulation on jaguar...
TRANSCRIPT
![Page 1: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/1.jpg)
Porting S3D to Titan (Turbulent Combustion)
Applica'on Readiness Team G. Bansal, J. H. Chen SNL R. W. Grout NREL R. Sankaran, K. Spafford, W. Wang ORNL M. FaBca, N. Juffa, G. Ruetsch Nvidia J. Levesque, J. Schwarzmeier Cray
Presented by: Ramanan Sankaran
August 15, 2011
![Page 2: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/2.jpg)
2
Direct numerical simulations (DNS) • Turbulent combustion occurs over a wide range of scales
– Device sizes are O(1m) – Diffusive scales and flame thickness O(10-100 µm) – Non-linear coupling and interaction among the entire range of
scales
• Combustion CFD approaches
• Direct numerical simulation (DNS) – No sub-grid models, but limited on range of scales – Simulations limited to canonical research configurations
Small scales Large scales
DNS LES RANS ♦
![Page 3: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/3.jpg)
3
DNS is hard
• DNS of combustion is highly floating point intensive • To increase the range of scales captured by a factor ‘X’, the
computational work increases by ~X3
• More time steps needed for better statistics and less dependence on initial condition
• Complex fuels require higher number of equations per grid point
• Device scale simulations are intractable and beyond the reach of DNS
![Page 4: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/4.jpg)
4
T` = 3.75K T` = 7.50K T` = 15.0K T` = 30.0K
Increasing stratification
Hom
ogeneous!
Front-like!
Results from a 2D parametric study with hydrogen chemistry (9 chemical species), Chen et al. 2003.
§ Objective: 3-dimensional DNS of HCCI combustion in a high-pressure stratified turbulent dimethyl ether (DME) blended iso‑octane/air mixture using detailed chemical kinetics (60 chemical species)
Grid: 2D O(106) ➟ 3D O(109). Chemical complexity: 9 ➟ 60 species. § Goals: To investigate
Ø Interaction of 3D turbulence with important chemical kinetic pathways leading to ignition Ø Effects of charge stratification on heat release modes, pressure rise rates, and pollutant
formation Ø Generate a high-fidelity database for use as a benchmark to validate sub-grid combustion
models for mixed-mode combustion in LES and RANS
What do we want to simulate on Titan?
![Page 5: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/5.jpg)
5
Planning the science simulation
• Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation
• Planned simulation will have more grid points and/or larger chemistry
• Will need a month on 12,000 hybrid nodes of Titan
Figure 5: Computational domain and grid to be used for simulations of the CRF HCCI engine.
Figure 6: Reaction and diffusion structures for OH radical during the third stage thermal explosion of a high-pressureDME fueled autoignition process.Recent 3D DNS of auto-ignition with 30-species
DME chemistry (Bansal et al. 2011)
![Page 6: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/6.jpg)
6
Defined a target problem and profiled it • A benchmark problem was defined to closely resemble the
target simulation – 52 species n-heptane chemistry and 483 grid points per node – 483 * 12,000 nodes = 1.5 billion
grid points
• Code was benchmarked and profiled on dual-hexcore XT5
• Several kernels identified and extracted into stand-alone driver programs
Chemistry
Core S3D
![Page 7: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/7.jpg)
7
Chemistry Kernels
• Reaction rates, thermodynamic properties and transport coefficients account for 55% of time. – Complex chemical kinetic models needed to address multi-stage ignition
and flame dynamics
• Point-wise functions that are independent of S3D’s mesh data structure and MPI-layer – Efforts will pay for a long time irrespective of changes to S3D’s
algorithms, data structures and solver.
• Used by other combustion codes in the community. – Impacts other HPC and workstation-scale combustion applications.
![Page 8: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/8.jpg)
8
General library vs model specific code
Legacy fortran library Lots of if-‐then, do
State Input
Output
Parse Analyze Generate Chemistry Tables
Compute kernel Fortran/CUDA/??
Generated code State Input
Output
Chemistry Table
![Page 9: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/9.jpg)
9
Chemistry Code Generator • Code generation for the chemistry routines is in vogue
– Allows non-standard and reduced chemistry models. – Branches are evaluated and loops are unrolled at generation time. – Model constants are in place avoiding pointer chases.
• Goals of the new code generator – Ensure accuracy and correctness for complex chemistry. – Ability to generate for newer architectures, programming models. – Target a wider audience using much more complex chemistry.
• Multi-zone type calculations with thousands of species – Expose more levels of parallelism
• Partition and parallelize the reaction network
![Page 10: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/10.jpg)
10
Need to accelerate more than chemistry
• Amdahl’s law: If you don’t accelerate everything, soon you won’t have accelerated any
Weak scaling of target problem on Jaguar XT5
!"
#"
$"
%"
&"
'"
("
)"
*+'"#",-./" *+'")$!!",-./0"
!"##$%&'(&
)'*+&('
#"1-"2"345"
6-7/"8%9"
6:/;<017="
![Page 11: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/11.jpg)
11
Core S3D Time Advance Loop
• Flux, derivatives, RHS and integrate account for 40% time – d/dt (Qk) = (Advection) + (Diffusion) + (Source) – Evaluation and accumulation of the terms that are on the right
hand side of the equation – Time integration
• Memory intensive 3, 4 and 5 dimensional loops and arrays – Low computation intensity – Strided and non-contiguous memory access – Representative of other CFD codes
• MPI communication between nearest neighbors in S3D’s 3D grid topology for halo exchange
![Page 12: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/12.jpg)
12
Hybridization of S3D
• Turn S3D from pure MPI code to a hybrid MPI+OpenMP – Improve MPI scaling – Use OpenMP extensions for porting all of S3D to accelerator
• S3D code revisions – Rearranging the compute loops and derivatives to have large
compute regions interleaved with communication – Variable scoping - distinguishing shared global variables from thread
private variables – Overlapping computation and communication by identifying
opportunities for preposting halo communication
![Page 13: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/13.jpg)
13
While we were hybridizing S3D
!"
#"
$"
%"
&"
'"
("
)"
*+'"#",-./" *+'")$!!",-./0" *1("#",-./" *1("(2",-./0"
!"##$%&'(&
)'*+&('
#"3-"4"567"
8-9/":%;"
8</=>039?"
• XE6 came with Gemini interconnect
Scaling of original code (XT5 vs XE6)
![Page 14: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/14.jpg)
14
Performance of hybrid code
• Currently we cannot beat pure-MPI performance – Scales better, but poor in
absolute time
• Needs more tuning – Memory affinity – Placement – Optimum number of threads
• Our vision in hybridizing S3D is to be on the right curve for exascale 2008 2018
Per
form
ance
![Page 15: Porting S3D to Titan (Turbulent Combustion) · 2017-07-20 · • Recent 3D simulation on Jaguar was used to extrapolate and plan a target Titan simulation • Planned simulation](https://reader033.vdocuments.us/reader033/viewer/2022042018/5e7629659a169808d478b20a/html5/thumbnails/15.jpg)
15
S3D rewrite is in progress
• Programming models being explored – CUDA (Fortran) – CCE !$omp directives – PGI !$acc directives
• We anticipate 4X better performance on XK6 Titan nodes than XT5 Jaguar – Combination of on-node GPU acceleration and improved parallel
scaling