design and optimization of openfoam-based cfd applications
TRANSCRIPT
![Page 1: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/1.jpg)
Design and Optimization of OpenFOAM-basedCFD Applications for Hybrid and Heterogeneous
HPC Platforms
Amani AlOnazi∗, David E. Keyes∗, Alexey Lastovetsky†, VladimirRychkov†
∗Extreme Computing Research Center, KAUST, Thuwal, Saudi Arabia,†Heterogeneous Computing Laboratory, UCD, Dublin, Ireland,
May 21, 2014
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 1 / 31
![Page 2: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/2.jpg)
Overview
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 2 / 31
![Page 3: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/3.jpg)
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 3 / 31
![Page 4: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/4.jpg)
Motivation
Hardware changes have to be taken into accountsI Parallelism and heterogeneity in modern HW
Per-processor performance on heterogeneous systems
Algorithms and codes have to be redesigned
B The heterogeneity of these platforms leads to several challenges andmuch contemporary attention is devoted to new software solutions. Thistrend in the HPC platforms invites redesign of the CFD packages or thealgorithms themselves to use these platforms efficiently.
“ I would rather have today’s algorithms on yesterday’s computersthan vice versa.”
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 4 / 31
![Page 5: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/5.jpg)
OpenFOAM CFD Package
Open source Field Operation And Manipulation (OpenFOAM) is a li-brary written in C++ used to solve PDEs
It covers a wide range of solvers employed in CFD, such as Laplaceand Poisson equations, incompressible flow, multiphase flow, and userdefined models
Free, open source, parallel, modular, and flexible
Large, user-driven support community
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 5 / 31
![Page 6: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/6.jpg)
OpenFOAM Parallel Approach
MPI Bulk Synchronous Parallel Model
OpenFOAM uses MPI to provide parallel multi-processorsfunctionality
The mesh and its associated fields are divided into subdomains andallocated to separate processors (domain decomposition)
Domain Decomposition Methods
It provides different utilities for partitioning the domain, such as:
Simple
Hierarchal
METIS
SCOTCH
The communication between the subdomains is implicitly handled withinthe package
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 6 / 31
![Page 7: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/7.jpg)
OpenFOAM Selected Applications
icoFoam The incompressible lam-inar Navier-Stokes equations canbe solved by icoFoam, which ap-plies the PISO algorithm in timestepping loop.
∇ � u = 0
∂u
∂t+∇ �(uu)−∇ �(ν∇u) = −∇p
p: CG
u: Bi-CGSTAB
laplacianFoam The solver is usedto find the solution of the Laplacianequation. The equation contains onevariable, a passive scalar, for instance,a temperature, T .
∂T
∂t−∇2(DT · T) = 0
T : CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 7 / 31
![Page 8: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/8.jpg)
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 8 / 31
![Page 9: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/9.jpg)
Performance Analysis: Platform
A single node consisting of twosockets, each socket is connectedto a GPU device (Tesla C2050)
The socket has 6 dual cores IntelXeon CPU X5670 @ 2.93GHz
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 9 / 31
![Page 10: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/10.jpg)
laplacianFoam: 3D Heat Equation
Heat equation test case solved over a three-dimensional slab geometry usingthe laplacianFoam.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 10 / 31
![Page 11: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/11.jpg)
icoFoam: Lid-driven Cavity flowThe lid-driven cavity flow test case contains the solution of a laminar,isothermal and incompressible flow over a three-dimensional cubic geom-etry. The top boundary of the cube is a wall that moves in the x direction,whereas the rest are static walls.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 11 / 31
![Page 12: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/12.jpg)
Conjugate Gradient 1 iteration
A single iteration of the conjugate gradient solver using a matrix derivedfrom the 3D heat equation.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 12 / 31
![Page 13: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/13.jpg)
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 13 / 31
![Page 14: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/14.jpg)
Proposed Optimizations
Hybrid CG Solver
Hybrid Pipelined CG Solver
Heterogenous Decomposition
“ Future computing architectures will be hybrid systems withparallel-core GPUs working in tandem with multi-core CPUs. ”
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 14 / 31
![Page 15: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/15.jpg)
Hybrid CG
Multicore/MultiGPU
CUDA Kernel is based onCUSP and Thrust calls.
CSR Format in the GPU andSkyline Format in the CPU
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 15 / 31
![Page 16: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/16.jpg)
Hybrid Pipelined CG
Recently introduced byGhysels and Vanroose∗Reduces the globalcommunication to 1 timeinstead of three
Offers better scalability atthe price of extracomputations
r0 = b − Axw0 = Ar0k = 0while ‖r‖2 ≥ τ do
λk = dotc(rk , rk )δ = dotc(wk , rk )q = SpMV (A,wk )if (k >0) then
β = λk / λk−1
α = λk / (δ - (β∗λk )/α)else
β = 0α = λk / δ
end ifz = q + β ∗ zs = w + β ∗ sp = r + β ∗ px = x + α ∗ pr = r − α ∗ sw = w − α ∗ z
end while
∗ P. Ghysels and W. Vanroose, Hiding global synchronization latency in the preconditioned con-
jugate gradient algorithm, Parallel Computing, 2013.Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 16 / 31
![Page 17: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/17.jpg)
Hybrid Pipelined CG
Multicore/MultiGPU
CUDA Kernel is based onCUSP and Thrust calls.
CSR Format in the GPU andSkyline Format in the CPU
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 17 / 31
![Page 18: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/18.jpg)
Heterogenous Decomposition
The main idea is to adequately partition and assign the subdomain tothe processor in proportion to its performance
I more powerful processors get larger subdomains
Combines the performance model and the METIS/SCOTCH library
The load of the processor will be balanced if the number ofcomputations performed by each processor is accordance to its speedon execution the kernel
si (ni ) =ni + 2FiT (ni )
ri (ni ) =si (ni )∑p si (ni )
ni ,new = N ∗ ri (ni )
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 18 / 31
![Page 19: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/19.jpg)
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 19 / 31
![Page 20: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/20.jpg)
Experimental Results: Heterogenous Decomposition
Metis Heterogeneous0
0.002
0.004
0.006
0.008
0.01
0.012
0.014
0.016
0.018
0.02
Decomositon Methods
Ave
rage
Tim
e P
er I
tera
tion
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 20 / 31
![Page 21: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/21.jpg)
Experimental Results: Speedup icoFoam I
N=100k
2 4 8 12 160
0.5
1
1.5
2
2.5
3
3.5
Processors
Sp
eed
up
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 21 / 31
![Page 22: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/22.jpg)
Experimental Results: Speedup icoFoam II
N=1M
2 4 8 12 160
0.2
0.4
0.6
0.8
1
1.2
1.4
1.6
1.8
2
Processors
Sp
eed
up
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 22 / 31
![Page 23: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/23.jpg)
Experimental Results: Speedup laplacianFoam I
N=100k
2 4 8 12 160
1
2
3
4
5
6
7
8
Processors
Spee
dup
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 23 / 31
![Page 24: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/24.jpg)
Experimental Results: Speedup laplacianFoam II
N=1M
2 4 8 12 160
0.5
1
1.5
2
2.5
3
3.5
4
Processors
Spee
dup
Hybrid CGHybrid Pipelined CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 24 / 31
![Page 25: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/25.jpg)
Experimental Results: Performance
103
104
105
106
107
1
2
4
8
16
32
64
128
256
512
Problem Size
MF
LO
Ps/
S
Hybrid Pipelined CGHybrid CG
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 25 / 31
![Page 26: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/26.jpg)
Experimental Results: Roofline
0.125 0.25 0.5 1 2 4 8 16
1
2
4
8
16
32
64
128
256
512
FLOP:BYTE Ratio
Att
aina
ble
Gflo
p/s
DRAM BWPCI BWStream DRAM BW
Peak DP
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 26 / 31
![Page 27: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/27.jpg)
1 IntroductionOpenFOAM CFD PackageOpenFOAM Applications
2 Performance AnalysislaplacianFoamicoFoamConjugate Gradient
3 Proposed OptimizationsHybrid CG SolverHybrid Pipelined CG SolverHeterogenous Decomposition
4 Experimental ResultsHeterogenous DecompositionSpeedup icoFoamSpeedup laplacianFoamHybrid Solvers Performance
5 Conclusion
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 27 / 31
![Page 28: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/28.jpg)
Conclusion
Memory bound applications, such as the OpenFOAM selected solvers,can take better advantage of the full hardware potential, which is nowcomplex, hybrid and heterogeneous, if all resources are taken into ac-counts in a holistic approach.
Vector reduction kernel performs n × 1 memory transactions, with nthe vector size and cannot be combined with other operations → lowarithmetic intensity, low memory throughput and poor scalability whenincreasing number of GPU/CPU.
The need for dynamic load balancing scheduling, which adaptively bal-ances the workload during the run-time, by memory-aware work steal-ing.
The experimental results show that the hybrid implementation of bothsolvers significantly outperforms state-of-the-art implementations of awidely used open source package.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 28 / 31
![Page 29: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/29.jpg)
Thank You!
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 29 / 31
![Page 30: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/30.jpg)
Backup Slides.
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 30 / 31
![Page 31: Design and Optimization of OpenFOAM-based CFD Applications](https://reader033.vdocuments.us/reader033/viewer/2022050612/62741fa9f76c303914555d24/html5/thumbnails/31.jpg)
PISO Algorithm
Pressure Implicit with Splitting of Operators (PISO)*1 Set the boundary conditions.2 Solve the discretized momentum equation to compute an intermediate
velocity field.3 Compute the mass fluxes at the cells faces.4 Solve the pressure equation.5 Correct the mass fluxes at the cell faces.6 Correct the velocities on the basis of the new pressure field.7 Update the boundary conditions.8 Repeat from 3 for the prescribed number of times.9 Increase the time step and repeat from 1.
*J. H. Ferziger, M. Peric, Computational Methods for Fluid Dynamics, Springer, 3rd Ed., 2001. H. Jasak, Error Analysis
and Estimation for the Finite Volume Method with Applications to Fluid Flows, Ph.D. Thesis, Imperial College, London, 1996.
http://openfoamwiki.net
Amani AlOnazi (KAUST) ParCFD 2014 May 21, 2014 31 / 31