gpu computing with cuda lecture 9 - applications - cfdgpu computing with cuda lecture 9 -...
TRANSCRIPT
![Page 1: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/1.jpg)
GPU Computing with CUDALecture 9 - Applications - CFD
Christopher Cooper Boston University
August, 2011UTFSM, Valparaíso, Chile
1
![Page 2: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/2.jpg)
Outline of lecture
‣Overview of CFD
- Navier Stokes equations
- Types of problems
- Discretization methods
‣ “Conventional” CFD
‣ Port CFD codes to CUDA
‣ Efforts
‣ Example problem: implicit heat transfer
2
![Page 3: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/3.jpg)
‣Numerical modeling of fluid systems
‣Navier-Stokes equation: momentum conservation
‣ Type of problems:
- Incompressible
- Compressible (non-viscous approximation)
- Shallow water
- Biphasic flows....3
CFD - Introduction
!u!t
+ (u ·!)u = "!p
"+ #!2u
![Page 4: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/4.jpg)
‣ Earliest: Richardson (1910)
- Human computers
- Quickest averaged 2000 operations a week
‣CFD development tied with computers!
- 50s-60s: use of digital computers, finite difference methods
- 70s: finite element methods, spectral methods
- 80s: finite volume methods
- 90s: application to diverse industries
4
CFD - Introduction
![Page 5: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/5.jpg)
CFD - Main discretization methods
‣ Finite difference
‣ Finite volume
5
!u
!x=
ui+1,j ! ui,j
!x
!
!t
!
!
"Ud! +"
!!
"Fnds = 0
!"U
!t+
! "F
!x= 0
![Page 6: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/6.jpg)
‣ Finite element method
‣ Spectral methods
6
CFD - Main discretization methods
!
!!u ·!v dx =
!
!fv dx.
!!u
!x= iku
![Page 7: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/7.jpg)
‣Mesh free methods
- Smoothed Particle Hydrodynamics
- Vortex methods
- Radial Basis Functions
- ...
7
CFD - Main discretization methods
!u
!x=
N!
i=0
"i!#i
!x
![Page 8: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/8.jpg)
‣ Fluid flow is a multi-scale phenomena
- We need Re3 mesh points to reproduce all scales!
- Turbulence modeling
- Approximate turbulence effects
8
CFD - Fluid Modeling
Re =V D
!
![Page 9: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/9.jpg)
Conventional CFD
‣Unstructured grids
- Unstructured sparse matrices
‣ Incompressible
- Projection methods
‣ Implicit
- Linear solvers
‣Modeled turbulence
- Reduced number of points
9
! · u = 0
![Page 10: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/10.jpg)
‣CFD is a tough problem for the GPU:
- Memory bound problems
‣Also, needs to convince people
- Old legacy codes
- How to port old codes to the GPU?
‣On the other hand, CFD codes are
- SIMD
- Single precision
- Large data sets10
Conventional CFD
![Page 11: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/11.jpg)
Porting a code to GPU
‣Option 1: accelerate the existing code
‣Option 2: Rewrite code from scratch
‣Option 3: Rethink algorithms
11Next slides credits: J. Cohen - NVIDIA
Pot
entia
l ac
cele
ratio
n
![Page 12: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/12.jpg)
‣ Easiest way
‣ Probably not huge speedup
‣ Libraries like Cusp or CUFFT may be useful
12
Option 1: Accelerate existing code
![Page 13: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/13.jpg)
‣ SpeedIT (OpenFOAM)
- Ported linear solvers to GPU
- Supports multi-GPU
13
Option 1: Accelerate existing code - SpeedIT
Ville Tossavainen (Seeinside Ltd.)
Mesh Speedup
20x20 -100x
96x96x96 2.4x
128x128x32 2.0x
Xeon X5650 CPUM2050 GPU
![Page 14: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/14.jpg)
‣ FEAST (Finite Element Analysis and Solution Tools)
- High level abstraction approach
- Isolate “accelerable” parts of code
- Ports solver to GPU: Multigrid
14
Option 1: Accelerate existing code - FEAST
Strzodka, Goddeke, Behr (2009)
Opteron 2214 4 nodes CPUGTX 8800 GPU
Acceleration fraction: 75%Local speedup: 11.5xGlobal speedup: 3.8x
![Page 15: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/15.jpg)
Option 2: Rewrite whole code
‣ First need to think about
- What is the total application speedup that you can get
- How does rewrite compare to accelerator approach
- Good design
- What global optimizations are possible
15
![Page 16: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/16.jpg)
Option 2: Rewrite whole code - cuIBM
‣ Immersed Boundary Method on GPU (cuIBM)
- Finite difference code with immersed boundary no slip condition
- 2 linear systems: implicit diffusion and projection
- Reported speedup: 7x
16
Layton, Krishnan, Barba 2011
0
0.5
1
1.5
2
2.5
Average over 16000 timesteps
Tim
e [s
]
AXPYApply BCsConversionForce CalculationForce OutputGenerate bc1Generate r2Generate rNMMMMat!vecMem TransferOutputPreconditionerSolve 1Solve 2Transfer qTransposeUpdate BUpdate QT
![Page 17: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/17.jpg)
17
Option 2: Rewrite whole code - cuIBM
0
5
10
15
20
25
30
35
40
45
50
Tim
e [s
]
pyAMGblackbox
pyAMGSmoothed
Aggregation
CuspNon!PC
CuspDiagonal PC
CuspScaled
Bridson
CuspSmoothed
Aggregation
With good pre-conditioner, GPU is 9x faster, not much difference in other cases (best is 1.6x faster)
CPU
GPU
![Page 18: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/18.jpg)
Option 2: Rewrite whole code - Open Current
‣Developed by Jonathan Cohen in NVIDIA
‣Compared a highly optimized CPU code and GPU code
- CPU: Fortran, 8-core 2.5 GHz Xeon (8 thredas with MPI and OpenMP)
- GPU: CUDA, Tesla C1060
‣ Solved the Rayleigh-Bernard with a finite difference code
18
![Page 19: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/19.jpg)
Option 2: Rewrite whole code - Open Current
19
ResolutionCUDA
time/step msFortran
time/step msSpeedup
64x64x32 24 47 2.0x
128x128x64 79 327 4.1x
256x256x128 498 4070 8.2x
384x384x192 1616 13670 8.5x
![Page 20: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/20.jpg)
Option 3: Rethink numerical algorithms
‣Most time consuming alternative!
‣Maybe new architectures require new numerics
‣ Find methods that map well to the hardware
- Maybe we overlooked something in the past because it was impractical
20
![Page 21: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/21.jpg)
Option 3: Rethink numerical algorithms - DG
‣Discontinuous Galerkin Methods
- Arithmetically intensive
- Mainly local
‣ Klockner et al. used DG to solve conservation laws
21
GPU T1060CPU: Xeon E5472
![Page 22: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/22.jpg)
Implicit heat equation solver
‣Conventional CFD usually is dominated by Poisson type solvers
- Projection methods
- Implicit solvers to avoid stability constraints
‣Heat equation with Crank-Nicolson
- No stability constraint!
22
!u
!t= "!2u
!k
h2< 0.5
![Page 23: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/23.jpg)
Implicit heat equation solver
23
T = 200
T = 200
T = 0
T = 0
3.5
3.5
α = 0.645k = 1e-5N = 128
!u
!t= "!2u
un+1i,j ! un
i,j
k= !
2
!un
i,j+1+uni,j!1+un
i+1,j+uni!1,j!4un
i,j
h2
+un+1i,j+1+un+1
i,j!1+un+1i+1,j+un+1
i!1,j!4un+1i,j
h2
"
![Page 24: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/24.jpg)
Implicit heat equation solver
24
aun+1i,j!1 + aun+1
i,j+1 + aun+1i!1,j + aun+1
i+1,j + bun+1i,j = RHSi,j
a = ! !k
2h2b = 1! 4a = 1 +
2!k
h2
RHSi,j = uni,j +
!k
2h2
!un
i,j!1 + uni,j+1 + un
i!1,j + uni+1,j ! 4un
i,j
"!BCn+1
![Page 25: GPU Computing with CUDA Lecture 9 - Applications - CFDGPU Computing with CUDA Lecture 9 - Applications - CFD Christopher Cooper Boston University August, 2011 UTFSM, Valparaíso, Chile](https://reader030.vdocuments.us/reader030/viewer/2022040403/5e8948e7305b5d1cd5163da8/html5/thumbnails/25.jpg)
Implicit heat equation solver
25
[A] = I +!k
2h2· [Poisson]
[A]un+1 = RHS
[A] size (N-2)2 x (N-2)2
RHS size (N-2)2
un+1 size (N-2)2