openfoam on · pdf filedr. markus bühler power acceleration and design center ibm...

24
Dr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen [email protected] OpenFOAM on POWER8

Upload: lenhan

Post on 12-Mar-2018

235 views

Category:

Documents


5 download

TRANSCRIPT

Page 1: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

Dr. Markus Bühler

POWER Acceleration and Design CenterIBM Deutschland Research & Development GmbH, Boeblingen

[email protected]

OpenFOAM on POWER8

Page 2: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation 14 October 20152

OpenFOAM Overview

The Benchmarks

Performance Results

OpenFOAM with GPU

Summary

Agenda

Page 3: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

OpenFOAM Overview

Page 4: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation4 14 October 2015

� Open Source Field Operation and Manipulation

� C++ Toolbox for Simulation of Mechanical Problems− Solvers− pre-/post processing utils− MPI parallelized− NO OpenMP− NO vector intrinsics

� Widely used in industry and academics− Computational Fluid Dynamics (CFD)

• Aerodynamics simulations • Pharmaceutical industry

− Electromagnetism− Combustion− ...

� Many different modules− simpleFoam – static analysis− pisoFoam – dynamic analysis

� Known to be memory bandwidth limited

OpenFOAM Overview

http://www.openfoam.com

Page 5: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

The Benchmarks

Page 6: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation6 14 October 2015

� OpenFoam 2.3.0 Compiled on x86 and POWER8− Only minor configuration changes to compile on POWER8− Optimizations see next slide

� Benchmark 1: Motorbike Example– provided with OpenFoam examples: incompressible/simpleFoam– Different problem sizes by changing grid: 1k – 100M points

� Benchmark 2: Car– Car model– Morphed from two real cars– Only few problem sizes

OpenFOAM on POWER8

Page 7: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation7 14 October 2015

Hardware Configurations

IBM Intel Ivybridge Intel Haswell

Processor P8 E7-4890V2 E5-2699V3

Clock 3.4GHz 2.8Ghz 2.3GHz

Cores / Threads (max)

10 / 80 15 / 30 18/36

Memory 128 GB 96 GB 384 GB

Memory BW 230 GB/s 85 GB/s 68 GB/s

Caches L1 / L2 / L3 / L4

640k / 5M / 80M / 128M

480k / 3.75M / 37.5M / n/a

576k / 4.6M / 45M / n/a

Linux Ubuntu 14.04 Red Hat 6.5 Red Hat 7.1

Compiler GCC 4.8.2 GCC 4.7.2 GCC 4.8.3

All values per socket

Page 8: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation8 14 October 2015

9 Steps

1. surfaceFeatureExtract

2. blockMesh

3. decomposePar

4. snappyHexMesh

5. patchSummary

6. potentialFoam

7. simpleFoam

8. reconstructParMesh

9. reconstructPar

MotorBike Benchmark Overview (1)

0

2000

4000

6000

8000

10000

12000

14000

16000

Runtime Distribution

reconstruct

Par

reconstruct

ParMesh

simple

Foam

potential

Foam

patch

Summary

snappyHex

Mesh

simpleFoam

Run

time

[s]

� simpleFoam dominates runtime� other steps´not always required� concentrate on simpleFoam

Page 9: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation9 14 October 2015

OpenFoam Example

① Start

② blockMesh

Mesh the 3D space • 20 x 8 x 8 = 1280 cells (default)

…• 857 x 343 x 343 = 100M cells

③ decomposePar� Divide into submeshes

MotorBike Benchmark Overview (2)

blockMesh

decomposePar

xyz

Page 10: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation10 14 October 2015

④ snappyHexMesh

MotorBike Benchmark Overview (3)

Adapt mesh to fit the object (dynamic grid)

(Reference) http://www.rccm.co.jp/icem/pukiwiki/index.php?SnappyHexMesh

Page 11: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation11 14 October 2015

⑦ simpleFoam

� One of the main solver included in OpenFOAM.

� Semi-Implicit Method for Pressure-Linked Equation: calculate velocity and pressure by iterative calculation

� Benchmark: run 500 iterative steps

� outcome: aerodynamic coefficients

MotorBike Benchmark Overview (4)

Page 12: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation12 14 October 2015

� Compiled with –O3

� Link with tc_malloc instead of standard malloc• simpleFoam: almost no impact• recomposeParMesh: up to 2x speedup

� Threads per Core• P8 offers up to 8 threads per core• Runtime impact

• chose 4 threads per core unless stated otherwise• x86: little impact on runtime → use 1 thread per core

Optimizations for POWER8

0

2000

4000

6000

8000

10000

12000

14000

0 2 4 6 8 10

Ru

nti

me

[s]

Threads per Core

P8 Runtime vs. Threads

50M cells

Page 13: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

Performance Results

Page 14: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation14 14 October 2015

ConclusionPOWER8� 1.5-3x faster than x86� likely due to

� Memory Bandwidth: 3x� Caches: 2x� SMT

MotorBike Performance

• Running different problem sizes on P8, Ivybridge and Haswell• Best threads/socket setting used for every data point

0

10000

20000

30000

40000

50000

60000

70000

80000

90000

0.E+00 5.E+07 1.E+08

Ru

nti

me

[s]

Cells

Runtimes simpleFOAM

P8

x86 Ivybridge

x86 Haswell

Largest Case (100M Cells):• Ivybridge seems to reach memory limit (96GB), although not swapping• Similar but less obvious effect on P8 (128 GB)• Haswell box has more memory (256GB) → linear behaviourSmaller Cases• show memory bandwidth dependence

Page 15: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation15 14 October 2015

Conclusion• Similar behaviour than motorBike

� speedup 1.5 – 2.5x

� larger advantage for P8 for larger cases

Car Performance

simpleFoam Runtime [s]

1.9x

2.4x

0

2000

4000

6000

8000

10000

12000

14000

16000

9M 40M

P8

Ivybridge

2.4x

1.7x

Industrial Testcase

Page 16: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation16 14 October 2015

Dynamic Simulation

� pisoFoam: Simulates transient behaviour� Runtimes even longer than simpleFoam

� hinders industrial use

2.4x

1.9x

2.4x

2.2x1.7x

2.2x

0

5000

10000

15000

20000

25000

30000

9M 25M 40M

pisoFoam

P8

Ivybridge

2.2x2.2x

1.7x

Conclusion• Similar speedup as simpleFoam

1000 steps 300 steps 300 steps

Page 17: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation17 14 October 2015

Scaling

� All previous runs on single socket� Multinode runs with

• 2 sockets per node• 2 threads/socket• OpenMPI 1.8.3• Infiniband EDR with 100GB/s

Conclusion� good scaling

• more nodes to be run in the future

100

1000

10000

100000

1.E+04 1.E+05 1.E+06 1.E+07 1.E+08

Ru

nti

me

[s]

Cells

Multinode Runtime

0.5 Nodes

1 Node

2 Nodes

Adaptive mesh increases number of cells

Page 18: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

OpenFOAM on GPU

Page 19: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation19 14 October 2015

OpenFOAM on GPU

AeroFluidX: GPU plugin for OpenFOAM

� Developed by FluiDyna

� Ported to POWER8 in collaboration with PADC Boeblingen

� Implements solver & matrix assembly• MPI communication by host• I/O and data preparation by OpenFOAM• Based on AmgX from NVIDIA

� 95% of total runtime on GPU → runtime indepedent from host

� Still under development

� First results indicate performance of 1 K80 ~ 1 P8 socket

P8 K80

mem. bandwidth [GB/s] 230 2x240

GPU memory [GB] n/a 2x12

CPU-GPU bandwidth [GB/s] 16

CPU GPU Improvement

~constant 1TB/s 2x

n/a 32 1.3x

80 5x

Future Systems

Page 20: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

Summary

Page 21: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation21 14 October 2015

� POWER8: up to 3x better performance than x86 on typical benchmarks• Stationary – simpleFoam• Instationary – pisoFoam

� Memory bandwidth� Cache sizes� SMT

� OpenFOAM scales well on P8 with up to 2 nodes� more tests with larger cluster ongoing

� GPU acceleration with POWER8 possible� Becomes interesting with future systems� Better concepts possible with Pascal and NVLINK

� Looking into further optimizations with XLC� Evaluation with new 822LC

Summary & Outlook

Page 22: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation22 14 October 2015

� Tsuyoshi Kamenoue, IBM Japan: Motorbike bechmark overview

� Peter Kaltstein, IBM R&D Germany: Multinode simulations

� Björn Landmann, Fluidyna: AerofluidX port and GPU simulations

Acknowledgments

Page 23: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8
Page 24: OpenFOAM on  · PDF fileDr. Markus Bühler POWER Acceleration and Design Center IBM Deutschland Research & Development GmbH, Boeblingen buehler@de.ibm.com OpenFOAM on POWER8

©2015 IBM Corporation24 14 October 2015

OpenFOAM (V2.4.0) Command listroot@s824l01:~/OpenFOAM/OpenFOAM-2.4.0/platforms/linuxPPC64leGccDPOpt/bin# ls

Co combinePatchFaces foamToEnsight nonNewtonianIcoFoam rhoSimpleFoam surfaceHookUp

DPMFoam compressibleInterDyMFoam foamToEnsightParts objToVTK rhoSimplecFoam surfaceInertia

LTSInterFoam compressibleInterFoam foamToGMV orientFaceZone rotateMesh surfaceLambdaMuSmooth

LTSReactingFoam compressibleMultiphaseInterFoam foamToStarMesh pPrime2 sammToFoam surfaceMeshConvert

LTSReactingParcelFoam createBaffles foamToSurface particleTracks sample surfaceMeshConvertTesting

Lambda2 createExternalCoupledPatchGeometry foamToTetDualMesh patchAverage scalarTransportFoam surfaceMeshExport

MPPICFoam createPatch foamToVTK patchIntegrate selectCells surfaceMeshImport

Mach createTurbulenceFields foamUpgradeCyclics patchSummarypatchSummarypatchSummarypatchSummary setFields surfaceMeshInfo

PDRFoam datToFoam foamUpgradeFvSolution pdfPlot setSet surfaceMeshTriangulate

PDRMesh decomposePardecomposePardecomposePardecomposePar gambitToFoam pimpleDyMFoam setsToZones surfaceOrient

Pe deformedGeom gmshToFoam pimpleFoam shallowWaterFoam surfacePointMerge

Q dnsFoam icoFoam pisoFoam simpleFoamsimpleFoamsimpleFoamsimpleFoam surfaceRedistributePar

R driftFluxFoam icoUncoupledKinematicParcelDyMFoam plot3dToFoam simpleReactingParcelFoam surfaceRefineRedGreen

SRFPimpleFoam dsmcFieldsCalc icoUncoupledKinematicParcelFoam polyDualMesh singleCellMesh surfaceSplitByPatch

SRFSimpleFoam dsmcFoam ideasUnvToFoam porousInterFoam smapToFoam surfaceSplitByTopology

XiFoam dsmcInitialise insideCells porousSimpleFoam snappyHexMeshsnappyHexMeshsnappyHexMeshsnappyHexMesh surfaceSplitNonManifolds

adiabaticFlameT electrostaticFoam interDyMFoam postChannel solidDisplacementFoam surfaceSubset

adjointShapeOptimizationFoam engineCompRatio interFoam potentialFoampotentialFoampotentialFoampotentialFoam solidEquilibriumDisplacementFoam surfaceToPatch

ansysToFoam engineFoam interMixingFoam potentialFreeSurfaceDyMFoam sonicDyMFoam surfaceTransformPoints

applyBoundaryLayer engineSwirl interPhaseChangeDyMFoam potentialFreeSurfaceFoam sonicFoam temporalInterpolate

applyWallFunctionBoundaryConditions enstrophy interPhaseChangeFoam probeLocations sonicLiquidFoam tetgenToFoam

attachMesh equilibriumCO kivaToFoam ptot splitCells thermoFoam

autoPatch equilibriumFlameT laplacianFoam reactingFoam splitMesh topoSet

autoRefineMesh execFlowFunctionObjects magneticFoam reactingParcelFilmFoam splitMeshRegions transformPoints

blockMeshblockMeshblockMeshblockMesh expandDictionary mapFields reactingParcelFoam sprayEngineFoam twoLiquidMixingFoam

boundaryFoam extrude2DMesh mdEquilibrationFoam reconstructParreconstructParreconstructParreconstructPar sprayFoam twoPhaseEulerFoam

boxTurb extrudeMesh mdFoam reconstructParMeshreconstructParMeshreconstructParMeshreconstructParMesh star3ToFoam uncoupledKinematicParcelFoam

buoyantBoussinesqPimpleFoam extrudeToRegionMesh mdInitialise redistributePar star4ToFoam uprime

buoyantBoussinesqSimpleFoam faceAgglomerate mergeMeshes refineHexMesh steadyParticleTracks viewFactorsGen

buoyantPimpleFoam financialFoam mergeOrSplitBaffles refineMesh stitchMesh vorticity

buoyantSimpleFoam fireFoam mhdFoam refineWallLayer streamFunction vtkUnstructuredToFoam

cavitatingDyMFoam flattenMesh mirrorMesh refinementLevel stressComponents wallFunctionTable

cavitatingFoam flowType mixtureAdiabaticFlameT removeFaces subsetMesh wallGradU

cfx4ToFoam fluent3DMeshToFoam modifyMesh renumberMesh surfaceAdd wallHeatFlux

changeDictionary fluentMeshToFoam moveDynamicMesh rhoCentralDyMFoam surfaceAutoPatch wallShearStress

checkMesh foamCalc moveEngineMesh rhoCentralFoam surfaceBooleanFeatures wdot

chemFoam foamDataToFluent moveMesh rhoLTSPimpleFoam surfaceCheck writeCellCentres

chemkinToFoam foamDebugSwitches mshToFoam rhoPimpleDyMFoam surfaceClean writeMeshObj

chtMultiRegionFoam foamFormatConvert multiphaseEulerFoam rhoPimpleFoam surfaceCoarsen yPlusLES

chtMultiRegionSimpleFoam foamHelp multiphaseInterDyMFoam rhoPimplecFoam surfaceConvert yPlusRAS

coalChemistryFoam foamInfoExec multiphaseInterFoam rhoPorousSimpleFoam surfaceFeatureConvert zipUpMesh

coldEngineFoam foamListTimes netgenNeutralToFoam rhoReactingBuoyantFoam surfaceFeatureExtractsurfaceFeatureExtractsurfaceFeatureExtractsurfaceFeatureExtract

collapseEdges foamMeshToFluent noise rhoReactingFoam surfaceFind