vps and scalability on arm · •vps –major ... • few and cheap operations on large global...

26
1 www.esi-group.com Copyright © ESI Group, 2019. All rights reserved. VPS and Scalability on ARM Arm HPC User’s Group at ISC’19 Greg Samonds and Thorsten Queckboerner ESI Group

Upload: others

Post on 20-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

1www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

VPS and Scalability on ARM

Arm HPC User’s Group at ISC’19

Greg Samonds and Thorsten Queckboerner

ESI Group

Page 2: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

2www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Agenda

VPS and Scalability on ARM

• Overview and introduction

• VPS – Major characteristics and typical applications

• Tuning activities – A historical overview

• HPC relevant code characteristics

• VPS explicit on ARM

• Compiler version comparison – Runtime and compilation speed

• Compiler options and runtime

• Scalability VPS explicit on ARM

Page 3: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

3www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Agenda

VPS and Scalability on ARM

• Overview and introduction

• VPS – Major characteristics and typical applications

• Tuning activities – A historical overview

• HPC relevant code characteristics

• VPS explicit on ARM

• Compiler version comparison – Runtime and compilation speed

• Compiler options and runtime

• Scalability VPS explicit on ARM

Page 4: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

4www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

What our Customers are doing

VPS – Virtual Performance Solution

• Characteristics of VPS (formerly also called PAM-CRASH)

• Structural mechanics FE solver

• Explicit solution scheme

• Implicit solution schemes

• General purpose Code with a strong focus on Automotive Virtual Prototyping

• Typical Customer applications

• Crash and Safety simulation

• Dummies and Human models

• Airbag unfolding with gas dynamics (embedded FPM CFD solver)

• Static load cases

• Implicit dynamics, NVH, acoustics

Page 5: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

5www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Virtual Prototyping - Crash Analysis

VPS – Virtual Performance Solution

Page 6: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

6www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

VPS – Virtual Performance SolutionVirtual Prototyping – Safety and Dummies

• Objective

• Virtual assessment of injury risks and safety ratings

• Realistic representation of deformation, injury, kinematics and sensor readings needed

• Dummy development and calibration

• Set up of experimental program for calibration of numerical model

• Validation of material, component and full body

Full body barrier tests

Source: http://www.mynrma.com.au/motoring-services/reviews/ancap/small-cars/mitsubishi-mirage-2013-ancap.htm

Injury risk example

Source: http://www.euroncap.com/en

Safety rating example

Source: https://www.safetywissen.com

Test setup example

Material tests

Component tests Full body pendulum tests

Page 7: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

7www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Virtual Prototyping - Airbag Modelling

VPS – Virtual Performance Solution

Driver airbag unfolding out of a steering wheel

Driver airbag unfolding out of a steering wheel (cut view)FPM velocity vectors

Page 8: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

8www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Virtual Prototyping - Statics & Dynamics & Acoustics

VPS – Virtual Performance Solution

• ResultsStatic loading

Displacements Stresses Torsion angle along vehicle length

Bending Test

Deformation/Strength

Indentation Test(Oil-Canning)

Buckling/ Remaining deformation

Misuse

Forces of retainer clipsStresses/

Remaining deformation

Creep

Remanent deformation after climate chamber

t

T

RT

90°C

Page 9: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

9www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Virtual Prototyping - Statics & Dynamics & Acoustics

VPS – Virtual Performance Solution

DYNAMICS

• Objective

• Improve structural dynamic strength

• Increase comfort during operation

• Noise

• Vibrations

• Methodology

• Linear dynamics

• Structural vibrations (FE)

• Acoustics in/outside of cavity (FE/BEM)

• Coupled vibro-acoustics (FE-FE, FE-BEM)

Eigenfrequencies/-modes

Resonance frequencies

Local dynamic stiffness

Frequency dependent stiffness

Coupled fluid-structure-trim

Interior/exterior sound pressure level Transfer path analysis

Page 10: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

10www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Usage at Customers/HPC-Centers/Partners

VPS – Virtual Performance Solution

• Automotive OEM: HPC usage (12 months):

• VPS-explicit 57 %

• openfoam 28 %

• foamapps 6 %

• starccm 4 %

• abaqus 2 %

• mentalray < 1 %

• starcd < 1 %

• nastran < 1 %

• vectris < 1 %

• aura < 1 %

• caebench < 1 %

Page 11: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

11www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Agenda

VPS and Scalability on ARM

• Overview and introduction

• VPS – Major characteristics and typical applications

• Tuning activities – A historical overview

• HPC relevant code characteristics

• VPS explicit on ARM

• Compiler version comparison – Runtime and compilation speed

• Compiler options and runtime

• Scalability VPS explicit on ARM

Page 12: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

12www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Tuning Activities

VPS – Virtual Performance Solution

• Constantly increasing requirements with regard to computing load• Model size increase

• More load cases

• New applications

• …

• Therefore, code tuning is daily business• Since more then 30 years

• On different areas• Sequential

• Vector

• SMP - OpenMP

• DMP - MPI

• With all major HPC vendors/players (alphabetic order)• CRAY, COMPAQ, Fujitsu, IBM, HP, SGI, SUN, NEC …

• AMD, ARM, INTEL …

Page 13: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

13www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Tuning Activities

VPS – Virtual Performance Solution

• Sequential tuning• Support new chip features

• Vector computer

• AVX units (x86)

• Code modifications + Compiler features

• Memory access

• Tuning on cache usage

• Direct access and data alignment

• Algorithms• Faster algorithms

• But keep memory consumption in mind

Page 14: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

14www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Tuning Activities

VPS – Virtual Performance Solution

• SMP tuning (OpenMP)• Load distribution

• Thread data grain size

• Thread load balance

• Reduce/avoid locks

• Thread and NUMA aware memory management

• DMP tuning (MPI)• Minimize number of communications: Interconnect latency

• Minimize size of messages: Interconnect bandwidth

• Minimize load unbalance• Static load unbalance (Domain decomposition)

• Dynamic load unbalance (Open issue)

Page 15: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

15www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

1980 20001990 2010

Simulation: ResearchCode: SequentialHardware: Vector (Cray, NEC)

Simulation: Industrial, verificationCode: SMP (openMP)Hardware: Risc (SGI, HP, IBM)

Simulation: PredictiveCode: DMP (MPI)Hardware: lean x86 cluster

Simulation: Daily businessCode: hybrid SMP-DMPHardware: fat x86 cluster

week

day

hour

VPS – Virtual Performance Solution

Evolution: Simulation, Software and Hardware evolution (explicit)

2020

Simulation: Daily businessCode: hybrid SMP-DMPHardware: many core cluster

Page 16: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

16www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Agenda

VPS and Scalability on ARM

• Overview and introduction

• VPS – Major characteristics and typical applications

• Tuning activities – A historical overview

• HPC relevant code characteristics

• VPS explicit on ARM

• Compiler version comparison – Runtime and compilation speed

• Compiler options and runtime

• Scalability VPS explicit on ARM

Page 17: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

17www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Portability

VPS – Virtual Performance Solution

• Language:• Std. Fortran (90 or higher)

• Std. C

• Size (Solver core only):• 8000 files

• 2.5 Million lines of code

• Supports all major OS• Unix, Linux, Windows

• Supports all major architecture• ia32, em64t, amd64, ia64, mips, pa-risc, ibm-power, ARM (work in progress)

• MPI Supports by dynamic load module• One executable – multiple MPI runtimes

• All major interconnect• Ethernet, Myrinet, Infiniband, Omnipath

Page 18: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

18www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Performance characteristics

VPS – Virtual Performance Solution

• Main code sections (for typical explicit crash/safety simulation):• Internal forces (FE) 50-65%

• Cache friendly (scatter-gather to local arrays)

• Many operations with few global data

• Mainly direct addressing on static data

• Scales very well for SMP and DMP

• Explicit time integration 15-20%• Few and cheap operations on large global arrays

• Direct addressing on static data

• Most demanding option for memory bandwidth

• Scalability in DMP and SMP limited by hardware

• Contact 15-40%• Indirect addressing on unstructured dynamic data

• Difficult for SMP, very difficult for DMP

• Scales reasonable well for DMP and SMP

Page 19: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

19www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Performance characteristics

VPS – Virtual Performance Solution

• A Crash/Safe simulation model is a big challenge for HPC and parallelization:

• Very feature rich and inhomogeneous input definition

• All kind of elements

• All kind of material models

• Complicated contact

• Particle methods (FPM, SPH)

• Exotic safety options (slipring, airbag, MBSYS, …)

-> Flat profile – no real hotspots

• Big risk to hit

• Load unbalance (dynamic or static)

• Serial code (keep in mind Amdahl)

Page 20: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

21www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Agenda

VPS and Scalability on ARM

• Overview and introduction

• VPS – Major characteristics and typical applications

• Tuning activities – A historical overview

• HPC relevant code characteristics

• VPS explicit on ARM

• Compiler version comparison – Runtime and compilation speed

• Compiler options and runtime

• Scalability

Page 21: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

22www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Testing the latest arm64 compilers, options, and scalability

VPS Arm Porting and Compilers Study

• Compilers: ArmCC-19.0, ArmCC-19.1, ArmCC-19.2, GCC-8.3.0, GCC-9.1.0

• Options: Flags recommended by PRACE Best Practices Guide, flags recommended by hardware provider

• Scalability: Up to 96 cores

• Test cases: Neon and Lupo explicit crash simulation

Page 22: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

23www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Machine configuration and compiler options

Test Setup

Configuration Optionsarmcc-19.0-pflag "-march=armv8.2-a -mcpu=cortex-a72 -O3 -fcray-pointer -fopenmp -fomit-frame-pointer"

armcc-19.0-hflag"-march=armv8.2-a+lse+crc+crypto+fp+simd -mcpu=cortex-a72 -O3 -ffp-contract=fast -funroll-loops -fcray-pointer -fopenmp -fomit-frame-pointer"

armcc-19.1-pflag "-march=armv8.2-a -mcpu=tsv110 -O3 -fcray-pointer -fopenmp -fomit-frame-pointer"

armcc-19.1-hflag"-march=armv8.2-a+lse+crc+crypto+fp+simd -mcpu=tsv110 -O3 -ffp-contract=fast -funroll-loops -fcray-pointer -fopenmp -fomit-frame-pointer"

armcc-19.2-pflag "-march=armv8.2-a -mcpu=thunderx2t99 -O3 -fcray-pointer -fopenmp -fomit-frame-pointer"

armcc-19.2-hflag"-march=armv8.2-a+lse+crc+crypto+fp+simd -mcpu=thunderx2t99 -O3 -ffp-contract=fast -funroll-loops -fcray-pointer -fopenmp -fomit-frame-pointer"

gcc-8.3.0-pflag"-march=armv8.2-a+crc+simd -mtune=cortex-a72 -O3 -fcray-pointer -fopenmp -floop-optimize -falign-loops -falign-labels -falign-functions -falign-jumps -fomit-frame-pointer"

gcc-8.3.0-hflag "-march=armv8.2-a+crc+simd -mtune=cortex-a72 -O3 -fcray-pointer -fopenmp -fbacktrace -ffpe-summary=none"

gcc-9.1.0-pflag"-march=armv8.2-a -mtune=tsv110 -O3 -fcray-pointer -fopenmp -floop-optimize -falign-loops -falign-labels -falign-functions -falign-jumps -fomit-frame-pointer"

gcc-9.1.0-hflag "-march=armv8.2-a+crc+simd -mtune=tsv110 -O3 -fcray-pointer -fopenmp -fbacktrace -ffpe-summary=none"

CPU RAM OS / Kernel MPI Implementation VPS Version

2 X Hi1620 48p (Kunpeng 920)

256GB DDR4 2666MHz

EL 7.6 / 4.14.0-115 HPC-X 2.3.0 (OpenMPI 4.1.0a1)

2018

Page 23: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

24www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Compiled executable runtime comparison, 96p, 5 run average

Test Results

249,94251,42

227,96229,36

226,52 227,12

251,7 251,12

244,76 244,96

210

215

220

225

230

235

240

245

250

255

Elap

sed

Tim

e (s

)

Configuration

VPS Lupo Simulation

163,98 164,38

153,06 153,48

149,18

151,36

154,16 154,72153,88

152,76

140

145

150

155

160

165

170

Elap

sed

Tim

e (s

)

Configuration

VPS Neon Simulation

Page 24: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

25www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Scalability and compilation speed

Test Results

76,7767359980,8189224

0

10

20

30

40

50

60

70

80

90

100

Neon Lupo

Scal

abili

ty %

Case

VPS 48p to 96p Scalability

350

322 318

167 173

0

50

100

150

200

250

300

350

400

ArmCC-19.0 ArmCC-19.1 ArmCC-19.2 GCC-8.3.0 GCC-9.1.0

Tim

e (s

)

Compiler

VPS Compilation Time

Page 25: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

26www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved.

Observations from experiment

Summary

• Meaningful improvement in runtime with ArmCC-19.1/19.2 compared to 19.0

• ArmCC 19.2 executable was faster than GCC 9.1.0, with the margin depending on test case (Neon ~2%, Lupo ~8%)

• Significantly faster compilation speed with GCC compared to ArmCC

• Very small difference in runtime across tested compiler option sets, usually ~1% (within scatter)

• Scaling is limited by our machine’s memory bandwidth. Previously observed ~85% (+10%) scaling up to 96 procs with Neon case on machine with faster RAM

• Promising overall performance, Huawei machine produced similar runtime to dual socket Xeon Gold 6150 machine

Page 26: VPS and Scalability on ARM · •VPS –Major ... • Few and cheap operations on large global arrays • Direct addressing on static data • Most demanding option for memory bandwidth

27www.esi-group.com

Copyright © ESI Group, 2019. All rights reserved. Arm HPC User’s Group at ISC 19 Greg Samonds and Thorsten Queckboerner

Thanks