which, why and how do they compare in our...

34
MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE 08.07.2018 I MUG’18, COLUMBUS (OH) I DAMIAN ALVAREZ Which, why and how do they compare in our systems?

Upload: others

Post on 11-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC, NOW AND IN THE FUTURE

08.07.2018 I MUG’18, COLUMBUS (OH) I DAMIAN ALVAREZ

Which, why and how do they compare in our systems?

Page 2: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• FZJ’ mission • JSC’s role • JSC’s vision for Exascale-era computing • JSC’s systems • MPI runtimes at JSC • MPI performance at JSC systems • MVAPICH at JSC

Outline

07 September 2018 !2

Page 3: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCFZJ’s mission

07 September 2018 !3

Page 4: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCFZJ’s mission

07 September 2018 !4

JSC

INMIBG

IBG

ICS JCNSIC

S

ICS

IEK

IEK

IEK

IEK

IEK

IEK

IKP

ZEA

ZEA

Page 5: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s role

07 September 2018 !5

•Supercomputer operation for: • Centre – FZJ • Region – RWTH Aachen University • Germany – Gauss Centre for Supercomputing

John von Neumann Institute for Computing • Europe – PRACE, EU projects

•Application support • Unique support & research environment at JSC • Peer review support and coordination

•R&D work • Methods and algorithms, computational science,

performance analysis and tools • Scientific Big Data Analytics with HPC • Computer architectures, Co-Design

Exascale Labs together with IBM, Intel, NVIDIA •Education and training

Page 6: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s role

07 September 2018 !6

CommunitiesResearch Groups

Simulation Labs

Cross-Sectional Teams Data Life Cycle Labs Exascale co-Design

Facilities

PADC

Page 7: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s role

07 September 2018 !7

Source: Jack Dongarra, 30 Years of Supercomputing: History of TOP500, February 2018

Page 8: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s role

07 September 2018 !8

Source: Jack Dongarra, 30 Years of Supercomputing: History of TOP500, February 2018

JSC will host the first Exascale system in the EU around 2022

Page 9: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s vision

07 September 2018 !9

Page 10: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s vision

07 September 2018 !10

Page 11: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s vision

07 September 2018 !11

Extreme Scale Computing

Big Data Analytics

Deep Learning

Interactivity

Page 12: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s vision

07 September 2018 !12

Module 2: GPU-Acc

Module 3: Many core Booster

Module 1: Cluster

BN

BN

BN

BN

BN BN

BN

BN

BN

CN

CN

Module 4: Memory Booster

NAM

NAM

NAM

NICMEM

NICMEM

Module 5: Data Analytics

DN DN

Module 6: Graphics Booster

GN GN

Module 0: Storage

GA

GA

GA

GA

GA GA

DiskDiskDisk Disk

Page 13: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s vision

07 September 2018 !13

General Purpose Cluster

IBM Power 6 JUMP, 9 TFlop/s

IBM Blue Gene/P JUGENE, 1 PFlop/s

HPC-FF100 TFlop/s

JUROPA200 TFlop/s

IBM Blue Gene/QJUQUEEN (2012)5.9 PFlop/s

IBM Blue Gene/LJUBL, 45 TFlop/s

IBM Power 4+JUMP (2004), 9 TFlop/s

Highly scalable

Modular PilotSystem

Modular Supercomputer JUWELS Scalable Module (2019/20)50+ PFlop/s

JUWELS Cluster Module (2018)12 PFlop/s

JURECA Cluster (2015) 2.2 PFlop/s

JURECA Booster (2017) 5 PFlop/s

Page 14: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s systems

07 September 2018 !14

JURECA

▪ Dual-socket Intel Haswell (E5-2680 v3) ▪ 12× cores/socket ▪ 2.5 GHz ▪ ≥ 128 GB main memory

▪ 1,884 compute nodes (45,216 cores) ▪ 75 nodes: 2× K80 NVIDIA GPUs ▪ 12 nodes: 2 × K40 NVIDIA GPUs

512 GB main memory

▪ Peak performance: 2.2 Petaflop/s (1.7 w/o GPUs)

▪ Mellanox InfiniBand EDR

▪ Connected to the GPFS file system on JUST ▪ ~15 PByte online disk and ▪ 100 PByte offline tape capacity

Page 15: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s systems

07 September 2018 !15

JURECA Booster

▪ Intel Knights Landing (7250-F) ▪ 68× cores ▪ 1.4 GHz ▪ 16 + 64 GB main memory

▪ 1,640 compute nodes (111,520 cores)

▪ Peak performance: ~5 Petaflop/s

▪ Intel OmniPath

▪ Connected to GPFS

June 2018: #14 in Europe #38 worldwide #66 in Green500

+JURECA

Page 16: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCJSC’s systems

07 September 2018 !16

JUWELS

▪ Dual-socket Intel Skylake (Xeon Platinum 8168) ▪ 24× cores/socket ▪ 2.7 GHz ▪ ≥ 96 GB main memory

▪ 2,559 compute nodes (122,448 cores) ▪ 48 nodes: 4× V100 NVIDIA GPUs, 192 GB main memory ▪ 4 nodes: 1× P100 NVIDIA GPUs, 768 GB main memory

▪ Peak performance: 12 Petaflop/s (10.4 w/o GPUs)

▪ Mellanox InfiniBand EDR

▪ Connected to the GPFS file system on JUST

June 2018: #7 in Europe #23 worldwide #29 in Green500

Page 17: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• ParaStationMPI • Preferred MPI runtime • Developed in a consortium including JSC, ParTec, KIT and University of Wüppertal • Based on MPICH • Intel + GCC

MPI runtimes

07 September 2018 !17

Page 18: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• ParaStationMPI • For the JURECA Cluster-Booster System, the ParaStation MPI Gateway Protocol bridges between Mellanox EDR and

Intel Omni-Path

• In general, the ParaStation MPI Gateway Protocol can connect any two low-level networks supported by pscom.

• Implemented using the psgw plugin to pscom, working together with instances of the psgwd, the ParaStation MPI Gateway daemon.

MPI runtimes

07 September 2018 !18

Page 19: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• IntelMPI • As back up of ParaStationMPI • Software stack mirrored between these 2 MPI runtimes • Intel compiler

• MVAPICH2-GDR • Purely for GPU nodes (but can run in normal nodes) • PGI + GCC (and Intel in the past)

MPI runtimes

07 September 2018 !19

Page 20: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• Microbenchmarks: Intel MPI Benchmarks • PingPong • Broadcast, gather, scatter and allgather

• ParaStationMPI, Intel MPI 2018.2, Intel MPI 2019 beta, MVAPICH2 and OpenMPI

• JURECA, Booster and JUWELS

• 1 MPI process per node. Average times. No cache disabling.

MPI performance

07 September 2018 !20

Page 21: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance

07 September 2018 !21

Jureca Booster JuwelsParaStationMPI Verbs PSM2 UCXIntel 2018 Default (DAPL+Verbs) Default (PSM2) Default (DAPL+Verbs)Intel 2019 Default (libfabric+Verbs) Default (libfabric+PSM2) Default (libfabric+verbs)MVAPICH2-GDR Verbs - -

MVAPICH2 Verbs PSM2 VerbsOpenMPI Default (UCX, Verbs and

libfabric+Verbs enabled)- -

Page 22: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (PingPong)

07 September 2018 !22

Jureca Booster Juwels

Page 23: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (PingPong)

07 September 2018 !23

Jureca Booster Juwels

Page 24: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (Broadcast, msg size 1 byte)

07 September 2018 !24

Jureca Booster Juwels

Page 25: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (Broadcast, msg size 4MB)

07 September 2018 !25

Jureca Booster Juwels

Page 26: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (Scatter, msg size 1 byte)

07 September 2018 !26

Jureca Booster Juwels

Page 27: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (Scatter, msg size 2MB)

07 September 2018 !27

Jureca Booster Juwels

Page 28: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (Gather, msg size 1 byte)

07 September 2018 !28

Jureca Booster Juwels

Page 29: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSCMPI performance (Gather, msg size 2MB)

07 September 2018 !29

Jureca Booster Juwels

Page 30: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

PEPC performance • N-Body code • Asynchronous communication based in a communicating thread spawned using pthreads • 2 algorithms tested • Single particle processing with pthreads • Tile processing using OpenMP tasks

• Experiments in JURECA using 16 nodes, 4 MPI ranks per node and 12 computing threads per MPI rank (SMT enabled)

• Violin plots show average time per timestep and its distribution

MPI performance

07 September 2018 !30

Page 31: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• PEPC performance with single particle processing

MPI performance

07 September 2018 !31

Plot: D

irk Bröm

mel

Page 32: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• PEPC performance with tile processing

MPI performance

07 September 2018 !32

Plot: D

irk Bröm

mel

Page 33: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

MPI RUNTIMES AT JSC

• Generally it’s been a positive experience.

MVAPICH at JSC

07 September 2018 !33

• The fact that MVAPICH2-GDR is a binary only package poses some annoyances • Need to ask the MVAPICH2 team for binaries for particular compiler versions • No icc/icpc/ifort version • Patching of compiler wrappers (xFLAGS used during compilation leak into them)

• The Deep Learning community is pushing for OpenMPI as CUDA-Aware MPI of choice in our systems

But (as the guy in charge of installing all software in our systems)….

Page 34: Which, why and how do they compare in our systems?mug.mvapich.cse.ohio-state.edu/static/media/mug/... · 2018-08-09 · MPI RUNTIMES AT JSC JSC’s systems 07 September 2018 !14 JURECA

THANK YOU!