gasnet-ex: a high-performance, portable communication ... · gasnet-1: overview • started in 2002...

44
GASNet-EX: A High-Performance, Portable Communication Library for Exascale Dan Bonachea and Paul H. Hargrove [email protected] https://gasnet.lbl.gov Workshop on Languages and Compilers for Parallel Computing (LCPC’18) October 11, 2018

Upload: others

Post on 03-Aug-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

GASNet-EX: A High-Performance, Portable Communication Library for Exascale

Dan Bonachea and Paul H. Hargrove [email protected] https://gasnet.lbl.gov

Workshop on Languages and Compilers for Parallel Computing (LCPC’18) October 11, 2018

Page 2: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Abstract •  Present GASNet-EX, the successor to GASNet-1 •  Show performance improvements due to the redesign •  Show RMA performance remains competitive w/ MPI-3 – Better in many cases, across multiple HPC systems

GASNet-EX / LCPC’18 / Paul H. Hargrove 2

Page 3: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Outline 1.  Introduction to GASNet-1 and GASNet-EX 2.  Overview of GASNet-EX Improvements 3.  Specific GASNet-EX Improvements 4.  RMA Microbenchmarks 5.  Conclusions

3 GASNet-EX / LCPC’18 / Paul H. Hargrove

Page 4: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

GASNet-1: Overview •  Started in 2002 to provide a portable network

communication runtime for three PGAS languages: – UPC, CAF and Titanium

•  Primary features: – Non-blocking RMA (one-sided Put and Get) – Active Messages (simplification of Berkeley AM-2)

•  Motivated by semantic issues in (then current) MPI-2.0 –  Dan Bonachea, Jason Duell, "Problems with using MPI 1.1 and 2.0 as compilation targets for

parallel language implementations", IJHPCN 2004.

GASNet-EX / LCPC’18 / Paul H. Hargrove 4

Page 5: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

GASNet: Adoption and Portability •  Client runtimes

•  Network conduits •  Supported platforms

–  Over 10 compiler families, 15 operating systems and dozens of architectures * These lists and counts include both current and past support

GASNet-EX / LCPC’18 / Paul H. Hargrove 5

LBNL UPC++ Berkley UPC GCC/UPC Clang UPC Cray Chapel

Stanford Legion Titanium Rice Co-Array Fortran OpenUH Co-Array Fortran OpenCoarrays in GCC Fortran

OpenSHMEM reference impl. Omni XcalableMP At least 7 others cited in the paper

OpenFabrics Verbs (InfiniBand) Mellanox MXM and VAPI (InfiniBand) Cray uGNI (Gemini and Aries) Intel PSM2 (OmniPath)

IBM PAMI (BG/Q and others) IBM DCMF (BG/P) IBM LAPI (Colony and Federation) Cray Portals3 (Seastar)

SHMEM (Cray X1 and SGI Altix) Quadric elan3/4 (QsNet I/II) Myricom GM (Myrinet) Dolphin SISCI UDP (any TCP/IP network)

MPI 1.1 or newer OFI/libfabric Sandia Portals4

Shared memory (no network)

Page 6: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

GASNet-EX: Overview •  GASNet-EX is the next generation of GASNet – Addressing needs of newer programming models such

as LBNL UPC++, Stanford Legion and Cray Chapel –  Incorporating over 15 years of lessons learned – Provides backward compatibility for GASNet-1 clients

•  Motivating goals include

GASNet-EX / LCPC’18 / Paul H. Hargrove 6

- Support more client asynchrony - Enable more client adaptation - Improve memory footprint - Improve threading support

- Support offload to network h/w - Support multi-client applications - Support for device memory

Page 7: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

GASNet-EX: Status •  GASNet-EX is a work-in-progress – Not every new feature has been implemented yet – Many have, with benefits this presentation will show

•  Three key clients using GASNet-EX – UPC++ v1.0 requires GASNet-EX – Legion and Chapel are starting work to use EX features

•  Will displace legacy GASNet-1 implementation in 2019

GASNet-EX / LCPC’18 / Paul H. Hargrove 7

Page 8: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Outline 1.  Introduction to GASNet-1 and GASNet-EX 2.  Overview of GASNet-EX Improvements 3.  Specific GASNet-EX Improvements 4.  RMA Microbenchmarks 5.  Conclusions

8 GASNet-EX / LCPC’18 / Paul H. Hargrove

Page 9: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Overview of Improvements gasnet_handle_t gasnet_put_nb(gasnet_node_t node, void *dest_addr, void *src_addr, size_t nbytes);

gex_Event_t gex_RMA_PutNB(gex_TM_t tm, gex_Rank_t rank, gex_Addr_t dest_addr, void *src_addr, size_t nbytes, gex_Event_t *lc_opt, gex_Flags_t flags);

A representative example: non-blocking RMA Put Changes between these two (in red on following slides) illustrate some of the most meaningful changes made in the GASNet-EX design. They provide the means to address several goals.

GASNet-EX / LCPC’18 / Paul H. Hargrove 9

GASNet-1: GASNet-EX:

Page 10: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Overview of Improvements gasnet_handle_t gasnet_put_nb(gasnet_node_t node, void *dest_addr, void *src_addr, size_t nbytes);

gex_Event_t gex_RMA_PutNB(gex_TM_t tm, gex_Rank_t rank, gex_Addr_t dest_addr, void *src_addr, size_t nbytes, gex_Event_t *lc_opt, gex_Flags_t flags);

Return Type GASNet-1: “handle” to test for operation completion •  Thread-specific (only the issuing thread can test/wait for completion) GASNet-EX: “Event” generalizes handle in two directions •  Not thread-specific (for progress threads, continuation passing, etc.) •  Supports multiple sub-events (e.g. local completion on later slide)

GASNet-EX / LCPC’18 / Paul H. Hargrove 10

GASNet-1: GASNet-EX:

Page 11: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Overview of Improvements gasnet_handle_t gasnet_put_nb(gasnet_node_t node, void *dest_addr, void *src_addr, size_t nbytes);

gex_Event_t gex_RMA_PutNB(gex_TM_t tm, gex_Rank_t rank, gex_Addr_t dest_addr, void *src_addr, size_t nbytes, gex_Event_t *lc_opt, gex_Flags_t flags);

Destination GASNet-1: an integer node identifier to name a process GASNet-EX: a (team,rank) pair to name an “Endpoint” •  “Team” is an ordered sets of Endpoints (also used in collectives) •  Multiple Endpoints for multi-threading and access to device memory •  Multiple Client runtimes for hybrid applications

GASNet-EX / LCPC’18 / Paul H. Hargrove 11

GASNet-1: GASNet-EX:

Page 12: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Overview of Improvements gasnet_handle_t gasnet_put_nb(gasnet_node_t node, void *dest_addr, void *src_addr, size_t nbytes);

gex_Event_t gex_RMA_PutNB(gex_TM_t tm, gex_Rank_t rank, gex_Addr_t dest_addr, void *src_addr, size_t nbytes, gex_Event_t *lc_opt, gex_Flags_t flags);

Destination Address GASNet-1: a remote virtual address GASNet-EX: a remote virtual address or an offset •  Offsets can improve scalability of clients using symmetric heaps •  Used with multiple endpoints will enable addressing device memory

GASNet-EX / LCPC’18 / Paul H. Hargrove 12

GASNet-1: GASNet-EX:

Page 13: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Overview of Improvements gasnet_handle_t gasnet_put_nb(gasnet_node_t node, void *dest_addr, void *src_addr, size_t nbytes);

gex_Event_t gex_RMA_PutNB(gex_TM_t tm, gex_Rank_t rank, gex_Addr_t dest_addr, void *src_addr, size_t nbytes, gex_Event_t *lc_opt, gex_Flags_t flags);

Local Completion (when local source buffer may be overwritten) GASNet-1: …put_nb() vs. …put_nb_bulk() •  Local completion can occur separately from remote completion •  Option to conflate it with either injection or remote completion GASNet-EX: lc_opt selects a local completion behavior •  Both GASNet-1 options, plus an Event the client can test/wait

GASNet-EX / LCPC’18 / Paul H. Hargrove 13

GASNet-1: GASNet-EX:

Page 14: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Overview of Improvements gasnet_handle_t gasnet_put_nb(gasnet_node_t node, void *dest_addr, void *src_addr, size_t nbytes);

gex_Event_t gex_RMA_PutNB(gex_TM_t tm, gex_Rank_t rank, gex_Addr_t dest_addr, void *src_addr, size_t nbytes, gex_Event_t *lc_opt, gex_Flags_t flags);

Per-operation Flags GASNet-EX: introduces extensibility modifiers •  Require non-default behaviors, such as offset-based addressing •  Allow optional behaviors, such as “Immediate Mode” (later slide) •  Assert properties which may eliminate more costly dynamic checks GASNet-1: has no direct equivalent

GASNet-EX / LCPC’18 / Paul H. Hargrove 14

GASNet-1: GASNet-EX:

Page 15: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Outline 1.  Introduction to GASNet-1 and GASNet-EX 2.  Overview of GASNet-EX Improvements 3.  Specific GASNet-EX Improvements 4.  RMA Microbenchmarks 5.  Conclusions

15 GASNet-EX / LCPC’18 / Paul H. Hargrove

Page 16: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Specific GASNet-EX Improvements •  Several new features are already delivering benefits •  This section reports on four of these – Local Completion Control –  Immediate-mode Communication Injection – Negotiated-payload Active Messages – Remote Atomics

•  This section’s results collected on Cray XC40 systems •  The paper provides more detail than can be given here

GASNet-EX / LCPC’18 / Paul H. Hargrove 16

Page 17: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Local Completion Control •  Figure shows a proxy for how

exposing a local completion event can improve overlap:

• The analogous change within GASNet-EX’s aries-conduit has improved flood bandwidth

•  Blue series shows bandwidth prior to utilizing GNI-level local completion

•  Red series shows up to 32% increased bandwidth with the local completion event

Non-bulk Put flood bandwidth on Cray Aries with and without use of a local

completion event at the GNI level

GASNet-EX / LCPC’18 / Paul H. Hargrove 17

0

1

2

3

4

5

6

7

8

9

512B 2kiB 8kiB 32kiB 128KiB 512KiB 2MiB

Floo

d Ba

ndw

idth

(GiB

/s)

Payload Size

With GNI local completion eventWithout GNI local completion event

GO

OD

Page 18: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Immediate-mode Communication Injection •  Lack of resources can stall communication injection – Such backpressure may be path-specific

•  New feature allows client adaptation to such a scenario – E.g. work-stealing could select a different victim

•  Immediate-mode is a flag which permits (does not require) implementation to return without performing communication, in the presence of backpressure

GASNet-EX / LCPC’18 / Paul H. Hargrove 18

Page 19: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Immediate-mode Communication Injection •  Figure illustrates performance

on a benchmark modeling AM communication with inattentive peers

•  Shows reduction in time to complete communication using a “reactive” immediate-mode approach

•  The series compare reactive to three distinct static schedules

•  Best case is 93% reduction

Reduced communication delays using immediate-mode Active Messages

GASNet-EX / LCPC’18 / Paul H. Hargrove 19

0%

20%

40%

60%

80%

100%

2 4 6 8 10 12 14 16

Red

uctio

n in

Com

mun

icat

ion

Tim

e

Receiving Processes (1 per node)

IMM reactive vs. Full Block staticIMM reactive vs. Blocksize 5 staticIMM reactive vs. Cyclic static

GO

OD

Page 20: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Negotiated-Payload Active Messages •  “Negotiated-Payload” is a new family of AM interfaces –  Splits AM injection into distinct Prepare and Commit phases –  Client and GASNet can negotiate the buffer size and ownership

•  Case 1: “chunking” loops may better utilize available buffer resources, allowing fewer larger messages

•  Case 2: remove critical-path memcpy for some patterns

// Fixed-Payload code, for which most conduits require a memcpy to an internal buffer: assemble_payload(client_buf, len); // writes client-owned memory gex_AM_RequestMedium1(team, rank, handler, client_buf, len, GEX_EVENT_NOW, flags, arg); // Negotiated-Payload avoids the memcpy via payload assembly into a GASNet-owned buffer: gex_AM_SrcDesc_t sd = gex_AM_PrepareRequestMedium(team, rank, NULL, len, len, NULL, flags, 1); assemble_payload(gex_AM_SrcDescAddr(sd), len); // writes GASNet-owned memory gex_AM_CommitRequestMedium1(sd, handler, len, arg);

GASNet-EX / LCPC’18 / Paul H. Hargrove 20

Page 21: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Negotiated-Payload Active Messages •  Figure shows an AM ping-pong

bandwidth benchmark using the memcpy-removal pattern on the previous slide

•  Normalized to the Fixed-Payload performance

•  Shows NP-AM implementation for Cray Aries network delivering up to a 14% improvement

Aries-conduit NP-AM speedup on a ping-pong test with dynamically-generated

payload

GASNet-EX / LCPC’18 / Paul H. Hargrove 21

0.98

1.00

1.02

1.04

1.06

1.08

1.10

1.12

1.14

1.16

128B 256B 512B 1kiB 2kiB 4kiB 8kiB 16kiB 32kiB 64kiB

Band

wid

th, n

orm

aliz

ed to

Fix

ed-P

aylo

ad

Message Size

GO

OD

Page 22: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Remote Atomics •  “Remote Atomics” is a new family of RMA interfaces – Analogous to MPI accumulate operations

•  Interface designed with offload in mind •  Uses the “atomics domain” concept –  Introduced by UPC 1.3 – Enables efficient offload, even in the presence of

concurrent updates to the same location using multiple distinct operations

GASNet-EX / LCPC’18 / Paul H. Hargrove 22

Page 23: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Remote Atomics •  Offload reduces latency of

fetch-and-add by 70% relative to generic AM-based reference

•  Figure shows aggregate throughput of a “hot-spot” test of fetch-and-add (all to one)

•  Green series shows robust scaling to saturation when offloaded to the Aries NIC

Scaling of a remote atomics “hot-spot” test in the Cray Aries network

GASNet-EX / LCPC’18 / Paul H. Hargrove 23

1e+05

1e+06

1e+07

1e+08

1e+09

64 128 256 512 1024 2048 4096 8192

Aggr

egat

e Fl

ood

Thro

ughp

ut (o

ps/s

)

Processes (64/node)

Perfect ScalingAries NIC Offload

Reference Implementation

GO

OD

Page 24: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Outline 1.  Introduction to GASNet-1 and GASNet-EX 2.  Overview of GASNet-EX Improvements 3.  Specific GASNet-EX Improvements 4.  RMA Microbenchmarks 5.  Conclusions

24 GASNet-EX / LCPC’18 / Paul H. Hargrove

Page 25: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

RMA Bandwidth Microbenchmarks •  This section reports unidirectional flood bandwidth

measured between two nodes, one process per node. •  Intel MPI Benchmarks v2018.1 to measure MPI-3 RMA –  IMB-RMA test, Unidir_put and Unidir_get subtests –  “Aggregate” result category reports bandwidth of

•  Series of many MPI_Put (or Get) operations •  A single final call to MPI_Win_flush

–  All within a passive-target access epoch established by a call to MPI_Win_lock(SHARED)outside the timed region

•  GASNet-EX measures nearest semantic equivalent

GASNet-EX / LCPC’18 / Paul H. Hargrove 25

Page 26: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

RMA Bandwidth Microbenchmarks •  Results collected on four platforms –  Three different MPI implementation (Cray, IBM and MVAPICH2) –  Two distinct networks (Cray Aries and Mellanox EDR InfiniBand) –  Three CPU families (Xeon Haswell, Xeon Phi, and POWER8) –  Complete details are given in the paper

•  Results are collected in “out of the box” configurations –  Used center’s defaults on the three production systems –  No tuning knobs used to improve performance

GASNet-EX / LCPC’18 / Paul H. Hargrove 26

Page 27: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

RMA Bandwidth Microbenchmarks

GASNet-EX / LCPC’18 / Paul H. Hargrove 27

0 1 2 3 4 5 6 7 8 9

10

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB

Band

wid

th (G

iB/s

)

Transfer Size

RMA Put on Cori-I (Haswell, Aries, Cray MPI)

GASNet-EX PutMPI RMA Put

GO

OD

Page 28: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

RMA Bandwidth Microbenchmarks

GASNet-EX / LCPC’18 / Paul H. Hargrove 28

0 1 2 3 4 5 6 7 8 9

10

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB

Band

wid

th (G

iB/s

)

Transfer Size

RMA Get on Cori-I (Haswell, Aries, Cray MPI)

GASNet-EX GetMPI RMA Get

GO

OD

Page 29: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

29

0 1 2 3 4 5 6 7 8 9

10

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Cori-II: Xeon Phi, Aries, Cray MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

0 1 2 3 4 5 6 7 8 9

10

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Cori-I: Haswell, Aries, Cray MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

0

2

4

6

8

10

12

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiBBa

ndw

idth

(GiB

/s)

Transfer Size

Gomez: Haswell-EX, InfiniBand, MVAPICH2

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

0

2

4

6

8

10

12

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Summitdev (single-rail): POWER8, InfiniBand, IBM Spectrum MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

RMA Bandwidth Microbenchmarks

GASNet-EX / LCPC’18 / Paul H. Hargrove

GO

OD

Page 30: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

RMA Latency Microbenchmarks •  RMA full-operation latency

• Same RMA Put or Get operation as flood test

• But keep just one in-flight instead of many (Win_flush after each)

•  Figure reports representative 8-byte latencies

• GASNet-EX uniformly competitive with MPI-3, better for small sizes

Latency of RMA Put and Get

GASNet-EX / LCPC’18 / Paul H. Hargrove 30

0

1

2

3

4

5

6

7

8

9

Cori-I Cori-II Summitdev Gomez

RM

A O

pera

tion

Lata

ncy

(µs)

MPI RMA GetGASNet-EX GetMPI RMA PutGASNet-EX Put

GO

OD

Haswell Aries

Cray MPI

Xeon Phi Aries

Cray MPI

POWER8 InfiniBand

IBM Spectrum MPI

Haswell-EX InfiniBand

MVAPICH2

Page 31: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Outline 1.  Introduction to GASNet-1 and GASNet-EX 2.  Overview of GASNet-EX Improvements 3.  Specific GASNet-EX Improvements 4.  RMA Microbenchmarks 5.  Conclusions

31 GASNet-EX / LCPC’18 / Paul H. Hargrove

Page 32: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Conclusions •  GASNet-EX is the next generation of GASNet,

addressing needs of newer programming models –  Asynchrony, adaptively, threading, scalability, device memory, ...

•  Already in production use by UPC++ –  Looking for new clients, talk to me over coffee!

•  Provides backward compatibility for GASNet-1 clients •  Benefits of new features are already measurable •  Delivers RMA performance competitive with MPI-3 RMA

GASNet-EX / LCPC’18 / Paul H. Hargrove 32

Page 33: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

THANK YOU

[email protected] https://gasnet.lbl.gov

GASNet-EX and UPC++ have a research poster at SC18

GASNet-EX / LCPC’18 / Paul H. Hargrove 33

Page 34: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Acknowledgements •  This research was funded in part by the Exascale Computing Project

(17-SC-20-SC), a collaborative effort of the U.S. Department of Energy Office of Science and the National Nuclear Security Administration.

•  This research used resources of the National Energy Research Scientific Computing Center, a DOE Office of Science User Facility supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC02-05CH11231.

•  This research used resources of the Argonne Leadership Computing Facility, which is a DOE Office of Science User Facility supported under Contract DE-AC02-06CH11357.

•  This research used resources of the Oak Ridge Leadership Computing Facility at the Oak Ridge National Laboratory, which is supported by the Office of Science of the U.S. Department of Energy under Contract No. DE-AC05-00OR22725.

GASNet-EX / LCPC’18 / Paul H. Hargrove 34

Page 35: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

BACKUP SLIDES

Page 36: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Local Completion Control •  GASNet-EX introduces means for client to test (or wait)

for local completion between injection and completion of a non-blocking Put

•  Exposes greater opportunity for communication overlap than possible with the options available in GASNet-1

GASNet-EX / LCPC’18 / Paul H. Hargrove 36

Page 37: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Non-Contiguous RMA •  GASNet-EX adds Vector-Indexed-Strided (VIS) APIs –  Express non-blocking Put and Get of non-contiguous data –  Names reflects the three metadata formats

•  Different trade-offs between size and generality –  Small modifications to an unofficial GASNet-1 extension

•  Implementation uses Active Messages, when appropriate, for pack/unpack of data –  Benefits from reimplementation using Negotiated-Payload AM

GASNet-EX / LCPC’18 / Paul H. Hargrove 37

Page 38: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

Non-Contiguous RMA •  Figure illustrates performance

of Strided Put API •  Red series shows performance

using Fixed-Payload AM •  Blue series shows performance

using Negotiated-Payload AM •  Both normalized to GASNet-1

Improved Strided Put performance, relative to GASNet-1

GASNet-EX / LCPC’18 / Paul H. Hargrove 38

!"#$%

&"!!%

&"!$%

&"&!%

&"&$%

&"'!%

&"'$%

&"(!%

&')% '$*% $&'% &+% '+,-% .+,-% )+,-% &*+,-% ('+,-% *.+,-% &')+,-% '$*+,-% $&'+,-% &/,-% '/,-%

-012

3,2456%17890:,;<

2%47%=>?

@<4A&%

B740:%C0D:702%

@<E7F04<2AG0D:702%>/%

H,I<2AG0D:702%>/%

@JK?L%L78,AMM%N+@OP%'A172<6%08,<QAR712S,46%*.ATD4<%<:<9<14%Q,;<6%'$U%2<1Q,4D%%

&')-%%%%%%%%'$*-%%%%%%%$&'-%%%%%%%%%&+,-%%

Page 39: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

0 1 2 3 4 5 6 7 8 9

10

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Cori-I: Haswell, Aries, Cray MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

Page 40: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

0 1 2 3 4 5 6 7 8 9

10

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Cori-II: Xeon Phi, Aries, Cray MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

Page 41: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

0

2

4

6

8

10

12

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Summitdev (single-rail): POWER8, InfiniBand, IBM Spectrum MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

Page 42: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

0

2

4

6

8

10

12

256 B 1kiB 4kiB 16kiB 64kiB 256kiB 1MiB 4MiB

Band

wid

th (G

iB/s

)

Transfer Size

Gomez: Haswell-EX, InfiniBand, MVAPICH2

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA GetMPI ISend/IRecv

Page 43: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

RMA Latency Microbenchmarks

GASNet-EX / LCPC’18 / Paul H. Hargrove 43

System 8-Byte RMA Put Latency 8-Byte RMA Get Latency

GASNet-EX MPI3-RMA Ratio GASNet-EX MPI3-RMA Ratio Cori-I 1.07 µs 1.20 µs 0.89 1.43 µs 1.57 µs 0.91 Cori-II 2.15 µs 3.42 µs 0.63 2.60 µs 4.06 µs 0.64

Summitdev 1.61 µs 8.10 µs 0.20 2.10 µs 8.13 µs 0.26 Gomez 1.41 µs 1.51 µs 0.94 1.82 µs 1.91 µs 0.95

Page 44: GASNet-EX: A High-Performance, Portable Communication ... · GASNet-1: Overview • Started in 2002 to provide a portable network communication runtime for three PGAS languages: –

44

0.00.51.01.52.02.53.03.54.04.5

4 16 64 256 1024 4096

RM

A O

pera

tion

Lata

ncy

(µs)

Transfer Size (bytes)

Cori-II: Xeon Phi, Aries, Cray MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA Get

0.0

0.5

1.0

1.5

2.0

2.5

4 16 64 256 1024 4096

RM

A O

pera

tion

Lata

ncy

(µs)

Transfer Size (bytes)

Cori-I: Haswell, Aries, Cray MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA Get

0.0

0.5

1.0

1.5

2.0

2.5

3.0

4 16 64 256 1024 4096R

MA

Ope

ratio

n La

tanc

y (µ

s)Transfer Size (bytes)

Gomez: Haswell-EX, InfiniBand, MVAPICH2

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA Get

0.01.02.03.04.05.06.07.08.09.0

4 16 64 256 1024 4096

RM

A O

pera

tion

Lata

ncy

(µs)

Transfer Size (bytes)

Summitdev (single-rail): POWER8, InfiniBand, IBM Spectrum MPI

GASNet-EX PutMPI RMA PutGASNet-EX GetMPI RMA Get

RMA Latency Microbenchmarks

GASNet-EX / LCPC’18 / Paul H. Hargrove

GO

OD