libfabric: your new bffsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your...

15
libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April 29, 2015

Upload: others

Post on 08-Jul-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

libfabric: your new BFF

Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April 29, 2015

Page 2: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Background

The Open Fabrics Interface Working Group (OFI WG) formed in August 2013 •  Co-chairs:

–  Sean Hefty, Intel –  Paul Grun, Cray Inc.

Charter: Develop an extensible, open source framework and interface aligned with upper-layer protocols and applications needs for high-performance fabric services.

2

Page 3: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Translation

The only network API you’ll ever need (we hope)

3

Page 4: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Why?

•  Today middleware needs to be ported to a new (and sometimes more complicated) low-level network API every 3-5 years

•  These hardware-specific APIs have to be supported for upwards of 10 years

•  A common API gives you portability on day one*

4

Page 5: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

How?

OFI WG is a community effort

•  In no particular order: DOE, DOD, NASA, Intel, Cray, Cisco, Mellanox, IBM, UNH (plus storage vendors)…

•  Expertise spans hardware and software

5

Page 6: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Charter: open source

Development on github

•  https://github.com/ofiwg •  Dual licensing: GPL and BSD

6

Page 7: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Charter: ULP and apps

Engage the user community to define requirements

•  MPI requirements –  ETH Zurich, SNL, ORNL, ANL, Cisco, IBM, Intel,

AMD, Cray, Microsoft, Mellanox, SGI, U Edinburgh/EPCC, U Alabama Birmingham

•  PGAS and SHMEM requirements –  LANL, ORNL, SNL, Intel, Mellanox, Cray

7

Page 8: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Charter: high performance fabric

Software leading hardware

•  Vendor involvement –  Can we influence HPC network vendors?

•  Extensible interface –  Can be vendor-specific –  Good way to propose acceptance into main API

8

Page 9: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

libfabric architecture

lifabric  APIs  

provider  implementa5on  

I/O  service  

I/O  service  

I/O  service  

middleware  

Message  Queue  

Control  Interface   RDMA   Atomics  

Event  Queues   Tag  Matching   Triggered  

Opera5ons  CM  Services  

I/O  service  

Page 10: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

libfabric architecture: realized

lifabric  APIs  

provider  implementa5on  

MPICH  

Message  Queue  

Control  Interface   RDMA   Atomics  

Event  Queues   Tag  Matching   Triggered  

Opera5ons  CM  Services  

OpenMPI   OpenSHMEM   GASNet   …

sockets   verbs  Cicso  usNIC  

… Intel  PSM  

Cray  uGNI  

available on github

UPC  

Page 11: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

libfabric in a visual nutshell

11

Communica5on  Services  

Connec5on  Management  

Address  Vectors  

Comple5on  Services  

Event  Queues  

Counters  

Data  Transfer  Services  Message  Queues  

Tag  Matching  

RMA  

Atomics  

Control  Services  

Discovery  

Triggered  Ope

ra5o

ns  

fi_info

Page 12: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Not shown: usage models

•  Capabilities –  Application desired features and permissions –  e.g., RMA, Atomics, tag matching

•  Attributes –  Defines the limits and behavior of selected interfaces –  e.g., thread safety, message ordering constraints

•  Mode –  Provider request on application –  e.g., local memory registration, user-allocated context

12

Page 13: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

“The Fine Print”

•  It’s a large API with lots of options

•  It’s a work-in-progress

•  Have an opinion? Join us: –  http://lists.openfabrics.org/mailman/listinfo/ofiwg –  Weekly meetings: Tuesdays 9am PT –  https://github.com/ofiwg/libfabric

13

Page 14: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Status

•  Release 1.0 coming soon! –  providers: sockets, verbs, PSM, usNIC

14

Page 15: libfabric: your new BFFsalishan.ahsc-nm.org/uploads/4/9/7/0/49704495/choi-2.pdf · libfabric: your new BFF Sung-Eun Choi, Cray Inc. Salishan Conference, Random Access Session April

Thanks!

And thanks to Sean and Paul for slide content

15