![Page 1: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/1.jpg)
1
Rob RossMathematics and Computer Science DivisionArgonne National [email protected]
Versatile Data Services for Computational Science Applications
Philip Carns, Matthieu Dorier, Kevin Harms, Robert Latham, and Shane SnyderArgonne National Laboratory
Sam Gutierrez, Bob Robey, Brad Settlemyer, and Galen ShipmanLos Alamos National Laboratory
George Amvrosiadis, Chuck Cranor, Greg Ganger, and Qing ZhengCarnegie Mellon University
Jerome Soumagne, Neil FortnerThe HDF Group
![Page 2: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/2.jpg)
2
New Science and Systems: Leading to New Services?
ALCF 2021 EXASCALE SUPERCOMPUTER – A21Intel/Cray Aurora supercomputer planned for 2018 shifted to 2021
Scaled up from 180 PF to over 1000 PF
7
NRE: HW and SW engineering and productizationALCF-3 ESP: Application Readiness
CY 2017 CY 2018 CY 2019 CY 2020 CY 2021 CY 2022
NRE contract awardBuild contract modification
Pre-planning review Design review
Rebaseline review
Build/Delivery
ALCF-3 Facility and Site Prep, Commissioning
Acceptance
DataSimulation Learning
Support for three “pillars”
Top image credit B. Helland (ASCR). Bottom left and right images credit ALCF. Bottom center image credit OLCF.
![Page 3: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/3.jpg)
3
Data Services in Computational Science
Science Workflow
Executables and
Libraries
SPINDLE
Checkpoints
SCRFTI
Input and Intermediate
Data Products
DataSpaces
MDHIMKelpie
Performance Data
DarshanLMT
There is an opportunity to extend this concept todomain-specific scientific data models as well.
![Page 4: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/4.jpg)
4
Lots of Common Functionality
Pro
visi
onin
g
Com
m.
Loca
l Sto
rage
Faul
t Mgm
t. an
d G
roup
M
embe
rshi
p
Sec
urity
ADLBData store and pub/sub. MPI ranks MPI RAM N/A N/A
DataSpacesData store and pub/sub. Indep. job Dart RAM
(SSD)Under devel. N/A
DataWarpBurst Buffer mgmt.
Admin./sched.
DVS/lnet XFS, SSD Ext. monitor Kernel,
lnet
FTICheckpoint/restart mgmt. MPI ranks MPI RAM, SSD N/A N/A
FaodelDist. in-mem. key/val store MPI ranks Opbox RAM
(Object) N/A Obfusc.IDs
SPINDLEExec. and library mgmt.
LaunchMON TCP RAMdisk N/A Shared
secret
![Page 5: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/5.jpg)
5
Reusability in (data) service development.
![Page 6: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/6.jpg)
6
Productively Developing High-Performance, Scalable (Data) Services
Vision● Specialized data services● Composed from basic building blocks● Matching application requirements and available technologies● Constraining coherence, scalability, security, and reliability to application/workflow scope
Approach● Lightweight, user-space components and microservices● Implementations that effectively utilize modern hardware● Common API for on-node and off-node communication
Impact● Better, more capable services for DOE science and facilities● Significant code reuse● Ecosystem for service development, float all boats
See http://www.mcs.anl.gov/research/projects/mochi/.
![Page 7: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/7.jpg)
7
Building Mochi Components● Mercury: RPC/RDMA with support for shared memory and multiple native transports● Argobots: Threading/tasking using user-level threads● Margo: Hide Mercury and Argobots details, focus on RPC handlers● Thallium: C++14 bindings
Mercury Argobots
Margo
Service B
MercuryArgobots
Margo
Service A
MercuryArgobots
Margo
Service A Service B
Single Process:• Direct execution of RPC
handlers
Separate Processes:• Shared memory (separate processes on same node)• RPC and RDMA over native transport (separate nodes)
![Page 8: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/8.jpg)
8
More Components!
● BAKE: RDMA-enabled data transfer to remote storage (e.g. SSD, NVRAM)
● SDS-KeyVal: Key/Value store backed by LevelDB or BerkeleyDB
● Scalable Service Groups (SSG): group membership management using gossip
● PLASMA: Distributed approximate k-NN database
● POESIE: Enables running Python and Lua interpreters in Mochi services
● Python wrappers: Py-Margo, Py-Bake, Py-SDSKV, Py-SSG, Py-Mobject, etc.
● MDCS: Lightweight diagnostic component
![Page 9: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/9.jpg)
9
BAKE: A Composed Service for Remotely Accessing Objects
Argobots
MercuryCCI
IB/verbs
Argobots
MercuryCCI
libpmem RAM, NVM, SSD
Client app Provider (Target)
Margo Margo
P. Carns et al. “Enabling NVM for Data-Intensive Scientific Services.” INFLOW 2016, November 2016.
Object API
Client Client API Mochi* External * We contribute to Argobots, but it’s primarily supported by P. Balaji’s team.
![Page 10: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/10.jpg)
10
BAKE: Latency of Access
● Haswell nodes, FDR IB● Backing to RAM rather than persistent memory● No busy polling● Each access is at least 1 network round trip, 1 libpmem access, and 1 new (Argobots) thread
Multiple protocols:Small: data is packed into RPC msg
Medium: data is copied to/from pre-registered RDMA buffers
Large: RDMA “in place” by registering memory on demand
![Page 11: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/11.jpg)
11
Examples of composed services.
![Page 12: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/12.jpg)
12
HEPnOS: Fast Event-Store for High-Energy Physics (HEP)Goals:● Manage physics event data from simulation and
experiment through multiple phases of analysis● Accelerate access by retaining data in the system
throughout analysis process
Properties:● Write-once, read-many● Hierarchical namespace (datasets, runs, subruns)● C++ API (serialization of C++ objects)
Components:● Mercury, Argobots, Margo, SDSKV, BAKE, SSG● New code: C++ event interface
Map data model into stores
Collaboration with FermiLab led by J. Kowalkowski.
BAKE SDS-KeyVal
HEP Code
RPC RDMA
PMEM LevelDB
C++API
![Page 13: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/13.jpg)
13
FlameStore: A Transient Storage System for Deep Neural NetworksGoals:● Store a collection of deep neural network models during a deep
learning workflow● Maintain metadata (e.g., hyperparameters, score) to inform
retention over course of workflow
Properties:● Write-once-read-many● Flat namespace● High level of semantics● Python API (stores Keras models)
Components:● Mercury, Argobots, Margo, BAKE,
POESIE, and their Python wrappers● New code: Python API,
master and worker managers
Worker Manager BAKE
DL Task
RPC RDMA
PMEM
PythonAPI
MasterManager
Collaboration with CANDLE cancer project, led by R. Stevens.
![Page 14: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/14.jpg)
14
Mobject: An Object Store Composed from MicroservicesGoals:● Validate approach with a more complex model ● Provide familiar basis for use by other libraries (e.g., HDF5)
Properties:● Concurrent read/write● Flat namespace● RADOS client API (subset)
Components:● Mercury, Argobots, Margo, SDSKV,
BAKE, SSG● New code: Sequencer,
RADOS API
Collaboration with the HDF Group.BAKE SDS-KeyVal
ClientRPC
RDMA
PMEM LevelDB
RADOSAPI
Sequencer
![Page 15: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/15.jpg)
15
Why am I here?
![Page 16: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/16.jpg)
16
Learning about this community, but also …
● How should we analyze these services?
● Looking for potential users and collaborators!� Performance data management service?
Thomas Ilsche et al., “Optimizing I/O forwarding techniques for extreme-scale event tracing”, Cluster Computing Journal, June 2013.
● Interested in how others build distributed services in HPC
● Thinking about autonomics, implementing control loops� Real-time performance analysis� Architecture for (decentralized) control of (multi-component) services
![Page 17: Versatile Data Services for Computational Science Applications · 6 Productively Developing High-Performance, Scalable (Data) Services Vision Specialized data services Composed from](https://reader033.vdocuments.us/reader033/viewer/2022050405/5f82ee0fa3c93513b0643ede/html5/thumbnails/17.jpg)
17
Thanks!
This work is in part supported by the Director, Office of Advanced Scientific Computing Research, Office of Science, of the U.S. Department of Energy under Contract No. DE-AC02-06CH11357; in part supported by the Exascale Computing Project (17-SC-20-SC), a joint project of the U.S. Department of Energy’s Office of Science and National Nuclear Security Administration, responsible for delivering a capable exascale ecosystem, including software, applications, and hardware technology, to support the nation’s exascale computing imperative; and in part supported by the U.S. Department of Energy, Office of Science, Office of Advanced Scientific Computing Research, Scientific Discovery through Advanced Computing (SciDAC) program.
http://www.mcs.anl.gov/research/projects/mochi/