challenges and opportunities for i/o - limitless storage ... · the current i/o stack a community...
TRANSCRIPT
Challengesandopportunities for I/O
Limitless StorageLimitless Possibilities
https://hps.vi4io.org
Image under free license (CC0)
Computer Science Department
Copyright University of Reading
2018-07-04
LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT
Julian M. Kunkel
Gung Ho Network Meeting
SH
∞
)
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Outline
1 The Current I/O Stack
2 A Community Strategy
3 Smart Interfaces Examples
4 Next Generation Interfaces
5 Summary
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 2 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
User Challenges with the Current I/O Stack
� Coexistence of multiple access paradigms in workflows
I Dealing with file (POSIX, ADIOS, HDF5), SQL, NoSQL together
� Low-level semantics
I Organization of a hierarchical namespace (good: MARS)I Missing opportunity to express workflowsI Setup of in-situ/in-transit path difficultI Describing information lifecycle and data provenance in shell scriptsI All data (logfile, checkpoint, result) is treated identically
� Restricted (performance) portability
I Manual tuning/mapping to storage path necessaryI Users lack technological knowledge for tweaking
� The performance is often quite suboptimal
� Difficulty to analyze behavior and understand performance
I Unclear access patterns, assessing performance?
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 3 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Example: A Software Stack for NWP/Climate
� Domain semantics
I XIOS writes independent variables to one file eachI 2nd servers for performance reasons
� Why user side servers besides data modelI Performant mappings to files are limited
• Map data semantics to one "file"• File formats are notorious inefficient
I Domain metadata is treated like normal data
• Need for higher-level databases
I Interfaces focus on variables but lack features
• Workflows• Information life cycle management
Application
XIOS
XIOS 2nd server
MPI-IO
Parallel File System
File system
Block device
HDF5
NetCDF
Domain sem
antic
Data m
odel
Types
Byte a
rray
Figure: Typical I/O stack
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 4 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Critical Discussion
Questions from the storage users’ perspective
� Why do I have to organize the file format?
I It’s like taking care of the memory layout of C-structs
� Why do I have to convert data between storage paradigms?
I Big data solutions typically do not require this step!
� Why must I provide system specific performance hints?
I It’s like telling the compiler to unroll a loop exactly 4 times
� Why is a file system not offering the consistency model I need?
I My application knows the required level of synchronization
Being a user, I would rather code an application?
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 5 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Challenges Faced by HPC I/O Experts/Developers
� Semantical information is lost through layers
I Suboptimal performance, lost opportunities
� Re-implementation of features across stack
I Unpredictable interactionsI Wasted resources
� Utilizing complex future storage landscapes unclear
I No performance awareness
� Difficulty to analyze behavior and understand performance
I Unclear access patterns (users, sites)
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 6 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Future Systems: Coexistence of Storage Systems
HDDHDD
Node
Memory
Node
Memory
NVRAM
SSD Tape
S3
CloudEC2
HDDHDD HDDMemory HDDBurst Buffer SSD
...
� We shall be able to use all storage technologies concurrently
I Without explicit migration etc. put data where it fitsI Administrators just add a new technology (e.g., SSD pool) and users benefit
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 7 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Alternative Software Stack
Some Examples
� High-level abstractions: Dataclay, Dataspaces, Mochi
� Data models: ADIOS, HDF5, NetCDF, VTK
� Standard API across file formats: Silo, VTK, CDI, (HDF5)
� Low-level libraries: SIONlib, PLFS
� Storage interfaces: MPI-IO, POSIX, vendor-specific (e.g., CLOVIS), S3
� Big-data: HDFS, Spark, Flink, MongoDB, Cassandra
� Research: Countless storage system prototypes every year
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 8 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Outline
1 The Current I/O Stack
2 A Community Strategy
3 Smart Interfaces Examples
4 Next Generation Interfaces
5 Summary
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 9 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Opportunities for NWP/Climate
Chance to influence a future standard!
� Community performs various high-level software developments
I Semantic namespaces (MARS)I Workflow management (Cylc)I High-level semantics with, e.g., XIOSI Middleware NetCDFI Standardization expertise; GRIB, (WMO), UGRID
� Is a relevant big-data use case
I Relevant market for vendorsI But so far the influence to RD&E in CS was limited!
Benefits
� Re-think current I/O approaches, better separation of concerns
� Harness development effort across HPC (and other) communities
� Hopefully, better software and resilient to coming changesJulian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 10 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Glimpsing at Standardization Attempts
Promising
� Container storage interface (community driven / involves companies)
� Cloud Data Management Interface (SNIA driven)
� pmem.io (good candidate for persistent memory programming)
� HDF5 (towards a de-facto standard interfaces)
How about HPC?
� MPI-IO (partially successful)
� Exascale10/EOFS (failed)
� Various earlier attempts that failed
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 11 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Thoughts
I believe we must re-architect the IO stack together to
� Enable smart hardware and software components
� Optimize storage and compute together
� Deal with scientific metadata and workflows as first-class citizens
� Become self-aware instead of unconscious
� Enable self-learning and improving system behavior over time
� Allow schedulers to generate optimized plans
I LiquidComputing: Running pieces on storage, compute, IoT, network
Constraints
� The process should be steered by a standard and open forum
� Open ecosystem for any vendor, research, ...
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 12 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
A Potential Approach in the Community: Following MPI
� The standardization of a high-level data model & interface
I Targeting data intensive and HPC workloadsI Lifting semantic access to a new levelI To have a future: must be beneficial for Big Data + Desktop, too
� Development of a reference implementation of a smart runtime system
I Implementing key features
� Demonstration of benefits on socially relevant data-intense apps
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 13 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Development of the Data Model and API
� Establishing a Forum similarly to MPI
� Define data model for HPC
� Open board: encourage community collaborationS
tand
ard
1.0
Data Model
Interface
Reference Implementation
Standard-Forum
Steering Board
Committee
WorkgroupUse
-cas
es
Mini-apps
Workflows
Industry
Data centers
Scientists
Pseudo code
Mem
ber
s
Bod
ies
Nex
t Gen
erat
ion
Sta
ndar
d 2.
0
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 14 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Outline
1 The Current I/O Stack
2 A Community Strategy
3 Smart Interfaces Examples
4 Next Generation Interfaces
5 Summary
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 15 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Compression Research: Involvement� Scientific Compression Library (SCIL)
I Separates concern of data accuracy and choice of algorithmsI Users specify necessary accuracy and performance parametersI Metacompression library makes the choice of algorithmsI Supports also new algorithmsI Ongoing: standardization of useful compression quantities
� Development of algorithms for lossless compressionI MAFISC: suite of preconditioners for HDF5, pack data optimally, reduces
climate data by additional 10-20%, simple filters are sufficient
� Cost-benefit analysis: e.g., for long-term storage MAFISC pays of� Analysis of compression characteristics for earth-science related data sets
I Lossless LZMA yields best ratio but is very slow, LZ4fast outperforms BLOSCI Lossy: GRIB+JPEG2000 vs. MAFSISC and proprietary software
� Method for system-wide determination of ratio/performanceI Script suite to scan data centers...
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 16 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
SCIL: Supported User-Space Quantities
Quantities defining the residual (error):absolute tolerance: compressed can become true value ± absolute tolerance
relative tolerance: percentage the compressed value can deviate from true value
relative error finest tolerance: value defining the abs tol error for rel compression for values around 0
significant digits: number of significant decimal digits
significant bits: number of significant decimals in bits
field conservation: limits the sum (mean) of field’s change
Quantities defining the performance behavior:compression throughput
decompression throughput
in MiB or GiB, or relative to network or storage speed
Aim to standardize user-space quantities across compressors!See https://www.vi4io.org/std/compression
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 17 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Ongoing Activity: Earth-Science Data Middleware
Part of the ESiWACE Center of Excellence in H2020
Design Goals of the Earth System Data Middleware
1 Understand application data structures and scientific metadata
2 Flexible mapping of data to multiple storage backends
I Placement based on site-configuration + performance modelI Site-specific optimized data layout schemes
3 Relaxed access semantics, tailored to scientific data generation
4 A configurable namespace based on scientific metadata
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 18 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Architecture
Tools and services
ESD
Application1
NetCDF4 (patched)
Application2 Application3
GRIB
HDF5 VOL (unmodified)
ESD interface
cp-esd esd-daemonesd-FUSE
Layout Datatypes
ESD (Plugin)
Performance model
Metadata backend Storage backends
Site configuration
RDBMSNoSQL POSIX-IO Object storage Lustre
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 19 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Outline
1 The Current I/O Stack
2 A Community Strategy
3 Smart Interfaces Examples
4 Next Generation Interfaces
5 Summary
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 20 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
A Pragmatic View to a First Prototype� Take existing data model like NetCDF (or VTK) as baseline� With a chunk of:
I Scientific metadata handlingI Workflow and processing interfaceI Information lifecycle managementI Hardware model interface (hardware provides its own performance models)
� First prototype utilizes existing software stackI Like Cylc for workflowsI Like MongoDB for metadataI Like a parallel file system (or object storage)
� Work on:I Scheduler for performant mapping of data/compute to storage/computeI A FUSE client for flexible data mappings on semantic metadataI Importer/Exporter tools for standard file formats
� Add some magic (knowledge of experts developing APIs)� Next implementations will be supported by vendors!
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 21 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Next-Generation HPC IO API Key Features� High-level data model for HPC
I Storage understands data structures vs. byte arrayI Relaxed consistency
� Semantic namespace and storage-aware data formatsI Organize based on domain-specific metadata (instead of file system)I Support domain-specific operations and addressing schemes
� Integrated processing capabilitiesI Offload data-intensive compute to storage systemI Managed data-driven workflows supporting events and servicesI Scheduler maps compute and I/O to hardware
� Enhanced data management featuresI Information life-cycle management (and value of data)I Embedded performance analysisI Resilience, import/export, ...
NG-HPC-IO
ManagementInformationIntentWorkflowOperationObjectData description
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 22 / 23
The Current I/O Stack A Community Strategy Smart Interfaces Examples Next Generation Interfaces Summary
Summary
� The separation of concerns in the existing storage stack is suboptimal
� There is a huge potential for the next-generation interface
I Is climate/NWP a driver of next generation standards?
� Can the community work together to define next generation APIs?
I Are you willing to rethink how you do I/O?I Computer scientists will then optimize the right interfaces!
� If you are interested, subscribe to:https://www.vi4io.org/mailman/listinfo/io-ngi
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 23 / 23
Appendix
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 24 / 23
The ESiWACE project has received funding from the European Union’s Horizon2020 research and innovation programme under grant agreement No 675191
Disclaimer: This material reflects only the author’s view and the EU-Commission is not responsible for any use
that may be made of the information it contains
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 25 / 23
Data Model: Addressing and Metadata
attr ibuting
Information Concepts
Storage & Processing Concepts
Fragment
Storage System
SchemaRegistry
CoverageVar
Collection
Schema Metadata
ContainerDataset
partit ionedintousing schema
1
1..*
lives on
references0..* 0..*
stored i n
describedb y
contains
contains1
is a � Data Formats are providedby schema registry
� Define addressing andhigh-level metadata space
� Containers, datasets,fragments
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 26 / 23
Processing Model
runs
Job script
Cluster workload manager
Traditional cluster management
dispatchesadjustsscale out
observesdependencies
listens to
Storage & Processing Concepts
Event
processes
UserAdmin
Workflow scheduler
Code TaskResource mgmt ServiceWorkflow
Information Concepts
Slot
Fragment
Storage System
SchemaRegistry
Schema
ContainerDataset
schedulesdispatches
submitsprologepilogue
triggersevents
upon change
reads, writes
registersRunson
Runs consists of11..*
providedb y
runsoptionallyprovided
b y
Julian M. Kunkel SH LIMITLESS POTENTIAL | LIMITLESS OPPORTUNITIES | LIMITLESS IMPACT 27 / 23