spectrum scale ai ecosystem and how it supports gpu ......2020/09/01 · spectrum scale ai insights...
TRANSCRIPT
Spectrum Scale AI ecosystemand how it supports GPU based workloads including Power AI
Tomer PerryScalable I/O development
Includes content provided by Piyush Chaudhary, Ted Hoover, Andreas Koeninger, Simon Lorenz
IBM Confidential
Disclaimer
2IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
The information in this document is IBM CONFIDENTIAL.
This information is provided on an "AS IS" basis without warranty of any kind, express or implied, including, but not limited to, the implied warranties of merchantability and fitness for a particular purpose. Some jurisdictions do not allow disclaimers of express or implied warranties in certain transactions; therefore, this statement may not apply to you.
This information is provided for information purposes only as a high level overview of possible future products. PRODUCT SPECIFICATIONS, ANNOUNCE DATES, AND OTHER INOFORMATION CONTAINED HEREIN ARE SUBJECT TO CHANGE AND WITHDRAWAL WITHOUT NOTICE.
USE OF THIS DOCUMENT IS LIMITED TO SELECT IBM PERSONNEL AND TO BUSINESS PARTNERS WHO HAVE A CURRENT SIGNED
NONDISCLUSURE AGREEMENT ON FILE WITH IBM. THIS INFORMATION CAN ALSO BE SHARED WITH CUSTOMERS WHO HAVE A
CURRENT SIGNED NONDISCLOSURE AGREEMENT ON FILE WITH IBM, BUT THIS DOCUMENT SHOULD NOT BE GIVEN TO A
CUSTOMER EITHER IN HARDCOPY OR ELECTRONIC FORMAT.
IBM's statements regarding its plans, directions, and intent are subject to change or withdrawal without notice at IBM's sole discretion. Information regarding potential future products is intended to outline our general product direction and it should not be relied on in making a purchasing decision. The information mentioned regarding potential future products is not a commitment, promise, or legal obligation to deliver any material, code, or functionality. The development, release, and timing of any future features or functionality described for our products remains at our sole discretion.
IBM reserves the right to change product specifications and offerings at any time without notice. This publication could include technical inaccuracies or typographical errors. References herein to IBM products and services do not imply that IBM intends to make themavailable in all countries.
IBM Confidential 3IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Outline
• AI Pipeline
• HDP goes Mainstream
• Containers
• Storage for AI
• Getting the data closer to the GPU
IBM Confidential 4IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Outline
• AI Pipeline
• HDP goes Mainstream
• Containers
• Storage for AI
• Getting the data closer to the GPU
IBM Confidential
Workflow and Data Flow is complex
5IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Data Source
New Data
Years of Data
Inference
Trained Model
Deploy in Production using Trained Model
Seconds to results
Data Preparation
Data Cleansing & Pre-Processing
Training Dataset
Testing Dataset
Weeks & months
Heavy IO
Iterate
Build, Train, Optimize Models
AI Deep Learning Frameworks(Tensorflow & Caffe)
Monitor & Advise
Instrumentation
Distributed & Elastic Deep Learning
Parallel Hyper-Parameter Search & Optimization
Network Models
Hyper-Parameters
Days & weeks
Traditional Business
IoT & Sensors
Collaboration Partners
Mobile Apps & Social Media
Legacy
IBM Confidential
Enterprise Data Pipeline with IBM Spectrum Storage
6IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Insights Out
Trained Model
Inference
Data In
Transient Storage
SDS/Cloud
Global Ingest
Throughput-oriented,
globally accessible
Cloud
ETL
High throughput, Random
I/O,
SSD/Hybrid
Archive
High scalability, large/sequential I/O
HDD Cloud Tape
Hadoop / Spark
Data Lakes
Throughput-oriented
Hybrid/HDD
ML / DLPrep ⇨ Training ⇨ Inference
High throughput, low latency,
Random I/O
SSD/NVMe
Classification &
Metadata Tagging
High volume, index &
auto-tagging zone
Fast Ingest /
Real-time Analytics
High throughput
SSD
Throughput-oriented,
software defined
temporary landing zone
capacity tier
performance tierperformance &
capacity Tier
performance &
capacity Tierperformance tier
capacity tier
EDGE INGEST ORGANIZE ANALYZE INSIGHTSML / DL
IBM Confidential
Enterprise Data Pipeline with IBM Spectrum Storage
7IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
EDGE INGEST ORGANIZE ANALYZE INSIGHTS
Insights Out
Trained Model
Inference
Data In
Transient Storage
Global Ingest
Cloud
ETL
SSD/Hybrid
Archive
HDD Cloud Tape
Hadoop / Spark
Data LakesHybrid/HDD
ML / DLPrep ⇨ Training ⇨ Inference
SSD/NVMe
Classification &
Metadata Tagging
Fast Ingest /
Real-time AnalyticsSSD
SDS/Cloud
Spectrum Scale
Cloud Object
Storage
Cloud Object
Storage
Spectrum Scale Spectrum Scale
Spectrum DiscoverSpectrum Scale
Spectrum Scale
Spectrum ScaleCloud Object
Storage
Spectrum
Archive
ML / DL
Cloud Object Storage
Cloud Object Storage
IBM Confidential 8IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Outline
• AI Pipeline
• HDP goes Mainstream
• Containers
• Storage for AI
• Getting the data closer to the GPU
IBM Confidential
Integration of HDFS in CES
9IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
• HDFS Transparency becomes an integral part of Spectrum Scale• Easy setup of HDFS Transparency using existing CES mechanisms
e.g. „mmces service enable“• Only NameNodes will be managed by CES• HDFS clients always talk to the same CES IP (for NameNode requests)• CES monitors the NameNode and moves the CES IP to another available node if something goes
wrong• Multiple HDFS clusters supported through multiple CES groups
IBM Spectrum Scale Cluster
CES Node(Active HDFS NameNode)
Regular GPFS Nodes(HDFS DataNodes)
HDFS Client
Always talks to the same CES IP
If the Active NameNode fails,CES will move the IP to a working NameNode
CES Node(Standby HDFS NameNode)
Special CES Group with single IP
IBM Confidential 10IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Outline
• AI Pipeline
• HDP goes Mainstream
• Containers
• Storage for AI
• Getting the data closer to the GPU
IBM Confidential
Journey to cloud requires an open, hybrid approach
11IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
ProductivePredictablePortableFuture proof by building once, deploying anywhere for flexible data and workload placement
Container platform
Open and integrated consistent management services that ensure operational integrity and reduce cost
Operational services
Integrated and secure containerized software for an agile, yet governed, enterprise
Containerized softwaresecure by design
ManageBuildMove
IBM Confidential
Goal: Deliver High Performance File Services to Containerized Application Workloads
12IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Support for Multiple Clouds
• Public, Private, Hybrid
Support Hybrid Use Cases
• Cloud Burst – Single Name Space
• Multi Cloud Data Sharing
• Archive
• High Performance Tiering
Solution Integration (Partners)
Support Workloads that Require High Performance File Services
• Analytics & Cognitive
• High Performance Computing
• AI Data Pipeline
Support the Workload Ecosystem in the Cloud
• Containerized Applications, Storage
• Ephemeral and Persistent Storage Volumes
Flexible Deployment
• Dynamic Provisioning, Configuration, Upgrade
IBM Confidential
Spectrum Scale Containers Models
13IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Storage for Containers
• Container Ready Storage
Storage in Containers
• Containerized Storage
Container Storage Interface (CSI) for ScaleStorage Provision
and Attachment
Application ContainerApplication ContainerApplication ContainerApplication Container
Application ContainerApplication ContainerApplication ContainerApplication Container
Application ContainerApplication ContainerApplication ContainerApplication Container
Spectrum Scale Client Spectrum Scale
Connectivity
Auto-deploy
Container Storage Interface (CSI) for ScaleStorage Provision
and Attachment
Application ContainerApplication ContainerApplication ContainerApplication Container
Application ContainerApplication Container
Application ContainerApplication ContainerApplication ContainerApplication Container
Containerized Spectrum Scale
Auto-deploy
Application ContainerApplication Container
IBM Confidential
Evolution of IBM Spectrum Scale Containers
14IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Scale for Containers v1
Spectrum Scale
(bare metal deployment)
Scale in a Container
Scale for Containers v2
SEC 2.0 CSI 1.0
Kubernetes
InfrastructureIBM & Partners
OS Support
InfrastructureIBM & Partners
OS Support
Spectrum ScaleSpectrum Scale
Kubernetes
OpenShift Interoperability
Spectrum Scale
InfrastructureIBM & Partners
OS Support
InfrastructureIBM & Expanded Partner Ecosystem
RHEL
OpenShift
CSI 1.1
Kubernetes
Spectrum Scale
Scale in a Container w/ CloudPaks
InfrastructureIBM & Expanded Partner Ecosystem
RHEL
OpenShift
CSI 1.2
Kubernetes
Spectrum Scale
Common Services
IBM Confidential
Evolution of IBM Spectrum Scale on Cloud
15IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Spectrum Scale(bare metal deployment)
InfrastructureIBM & Partners
OS Support
Spectrum Scale
Spectrum Scale IBM Cloud
Spectrum Scale
Spectrum Scale on AWS
InfrastructureIBM & Partners
OS Support
Spectrum Scale
AWS Common Services
AMI
Spectrum Scale Partner Solutions
InfrastructureIBM & Partners
OS Support
Spectrum Scale
Common Services
Scale in a ContainerOn Cloud
InfrastructureIBM & Partners
OS Support
CSI 1.2
Spectrum Scale
Scale in a Container
Common Services
Scale in a Container w/ CloudPaks(Multi-Cloud)
InfrastructureIBM & Partners
OS Support
FuturePartner and Scale Offerings
CurrentPartner and Scale Offerings
InfrastructureIBM & Expanded Partner
Ecosystem
RHEL
OpenShift
CSI 1.2
Kubernetes
Spectrum Scale
Common Services
IBM Confidential
Evolution of Hybrid Cloud with IBM Spectrum Scale
16IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Current
Spectrum Scale on AWS
InfrastructureIBM & Partners
OS Support
Spectrum Scale
AWS Common Services
AMI
Spectrum Scale IBM Cloud
Spectrum Scale
InfrastructureIBM & Partners
OS Support
Spectrum Scale(bare metal deployment)
InfrastructureIBM & Partners
OS Support
Spectrum Scale
Single Name Space w/AFM
Scale in a ContainerOn Cloud
InfrastructureIBM & Partners
OS Support
CSI 1.2
Spectrum Scale
Scale in a Container
Common Services
Spectrum Scale(bare metal deployment)
InfrastructureIBM & Partners
OS Support
Spectrum Scale
InfrastructureIBM & Expanded Partner
Ecosystem
RHEL
OpenShift
CSI 1.2
Kubernetes
Spectrum Scale
Common Services
Scale in a Container w/ CloudPaks(Multi-Cloud)
Future
IBM Confidential 17IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Outline
• AI Pipeline
• HDP goes Mainstream
• Containers
• Storage for AI
• Getting the data closer to the GPU
IBM Confidential
Start small and scale easilyfrom experiment to production at enterprise scale
18IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
IBM Confidential
NVMe Flash for AI and Big Data WorkloadsIBM Elastic Storage System 3000
19IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
All-new storage solution
• Leverages proven FS9100 technology
• Integrated scale-out advanced data management with end-to-end NVMe storage
• Containerized software for ease of install and update
• Fast initial configuration, update and scale-out expansion
• Performance, capacity, and ease of integration for AI and Big Data workflows
IBM Confidential 20IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Outline
• AI Pipeline
• HDP goes Mainstream
• Containers
• Storage for AI
• Getting the data closer to the GPU
IBM Confidential
Data Accelerator for AI and Analytics Infrastructure
21IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
• Performance Tier
• Maximize performance of storage: $/IOP & $/GB/s are key
• Low latency random I/O & High bandwidth sequential
• Relatively small compared to Capacity tier (say 5-25%)
• Can be Lower Durability, Lower Availability, Lower Reliability,if Architected properly
• No Geo-distribution
• Capacity Tier (aka “Data Lake”)
• Minimize the cost of storage: $/TB is key
• High Durability, Availability, Reliability, Geo-distribution
Hadoop / SparkML / DL
Prep ⇨ Training ⇨ Inference
IBM
SpectrumScale
High Performance Tier
Metadata Search Engine
Organizer / Porter
Data Lakes / Archive
Capacity Tier / Data Lake
IBM CloudObject Storage
IBM
SpectrumScale
NASFilers
Ing
est
Org
an
ize
An
aly
ze
ML
/ D
L
Servers with
CPUs & GPUs
Shared Storage
IBM Confidential
Accelerating data for NVIDIA GPUs
22IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
• NVIDIA Magnum IO is a collection of software APIs and libraries to optimize storage and network I/O performance in multi-GPU, multi-node processing environments. NVIDIA developed Magnum IO in close collaboration with storage industry leaders, including IBM.
• Collaboration with Nvidia continues to align Spectrum Scale’s pagepool with GPU memory (“pagepool tiering”)
https://devblogs.nvidia.com/wp-content/uploads/2019/08/GPUDirect-Fig-1-New.png
https://www.nvidia.com/en-us/data-center/magnum-io/
IBM Confidential 23IBM Spectrum Scale / Dec, 2019 / © 2019 IBM Corporation
Spectrum Scale AI
Questions?
24
IBM Confidential
Updated and Simplified IBM Spectrum Scale Picture
25IBM Spectrum Scale / June 07, 2019 / © 2019 IBM Corporation
Data Accelerator for AI and Analytics
IBM Spectrum ScaleAutomated data placement and data migration
Flash DiskShared Nothing
Cluster JBOD/JBOF
Spectrum Scale ECE
Global NamespaceManagement API (RESTful) Advanced GUI
Tape
Site A
Site B
Site C
Worldwide Data
Distribution (AFM) IBM CloudObject Storage
IBM Cloud
Transparent Cloud
Tier
C O N T A I N E R R U N T I M E
Container
Storage Enabler for Containers / Container Storage Interface
File
SMBNFS
POSIX
Analytics
TransparentHDFS
Micro- Services
Apps
S3
New Gen. &
Traditional
Applications
Object
S3 (Swift)
Swift
… and others
… and others
Runs o
n…
IBM Cloud