breaking io performance barriers: scalable parallel file system for aws
DESCRIPTION
Across all industries worldwide, HPC is helping innovative users achieve breakthrough results—from leading edge academic research to data-intensive applications, such as weather prediction and large-scale manufacturing in the aerospace and automotive sectors. As HPC-powered simulations continue to grow ever larger and more complex, scientists are looking for cost-effective high performance compute resources that's available when they need it. Access to on-demand infrastructure allows opportunities to experiment and try new speculative models. AWS provides computing infrastructure that allows scientists and engineers to solve complex science, engineering, and business problems using applications that require high bandwidth, low latency networking, and very high compute capabilities. Driven by its flexibility and affordability, many HPC and big data workloads are transitioning from on premise entirely onto AWS. But like on-premises HPC, maximizing application of ""HPC cloud"" workloads requires fast and highly scalable storage. Intel® Cloud Edition for Lustre Software has been purpose-built for use with the dynamic computing resources available from Amazon Web Services to provide the fast, massively scalable storage software resources needed to accelerate performance, even on complex workloads.TRANSCRIPT
© 2014 Amazon.com, Inc. and its affiliates. All rights reserved. May not be copied, modified, or distributed in whole or in part without the express consent of Amazon.com, Inc.
Breaking IO Performance Barriers:
Scalable Parallel File System for AWS
Paresh G. Pattani, Ph.D.
Sr. Director, High Performance Data Solutions
Intel Corporation July 10, 2014
The need for parallel storage
Parallel Storage Needs
• Time spent storing and retrieving data is time not
spent on compute. Fast storage maximizes
processing utilization.
Scalability
Reliability
Performance
• Growing datasets require greater amounts of storage
and the ability to expand existing storage.
• Large clusters and critical workloads require a
comprehensive focus on data availability.
Scale Out Storage Using Lustre*
• Purpose-built for HPC
• Distributed, Parallel, Vast Global Namespace
• Linux server based
• Linux, Windows and Mac client support
• Support for 100,000+ Clients
• Designed for Reliable Storage
• Now available on AWS Marketplace lustre.intel.com/cloudedition
* Some names and brands may be claimed as the property of others.
Intel Strategy for Lustre* Storage
Extend core Lustre* for use across
HPC and enterprise applications
Intel Enhanced Lustre* – HPC Clouds
Extend core Lustre* with key
features for new markets and use
cases
Push Lustre* onto HPC cloud
infrastructure
Open-source innovation driving
performance at scale
Open Source - Powerful storage
foundation for exascale applications
Increased scale and streaming
bandwidth
Accelerate maturity, lower risk
and grow the ecosystem
1 2
* Some names and brands may be claimed as the property of others.
Use Models: Cloud Resources for HPC
1 Augment: burst peak workloads and supplement resources
2 Transition: move on-premises HPC to cloud infrastructure
3 Deploy: launch new applications exclusively to the cloud
Key HPC Markets Using Lustre* Today
Large-scale Manufacturing
Weather and Climate
Life Sciences Energy Finance
* Some names and brands may be claimed as the property of others.
What Does Intel® Cloud
Edition for Lustre* Software
Look Like?
*Other names and brands may be claimed as the property of others.
MDS
MDS
Lustre* Components
Management Metadata Storage
Lustre* mount service
Initial point of contact
for clients
Namespace of file
system
File layouts, no data
Scalable
File content stored as
objects
Striped across targets
Scales to 100+
MGT
MDT
OST
OST
MGS
OSS
OSS
*Other names and brands may be claimed as the property of others.
Deploying a Storage Cluster
Deploying a Storage Cluster
Deploying a Storage Cluster
Deploying a Storage Cluster
Monitoring & Command Line Interface
Performance….
Large File Benchmark
Comparing 3 Lustre* cluster configuration
Increase the number of OSSs • 4 OSS
• 8 OSS
• 16 OSS
Configurations of MGS and MDS are the
same
We use 32 clients
MDS
EBS Optimized
RAID0
8x 40GB
Standard
110 MB/sec
m3.2xlarge
OSS
EBS Optimized 8x 100GB
Standard
110 MB/sec
m3.2xlarge
Client 110 MB/sec
m3.2xlarge
MGS 94 MB/sec
m1.medium
*Other names and brands may be claimed as the property of others.
IOR Sequential Read FPP
0
200
400
600
800
1000
1200
1400
1600
1 2 4 8 16 32
4OSS
8OSS
16OSS
N. Clients
MB/sec
Client’s network bottleneck
OSS’s network bottleneck
OSS’s network bottleneck
Close to the OSS network
0
200
400
600
800
1000
1200
1400
1600
1 2 4 8 16 32
4OSS
8OSS
16OSS
IOR Sequential Write FPP
N. Clients
MB/sec
Client’s network bottleneck
OSS’s network bottleneck
OSS’s network bottleneck
Ops….
Aggregate Performance During Run
• LTOP is available and
we use it to record the
OSTs activities during
the IOR run.
• With a simple python
script we create this
graph: “aggregate
performance vs time”
to analyze the problem.
time
1920
MB/sec
Long tail
Compare Lustre* and NFS
*Other names and brands may be claimed as the property of others.
Small File Benchmark
Simulated EDA Benchmark • Simulate workload by compiling a package
• untar; configure; make;
• Python wrapper parallelizes on cluster using MPI
• Calculate score based on (total workload/runtime)
32 Clients • Linux, c3.xlarge
Compare with NFS • Linux, i2.4xlarge
• 4x EBS RAID0
Lustre* Configuration
1 MGT • m3.medium
1 - 4 MDTs • m3.2xlarge
• 8x 40GB EBS
4 OSTs • c3.xlarge
• 8x 40GB EBS
*Other names and brands may be claimed as the property of others.
EDABench – Lustre* vs. NFS
0
2000
4000
6000
8000
10000
12000
1 2 4 8 16 32 64 128
EDABench Score
(Compile)
Processes (32 clients)
1 MDT
2 MDTs
4 MDTs
NFS
*Other names and brands may be claimed as the property of others.
Storage Instance Cost Comparison
• EBS Optimized for all storage instances
• Global Support for Lustre*
• Does not include EBS cost
Cluster Option Total Cost / Hour
Lustre* – 1xMDT + 4xOSS $2.00
Lustre* – 2xMDT + 4xOSS $2.69
Lustre* – 4xMDT + 4xOSS $4.07
NFS – i2.4xlarge $3.51
*Other names and brands may be claimed as the property of others.
Intel® Cloud Edition for Lustre* software
*Other names and brands may be claimed as the property of others.
Status Today
• Available on AWS Marketplace
• Setup in less than 10 minutes
• Try for yourself lustre.intel.com/cloudedition
lustre.intel.com/contactus
Thank You.