wekaio: extracting performance from modern hpc hardware · derek burke, regional account manager...
TRANSCRIPT
![Page 1: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/1.jpg)
1WekaIO Confidential © 2018 All rights reserved.
WekaIO: Extracting Performance from Modern HPC Hardware
Derek Burke, Regional Account ManagerChris Weeden, Senior Systems Engineer
![Page 2: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/2.jpg)
2WekaIO Confidential © 2018 All rights reserved.
Company Overview
COMPANY
Founded in 2013
US HQ in San Jose, CA
Eng, in Tel Aviv, Israel
• Founding team who built XIV, acquired by IBM
• Strong patent portfolio
• 54 Patents Filed
• 10 issued
• Matrix became GA in July 2017
• Showcased at AWS ReInvent in Dec. 2017
• Partner focused
• Rapidly growing customer base
$40.35M
INVESTMENTOUR ACCOLADES
WekaIO Matrix is the fastest, most scalable parallel file system for AI and
technical compute workloads, on premises and in the public cloud
WHAT WE DO
![Page 3: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/3.jpg)
3WekaIO Confidential © 2018 All rights reserved.
Data Lake
MatrixFS
Matrix Client
Single Namespace for Multiple Workloads
Ethernet or InfiniBand Network
Storage Servers
Unified Namespace
S3
Public
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
APP
SMBSMBNFSNFS
APP APP
GPU GPU
APP APP
GPU GPU
APP APP
GPU GPU
APP APP
GPU GPU
APP APP
GPU GPU
APP APP
GPU GPU
APP APP
GPU GPU
APP APP
GPU GPU
Microscopy Genomics Imaging Device Video Editing Media Rendering HPC OLTP
DatabasesData Mining Real-time
Analytics
Financial
Processing
& Analyses
Fraud
Detection
Geospatial
Research
![Page 4: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/4.jpg)
4WekaIO Confidential © 2018 All rights reserved.
Focused On Performance Use Cases
• Seismic
• Reservoir simulation
• Analytics
• Machine Learning\ AI
• Real-time analytics
• IOT
• Genomics sequencing and analytics
• Drug discovery
• Microscopy
• Algorithmic trading
• Business analytics (SAS Grid)
• Risk analysis (Monte Carlo simulation)
• CFD, Simulations
• EDA• Media rendering
• Transcoding
• Visual Effects (VFX)
![Page 5: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/5.jpg)
5WekaIO Confidential © 2018 All rights reserved.
We Make a Bold Claim…
From one rack of commodity hardware with 100GbE Network
Approximately:
• 400GB/s
• 30 Million IOPS
![Page 6: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/6.jpg)
6WekaIO Confidential © 2018 All rights reserved.
SPEC SFS2014 vda Results
Lower and Longer Are Better
1.8ms at highest latency
Test is 90% write intensive
![Page 7: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/7.jpg)
7WekaIO Confidential © 2018 All rights reserved.
SPEC SFS2014 eda Results
Lower and Longer Are Better
.45ms latency
Frontend is 50% stats test
Backend is 50% R and W
![Page 8: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/8.jpg)
8WekaIO Confidential © 2018 All rights reserved.
SPEC SFS2014 vdi Results
Lower and Longer Are Better
.42ms latency
Test is 60% random write
20% random read
![Page 9: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/9.jpg)
9WekaIO Confidential © 2018 All rights reserved.
SPEC SFS2014 db Results
Lower and Longer Are Better
.29ms latency
Test is 80% random read
20% random write
![Page 10: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/10.jpg)
10WekaIO Confidential © 2018 All rights reserved.
How do we do it…
![Page 11: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/11.jpg)
11WekaIO Confidential © 2018 All rights reserved.
Why Data Locality is Irrelevant
o Modern networks on 10Gbit Ethernet are 30 times faster than SSD for reads and 10 times faster than SSD for writes
o With right networking stack, shared storage is faster than local storage
0 20 40 60 80 100 120
100Gbit (EDR)
10Gbit (SDR)
SSD Write
SSD Read
Time it takes to Complete a 4KB Page Move
Microseconds
![Page 12: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/12.jpg)
12WekaIO Confidential © 2018 All rights reserved.
The Kernel is designed for multi-tasking…• The system needs to be notified of the new packet and pass the data onto a specially allocated buffer
sk_buff struct (Linux allocates these buffers for every packet).
• To do this, Linux uses an interrupt mechanism: an interrupt is generated several times when a new
packet enters the system. The packet then needs to be transferred to the user space.
• As more packets have to be processed, more resources are consumed negatively impacting the overall
system performance.
• sk_buff struct: the Linux network stack was originally designed to be compatible with as many protocols
as possible. As such, metadata for all of these protocols is included in the sk_buff struct, but that’s
simply not necessary for processing specific packets. Because of this overly complicated struct,
processing is slower than it could be.
• When an application in the user space needs to send or receive a packet, it executes a system call.
The context is switched to kernel mode and then back to user mode.
This consumes a significant amount of system resources
![Page 13: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/13.jpg)
13WekaIO Confidential © 2018 All rights reserved.
Context Switching is wasteful…
![Page 14: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/14.jpg)
14WekaIO Confidential © 2018 All rights reserved.
DPDK by-passes the Kernel…
![Page 15: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/15.jpg)
15WekaIO Confidential © 2018 All rights reserved.
Components of DPDK:
•Environment Abstraction Layer (EAL) : It is responsible for gaining access to low-level resources such as hardware and memory space.
•Memory Manager: Responsible for allocating pools of objects in memory. A pool is created in huge page memory space and uses a ring to store free objects.
•Buffer Manager: Reduces by a significant amount the time the operating system spends allocating and de-allocating buffers using advanced techniques such as Bulk Allocation, Buffer Chains, Per Core Buffer Caches etc.
•Queue Manager: Implements safe lockless queues, instead of using spinlocks, that allow different software components to process packets, while avoiding unnecessary wait times.
•Packet Flow Classification: DPDK Flow Classifier implements hash based flow classification to quickly place packets into flows for processing.
•Poll Mode Drivers: Instead of using Interrupts and wasting CPU attention, PMD uses polling (scanning the NIC whether packets arrived or not), and doesn't disturb the CPU at all!
![Page 16: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/16.jpg)
16WekaIO Confidential © 2018 All rights reserved.
We also use SR-IOV and SPDKMultiple Virtual NICs Benefits of SPDK
![Page 17: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/17.jpg)
17WekaIO Confidential © 2018 All rights reserved.
Software Architecture
Runs in LXC Container for isolation
Kernel Module
▪ VFS Interface
Front End
▪ POSIX Client
▪ Other protocol access
Back End
▪ Data placement
▪ Data protection
▪ File system metadata
▪ Tiering
Networking
▪ SRIOV for network stack
▪ I/O bypasses network stack
Storage Agent
▪ I/O bypasses the kernel
Front EndClient
Access
Networking
Flash
Storage
Device
Agent
Kernel Module
Back EndStorage
Services
Application
Hardware
User
Space
Kernel
NFS
ke
rne
l
SR
-IO
V
![Page 18: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/18.jpg)
18WekaIO Confidential © 2018 All rights reserved.
WekaIO Data Path
Front EndClient
Access
Networking
Flash
Storage
Device
Agent
Kernel Module
Back EndStorage
Services
Application
Hardware
User
Space
Kernel
NFS
ke
rne
l
SR
-IO
V
▪ Application IO (file operations)
▪ Access WekaIO Client as Local FS
▪ User-Space, Low-Latency
▪ POSIX-complete, high-perf
▪ Kernel Module for VFS integrationOR
▪ Client-side NFS
▪ Bottlenecked by Kernel
▪ Handled by WekaIO’s Front End
![Page 19: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/19.jpg)
19WekaIO Confidential © 2018 All rights reserved.
WekaIO Data Path
Front EndClient
Access
Networking
Flash
Storage
Device
Agent
Kernel Module
Back EndStorage
Services
Application
Hardware
User
Space
Kernel
ke
rne
l
▪ Application IO (file operations)
▪ Access WekaIO Client as Local FS
▪ User-Space, Low-Latency
▪ POSIX-complete, high-perf
▪ Kernel Module for VFS integrationOR
▪ Client-side NFS
▪ Bottlenecked by kernel
▪ Handled by WekaIO’s Front End
▪ WekaIO Front-Ends are Cluster-Aware
▪ Incoming Read Requests optimized
re Location & Loading Conditions
▪ Incoming Writes can go anywhere
▪ Metadata fully distributed
▪ No redirects required
▪ SR-IOV optimizes Network access
NFS
WekaIO Cluster
SR
-IO
V
![Page 20: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/20.jpg)
20WekaIO Confidential © 2018 All rights reserved.
WekaIO Data Path
Front EndClient
Access
Networking
Flash
Storage
Device
Agent
Kernel Module
Back EndStorage
Services
Application
Hardware
User
Space
Kernel
ke
rne
l
SR
-IO
V
▪ Application IO (file operations)
▪ Access WekaIO Client as Local FS
▪ User-Space, Low-Latency
▪ POSIX-complete, high-perf
▪ Kernel Module for VFS integrationOR
▪ Client-side NFS
▪ Bottlenecked by kernel
▪ Handled by WekaIO’s Front End
▪ WekaIO Front-Ends are Cluster-Aware
▪ Incoming Read Requests optimized
re Location & Loading Conditions
▪ Incoming Writes can go anywhere
▪ Metadata fully distributed
▪ No redirects required
▪ SR-IOV optimizes Network access
▪ WekaIO directly accesses NVMe flash
▪ Bypassing kernel, better perf
NFS
WekaIO Cluster
![Page 21: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/21.jpg)
21WekaIO Confidential © 2018 All rights reserved.
File System Scales Linearly with Cluster Size
K
1M
2M
3M
4M
5M
6M
7M
8M
0 30 60 90 120 150 180 210 240
IOP
S
Cluster size
Linear Scalability - IOPS
100% random write (IOPS) 100% random read (IOPS)
0.2
0.3
0.4
0.5
0.6
0 30 60 90 120 150 180 210 240
Mill
iseconds
Cluster size
Linear Scalability – Latency (QD1)
read latency (ms) write latency (ms)
0
10
20
30
40
50
60
70
80
90
100
0 30 60 90 120 150 180 210 240
GB
/Second
Cluster size
Linear Scalability - Throughput
100% write throughput (GB) 100% read throughput (GB)
Test Environment – 30-240 R3.8xlarge cluster, 1 AZ, utilizing 2 cores, 2 local SSD drives & 10GB of RAM on each instance. About 5% of CPU/RAM.
~30K OPS/AWS Instance
~375MB/sec/AWS Instance
<400 microsecond latency
![Page 22: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/22.jpg)
22WekaIO Confidential © 2018 All rights reserved.
WekaIO Matrix #1 File System on SPEC
WekaIO
Oracle
NetApp
DDN
0
1000
2000
3000
4000
5000
6000
7000
8000
Video Streams SW Build Database EDA VDI
Ben
chm
ark
Me
tric
Summary of SPEC SFS 2014 Testing
WekaIO #2 Vendor
Benchmark #1
Position
Score ORT
(ms)
#2 Position Score ORT
(ms)
Software build WekaIO 5700 0.26 NetApp 4200 0.78
Database WekaIO 4480 0.34 Oracle 2240 0.78
Engineering design WekaIO 2000 0.48 Oracle 900 0.61
Video streams WekaIO 6800 1.56 DDN 3400 50.07
Virtual desktop WekaIO 1600 0.48 DDN 800 2.58
![Page 23: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/23.jpg)
24WekaIO Confidential © 2018 All rights reserved.
#1 File System on the IO500 Test
o 31% better than World’s largest Supercomputer
o 85% better than Lustre
0
10
20
30
40
50
60
70
DDN Lustre IBM Spectrum Scale WekaIO
IO500 Ten Node Challenge Score
Test measures bandwidth
and metadatahttps://www.vi4io.org/io500/list/19-01/10node
![Page 24: WekaIO: Extracting Performance from Modern HPC Hardware · Derek Burke, Regional Account Manager Chris Weeden, Senior Systems Engineer. ... SPEC SFS2014 db Results Lower and Longer](https://reader034.vdocuments.us/reader034/viewer/2022050323/5f7d7d1b3379e44fda087c77/html5/thumbnails/24.jpg)
25WekaIO Confidential © 2018 All rights reserved.
https://docs.weka.io https://start.weka.io
Thank you.