nanosecond scale storage...2017/03/31  · 3/22/2017 1 nanosecond scale storage: ultrafast ssds and...

24
1 3/22/2017 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform Technologies Western Digital

Upload: others

Post on 19-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

13/22/2017

Nanosecond Scale Storage:Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs

Zvonimir Z. Bandic

Next-Gen. Platform Technologies

Western Digital

Page 2: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Acknowledgments

•Qingbo Wang

• Filip Blagojevic

•Md Kamruzzaman

•Martin Lueker-Boden

3/22/2017 2

•Dejan Vucinic

•Damien LeMoal

• Cyril Guyot

• Steffen Hellmold

Page 3: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Agenda

• What are emerging NVM?

• Programming models

CPU memory

Fast block storage

• Prototyping and performance

• Large scale deployment

RDMA networking

NVMe over fabrics

• Conclusions

3/22/2017 3

Page 4: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

DRAM System Scaling Challenges

• DRAM is expensive, and does not scale well beyond 4TB per node And at that point system cost and DRAM cost are prohibitively high

• Big Data analytics, in-memory DB, HPC all require a lot of memory, not just in a single node, but rack/data center level

Source: Microsoft, 2015

3/22/2017 4

Page 5: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Data Centric Compute Architectures

MobileData CenterRack Scale

Architecture

Data CenterCheap CPUs Around

PB of SCM

Client Compute

SCM complements DRAM forcompute intensive clients

Large memory requirementsfor Virtual Reality

T I M E

AP

eDRAMeMRAM SCM

ModuleDDR /New

CPU / CS

SRAM /eMRAMeDRAM

SSD

CNTLRPCIe-NVMe

SCM Module

NANDTM

DDR / New

DRAM / MRAM

DDR GPU

3/22/2017 5

Page 6: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Extending Storage to 250ns Latency

Source: Western Digital estimates

Core

Register

Core

L1 Cache

Core

L2 Cache

Shared

L3 Cache

DRAM HDD

3/22/2017 6

Page 7: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Memory Architecture Models

• Type b) can be implemented via customized memorycontroller (Intel) or new programming model

For example, new data structure types

Requires significant rewriting of OS/applications

• Type c) requires OS improvements – such as rewritingswap() or coming up with new memory architectures, but less impact on applications

• Best understood model, with direct application for large in-memory DB (Oracle, IBM) and web-analytics on “Big Data”

• OS changes are needed

– E.g. already ready in Linux and Windows

SCM as memory SCM as storage

SCM

SCMDRAM

SCMDRAM

Desired memory Desired “memory”

SCM is exposed as ramdiskor ultrafast PCIe block device(see research nvswap() project)

SCM is exposed as memory

DRAM

SCM

SCM is exposed as ramdiskor ultrafast PCIe block device

in-mem DBmmap()

Optimized accessfor mmap()-edfiles via ext4-dax

a)

b)

c)

3/22/2017 7

Page 8: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Where Should We Attach Non-volatile Memory?

3/22/2017

CPU bus - parallel CPU bus - serial Serial peripheral bus

Physical interface

DIMM DIMM/other PCIe

Logical interface

Non-standard DDR4, NVDIMM-P CCIX, OpenCAPI 3.1, Rapid-IO, gen-Z NVMe, DC express*

Pros -Low latency-High bandwidth-power proportional-coherent through memory controller

-High bandwidth-Significant pin reduction-Higher memory bandwidth to CPU-Coherent through memory controller, or in some cases can even present lowest point of coherence

-Standardized-CPU/platform independent-Latency low enough for storage-Easy RDMA integration-Hot pluggable

Cons -CPU memory controller has to implement specific logical interface

-Not suited for stochastic latency behavior-Not hot pluggable-BIOS needs change

-CPU memory controller has to support-May have higher power consumption

-Higher latency (~1us)

8

Page 9: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

PCIe Attached Non-volatile Memory Block DeviceShown at FMS 2014

Western Digital Innovation – NVMe

• Polling and polling driver work

– DC Express

• New, leaner PCIe storage protocol

• Minimizes number of packets per command

3/22/2017 9

Page 10: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

DC Express Prototype Device Performance

• FPGA prototype devices demonstrated on Flash Memory Summit in 2014

• Proprietary low latency technology from HGST

– DC express: No doorbells, no completions

• User library proprietary DC express driver

– QD=1 performance is 1.8us for 512B (vs. NVMe of ~4.3us)

– At high QDs, 99.9% of IOs complete within 5.5 us

Demonstrated new, low-latency interface technology – DC express in 2014

3/22/2017 10

Page 11: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

DRAM+SCM as Memory – Using Swap()

We have studied in latency observed by application

In most cases between 1K-32K cycles (300ns-10us) assuming DRAM-like swap() performance

We have also observed very long tail behavior – up to 16M cycles (corresponding to 4 ms), which is a consequence of reactive behavior of Linux swap() – and can be improved

3/22/2017

Ramdisk or fastblock device

11

Page 12: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

DRAM+SCM as Memory – Using Swap()

We have also studied what happens if ramdisk device has additional slowdown due to the technology itself – in the range between 0 and 5 us

Clearly see increased execution time as latency increases

Also, faster swap devices can process more page faults (as expected)

Slowdown difference between 100ns and 500ns is not dramatic – at least for MCF benchmark

3/22/2017

MCF benchmark

Ramdisk or fastblock device

12

Page 13: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Using SCM as Storage – Comparison Between PMEM and Block*

• We compare latencies of file system level access to DRAM vs. DC express PCIeprototype card vs. standard SSD

• Tremendous potential of SCM technology is getting DRAM-like performance at storage level persistence and even APIs/access models

• PCIe block device is obviously slower, however due to file system overhead and QoSimpact of operating system, the difference is not dramatic– PCIe device may be very interesting in the early phase of the market, prior to standardization

Includes file system overhead

3/22/2017

PMEM/NVDIMM PCIe-DCe userlib

SSD

13

Page 14: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

NVMf Host Software Architecture

• Common code was extracted from NVMePCIe driver

• NVMe Fabrics driver is new, incorporated into NVMe driver

• Other driver, stack, and FS modules are unmodified

• RoCE, iWARP, Infiniband – all supported

HCA Driver

RDMA Stack

PCIe Stack

NVMe Fabrics Driver(new)

NVMe PCIeDriver

(minus common code)

Common

Block Layer

File System

Application

Generic Block Commands

Generic FS OperationsBlock

Device I/O

NVMeCommands

By IOCTL

InfiniBand / RoCE / iWARP

PCI Express

3/22/2017 14

Page 15: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

NVMe over Fabrics Controller Architecture

• Target devices include

RAM disk

NVMe device

Other NVM SATA/SAS devices

RoCE, iWARP, Infiniband – all supported

©2017 Western Digital Corporation or its affiliates. All rights reserved. Confidential.

HCA Driver

RDMA Stack

NVMe Fabric Target Driver (new)

NVMe Target Driver (new)

Storage Devices

Block Layer

InfiniBand / RoCE / iWARP

NVMe PCIe Devices

NVMe Driver

3/22/2017 15

Page 16: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Performance Measurements (with polling)

• Over Infiniband

• Added polling on the host side

On the controller side the Ramdisk driver always executes synchronously

• Latency (end-to-end) is 8 us:

Network latency contribution is <7 us

-2

0

2

4

6

8

10

12

14

0 2 4 6 8 10 12 14 16 18 20

Perc

enta

ge o

f IO

s[%

]

Latency [us]

NVMe fabrics performance Random Read

NVMe fabric to Ramdiskwith poll polling nQ=1

3/22/2017 16

Page 17: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Remote eNVM Has Performance Similar to Remote DRAMRaw RDMA access to remote PCM via PCIe peer2peer is 26% slower than to DRAM

3/22/2017

Raw RDMA

17

Page 18: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Memory, Storage Fabric Standardization

• Transition from CPU to non-volatile memory centric architectures

• Industry standards efforts ongoing NVDIMM-P / NVDIMM-N

Gen-Z

CCIX

OpenCAPI

Rapid-IO

Open industry standards key to broad-based nanosecond-class storage adoption and capital investments

CPU-centric architecture NVM centric architecture

2018 2020

Cache coherent memory controller

NVMe over fabrics controller

Cache coherent memory controller

3/22/2017 18

Page 19: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Conclusions

• Emerging NVMs programming models

CPU attached memory

Ultrafast block device

• Both models have their pros and cons, primarily related to interface standardization

• Putting persistent memory on the network

Need of fast fabrics – i.e. RDMA Ethernet

Protocols for block storage: NVMe over fabrics

• Network latency needs to be similar or better than the latency of underlying persistent memory resource

Network is a new bottleneck

Memory fabrics will be required: Gen-Z, OpenCAPI etc.

• More standardization is required

Simplification of memory interface options

Standardization of memory fabrics

3/22/2017©2017 Western Digital Corporation or its affiliates. All rights reserved. Confidential. 19

Page 20: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

3/22/2017 20

Page 21: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Western DigitalA Storage Solutions Leader

21

• In a strong strategic position to lead global evolution of broad-based and changing storage industry

• Broad storage portfolio, including HDDs, SSDs, embedded and removable flash memory, and storage-related systems

• 13,500+ active patents worldwide

• Vertically integrated business model to maximize operational efficiency

• Consistent profitable performance, strong free cash flow

3/22/2017

Page 22: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Broadest Portfolio of Products & Solutions

22

Client Devices

Client Solutions

Data Center Devices & Solutions

3/22/2017

Page 23: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

Performance Measurements

• Over Infiniband

• 13 us latency at QD=1 for random reads

Sub-10 us network contribution

• Further improvements

Polling library should remove 3 us from the local device

2-3 us additional improvement in network contribution should be possible

3/22/2017 23

Page 24: Nanosecond Scale Storage...2017/03/31  · 3/22/2017 1 Nanosecond Scale Storage: Ultrafast SSDs and Persistent Memory Applications of Emerging NVMs Zvonimir Z. Bandic Next-Gen. Platform

243/22/2017

Thank You