intel builder’s conference -netapp · intel builder’s conference -netapp john meneghini –data...
TRANSCRIPT
Intel Builder’s Conference - NetApp
John Meneghini – Data ONTAP NVMe-oF Target Architect
Madhu Pai – Data ONTAP NVMe-oF Transport Architect
April 21, 2017
V1.2
© 2017 NetApp, Inc. All rights reserved..1
Introduction
1) Data ONTAP SAN Engineering FC & iSCSI Transport engineering teams
SCSI Target protocol engineering team
QA and Host Interop teams
2) Active at T10, T11 and NVMexpress.org Many TPARs, TPs, and ECNs
Many T10 and T11 Proposals
3) Adding support for NVMe-oF to Data ONTAP For more information see: http://www.netapp.com/us/media/wp-7248.pdf
4) SPDK provides the following that NetApp wants leverage NVMe-oF virtual Target & Initiator protocol engine
RDMA (and FC-NVMe) transports
Libraries, unit tests, scripts and tools
© 2017 NetApp, Inc. All rights reserved.2
3
What NetApp Wants To Use
Drivers
StorageServices
StorageProtocols
iSCSI Target
NVMe-oF*Target
SCSI
vhost-scsiTarget
NVMe
NVMe Devices
NVMe-oF*
Initiator
Intel® QuickDataTechnology
Driver
Block Device Abstraction (BDEV)
Ceph RBD
Linux Async IO
Blob bdev
NetAppbdev
VirtualNVMe
NVMe*
PCIeDriver
Current Future
vhost-blkTarget
ObjectIntegration
RocksDB
Ceph
Core
ApplicationFramework
Libraries
bdev json event copy conf nvmftrace log util
4
What Got My Attention
Apl 2017: First SPDK Summit
Sept 2015: nvme driver on github
Jan 2016: first external contributor
2013: spdk starts as INTEL® internal project
Jun 2016: NVMe-oF* Target
NetApp Likes/Dislikes
1) SPDK Libraries, Modules and APIs (Likes) The BDAL API is what made SPDK work for NetApp
NetApp likes the modularity and APIs in SPDK
Improved modularity and APIs make it even better
2) DPDK and /usr/lib dependencies (Dislikes) The DPDK environment doesn’t work for our application (Data ONTAP)
Expand the SPDK EAL (env_dpdk) to abstract DPDK dependencies
Abstract dependencies on Posix
3) Threading model (Dislikes) Would like a flexible, dynamic threading model supported by SPDK
4) Management plane (Dislikes) Would like to improve the NVMe-oF management APIs
5) Tired of chasing the tip of master, not knowing what will show up next
© 2017 NetApp, Inc. All rights reserved5
NetApp’s Vision for SPDK
1) Open source library that supports enterprise applications Well defined APIs that support a variety of use cases
Support for multiple platforms and execution environments
Don’t even assume user space
Enterprise class Reliability, Availability and Supportability
2) Community collaboration Better distribution lists (e.g. separate lists for code-reviews)
Public bug reporting and code reviews
Shared test environment and automation
Feature branch development
3) Governance True open source project governed by community members
Feature roadmaps and schedules
© 2017 NetApp, Inc. All rights reserved.6
Development Effort
Core
Value-Add
Shared
Proprietary
Proprietary
Functionality
SPDKShared
NetApp’s Vision for SPDK - Simplified
Topics Agenda
• Platform Abstraction
• NVMe-oF Transport Improvements
• Support for NVM Protocol Features
• NVMe-oF Management APIs
• Enterprise Readiness with RAS
• NVMe-oF Target Threading Model
© 2017 NetApp, Inc. All rights reserved.8
Platform Abstraction Improvements
1) Abstract dependencies on DPDK Improvements to env.h and env_dpdk/env.c
Not all modules effected (e.g. vhost)
2) Abstract dependencies on Posix APIs and User libs Pthreads abstracted
/usr/include, /usr/lib abstracted
3) Makefile improvements Better compile tool chain support
Support for: armv8a, spaa2, thunderx, xgene1, power8
Support for different compilers
Optional build targets
Only build the libraries and applications I want
C.f. dpdk/config
© 2017 NetApp, Inc. All rights reserved.9
Agenda
• Platform Abstraction
• NVMe-oF Transport Improvements
• Support for NVM Protocol Features
• NVMe-oF Management APIs
• Enterprise Readiness with RAS
• NVMe-oF Target Threading Model
© 2017 NetApp, Inc. All rights reserved. 10
FC-NVMe and RDMA Transports
1) Interested in SPDK FC-NVMe and RDMA Transports
2) Changes to struct spdk_nvmf_transport expected FC-NVMe and RDMA differences
Feature branch desired
3) Changes to SGL infrastructure expected Some applications require more than 2 SGL entries
4) RDMA Transport Improvements Verbs API in rdma.c to abstract user OFED libraries
C.f. Linux NVMe-oF RDMA layering
© 2017 NetApp, Inc. All rights reserved.11
Agenda
• Platform Abstraction
• NVMe-oF Transport Improvements
• Support for NVM Protocol features
• NVMe-oF Management APIs
• Enterprise Readiness with RAS
• NVMe-oF Target Threading Model
© 2017 NetApp, Inc. All rights reserved 12
NVM Protocol Improvements (Virtual Target)
1) Additional NVM Commands Abort and Identify command improvements
Persistent Reservations
Fuse Operations, Compare and Write
2) Additional Controller features Scaling Controllers, QPs and Namespaces
Support for static Controllers
Namespace sharing
Namespace mapping
Submission Queue flow
WRR with Urgent Priority Arbitration
3) In-band Namespace Management NVMe Specification Improvements
Name Space Management and Attach command support
© 2017 NetApp, Inc. All rights reserved 13
HardwareDrivers
DPDK EAL + Posix Libraries
StorageServices
Storage Protocols
NVMe-oF Improvements
© 2017 NetApp, Inc. All rights reserved. 14
Linux Kernel (RHEL 7, Debian 8)
Hardware
NVMe-oFTarget
Storage
Block DeviceAbstraction Layer
(BDAL)
BDAL ExtensionModule
New
OFED
Objectives:
Complete SPDK EAL
Add POSIX Abstractions
Add FC-NVMe Transport
Add FCT and Verbs API
Develop NVM Protocol
HCA
NVMe-oFInitiator
RDMA FC-NVME
User Verbs
FCT API
CurrentSPDK EAL and POSIX abstractions
Verbs API
FCT
FC Driver
UIO
HBAHCA
DirectVirtual
NVMProtocol
RNIC Driver
HCA
Agenda
• Platform Abstraction
• NVMe-oF Transport Improvements
• Support for NVM Protocol Features
• NVMe-oF Management APIs
• Enterprise Readiness with RAS
• NVMe-oF Target Threading Model
© 2017 NetApp, Inc. All rights reserved 15
Management Plane improvements (OOB)
1) Support for Administratively configuring Subsystems and Controllers Add a Subsystem with no Subsystem Ports or Namespaces
Start a Subsystem or stop a Subsystem (forces disconnect)
Configure number of Controllers per Subsystem
Add and remove Hosts from a Subsystem (dynamic discovery service)
2) Support for Administratively configuring Namespaces Out of band Namespace Management
Mapping NVMe-oF Hosts to NVMe Namespaces (ACL)
Support both private and shared Namespaces (ACL)
Adding and removing Namespaces (AEN support)
3) Support for Administratively configuring Subsystem Ports Subsystem port online/offline (disconnect)
Subsystem port add/remove
Dynamic updates to Discovery Service
© 2017 NetApp, Inc. All rights reserved.16
Agenda
• Platform Abstraction
• NVMe-oF Transport Improvements
• Support for NVM Protocol Features
• NVMe-oF Management APIs
• Enterprise Readiness with RAS
• NVMe-oF Target Threading Model
© 2017 NetApp, Inc. All rights reserved 17
SPDK and DPDK Reliability, Availability, Supportability
1) Improve Logging (log.c) Well defined APIs
Implementation needs to be abstracted (e.g. with a constructor module)
2) Improve Tracing (trace.c) Well defined APIs
Implementation needs to be abstracted
3) First Failure Detection and Capture (FFDC) Trigger and dump traces and logs on error or exception
4) Add performance counters and histograms Counters in the perf path
Programmable, not always on
5) Better error handling Don’t panic or abort() on error
© 2017 NetApp, Inc. All rights reserved 18
Agenda
• Environmental Abstraction
• NVMe-oF Transport Improvements
• Support for NVM Protocol Features
• NVMe-oF Management APIs
• Enterprise Readiness with RAS
• NVMe-oF Target Threading Model
© 2017 NetApp, Inc. All rights reserved19
SPDK Threading Model
1) Hybrid Polling Implemented in the Linux 4.10 kernel
2) SPDK Libraries should be thread model independent Different applications will use different threading models
3) Changes to the NVMe-oF Target threading model Scaling threads in an NVMe-oF Target Subsystem
More than 1 thread per Subsystem
Dynamic thread association with Controllers
Based upon dynamic controller creation
Dynamic thread association with Queue Pairs
As QPs scale, threads scale
© 2017 NetApp, Inc. All rights reserved 20
SPDK Threading Model (Limitations)
1) The number of threads are static. This results in wasted cores and cycles if we pre-provision the NVMe-oF threads.
2) Flexible thread model where there is a dynamic association of threads and their work.
3) The model of binding a subsystem to a thread does not scale in our environment.
4) The queuing architecture supported by the FC hardware does NOT tailor well for dynamic creation/deletion of queue pairs as done in RDMA.
© 2015 NetApp, Inc. All rights reserved 21
SPDK Threading Model (Requirements)
1) Dynamic threading model that activates and quiesces threads.
2) Dynamically create an NVMe-oF Subsystem and associate it with a thread.
3) Break the subsystem IO traffic to thread affinity.
4) Support the same lockless semantics.
5) Enhancements should be compatible with future releases of SPDK.
© 2015 NetApp, Inc. All rights reserved 22
SPDK Threading Model (Terminology/Extensions)
1) Hardware Queue Pair (HWQP) is the basic “unit” for polling The HWQP specifies a set of FC-NVMe queues that work together to provide the Send/Completion QP.
In the new model, a thread would poll one or more HWQPs.
An HWQP has affinity to a thread.
2) Subsystems would still have a thread affinity for the event handling. The thread “owning” the subsystem is the Master thread (for that subsystem)
3) IO and Admin queue pairs (IOQP/AQP) are spread across many HWQPs.
4) Threads polling the HWQPs are “Poller” threads. A “Master” thread could be a “Poller” thread (depending on the subsystem).
© 2015 NetApp, Inc. All rights reserved23
HWQP Layout
© 2015 NetApp, Inc. All rights reserved24
SPDK Threading Model (Operation)
1) All subsystem data “owned” by the Master thread.
2) Poller threads process IO by polling the HWQPs (and hence the IOQPs).
3) Master thread propagates a cache of needed data to the Poller threads that is needed during the lifecycle of an IO.
4) Poller threads route NVMe admin (and fabric) commands to the Master thread for processing.
5) Master thread co-ordinates out-of-band management commands and verifies that required caches in the Poller threads are set up correctly.
© 2015 NetApp, Inc. All rights reserved 25
SPDK Threading Model (Operation)
© 2015 NetApp, Inc. All rights reserved 26
Thank You
Questions?
© 2017 NetApp, Inc. All rights reserved.27