vsan trends from the trenches: what our largest global
TRANSCRIPT
#vmworld
HCI2049BU
vSAN Trends from the Trenches: What Our Largest Global Customers Are Doing
David Boone, VMware, Inc.Dave Morera, VMware, Inc.
#HCI2049BU
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Disclaimer
This presentation may contain product features or functionality that are currently under development.
This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
Technical feasibility and market demand will affect final delivery.
Pricing and packaging for any new features/functionality/technology discussed or presented, have not been determined.
2
The information in this presentation is for informational purposes only and may not be incorporated into any contract. There is no commitment or obligation to deliver any items presented herein. VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc.
Agenda
3
Trends from Global Customers
How are these trends changing the way they do business
How these trends change requirements
Tips, tricks, and lessons learned
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 4
OverviewvSAN Success in Global Enterprises
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 5
Variety of IndustriesOur team interacts with our largest customers
Financials / Insurance
Healthcare Automotive Communications Logistics
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 6
Key Areas of vSAN Design
APPLICATION WORKLOADS
SERVERS (CPU, MEMORY, NICS)
DISK SUBSYSTEMS NETWORKING BC/DR/AVAILABILITY REQUIREMENTS
ADVANCED FEATURES PROS AND
CONS
COST VS. EVERYTHING
VMworld 2019 Content: Not for publication or distribution
7©2019 VMware, Inc.
Trend 1: Tier 1 ApplicationsProper design and sizing
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 8
vSAN Clusters for Tier 1
• Not all workloads are equal
• Expect high performance
• High Availability
vSAN design is key
• Best practices
Not sizing properly
• SPBM
• Growth
Workload Profiles
• Read/write ratio
• Peak metrics
• Random vs. Sequential
Latency vs. IOPS
Overview Problems Observed
Design and SizingTier 1 Applications
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 9
Understand vSAN’s Distributed Object Model and how vSAN achieves Availability
Application Workloads – Prepare Yourself
Availability Branching
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 10
Build generic when possible
Application Workloads
• Read and write I/O ratios• Average read and write I/O sizes• Average and peak IOPS and throughput• Application-level replication?
Application characteristics
• Latency < “x” ms• Write throughput > “y” MB/sec • Data remains available with ”x” simultaneous failures in cluster
Define “success”
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 11
Workload Profile
Live Optics
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 12
Sizing and Performance
Servers
Clock speed more important than cores to vSAN
Plan 10% overhead for standard vSAN, 35% with Deduplication & Compression
Intel vs. AMD
CPU MemoryPlan 10% overhead for standard vSAN, 35% with Deduplication & Compression
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 13
Disk Subsystems
2 disk groups per host
Additional controller if > 2 DGs
Key vSAN performance metric
at OEM stats is “4KB Random Write IOPS”
800GB SSDs vSAN cache • Resyncs can use it – faster MTTR
• Exception: 375GB Intel Optane P4800X is so fast and parallel, its space is enough NVMe for caching
tierNVMe or SAS SSDs
for capacity tier
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 14
BC/DR/Availability
Snapshots are not backups
• Consider Secondary level of Failures To Tolerate (SFTT, SFTM)Stretched Cluster
• Maintenance mode counts as a Fault• Disks don’t immediately fail as Bad Blocks form,
only get corrected and repaired if the block is read
Consider FTT=2
RAID-6 is attractive but needs 20Gb+
vSAN network bandwidth
The worst data corruption bugs did
not impact customers not using DD&C
Name: General PurposeFTM: RAID-1FTT: FTT=1
Name: Dev/TestFTM: RAID-5FTT: FTT=1IOPS Limit: 1,000
Name: SQL ServersFTM: RAID-1FTT: FTT=2
Name: Default Storage PolicyFTM: RAID-1FTT: FTT=1
vSAN
VMworld 2019 Content: Not for publication or distribution
15©2019 VMware, Inc.
Trend 3: All-NVMe vSANSuperior Performance
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 16
vSAN Clusters with All-NVMe devices
• Future proof
• Great price compared to older SSDs
• Less bottlenecks
Application driven
• Tier 1 applications
• Demanding Workloads
Superior Performance
Unknown Success Criteria
Application Best Practices not applied
• Oracle RAC
• MSSQL Server
Network Stack Ignored
• Nic speed
• Switch Buffers
• Too many hops
• MTU
Overview Problems Observed
All-NVMe vSAN ClustersInvestment on latest tech
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 17
vSAN Performance on Hardware
All-flash: More predictable and responsive than hybrid
SATA protocol is 1-to-1. Locks bus. Avoid if possible
Storage controllers can be bottleneck.
Limited support of SAS expanders (due to performance)
NVMe. Fastest, simple (no external controller), and low CPU overhead
The cost/performance pyramid of storage device types
Capacity
3D XPoint cache / NVMe capacity
All NVMe
NVMe cache / SAS capacity
All SAS
NVMe cache / SATA capacity
SAS cache / SATA capacity
All SATA
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 18
I/O Flow
vSAN can go as fat as the hardware allows
• Potential Bottlenecks are based on hardware selected
• New tech = less obstacles
• A race car can only go so fast on a rocky road
Potential Bottlenecks
SATA/SAS – Queue Depth
Disk GroupDisk Group Disk Group
Single vSAN datastore across cluster
vSphere vSAN vSphere vSAN
Disk Controllers
SATA/SAS – Queue Depth
NetworkBuffer Size
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 19
NetworkingSwitching
16MB Port buffers
minimum
Deep buffer size (1GB+)
Port Speed
Active/Standby
vs.
LACP
network extenders QoS, NetQueue
16MB ➗ 48 Ports = 0.33MB
Switch Buffers
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 20
Often overlooked
NICs
NICs – virtualization offloading
30% IOPS gain -Mellanox CX4
Firmware and Driver KB 2030818 Native inbox drivers
VMworld 2019 Content: Not for publication or distribution
21©2019 VMware, Inc.
Trend 3: AutomationThe ”Easy Button”
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 22
Empower teams
• Faster Deployments
• Less time wasted
Faster Life Cycle Management (LCM)
• Upgrade solutions end-to-end with no downtime
Interoperability
Homebrewed Scripts - outdated
Knowledge base
• Too many components
Low visibility between solutions
Overview Problems Observed
Environment AutomationAccomplish more, faster
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 23
Deployment at Scale
Cluster-wide settings must be consistent
• Use Distributed vSwitchor NSX
• Active / Standby, Standby / Active, LACP
• vSwitch MTU, vmkernel MTU
• Advanced Settings
• Boot settings
• Power Management
Automation is Key
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 24
vSAN Automation Overview
vCenter vSAN Cluster
vSAN API vSAN API vSAN API
vSAN API
UI SDKCLI
vSAN API endpoint on ESXi
vSAN API endpoint on vCenter
UI - vSphere H5 / Embedded Host Client
CLI - PowerCLI, ESXCLI & RVC
SDK - Programming/Scripting languages
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 25
vSAN & PowerCLIvSAN also uses SPBM (Storage Policy Based Management) so make sure you know the following cmdlets
ReplicationStart-SpbmReplicationFailoverSync-SpbmReplicationGroupGet-SpbmReplicationGroupGet-SpbmReplicationPairStart-SpbmReplicationPrepareFailoverStart-SpbmReplicationPromoteStart-SpbmReplicationReverseStart-SpbmReplicationTestFailoverStop-SpbmReplicationTestFailover
RulesNew-SpbmRuleNew-SpbmRuleSet
Storage PolicyRemove-SpbmStoragePolicyNew-SpbmStoragePolicyImport-SpbmStoragePolicyGet-SpbmStoragePolicySet-SpbmStoragePolicyExport-SpbmStoragePolicy
OthersGet-SpbmCapabilityGet-SpbmCompatibleStorageGet-SpbmEntityConfigurationSet-SpbmEntityConfigurationGet-SpbmFaultDomainGet-SpbmPointInTimeReplica
https://github.com/jasemccarty
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 26
VMware Cloud FoundationAutomation for the entire stack
Network
Storage
Compute
Mgmt.
Cloud Foundation
Consistency & Security
StandardizedArchitecture
Full Stack Approach Built-in Security Apps/Services/Infrastructure Automation
Tested and Validated
Simplified Experience
VMware Cloud Foundation
Management Compute Storage Networking
Public Cloud EdgeData CenterVMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 27
Trend 4: Operationalizing vSAN Day 2 Operations
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 28
Other teams taking over Day 2 operations
• Scale up/out
• HW replacement
• Driver/FW/Bios Updates
vSAN is easy
• No training provided
Lack of knowledge from these teams
• Self inflicted outages
Best Practices not applied
May result in performance issues
Distributed Architecture
Overview Problems
Operations and Support teamsClick to edit optional subtitle
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 29
The vSAN Difference
Add capacity the way you want
Scale UP by adding drives
Scale OUT by adding hosts
Scale UP and OUT for maximum agility
vSAN Datastore
Scale Out
Sca
le U
p
vSphere vSANvSphere vSANvSphere vSAN
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 30
Disk Groups – Use as a Strategy for Growth
Easily design hosts to increase capacity, and performance without adding hosts or licenses
• Initial purchase consisting of only some drives populated
• In 12-18 months, populate remaining bays.
• Cycle out older devices in future purchasing cycle to increase density even further
Take advantage of technology improvements and market conditions
Cache
Capacity
Disk GroupDisk Group
All-Flash vSAN
Disk Group
Single vSAN datastore across cluster
vSphere vSAN
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 31
VMware Compatibility Guide (VCG) Rules for vSAN
BIOS
• Match ESXi version to BIOS version, can use equal or newer
NIC
• Match ESXi version to NIC firmware/device driver version, can use equal or newer
• Alternative: See KB 2030818
Storage Controller
• Match ESXi version to Storage Controller firmware version and device driver version. FW & DD versions must align in same row. Must be an exact match to what was tested. No newer versions till certified.
Disks
• If disk firmware is listed, Match ESXi version to disk firmware. Must be equal or newer
VMworld 2019 Content: Not for publication or distribution
©2019 VMware, Inc. 32
Fix Knowledge Gaps
vSAN Badges
Hands On Lab (HOL)
StorageHub
Videos
Train your staff
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution
VMworld 2019 Content: Not for publication or distribution