vmworld 2015: conducting a successful virtual san proof of concept
TRANSCRIPT
Conducting a Successful Virtual SANProof of ConceptCormac Hogan, VMware, IncJulienne Pham, VMware, Inc
STO4572
#STO4572
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
CONFIDENTIAL 2
CONFIDENTIAL 3
4
Agenda
1 Introduction to STO4572 Session
2 Introduction to Virtual SAN
3 Initial consideration for a proof of concept on VSAN
4 Tools available to conduct a successful proof of concept
5 POC validation scenarios
6 Measuring Performance
7 Moving from POC to Production
CONFIDENTIAL
5
This Session…• Virtual SAN has been available for 18 months
• VMware recognizes that conducting a Virtual SAN proof of concept can be challenging
• Since the launch of Virtual SAN, additional tools for managing, monitoring and troubleshooting Virtual SAN have become available
• In this session, the tools available to vSphere and Virtual SAN administrators will be discussed, and how they can help deliver a Virtual SAN proof of concept
• The session will also cover considerations of moving Virtual SAN from POC to production
CONFIDENTIAL
Unprecedented Customer Momentum
2000+ Customers in the first 15 months
In my experience VMware solutions are rock solid…we’re ready to nearly double our VSAN deployment.
“”
It really did work as advertised…the fact that I have been able to set it and forget it is huge!
“”
CONFIDENTIAL 6
7
Introduction to VMware Virtual SAN• Storage scale out architecture
built into the hypervisor• Aggregates locally attached storage
from each ESXi host in a cluster• Dynamic capacity and
performance scalability• Flash optimized storage solution • Fully integrated with vSphere and interoperable:
• vMotion, DRS, HA, VDP, VR …• VM-centric data operations
+ + + ++ + +
…
+
DatastoreVirtual SAN
CONFIDENTIAL
Proof of Concept ConsiderationsBefore you start …
Before Considering a Virtual SAN PoC
Accelerate Use Case
Planning Outcome
CONFIDENTIAL 9
10
Organization Challenges
Culture Barrier• The fear about what
you do not know and the lack of control and visibility
Storage team operations• New methodology• New way to see
things and operate• Converged compute
and storage
Support• Single Point
of Contact• No vendor
finger pointing
CONFIDENTIAL
11
Technical Requirements
• EVO:RAIL, VSAN Ready Node or Do-it yourself• Uniform configuration
Hardware
• Shared Network VS Dedicated• Distributed Switch VS Standard• Multicast
Networking
• Controller choices• RAID0 VS Pass-through• SSD/HDD Ratio Choices• Performance VS Endurance• SAS Expanders
Storage
CONFIDENTIAL
What I Need to Be SuccessfulTools to conduct a successful Virtual SAN POC
13
Success Tool #1: Health Plugin• Introduced with Virtual SAN 6.0
• Incorporate in the vSphere Web Client
• Virtual SAN Health Check tool include:– General Health– Proactive tests– Virtual SAN HCL health– Physical disk health
• Especially useful to observe injected errors and verifying that they have been remediated
CONFIDENTIAL
14
Success Tool #1: Health Plugin
• Proactive tools running on Virtual SAN cluster and pre-production tests– VM Creation test– Storage Load test– Multicast Performance test
CONFIDENTIAL
Success Tool #2: RVC/Virtual SAN Observer• Native tools installed on VCSA and on VC Windows
• Used for Configuration and Status of the Virtual SAN Cluster
• For Performance and Activity monitoring on demand– VM level– Host level– VMDK level– HDD/SSD Level
• Any anomalies will show up with the metric in question shown in red
CONFIDENTIAL 15
16
Success Tool #2: RVC/Virtual SAN Observer
vsan.apply_license_to_cluster
vsan.enable_vsan_on_cluster
vsan.disable_vsan_on_cluster
vsan.clear_disks_cache
vsan.cluster_change_autoclaim
vsan.cluster_set_default_policy
vsan.enter_maintenance_mode
vsan.fix_renamed_vms
vsan.object_reconfigure
vsan.host_wipe_vsan_disks
vsan.recover_spbm
vsan.reapply_vsan_vmknic_config
Cluster
vsan.check_limits
vsan.check_state
vsan.cluster_info
vsan.cmmds_find
vsan.whatif_host_failures
vsan.resync_dashboard
Diskvsan.disk_object_info
vsan.disks_info
vsan.disks_stats
Hostvsan.host_info
vsan.host_consume_disks
Networkingvsan.lldpnetmap
VMvsan.vm_object_info
vsan.vm_perf_stats
vsan.vmdk_stats
vsan.obj_status_report
vsan.object_info
Troubleshootingvsan.support_information
vsan.observer
Virtual SAN Operation Virtual SAN Information
Virtual SAN Monitoring
CONFIDENTIAL
17
Success Tool #3: Virtual SAN Pack for vROps• Integrate to the comprehensive vSphere monitoring software vRealize Operations 6.0.1
• Available on Advanced or Enterprise Edition
• Collect SSD/HDD disk performance across the cluster
• Collect SMART information
• Monitor information across multiple level : – disk group– host– cluster– datacenter
CONFIDENTIAL
18
Custom DashboardsIn the VSAN cluster• Disk Group Throughput • SSD/MDs Information• Capacity Usage by hosts
CONFIDENTIAL
CONFIDENTIAL 19
Success Tool #4: Log Insight• Built-In with VMware - vSphere
• Troubleshooting tool
• Logging Analytic tools
• Any Virtual SAN failure can be correlate between hosts and disk groups
• Track Virtual SAN operations
Storage – VSAN view
Storage – VSAN Interactive Analytic view
Validation ScenariosExpected outcomes from various activities
PoC Validation• What are the most important test validation?
1. Successful VSAN configuration2. Successful VM deployments on VSAN datastore3. VM Availability in the event of failures (host, storage device, network)4. VSAN serviceability5. VM Performance meets expectations
CONFIDENTIAL 21
CONFIDENTIAL 22
Case #1 – Successfully Deploy VSAN• Ensure correct vSphere versions
• Appropriate licenses are available (if PoC is going to take a long time)
• Ensure network is in place. Remember multicast requirement, so prep the network team.
• Minimum of three servers.
• Minimum of three servers contributing storage:– At least one storage controller – check the HCL, verify drivers and firmware are valid– At least one flash device (SSD, PCIe) for cache – make sure these are on HCL– At least one magnetic disk or flash device for capacity – check the HCL– Or consider VSAN Ready Nodes as an option …
Remember, the VSAN Health Check will do most of this work for you
CONFIDENTIAL 23
Case #1 – Successfully Deploy VSAN
Run this after every test!
Also use it to make sure you
fixed the problem you previously
introduced!
Check the Virtual SAN Health Check plugin regularly
CONFIDENTIAL 24
Case #2: Successful VM Deployment
Use the Health Check to do initial VM deployment check
Part of the Proactive Tests. This will verify if VMs can be created
on VSAN cluster
CONFIDENTIAL 25
Case #2: Successful VM Deployment
I created a new VM, but I am not sure where the VM is stored
Component host location
CONFIDENTIAL 26
Case #3: VM Availability in the Event of Failures• There are various failures that may be introduced as part of a typical POC
– Host failure– Flash device / Magnetic Disk failure – Cache/Capacity failures– Network failure
• The primary objective is to ensure that the VM continues to be available in the event of a failure. This might mean the VM is restarted on another node in the cluster.
• vSphere HA also has a role to play here. It is integrated with Virtual SAN.
CONFIDENTIAL 27
Case #3.1: Host Failures• How many hosts do I really need?• A minimum of 3 hosts is needed to support VSAN
• What about rebuilding after a failure or maintenance mode operations?
• If you want virtual machines to remain highly available on VSAN during these scenarios, consider configuring for additional capacity i.e. minimum 4 nodes
CONFIDENTIAL 28
Case #3.2: Storage Failures
• The Virtual SAN 6.0 Proof Of Concept Guide has details on how to inject temporary disk errors for the purpose of testing– A real disk failure results in immediate rebuild activity initiated by VSAN
Eject/Offline/Unplug: AbsentWait 60 minutes
before remediation
Failure: DegradedImmediate remediation
CONFIDENTIAL 29
Case #3.3: Network Failure
Part of the Proactive Tests. This will verify if multicast
performance is acceptable can for VSAN cluster
Multicast configuration is the most common issue
30
Case #3.4: Validating Rebuild Activity after Failure• Virtual SAN might need to move data around in the background: change policy, host failure, long
term/permanent component loss, user triggered reconfig, maintenance mode, etc.
• UI Resync Dashboard shows the VMs that are resyncing and remaining bytes to sync
Remember! Test one thing at a
time!
CONFIDENTIAL
CONFIDENTIAL 31
Case #4: VSAN Serviceability
I want to update one of my ESXi host in a VSAN cluster, what do I do?
VSAN provides multiple options for maintenance mode
CONFIDENTIAL 32
Case #4: VSAN Serviceability
Ensure Availability Full Data Migration No data MigrationLost of VM compliance Full VM Data compliance No VM availability ensured
Short time maintenance More than one hour of Maintenance
Short time maintenance
Short Storage preparation Long storage preparation No Impact
Limited Free Storage space required
Free Storage space requirements on the other nodes
No Impact
Case #4: Management – Disks ServiceabilityDisk serviceability feature enables identification of to be replaced magnetic disks and flash based
CONFIDENTIAL 33
34
Case #4: Management – Disk/Disk Group Evacuation
• Allows you to evacuate data from disk groups and individual disks before removing a disk/disk group from a Virtual SAN host
• Allows Virtual SAN to ensure all workloads stay fully compliant with their policy!• Supported in the UI, ESXCLI and RVC
• Check box in the “Remove disk/disk group” UI screen
CONFIDENTIAL
How to Measure Virtual SAN Performance?
How to Test Performance…• The distributed architecture of VMware Virtual SAN dictates that reasonable performance is
achieved when the pooled compute and storage resources in the cluster are well utilized
• This usually means a number of VMs each running the specified workload should be distributed in the cluster and run in a consistent manner to deliver aggregated performance
• This part of an evaluation can be complex and time-consuming
• Real application workloads are best, but …– synthetic workloads (IOmeter) might be easier to set up– simplistic workloads don’t really reflect what Virtual SAN can do
• Worth a read: Pro Tips For Storage Performance Testing– http://blogs.vmware.com/storage/2015/08/12/tips-storage-performance-testing/
CONFIDENTIAL 36
CONFIDENTIAL 37
Performance Testing Considerations
Is the test utilizing the distributed storage resources of Virtual SAN?
• Multiple VMs across multiple hosts will deliver better performance than a single VM on one host
Is the working set fully in cache, utilizing flash performance?
• Read-cache misses will incur latency
Is the workload cache friendly?
• Sustained sequential write workloads fill cache, which must then be destaged. Mixed R/W workloads are best
Is the cache warmed?
• Initial results from starts of tests will not be reflective of overall performance
Performance Considerations• Application
– Single vs. multiple workers– Working set size – is it all in cache?– Sequential workloads versus random workloads – cache friendly workload?– Outstanding I/Os – have you a decent queue depth on the storage controllers?– Block size – if synthetic, does it represent the typical application block size?– Guest file system considerations – raw or not?
• VSAN– Cache warm up considerations– Number of magnetic disk drives/striping considerations– Performance during failures and rebuild activity
CONFIDENTIAL 38
CONFIDENTIAL 39
Performance Test with IOmeter• Do NOT forget to warm the SSD before your performance test
• First test:– Single worker – < 8 Outstanding I/O – Write I/O Data Pattern will use repeating bytes – 4KB I/O size – 70% Read/30% Write – 100% Random
• Consider moving, over time, to:– multiple workers – multiple VMs – multiple hosts– Increasing OIO – latency versus IOPS
CONFIDENTIAL 40
Virtual SAN Health Check Plugin – Proactive Storage Tests• Run this performance test in a non-production environment
• It will create ~10-20 VMDKs per host which will be distributed by VSAN onto physical disks and then issue a synthetic IO workload on all VMDKs on all hosts in parallel
• A way to validate IOPs and bandwidth requirements
From PoC to ProductionDay 2 Operation Considerations
Considerations
HA/DR
Monitoring
Operations
Design for Scaling
• Stretched Cluster• Used of VR/SRM
• Setup Alarms • Used vROps• vSAN Health Plugin
• Maintenance Mode• Workflow• Third Party tools• SSD/HD rebuild
• Script install• Capacity planning
CONFIDENTIAL 43
Conducting a Successful Virtual SANProof of ConceptCormac Hogan, VMware, IncJulienne Pham, VMware, Inc
STO4572
#STO4572
CONFIDENTIAL 49
Case #4 : Other ways of monitoring VSAN Activity• VSAN Health Check Plugin
– Rerun tests and check if any of the many checks have failed– Any checks that have failure will also generate an alarm (new in 6.1 version)– Link to VMware KB via AskVMware to assist with troubleshooting
• vRealize Operations Management with storage pack for VSAN– Ships with a number of preconfigured dashboards– Surfaces up various events and warning that are specific to VSAN– Provides troubleshooting guidance
• vRealize Log Insight– Examines logs from VSAN events as well as VSAN traces
CONFIDENTIAL 50
Case #4 : Monitory VSAN Activity
Number of Virtual SAN Cluster
Virtual Machine ObjectTop Virtual SAN issues
Virtual SAN Alerts
VM Information through vROps
CONFIDENTIAL 51
Case #4 : Monitoring VSAN Activity
Magnetic disks used by this Virtual SAN Cluster
Storage Performance
Disk latencies through vROps
CONFIDENTIAL 52
Case #4 : Observing VSAN Activity
Host disconnected from the network
Impact of failure on VSAN, along with recommendations on
what to do next