vsphere fault tolerance - rainfocus · target market of vsphere fault tolerance no big deal, create...
Post on 12-Jun-2020
1 Views
Preview:
TRANSCRIPT
Yiting Jin, Product Management, VMwareJoe Bruneau, Systems Administrator, General MillsSebastian Neagu, Principal Engineer, United AirlinesRick Stopf, Product Marketing Manager, Honeywell
SER3107PU
#VMworld #SER3107PU
Running on Zero Downtime, Zero Data Loss: Real-Life Cases with vSphere Fault Tolerance Users
VMworld 2017 Content: Not fo
r publication or distri
bution
• This presentation may contain product features that are currently under development.
• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.
• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.
• Technical feasibility and market demand will affect final delivery.
• Pricing and packaging for any new technologies or features discussed or presented have not been determined.
Disclaimer
2#SER107PU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
1,000 Host failures per year
VMworld 2017 Content: Not fo
r publication or distri
bution
Target Market of vSphere Fault Tolerance
No big deal, create a new one
(“Cattle”)
Bring them back up, but HA restart is
enough
(“Pattle”)
Disastrously expensive if any data loss or
downtime.
SAVE AT ALL COST
(“Pets”)
e.g. apps monitoring acid chemical pools, apps tracking inventory and revenue generation
e.g. standard production VMs
e.g. test VMs
What happens when each type of workload starts going down?
“For everything else,
there’s HA”
“Workloads where I can’t afford to lose any
state or experience downtime”
0-RPO, 0-RTO > 0-RPO / RTO is okay
Fault
Tolerance… who cares?
#SER107PU CONFIDENTIAL 4
VMworld 2017 Content: Not fo
r publication or distri
bution
What’s a vSphere Admin to do?
#SER107PU CONFIDENTIAL
Disastrously expensive if any
data loss or downtime.
SAVE AT ALL COST
(“Pets”)
e.g. apps monitoring acid / chemical pools, apps tracking inventory and revenue generation
1. Spend in-house resources building application protection –
for each type of mission-critical workload you have
2. Pay extra $$$ for third party solutions and support, spend time
training teams on the technology, add complexity to availability
management
3. “… nah, they’ll never go down.”
4. Enable vSphere Fault Tolerance – and not pay anything extra
5
VMworld 2017 Content: Not fo
r publication or distri
bution
6
VMworld 2017 Content: Not fo
r publication or distri
bution
New in vSphere 6.5
• Performance improvements on maximum and average response time
– Reduced maximum latency from 100ms to 12ms, average of 1ms
• Multiple NIC aggregation for improved performance
– e.g. rather than dedicating 1 single 10 Gb NIC – aggregate multiple 10+ Gb NICs for FT network
• Interoperate with Distributed Resource Scheduler (DRS)
– DRS takes into consideration FT requirements in determining optimal initial host placement
7#SER107PU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
Using Fault Tolerance with VSAN (vSphere 6.0u1 and later)
• Fault-tolerant VSAN datastore in cluster
• Restart VMs from other hosts in a VSAN cluster
• Preserve storage policies across FT failovers
• Secondary FT VM can be placed on the same VSAN datastore as the primary
• FT primary VM and secondary VM are independent from any replicated VMs for VSAN
• FT and VSAN for Remote and Branch Offices (ROBO)
#SER107PU CONFIDENTIAL 8
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
Introduction of Panelists
▪ Joe Bruneau, Systems Administrator, Enterprise Infrastructure, General Mills
▪ Sebastian Neagu, Principal Engineer, United Airlines
▪ Rick Stopf, Product Marketing Manager, Honeywell
VMworld 2017 Content: Not fo
r publication or distri
bution
History with VMware Products and Solutions
• Global footprint
• Number of datacenters, vCenters, hosts globally
#SER107PU CONFIDENTIAL 13
VMworld 2017 Content: Not fo
r publication or distri
bution
What Does One Minute of Downtime Mean to You?
• Elaborate on some past experiences when hardware failure was costly
• Host failures vs. Storage failures
#SER107PU CONFIDENTIAL 14
VMworld 2017 Content: Not fo
r publication or distri
bution
Describe Your Offerings and Future FT Enablement
• What kinds of applications and workloads do you protect today with Fault Tolerance?
• What are you looking to protect in the future?
#SER107PU CONFIDENTIAL 15
VMworld 2017 Content: Not fo
r publication or distri
bution
Alternate Solutions for Protecting Workloads
• Can you talk about alternative solutions and how your experience there was compared with FT?
• Ease of setup
• Zero downtime / zero data loss
• Ability to integrate with vSphere features such as vMotion, snapshots, backups
• Do you think differently about hardware failure?
• How was performance? Is the tradeoff between performance and zero-data loss, zero-downtime protection worth it?
VMworld 2017 Content: Not fo
r publication or distri
bution
How Easy was it to Set Up Fault Tolerance?
• Setup through vSphere client
• Networking requirements: FT logging bandwidth
• Storage: redundant VMDKs
• Capacity planning and memory reservation
#SER107PU CONFIDENTIAL 17
FT logging channel
Primary Secondary
Datastore A Datastore B
VMworld 2017 Content: Not fo
r publication or distri
bution
Supported Scalability and Hardware Requirements
#SER107PU CONFIDENTIAL
4 vCPU / 64 GB vRAM per FT VM
8 vCPU / 128 GB vRAM per host
4 total FT VMs per host
16 virtual disks
Virtual disk size: 2 TB
10 Gb link for FT Logging Network + Multi-NIC
aggregation(dedicated 10Gb not required,
but recommended)
18
VMworld 2017 Content: Not fo
r publication or distri
bution
Technology Preview
• Increasing to 8 vCPU / 128 GB vRAM per FT-protected VM
– Same host scalability: 8 vCPU of FT VMs per host, 4 total FT VMs per host
• Storage Failure Protection for Fault Tolerance
– Integration with VM Component Protection (VMCP)
– Storage APD / PDL failures will trigger FT failover instead of restarting VM. No data loss.
• End of support for Legacy Record & Replay (1-vCPU) Fault Tolerance
• Fault Tolerance with Site Recovery Manager
• Longer term: Stretched Cluster FT
– Collaboration with Distributed Resource Scheduler (DRS) team
19#SER107PU CONFIDENTIAL
VMworld 2017 Content: Not fo
r publication or distri
bution
20
VMworld 2017 Content: Not fo
r publication or distri
bution
Summary
21
• Fault Tolerance provides Zero data loss, Zero downtime protection against host failures
• No extra licensing cost
• No need to change your applications
• Simple to manage with software
▪ FT integration with VSAN
• No extra shared storage setup needed
▪ Technology preview provides storage protection with Fault Tolerance, improved scalability
(to 8vCPU per FT VM)VMworld 2017 Content: N
ot for publicatio
n or distribution
Q & A
VMworld 2017 Content: Not fo
r publication or distri
bution
Related Sessions
23
Session Day / Time Session Type
ELW181107U – vSphere HTML Client SDK - Build a Plugin
WorkshopSunday, 1:30 pm – 3:00 pm Hands on Labs
SER3101PU – Acting as One: Plug in to vSphere Monday, 2:30 pm – 3:30 pm Panel Discussion
SER3100GU – Discuss Plug-In Experience with the vSphere
ClientTuesday, 11:30pm – 12:30 pm Group Discussion
SER1411BU – vSphere Clients Roadmap: HTML5 Client, Host
Client, and Web ClientTuesday, 1:00 pm – 2:00 pm Breakout
SER3084BU – Mind Your Foundation: Extending the Power of
the vSphere PlatformTuesday, 5:30 pm – 6:30 pm Breakout
SER3107PU – Running on Zero Downtime, Zero Data Loss:
Real-Life Cases with vSphere Fault Tolerance UsersWednesday, 8:30 am – 9:30 am Panel Discussion
SER1792GU – Discussion of vSphere Web Client (HTML5) and
the Transition ExperienceWednesday, 11:30 pm – 12:30 pm Group Discussion
SER2790BU – Journey to a vSphere HTML Client Ecosystem:
Deep Dive with Big Switch NetworksWednesday, 3:30 pm – 4:30 pm Breakout
VMworld 2017 Content: Not fo
r publication or distri
bution
VMworld 2017 Content: Not fo
r publication or distri
bution
Follow us on Twitter: @VMwarevSphere@YitingJin
VMworld 2017 Content: Not fo
r publication or distri
bution
Appendix
VMworld 2017 Content: Not fo
r publication or distri
bution
Improved Fault Tolerance workflow
27
Simplifying protection for your VMs
Right-click on VM to turn on Fault Tolerance
VMworld 2017 Content: Not fo
r publication or distri
bution
Improved Fault Tolerance workflow
28
Simplifying protection for your VMs
Right-click on VM to turn on Fault Tolerance
Select datastore for VM configuration files
VMworld 2017 Content: Not fo
r publication or distri
bution
Improved Fault Tolerance workflow
29
Simplifying protection for your VMs
Right-click on VM to turn on Fault Tolerance
Select datastore for VM configuration files
Select another host in the HA cluster to place the secondary VM
VMworld 2017 Content: Not fo
r publication or distri
bution
Improved Fault Tolerance workflow
30
Simplifying protection for your VMs
Right-click on VM to turn on Fault Tolerance
Select datastore for VM configuration files
Select another host in the HA cluster to place the secondary VM
VMworld 2017 Content: Not fo
r publication or distri
bution
▪ Continuous Availability for all FT-protected workloads
▪ Protect mission critical applications from vSphere host failure
▪ RPO = 0 RTO = 0 No loss of TCP connections
▪ Any OS Any Application
▪ Support workloads on vSphere STD and above: 4 vCPU per VM, 64 GB vRAM. 8 vCPU per host,
with 4 FT VMs (total primary + secondary) per host
▪ Simple Configuration: Point and click to select VM to enable FT protection
FT logging channel
Primary Secondary
Fault Tolerance: Introduction
VMworld 2017 Content: Not fo
r publication or distri
bution
32
▪ Separate VMX and VMDK files, changes to which are constantly mirrored to the secondary
▪ FT creates a second copy of VMDKs
• Can be located on separate datastores for further fault domain isolation
FT logging channel
Primary Secondary
Datastore A Datastore B
Redundant Storage
VMworld 2017 Content: Not fo
r publication or distri
bution
FT logging channel
Primary SecondaryNew Primary
New Secondary
FT logging channel
▪ Failure Occurs: The secondary VM becomes the primary
▪ HA starts a new secondary VM on a new host
▪ HA initiates a new FT migration on the primary VM to set up the FT protection again
Failover
VMworld 2017 Content: Not fo
r publication or distri
bution
In order for to get to Zero downtime,
Zero data loss:
▪ Any data generated by the primary will not be
transmitted to the outside world until that
data has been replicated completely to the
secondary
CONFIDENTIAL 34
Network
FT pair
(Why 0-downtime and 0-data loss isn’t free)
▪ Outgoing network packets are batched,
agreement between primary and secondary is
achieved, and packets are released en masse
every checkpoint
▪ This adds a varying degree of latency and jitter
to every network packet
Why Fault Tolerance adds Network Latency
VMworld 2017 Content: Not fo
r publication or distri
bution
Best Practices and Hardware Requirements
▪ Requires Intel Sandy Bridge / AMD Bulldozer or later
▪ Improved performance on newer processor generations
▪ Recommend 10Gb NIC for a separate FT logging network
Configuration requirements
▪ VMs to be protected by FT must be in an HA cluster
▪ Shared storage for configuration file and tiebreaker (witness / arbiter) files so that the primary and secondary VMs can see the files.
▪ 2 separate VMDKs for redundancy: 1 for primary VM, 1 for secondary VM
▪ VMDKs can be local, but VMDKs on shared storage provide the advantage of multiple hosts being able to restart secondary VMs.
CONFIDENTIAL 35
VMworld 2017 Content: Not fo
r publication or distri
bution
More info on FT in vSphere 6.0
▪ Best practices for deploying SMP-FT in vSphere 6:
http://www.vmware.com/techpapers/2015/performance-
best-practices-for-vmware-vsphere-60-10480.html
▪ vSphere 6 FT Performance Paper:
http://blogs.vmware.com/performance/2016/01/vsphere6
-fault-tolerance-perf.htmlVMworld 2017 Content: N
ot for publicatio
n or distribution
top related