vsphere fault tolerance - rainfocus · target market of vsphere fault tolerance no big deal, create...

Yiting Jin, Product Management, VMwareJoe Bruneau, Systems Administrator, General MillsSebastian Neagu, Principal Engineer, United AirlinesRick Stopf, Product Marketing Manager, Honeywell

SER3107PU

#VMworld #SER3107PU

Running on Zero Downtime, Zero Data Loss: Real-Life Cases with vSphere Fault Tolerance Users

VMworld 2017 Content: Not fo

r publication or distri

bution

• This presentation may contain product features that are currently under development.

• This overview of new technology represents no commitment from VMware to deliver these features in any generally available product.

• Features are subject to change, and must not be included in contracts, purchase orders, or sales agreements of any kind.

• Technical feasibility and market demand will affect final delivery.

• Pricing and packaging for any new technologies or features discussed or presented have not been determined.

Disclaimer

2#SER107PU CONFIDENTIAL

bution

1,000 Host failures per year

bution

Target Market of vSphere Fault Tolerance

No big deal, create a new one

(“Cattle”)

Bring them back up, but HA restart is

enough

(“Pattle”)

Disastrously expensive if any data loss or

downtime.

SAVE AT ALL COST

(“Pets”)

e.g. apps monitoring acid chemical pools, apps tracking inventory and revenue generation

e.g. standard production VMs

e.g. test VMs

What happens when each type of workload starts going down?

“For everything else,

there’s HA”

“Workloads where I can’t afford to lose any

state or experience downtime”

0-RPO, 0-RTO > 0-RPO / RTO is okay

Tolerance… who cares?

#SER107PU CONFIDENTIAL 4

bution

What’s a vSphere Admin to do?

#SER107PU CONFIDENTIAL

Disastrously expensive if any

data loss or downtime.

SAVE AT ALL COST

(“Pets”)

e.g. apps monitoring acid / chemical pools, apps tracking inventory and revenue generation

1. Spend in-house resources building application protection –

for each type of mission-critical workload you have

2. Pay extra $$$ for third party solutions and support, spend time

training teams on the technology, add complexity to availability

management

3. “… nah, they’ll never go down.”

4. Enable vSphere Fault Tolerance – and not pay anything extra

bution

New in vSphere 6.5

• Performance improvements on maximum and average response time

– Reduced maximum latency from 100ms to 12ms, average of 1ms

• Multiple NIC aggregation for improved performance

– e.g. rather than dedicating 1 single 10 Gb NIC – aggregate multiple 10+ Gb NICs for FT network

• Interoperate with Distributed Resource Scheduler (DRS)

– DRS takes into consideration FT requirements in determining optimal initial host placement

bution

Using Fault Tolerance with VSAN (vSphere 6.0u1 and later)

• Fault-tolerant VSAN datastore in cluster

• Restart VMs from other hosts in a VSAN cluster

• Preserve storage policies across FT failovers

• Secondary FT VM can be placed on the same VSAN datastore as the primary

• FT primary VM and secondary VM are independent from any replicated VMs for VSAN

• FT and VSAN for Remote and Branch Offices (ROBO)

bution

Introduction of Panelists

▪ Joe Bruneau, Systems Administrator, Enterprise Infrastructure, General Mills

▪ Sebastian Neagu, Principal Engineer, United Airlines

▪ Rick Stopf, Product Marketing Manager, Honeywell

bution

History with VMware Products and Solutions

• Global footprint

• Number of datacenters, vCenters, hosts globally

bution

What Does One Minute of Downtime Mean to You?

• Elaborate on some past experiences when hardware failure was costly

• Host failures vs. Storage failures

bution

Describe Your Offerings and Future FT Enablement

• What kinds of applications and workloads do you protect today with Fault Tolerance?

• What are you looking to protect in the future?

bution

Alternate Solutions for Protecting Workloads

• Can you talk about alternative solutions and how your experience there was compared with FT?

• Ease of setup

• Zero downtime / zero data loss

• Ability to integrate with vSphere features such as vMotion, snapshots, backups

• Do you think differently about hardware failure?

• How was performance? Is the tradeoff between performance and zero-data loss, zero-downtime protection worth it?

bution

How Easy was it to Set Up Fault Tolerance?

• Setup through vSphere client

• Networking requirements: FT logging bandwidth

• Storage: redundant VMDKs

• Capacity planning and memory reservation

FT logging channel

Primary Secondary

Datastore A Datastore B

bution

Supported Scalability and Hardware Requirements

#SER107PU CONFIDENTIAL

4 vCPU / 64 GB vRAM per FT VM

8 vCPU / 128 GB vRAM per host

4 total FT VMs per host

16 virtual disks

Virtual disk size: 2 TB

10 Gb link for FT Logging Network + Multi-NIC

aggregation(dedicated 10Gb not required,

but recommended)

bution

Technology Preview

• Increasing to 8 vCPU / 128 GB vRAM per FT-protected VM

– Same host scalability: 8 vCPU of FT VMs per host, 4 total FT VMs per host

• Storage Failure Protection for Fault Tolerance

– Integration with VM Component Protection (VMCP)

– Storage APD / PDL failures will trigger FT failover instead of restarting VM. No data loss.

• End of support for Legacy Record & Replay (1-vCPU) Fault Tolerance

• Fault Tolerance with Site Recovery Manager

• Longer term: Stretched Cluster FT

– Collaboration with Distributed Resource Scheduler (DRS) team

bution

Summary

• Fault Tolerance provides Zero data loss, Zero downtime protection against host failures

• No extra licensing cost

• No need to change your applications

• Simple to manage with software

▪ FT integration with VSAN

• No extra shared storage setup needed

▪ Technology preview provides storage protection with Fault Tolerance, improved scalability

(to 8vCPU per FT VM)VMworld 2017 Content: N

ot for publicatio

n or distribution

bution

Related Sessions

Session Day / Time Session Type

ELW181107U – vSphere HTML Client SDK - Build a Plugin

WorkshopSunday, 1:30 pm – 3:00 pm Hands on Labs

SER3101PU – Acting as One: Plug in to vSphere Monday, 2:30 pm – 3:30 pm Panel Discussion

SER3100GU – Discuss Plug-In Experience with the vSphere

ClientTuesday, 11:30pm – 12:30 pm Group Discussion

SER1411BU – vSphere Clients Roadmap: HTML5 Client, Host

Client, and Web ClientTuesday, 1:00 pm – 2:00 pm Breakout

SER3084BU – Mind Your Foundation: Extending the Power of

the vSphere PlatformTuesday, 5:30 pm – 6:30 pm Breakout

SER3107PU – Running on Zero Downtime, Zero Data Loss:

Real-Life Cases with vSphere Fault Tolerance UsersWednesday, 8:30 am – 9:30 am Panel Discussion

SER1792GU – Discussion of vSphere Web Client (HTML5) and

the Transition ExperienceWednesday, 11:30 pm – 12:30 pm Group Discussion

SER2790BU – Journey to a vSphere HTML Client Ecosystem:

Deep Dive with Big Switch NetworksWednesday, 3:30 pm – 4:30 pm Breakout

bution

Follow us on Twitter: @VMwarevSphere@YitingJin

bution

Appendix

bution

Improved Fault Tolerance workflow

Simplifying protection for your VMs

Right-click on VM to turn on Fault Tolerance

bution

Select datastore for VM configuration files

bution

Select another host in the HA cluster to place the secondary VM

bution

Select another host in the HA cluster to place the secondary VM

bution

▪ Continuous Availability for all FT-protected workloads

▪ Protect mission critical applications from vSphere host failure

▪ RPO = 0 RTO = 0 No loss of TCP connections

▪ Any OS Any Application

▪ Support workloads on vSphere STD and above: 4 vCPU per VM, 64 GB vRAM. 8 vCPU per host,

with 4 FT VMs (total primary + secondary) per host

▪ Simple Configuration: Point and click to select VM to enable FT protection

FT logging channel

Primary Secondary

Fault Tolerance: Introduction

bution

▪ Separate VMX and VMDK files, changes to which are constantly mirrored to the secondary

▪ FT creates a second copy of VMDKs

• Can be located on separate datastores for further fault domain isolation

FT logging channel

Primary Secondary

Datastore A Datastore B

Redundant Storage

bution

FT logging channel

Primary SecondaryNew Primary

New Secondary

FT logging channel

▪ Failure Occurs: The secondary VM becomes the primary

▪ HA starts a new secondary VM on a new host

▪ HA initiates a new FT migration on the primary VM to set up the FT protection again

Failover

bution

In order for to get to Zero downtime,

Zero data loss:

▪ Any data generated by the primary will not be

transmitted to the outside world until that

data has been replicated completely to the

secondary

CONFIDENTIAL 34

Network

FT pair

(Why 0-downtime and 0-data loss isn’t free)

▪ Outgoing network packets are batched,

agreement between primary and secondary is

achieved, and packets are released en masse

every checkpoint

▪ This adds a varying degree of latency and jitter

to every network packet

Why Fault Tolerance adds Network Latency

bution

Best Practices and Hardware Requirements

▪ Requires Intel Sandy Bridge / AMD Bulldozer or later

▪ Improved performance on newer processor generations

▪ Recommend 10Gb NIC for a separate FT logging network

Configuration requirements

▪ VMs to be protected by FT must be in an HA cluster

▪ Shared storage for configuration file and tiebreaker (witness / arbiter) files so that the primary and secondary VMs can see the files.

▪ 2 separate VMDKs for redundancy: 1 for primary VM, 1 for secondary VM

▪ VMDKs can be local, but VMDKs on shared storage provide the advantage of multiple hosts being able to restart secondary VMs.

CONFIDENTIAL 35

bution

More info on FT in vSphere 6.0

▪ Best practices for deploying SMP-FT in vSphere 6:

http://www.vmware.com/techpapers/2015/performance-

best-practices-for-vmware-vsphere-60-10480.html

▪ vSphere 6 FT Performance Paper:

http://blogs.vmware.com/performance/2016/01/vsphere6

-fault-tolerance-perf.htmlVMworld 2017 Content: N

ot for publicatio

n or distribution

vsphere fault tolerance - rainfocus · target market of vsphere fault tolerance no big deal, create...

Documents

software fault tolerance the 1990s were to be the decade of...

a survey of software fault tolerance techniques · a survey...

microsoft sql server on vmware vsphere … among the virtual...

vmware vsphere 6 fault tolerance · technical white paper /...

vmware vsphere stretched cluster using ise active-active...

architecting microsoft sql server on vmware ......servers....

sto1167bu oracle on hci steroids: a complete end-to-end or...

leveraging vmware virtual san for highly available...

protecting hadoop with vmware vsphere 5 fault tolerance...

l-15 fault tolerance 1. fault tolerance terminology &...

byzantine fault tolerance

vmware vsphere 6 fault tolerance · vmware vsphere® fault...

fault tolerance

bco2874 vsphere high availability 5.0 and smp fault...

vmware fault tolerance recommendations and considerations...

fault tolerance - computer...

fault tolerance system

vmworld 2013: vmware vsphere fault tolerance for...

fault tolerance in mesos -...

vmware vsphere: what's new [v5.5 to v6] – защита и...