implementing a holistic bc/dr strategy with vmware · strategy with vmware roberto barbero solution...

Post on 25-Jul-2020

8 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

© 2014 VMware Inc. All rights reserved.

Implementing a Holistic BC/DR Strategy with VMware

Roberto BarberoSolution ArchitectVMware vForum, 2014

What’s on the agenda?

• Defining the problem

• Definitions

• VMware technologies that provide BC and DR

– vSphere HA and App HA

– vSphere FT

– vSphere Data Protection / Advanced

– vCenter Availability

– vSphere Replication

– vCenter Site Recovery Manager (SRM)

– vCenter Infrastructure Navigator (VIN)

• Find out more

IT Business Continuity

Is It a Real Problem?

Is It a Real Problem?

Is It a Real Problem?

UK Bank group

An outage in June 2012 affected millions of

customers from receiving or making

payments and lasted for almost an entire

week.

£125 Million

What’s the Difference?

Disaster

Avoidance

Disaster

Recovery

Planned vs.

Unplanned

Disaster Recovery vs. Business Continuity

Example: Tuesday, August 23, 2011 at 1:51 PM EDT - Magnitude 5.8 earthquake near Mineral, Virginia

Disaster recovery required?

No

Interruption to business continuance?

YES!

Fault Tolerance vs. High Availability

• Fault tolerance

– Ability to recover from component loss

– Example: Hard drive failure

• High availability

Uptime percentage in one year Downtime in one year

99 3.65 days

99.9 8.76 hours

99.99 52 minutes

99.999 “five nines” 5 minutes

X

RTO, RPO, and MTD

• Recovery Time Objective (RTO)

– How long it should take to recover

• Recovery Point Objective (RPO)

– Amount of data loss that can be incurred

• Maximum Tolerable Downtime (MTD)

– Downtime that can occur before significant loss is incurred

– Examples: Financial, reputation

Making an Application Service Highly Available

• vSphere HA

• NEW: vSphere App HA

VMware vFabric™ tc Server

vSphere App HANew

Policy-based

Protect off-the-shelf apps

vSphere App HA

vSphere HA Cluster

vFabric Hyperic

Virtual Appliance

vSphere App HA

Virtual Appliance

Hyperic Agents

Running in VMs

vCenter

Server

vSphere vSphere vSphere vSphere

New

vSphere App HANew

vSphere HA – Keep In Mind…

• RTO – measured in minutes (not seconds)

• Requires shared storage

• Best practices

– Use admission control – percentage policy

– Test post-failure performance with host maintenance mode

– Isolation response – leave powered on

– Network and storage redundancy

vSphere Fault Tolerance (FT)

• Zero recovery time, data loss

– Host hardware failure only

– Does not protect against OS and application failure

• Works fine with HA, App HA

• Why not FT?

– Resource requirements – does workload really need it?

– VM has multiple CPUs

– No VM snapshots – backups require agent

Data Protection (Backup and Restore)

• Agents? No Agents? – Both!

– No agents for majority of workloads – keep it simple

– Agents for certain apps

• vSphere Data Protection (VDP) Advanced

– Backup and recovery for VMware, from VMware

– Based on proven, mature EMC Avamar™

– Agent-less VM backup and restore

– Agents for granular tier-1 application protection

vSphere Data ProtectionNew

VDP Advanced – Keep In Mind…

• Engineered for SMB environments

• Uses VADP – VM snapshots, CBT

• Utilizes Windows VSS in VMware Tools

• Works fine with HA, not with FT

• RDM – virtual yes, physical no

• Is it DR?

– Maybe – depends on RTO, RPO

– Needs replication offsite, right?

VDP Advanced – Keep In Mind…

• Best Practices

– Prepopulate DNS, always use FQDN

– Manage VM snapshots

– Avoid deploying to slow storage

– Do not power-off, always shut down gracefully

– Do not schedule backups during maintenance window

vCenter Availability

• Run vCenter Server application in a VM

• Run vCenter Server database in a VM

• Run both in same VM?

• Protect with vSphere HA

– vCenter and DB VM restart priority set to High

– Enable guest OS and App monitoring

• App HA can protect SQL Server database

vCenter Availability

• Back up vCenter Server VM and database

– Image-level backup for vCenter Server VM

– App-level backup using agent for database backup

• Why not FT for vCenter Server?

– vCenter Server requires minimum of 2 vCPUs

– FT does not protect against application failure

• Replicate vCenter Server, database VMs?

vSphere Replication – DR

• Native tool built into the platform

• Per-VM hypervisor replication, managed in VC

Selectable RPO from 15 min up

to 24 hours

Selectable destination

datastore (Disk-type agnostic)

Replication Across Sites

vCenter Server

ESXi

NFC

VRA

ESXi

NFC

VRA

ESXi

NFC

VRA

StorageStorage

(VMDK1)

vCenter Server

ESXi

NFC

VRA

ESXi

NFC

VRA

ESXi

NFC

VRA

VR

ApplianceVR

Appliance

StorageStorage

VMDK1

vCenter Server vCenter Server

Four Steps for Full Recovery

Right-click, select “Recover”

Select a target folder

Select a target resource

Click Finish

Will validate your choices as you go

New Feature – Retain Historical Replicas

vSphere

VR Agent

After recovery, use the snapshot manager to revert

to earlier points

Retention of

multiple points

in time allows

reversion to

earlier known

good states

MPIT Presented as VM Snapshots after Failover

Use the snapshot manager to revert to earlier points, an interface

all administrators have been comfortable with for many years.

vSphere Replication – Interoperability

Fault tolerance –

Doesn’t work with VR

• FT conflicts at the

vSCSI disk filter level.

VDP

• Mostly no problem!

• If using VSS… ensure

you are using 5.5!!

HA, vMotion, DRS

Storage vMotion

and Storage DRS

• Now supported!

SRM

• A Disaster Recovery engine

• A tool that uses externally replicated data (VR or array based) to speed the RTO of a BCP

• A product that allows for DR to be tested, automated, planned, repeatable and customizable

What is it?

• A replication engine

• A tool for systems that need near-instant RPO

• A disaster avoidance stretched cluster

What is it not?

Key Components of SRM

Replication

vCenter Server

SRM Server

One vCenter Server

(Windows or VCVA) per

site, same versions

One SRM Server per

site, same versions

vSphere hosts,

recommend same

versions per site (pre

vSphere 5.x only if using

array replication)

vSphere Essentials Plus and higher editions supported

vCenter Server

SRM Replication Options

• SRM can utilize BOTH array based

AND vSphere Replication

• SRM will “see” existing standalone

vSphere

Replication protected VMs

• SRM can install vSphere

Replication from scratch

if needed

HubLUN 2

Web

Multi-tier App

DB

App

vSphere Replication

Storage-based Replication

LUN 1

Web

DB

App

Multi-tier App

Recovery Workflows

• User defined recovery plan

• Minimize errors

Failover Automation

• Isolated test environment

• Increase confidence in DR process

Non-disruptive Failover Testing

• Zero data loss

• Operational migration

Planned Migration

• Re-protect VM’s, migrate back

Failback Automation

SRM Interoperability

• Works with VR –and- ABR

• Backups, VADP or other

are fine

• HA is no problem at all

• vMotion and DRS are fine

• Storage vMotion and

Storage DRS – Sort of…

– Replication Dependent

• FT is “yellow”

– Array replicated only and the

FT status is not recovered

• Web vs vSphere Client

SRM – A Few Best Practices

Big ones:

Storage Layout

Test Network Configuration

Test often!

Size vCenter correctly

Biggest one:

Do a Business Impact Analysis

RPO, RTO, Cost of downtime, interdependencies, criticality of applications, priorities, units of failover, overlooked externalities, executive buy-in, …..

Protection Groups (PGs)

• More PGs = more granular testing/failover

– DR testing is easier – fewer resource requirements

– Fail-over only what is needed

– More configuration/complexity

• Less protection groups = less complex

– Fewer LUNs, PGs, recovery plans

– Less flexibility

Fewer LUNs/PGs

Less complexity

Less flexibility

More LUNs/PGs

More complexity

More flexibility

Majority of outages

are partial (not entire

data center) – design

accordingly

Test Network

– Use VLAN or isolated network for test environment

• Default “Auto” setting does not allow VM communication between hosts

– Different vSwitch can be specified in SRM for test versus run

• Specified in Recovery Plan

vSphere Infrastructure Navigator

VMware – Multiple Levels of Protection

SQL

vSphere HA/FT

Site A

VMware – Multiple Levels of Protection

SQL

vSphere HA/FT

VDPA

Site A

VMware – Multiple Levels of Protection

SQL

vSphere HA/FT

VR/SRMSQL

VDPA

Site A Site B

Find Out More

• Take an online hands on lab

• Ask for a demo

• Install 60-day evaluation

http://labs.hol.vmware.com/

Thank You

top related