cost-effective bc/dr with vmware site recovery manager (srm) and lefthand networks

Company Confidential

Cost-effective BC/DR with VMware Site Recovery Manager (SRM) and LeftHand NetworksPresented by Stephan Stelter, LeftHand Networks

Agenda

Introduction• Definitions: Business Continuance, High Availability, Disaster Recovery,

RPO, RTO• The impact of disasters and downtime; virtualization to the rescue!

Relevant LeftHand Networks Products and Features• Key features of LeftHand Networks SANs that provide BC/DR benefits to

VMware environments

Customer Examples• How are LeftHand Networks customers using VMware for BC/DR?

Introduction

Business Continuance, defined

“According to a recent Gartner Group document, a business continuance plan should include: 1) a disaster recovery plan, which specifies an organization's planned strategies for post-failure procedures2) a business resumption plan, which specifies a means of maintaining essential services at the crisis location3) a business recovery plan, which specifies a means of recovering business functions at an alternate location4) and a contingency plan, which specifies a means of dealing with external events that can seriously impact the organization.” – SearchStorage.com

High Availability, Disaster Recovery, RPO, RTO

High Availability - refers to a system or component that is continuously operational for a desirably long length of timeDisaster Recovery (plan) - describes how an organization is to deal with potential disasters; disaster recovery planning involves an analysis of business processes and continuity needs; it may also include a significant focus on disaster preventionRecovery Point Objective - the age of files that must be recovered for normal operations to resume if a system goes down as a result of a failureRecovery Time Objective - the maximum tolerable length of time that a computer, system, network, or application can be down after a failure or disaster occurs

The impact of disasters - do you have a plan?

• Every year, one out of 500 data centers will experience a severe disaster (McGladrey and Pullen)

• 43% of companies experiencing disasters never re-open, and 29% close within two years (McGladrey and Pullen)

• 93% of business that lost their data center for 10 days went bankrupt within one year (National Archives & Records Administration)

Type of Downtime Business Continuity ComponentUnplanned downtime

High Availability (HA)Planned downtime

Disasters Disaster Recovery (DR)

How can I simply automate my

disaster recovery plan?

When we speak with customers, many of them ask –

How can I affordably eliminate application

downtime and prevent data loss?

How can I test my disaster recovery plan quickly and easily?

How can I reduce recovery times from hours to minutes?

Virtualization to the rescue

Virtualization to the rescue?

Traditional Servers Virtualized Servers

One server failure, one application goes down One server failure, ALL applications goes down

• Application server consolidation onto fewer physical servers exposes users to more application downtime in event of a hardware failure

• Delivering high availability requires application and storage HA

Virtualization to the rescue!Distributed Resource

Scheduler (DRS) VMotion

XHigh Availability (HA) Consolidated Backup

(VCB)

But wait, there’s more!Distributed Resource

Scheduler (DRS) VMotion

XHigh Availability (HA) Consolidated Backup

(VCB)

• Simplifies and automates disaster recovery workflows:

• Setup, testing, failover• Turns manual recovery runbooks into

automated recovery plans• Provides central management of

recovery plans from VirtualCenter

VMware Site Recovery Manager

• Works with VMware Infrastructure to make disaster recovery rapid, reliable, manageable, affordable

Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automation

What is VMware Site Recovery Manager

Pre-programmed disaster responsesFinds replicated volumes to speed set upMaps VMs to replicated volumesDR plan change controlEnsures primary and remote site consistency

Automates volume snapshot for testAutomates testing, one click to test DR planNo impact to application availabilityIsolates network traffic with alternate VLAN / port groups

Coordinates application and storage failoverAutomates workflowAutomates promotion of remote volumesAutomates networking of VMsEnsures quality of service during/after failover

Set Up and Planning Testing Failover & Failback

Preparation Disaster Happens

Simplifies, coordinates, automates storage and application disaster recovery.• Simplifies set up and management of DR plans to lower DR cost.• Enables DR plan testing for storage and applications to ensure reliability.• Coordinates and automates storage and application failover for faster availability.

Disaster Recovery Solution

Storage

Servers

VMware Infrastructure

Virtual Machines

VirtualCenterSite

Recovery Manager

Storage

Servers


VirtualCenterSite

Recovery Manager

LeftHand Remote Copy

Site Recovery Manager

Protected virtual machines

Virtual Machines

Production Disaster Recovery

How Site Recovery Manager works

1. Pre-program your DR plan2. Test to ensure reliability3. Disaster strikes!

• Site failure is detected• Alert when heartbeat lost

• Initiate failover• User confirmation of outage• Granular failover initiation

• Manage replication failover• Break replication• Make replica visible to recovery hosts

• Execute recovery process• Use pre-programmed plan• Provide visibility into progress Question: What RTO have we achieved?

What RPO have we achieved?

Relevant LeftHand Networks Products and Features

LeftHand Networks, Inc.

Leader in iSCSI SANs• Pioneer in the IP SAN market, founded in 1999• Highly available, simple to manage, and “grow as

needed” architecture

Rapid market acceptance and growth• More than 10,000 installations; over 3,000 customers• Strategic VARs and resellers in North America and

Europe

Strategic industry partnerships• Microsoft, VMware, Citrix

Typical Storage Array Architecture

Monolithic Array• Not scalable

• Controller head Becomes bottleneck

• Scales capacity only• Single point of failure• Forklift upgrades• Provisioning capacity tends to

involve manipulating individual disks and RAID levels for each LUN or volume

Scale-up Storage

The LeftHand Networks Difference Scale-everything architecture pairs redundant hardware with enterprise-class features

SAN/iQ Storage ClusteringTrue clustering brings reliability, performance, and ease of management

Storage Cluster• Aggregates all components for performance• Data is load balanced across all nodes• Predictable scalability

Grow on Your Terms• Non-disruptive scalability• No forklift upgrades• Scale everything• Throttle Bandwidth

Create Tiers of Storage• Create a tiered environment for different performance

requirements• Online Volume Migration

Simple Centralized Management• Provisioning• Monitoring• Security

SAS

SATA

Centralized Management Console

SAN/iQ Network RAIDIntegrates Synchronous Replication with Automated Failover and Failback

Beyond Component Redundancy• Protects data from array failure• Synchronous Replication• Configure on a per-volume basis• Change RAID level on-the-fly

High Availability• Multiple disks, controllers, or

arrays• Zero disruption of data access• Ensures “high availability” for

data

SAN/iQ Cluster

A

B

C

D

A

B

C D

B

A

C

D

BA

C

D

SAN/iQ Multi-site SANReal-time protection from site failure

Protect Storage By:• Rack• Room• Floor• Building• Site

Keep Data Online During:• Facility disruption• Natural disaster• Site maintenance

SAN/iQ ClusterSAN/iQ Multi-site SAN

A D A D

BA BA

C B BC

C D C D

Volumes Remain Online

SAN/iQ Remote CopyTime and space-efficient asynchronous replication for disaster recovery and backups

Remote Copy• Asynchronous Replication

• Per volume basis• Scheduled or manual• Thin provisioned

• Simple to Manage• Bandwidth management• Failover / Failback Wizard

SAN 1

SAN 2

2:00

1:00

3:003:00

2:00

1:00 Baseline Copy

Incremental Copy

Incremental Copy

Recovery Server

SAN/iQ cluster within ESX• Highly Available storage across

multiple ESX systems• Shared storage for VMs

In the event of an ESX failure:• SAN/iQ keeps volume online• VMware HA will failover VMs

Full Featured Virtual SAN• SAN/iQ within an ESX virtual machine• Virtualizes an ESX server’s internal

disk resources• Significant storage footprint (up to 2TB)• Only SAN appliance on VMware

SAN/Storage HCL

Virtual SAN Appliance for VMware ESXHigh Availability for Server & Storage For Remote/Branch Offices

VSAVSAVSA

VSA as Remote Office / Branch Office Replication Client

• Cost effective DR solution• Provide HA for stranded sites• Replicate data with SAN/iQ

Remote Copy to central data center

SAN/iQ Cluster

VSA Cluster

VSA

VSA Cluster

VSAVSA VSA

LeftHand SAN Integration with Site Recovery Manager

Storage

Servers


Virtual Machines

VirtualCenterSite

Recovery Manager

Site Recovery Manager> Manages and monitors recovery plans> Tightly integrated with VirtualCenter

LeftHand Remote Copy> Storage Replication Adapter certified by VMware

LeftHand Remote Copy

VMware Infrastructure> Requires ESX Server 3.0.2 or later> Requires VirtualCenter 2.5 or later

LeftHand iSCSI SAN Storage> On VMware SAN/Storage Compatibility Guide

One of First Vendors With

Certified Adapter

Customer Examples

University of Maryland School of MedicineHA/DR Project

The fifth oldest medical school in the United States• Established in 1807

On the University of Maryland, Baltimore campus, the School of Medicine• Serves as the foundation for a large academic health center that combines medical education,

biomedical research, patient care and community service.

Recognized technology leadership within the University of Maryland• Adoption of Server and Storage Virtualization

The Challenge – Provide high availability & effective disaster recovery across geographically separated

data centers

http://www.umd.edu/

SAN/iQ Cluster

SAN/iQ Multi-Site SAN and VMware ESX Cluster

VMware ESX HA Cluster

AB

CD

E F AB

CD

E F

Virtual Volume / LUN

A BC DE F

SAN/iQ Cluster is configured with equal storage in each siteESX cluster is configured with equal hosts in each siteSAN/iQ Network RAID replicates data between sites synchronouslyIn the event of a site failure SAN/iQ keeps volumes availableESX High Availability boots up virtual machines lost at the failed siteWhen the failed site comes back online ESX rebalances virtual machines (DRS)

6 Blocks

The Result: Reduced Unexpected Downtime From Hours To Seconds!

“Our solution combined the VMware HA feature with LeftHand’s Multi-Site SAN capability that synchronously replicates data between multiple sites.” says Jimmy Reid. “As a result, when we had a power outage affect one of our sites, the combined solution detected a failure within 15 seconds and restarted the virtual machines within a minute—as opposed to the several hours that would be needed for an administrator to physically go to the site and bring the servers online.”

http://www.umd.edu/

Charlotte CountyServer and Storage Project

Project goals• Cost effective server and storage solution• Reduce physical server sprawl• Reduce operational expense requirements• Scalable• Survivable

http://en.wikipedia.org/wiki/Image:Charlotte_County_Fl_Seal.jpg

http://en.wikipedia.org/wiki/Image:Map_of_Florida_highlighting_Charlotte_County.svg

Charlotte County IT Environment

900 Windows XP workstations (PC’s and laptops)60 Microsoft Standard and Enterprise 2000 and 2003 servers

• HP DL320 1U servers• IBM LS41 AMD Opteron blade servers

Applications• Exchange 2003, SQL 2000 SP4, SQL 2005• File servers housing dept. shared data, user home directories and misc flat file print

servers

VMware & LeftHand• VMware Infrastructure 3 with VMotion, DRS, HA• 54TB of LeftHand iSCSI SAN storage

Charlotte County Data Centers

27 kmTwo tiers of storage needed in each siteNeed both sites operational if link fails

Need RPO of zero if site disaster occurs

Single Mode Fiber10Gb Ethernet

Administration Building Public Safety Building

Murdock Administration Building

Public Safety Building

Murdock Bldg iSCSI SAN Cluster 1

Murdock Bldg iSCSI SAN Cluster 2

Public Safety Bldg iSCSI SAN Cluster 1

Public Safety Bldg iSCSI SAN Cluster 2

ESX Cluster

ESX Cluster

1

2

1

2

4 iSCSI SANStorage Clusters

Failover Manager

Failover Manager

SAS

SASSAS

SAS

SATA

SATASATA

SATA

10 GB Link

Current Results and Future Plans

Current Results• Migrated approximately 1300 Exchange mailboxes to new VMware based

Exchange servers connected to LeftHand iSCSI SAN (SAS based)• 12 – 15 Virtual Machines SQL 2005, Exchange, Flat File Servers• Tested fiber cut scenario, worked flawlessly

Future Plans• Continue migrating all physical servers to virtual servers attached to

LeftHand iSCSI SAN• Next phase will include dept. shared data and user’s home directory data

migrated to LeftHand iSCSI SAN (SATA based)

http://en.wikipedia.org/wiki/Image:Charlotte_County_Fl_Seal.jpg

Summary

Benefits of VMware and LeftHand Networks for Business Continuance

High Availability• LeftHand’s Network RAID combined with VMware HA delivers superior

high availability• Simple to deploy and manage

Disaster Recovery• Site Recovery Manager and LeftHand SANs

• Certified solution• Simple Setup and Management• Fast, Automated Recovery• Easy Disaster Recovery Tests

Storage Disaster Recovery Capability Check List

CapabilityCertified with VMware Site Recovery Manager *Incremental storage failback Test failover does not interrupt replication No reserve space required at remote site Bandwidth management/throttling Remote replication bundled with SAN system software

Single Storage Replication Adapter for all products

Thank you!

Questions?

cost-effective bc/dr with vmware site recovery manager (srm) and lefthand networks

Documents

business recovery plan

disaster recovery workflows

recovery times

business continuance

business resumption

contingency plan

severe disaster mcgladrey

application downtime