cost-effective bc/dr with vmware site recovery manager (srm) and lefthand networks
DESCRIPTION
Cost-effective BC/DR with VMware Site Recovery Manager (SRM) and LeftHand Networks. Presented by Stephan Stelter, LeftHand Networks. Agenda. Introduction Definitions: Business Continuance, High Availability, Disaster Recovery, RPO, RTO - PowerPoint PPT PresentationTRANSCRIPT
Company Confidential
Cost-effective BC/DR with VMware Site Recovery Manager (SRM) and LeftHand NetworksPresented by Stephan Stelter, LeftHand Networks
Agenda
Introduction• Definitions: Business Continuance, High Availability, Disaster Recovery,
RPO, RTO• The impact of disasters and downtime; virtualization to the rescue!
Relevant LeftHand Networks Products and Features• Key features of LeftHand Networks SANs that provide BC/DR benefits to
VMware environments
Customer Examples• How are LeftHand Networks customers using VMware for BC/DR?
Introduction
Business Continuance, defined
“According to a recent Gartner Group document, a business continuance plan should include: 1) a disaster recovery plan, which specifies an organization's planned strategies for post-failure procedures2) a business resumption plan, which specifies a means of maintaining essential services at the crisis location3) a business recovery plan, which specifies a means of recovering business functions at an alternate location4) and a contingency plan, which specifies a means of dealing with external events that can seriously impact the organization.” – SearchStorage.com
High Availability, Disaster Recovery, RPO, RTO
High Availability - refers to a system or component that is continuously operational for a desirably long length of timeDisaster Recovery (plan) - describes how an organization is to deal with potential disasters; disaster recovery planning involves an analysis of business processes and continuity needs; it may also include a significant focus on disaster preventionRecovery Point Objective - the age of files that must be recovered for normal operations to resume if a system goes down as a result of a failureRecovery Time Objective - the maximum tolerable length of time that a computer, system, network, or application can be down after a failure or disaster occurs
The impact of disasters - do you have a plan?
• Every year, one out of 500 data centers will experience a severe disaster (McGladrey and Pullen)
• 43% of companies experiencing disasters never re-open, and 29% close within two years (McGladrey and Pullen)
• 93% of business that lost their data center for 10 days went bankrupt within one year (National Archives & Records Administration)
Type of Downtime Business Continuity ComponentUnplanned downtime
High Availability (HA)Planned downtime
Disasters Disaster Recovery (DR)
How can I simply automate my
disaster recovery plan?
When we speak with customers, many of them ask –
How can I affordably eliminate application
downtime and prevent data loss?
How can I test my disaster recovery plan quickly and easily?
How can I reduce recovery times from hours to minutes?
Virtualization to the rescue
Virtualization to the rescue?
Traditional Servers Virtualized Servers
One server failure, one application goes down One server failure, ALL applications goes down
• Application server consolidation onto fewer physical servers exposes users to more application downtime in event of a hardware failure
• Delivering high availability requires application and storage HA
Virtualization to the rescue!Distributed Resource
Scheduler (DRS) VMotion
XHigh Availability (HA) Consolidated Backup
(VCB)
But wait, there’s more!Distributed Resource
Scheduler (DRS) VMotion
XHigh Availability (HA) Consolidated Backup
(VCB)
• Simplifies and automates disaster recovery workflows:
• Setup, testing, failover• Turns manual recovery runbooks into
automated recovery plans• Provides central management of
recovery plans from VirtualCenter
VMware Site Recovery Manager
• Works with VMware Infrastructure to make disaster recovery rapid, reliable, manageable, affordable
Site Recovery Manager leverages VMware Infrastructure to deliver advanced disaster recovery management and automation
What is VMware Site Recovery Manager
Pre-programmed disaster responsesFinds replicated volumes to speed set upMaps VMs to replicated volumesDR plan change controlEnsures primary and remote site consistency
Automates volume snapshot for testAutomates testing, one click to test DR planNo impact to application availabilityIsolates network traffic with alternate VLAN / port groups
Coordinates application and storage failoverAutomates workflowAutomates promotion of remote volumesAutomates networking of VMsEnsures quality of service during/after failover
Set Up and Planning Testing Failover & Failback
Preparation Disaster Happens
Simplifies, coordinates, automates storage and application disaster recovery.• Simplifies set up and management of DR plans to lower DR cost.• Enables DR plan testing for storage and applications to ensure reliability.• Coordinates and automates storage and application failover for faster availability.
Disaster Recovery Solution
Storage
Servers
VMware Infrastructure
Virtual Machines
VirtualCenterSite
Recovery Manager
Storage
Servers
VMware Infrastructure
VirtualCenterSite
Recovery Manager
LeftHand Remote Copy
Site Recovery Manager
Protected virtual machines
Virtual Machines
Production Disaster Recovery
How Site Recovery Manager works
1. Pre-program your DR plan2. Test to ensure reliability3. Disaster strikes!
• Site failure is detected• Alert when heartbeat lost
• Initiate failover• User confirmation of outage• Granular failover initiation
• Manage replication failover• Break replication• Make replica visible to recovery hosts
• Execute recovery process• Use pre-programmed plan• Provide visibility into progress Question: What RTO have we achieved?
What RPO have we achieved?
Relevant LeftHand Networks Products and Features
LeftHand Networks, Inc.
Leader in iSCSI SANs• Pioneer in the IP SAN market, founded in 1999• Highly available, simple to manage, and “grow as
needed” architecture
Rapid market acceptance and growth• More than 10,000 installations; over 3,000 customers• Strategic VARs and resellers in North America and
Europe
Strategic industry partnerships• Microsoft, VMware, Citrix
Typical Storage Array Architecture
Monolithic Array• Not scalable
• Controller head Becomes bottleneck
• Scales capacity only• Single point of failure• Forklift upgrades• Provisioning capacity tends to
involve manipulating individual disks and RAID levels for each LUN or volume
Scale-up Storage
The LeftHand Networks Difference Scale-everything architecture pairs redundant hardware with enterprise-class features
SAN/iQ Storage ClusteringTrue clustering brings reliability, performance, and ease of management
Storage Cluster• Aggregates all components for performance• Data is load balanced across all nodes• Predictable scalability
Grow on Your Terms• Non-disruptive scalability• No forklift upgrades• Scale everything• Throttle Bandwidth
Create Tiers of Storage• Create a tiered environment for different performance
requirements• Online Volume Migration
Simple Centralized Management• Provisioning• Monitoring• Security
SAS
SATA
Centralized Management Console
SAN/iQ Network RAIDIntegrates Synchronous Replication with Automated Failover and Failback
Beyond Component Redundancy• Protects data from array failure• Synchronous Replication• Configure on a per-volume basis• Change RAID level on-the-fly
High Availability• Multiple disks, controllers, or
arrays• Zero disruption of data access• Ensures “high availability” for
data
SAN/iQ Cluster
A
B
C
D
A
B
C D
B
A
C
D
BA
C
D
SAN/iQ Multi-site SANReal-time protection from site failure
Protect Storage By:• Rack• Room• Floor• Building• Site
Keep Data Online During:• Facility disruption• Natural disaster• Site maintenance
SAN/iQ ClusterSAN/iQ Multi-site SAN
A D A D
BA BA
C B BC
C D C D
Volumes Remain Online
SAN/iQ Remote CopyTime and space-efficient asynchronous replication for disaster recovery and backups
Remote Copy• Asynchronous Replication
• Per volume basis• Scheduled or manual• Thin provisioned
• Simple to Manage• Bandwidth management• Failover / Failback Wizard
SAN 1
SAN 2
2:00
1:00
3:003:00
2:00
1:00 Baseline Copy
Incremental Copy
Incremental Copy
Recovery Server
SAN/iQ cluster within ESX• Highly Available storage across
multiple ESX systems• Shared storage for VMs
In the event of an ESX failure:• SAN/iQ keeps volume online• VMware HA will failover VMs
Full Featured Virtual SAN• SAN/iQ within an ESX virtual machine• Virtualizes an ESX server’s internal
disk resources• Significant storage footprint (up to 2TB)• Only SAN appliance on VMware
SAN/Storage HCL
Virtual SAN Appliance for VMware ESXHigh Availability for Server & Storage For Remote/Branch Offices
VSAVSAVSA
VSA as Remote Office / Branch Office Replication Client
• Cost effective DR solution• Provide HA for stranded sites• Replicate data with SAN/iQ
Remote Copy to central data center
SAN/iQ Cluster
VSA Cluster
VSA
VSA Cluster
VSAVSA VSA
LeftHand SAN Integration with Site Recovery Manager
Storage
Servers
VMware Infrastructure
Virtual Machines
VirtualCenterSite
Recovery Manager
Site Recovery Manager> Manages and monitors recovery plans> Tightly integrated with VirtualCenter
LeftHand Remote Copy> Storage Replication Adapter certified by VMware
LeftHand Remote Copy
VMware Infrastructure> Requires ESX Server 3.0.2 or later> Requires VirtualCenter 2.5 or later
LeftHand iSCSI SAN Storage> On VMware SAN/Storage Compatibility Guide
One of First Vendors With
Certified Adapter
Customer Examples
University of Maryland School of MedicineHA/DR Project
The fifth oldest medical school in the United States• Established in 1807
On the University of Maryland, Baltimore campus, the School of Medicine• Serves as the foundation for a large academic health center that combines medical education,
biomedical research, patient care and community service.
Recognized technology leadership within the University of Maryland• Adoption of Server and Storage Virtualization
The Challenge – Provide high availability & effective disaster recovery across geographically separated
data centers
SAN/iQ Cluster
SAN/iQ Multi-Site SAN and VMware ESX Cluster
VMware ESX HA Cluster
AB
CD
E F AB
CD
E F
Virtual Volume / LUN
A BC DE F
SAN/iQ Cluster is configured with equal storage in each siteESX cluster is configured with equal hosts in each siteSAN/iQ Network RAID replicates data between sites synchronouslyIn the event of a site failure SAN/iQ keeps volumes availableESX High Availability boots up virtual machines lost at the failed siteWhen the failed site comes back online ESX rebalances virtual machines (DRS)
6 Blocks
The Result: Reduced Unexpected Downtime From Hours To Seconds!
“Our solution combined the VMware HA feature with LeftHand’s Multi-Site SAN capability that synchronously replicates data between multiple sites.” says Jimmy Reid. “As a result, when we had a power outage affect one of our sites, the combined solution detected a failure within 15 seconds and restarted the virtual machines within a minute—as opposed to the several hours that would be needed for an administrator to physically go to the site and bring the servers online.”
Charlotte CountyServer and Storage Project
Project goals• Cost effective server and storage solution• Reduce physical server sprawl• Reduce operational expense requirements• Scalable• Survivable
Charlotte County IT Environment
900 Windows XP workstations (PC’s and laptops)60 Microsoft Standard and Enterprise 2000 and 2003 servers
• HP DL320 1U servers• IBM LS41 AMD Opteron blade servers
Applications• Exchange 2003, SQL 2000 SP4, SQL 2005• File servers housing dept. shared data, user home directories and misc flat file print
servers
VMware & LeftHand• VMware Infrastructure 3 with VMotion, DRS, HA• 54TB of LeftHand iSCSI SAN storage
Charlotte County Data Centers
27 kmTwo tiers of storage needed in each siteNeed both sites operational if link fails
Need RPO of zero if site disaster occurs
Single Mode Fiber10Gb Ethernet
Administration Building Public Safety Building
Murdock Administration Building
Public Safety Building
Murdock Bldg iSCSI SAN Cluster 1
Murdock Bldg iSCSI SAN Cluster 2
Public Safety Bldg iSCSI SAN Cluster 1
Public Safety Bldg iSCSI SAN Cluster 2
ESX Cluster
ESX Cluster
1
2
1
2
4 iSCSI SANStorage Clusters
Failover Manager
Failover Manager
SAS
SASSAS
SAS
SATA
SATASATA
SATA
10 GB Link
Current Results and Future Plans
Current Results• Migrated approximately 1300 Exchange mailboxes to new VMware based
Exchange servers connected to LeftHand iSCSI SAN (SAS based)• 12 – 15 Virtual Machines SQL 2005, Exchange, Flat File Servers• Tested fiber cut scenario, worked flawlessly
Future Plans• Continue migrating all physical servers to virtual servers attached to
LeftHand iSCSI SAN• Next phase will include dept. shared data and user’s home directory data
migrated to LeftHand iSCSI SAN (SATA based)
Summary
Benefits of VMware and LeftHand Networks for Business Continuance
High Availability• LeftHand’s Network RAID combined with VMware HA delivers superior
high availability• Simple to deploy and manage
Disaster Recovery• Site Recovery Manager and LeftHand SANs
• Certified solution• Simple Setup and Management• Fast, Automated Recovery• Easy Disaster Recovery Tests
Storage Disaster Recovery Capability Check List
CapabilityCertified with VMware Site Recovery Manager *Incremental storage failback Test failover does not interrupt replication No reserve space required at remote site Bandwidth management/throttling Remote replication bundled with SAN system software
Single Storage Replication Adapter for all products
Thank you!
Questions?