failover clustering & hyper-v: multisite disaster recovery prakash gopinadham support escalation...
TRANSCRIPT
Failover Clustering & Hyper-V: Multisite Disaster Recovery
Prakash GopinadhamSupport Escalation Engineer
Microsoft Corporation
Multi-Site Clustering Content
Design guide: http://technet.microsoft.com/en-us/library/dd197430.aspxDeployment guide/checklist: http://technet.microsoft.com/en-us/library/dd197546.aspxCustomer case studies using multi-site clustering:http://blogs.msdn.com/b/clustering/archive/2009/11/04/9917628.aspx
Multi-Site Clustering
Introduction
Networking
Storage
Quorum
Defining High-Availability
But what if there is a catastrophic event and you lose the entire datacenter?
Site A
High-Availability allows applications
to maintain service availability bymoving them between nodes in a cluster
Defining Disaster Recovery
Disaster Recovery (DR) allows applications to maintain service availability by moving them to a cluster node in a different physical location
Site B
Node is located at a physically separate site
SAN
Site A
Site A
Site B
Benefits of a Multi-Site Cluster
Protects against loss of an entire locationPower Outage, Fires, Hurricanes, Floods, Earthquakes, Terrorism
Automates failoverReduced downtimeLower complexity disaster recovery plan
What is the primary reason why DR solutions fail?
Dependence on People
Multi-Site Clustering
Introduction
Networking
Storage
Quorum
Stretching the Network Longer distance traditionally means greater network latency Missed inter-node health checks can cause false failover Cluster heartbeating is fully configurable
– SameSubnetDelay (default = 1 second)• Frequency heartbeats are sent
– SameSubnetThreshold (default = 5 heartbeats)• Missed heartbeats before an interface is considered down
– CrossSubnetDelay (default = 1 second)• Frequency heartbeats are sent to nodes on dissimilar subnets
– CrossSubnetThreshold (default = 5 heartbeats)• Missed heartbeats before an interface is considered down to
nodes on dissimilar subnets
– Command Line: Cluster.exe /prop– PowerShell (R2): Get-Cluster | fl *
Security over the WAN
• Encrypt inter-node communication• Trade-off security versus performance
– 0 = clear text– 1 = signed (default)– 2 = encrypted
10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Site A Site B
Network Considerations
Network Deployment Options:1. Stretch VLANs across sites2. Cluster nodes can reside in different subnets
Public Network
10.10.10.1 20.20.20.1
30.30.30.1 40.40.40.1
Redundant Network
Site A
Site B
DNS Considerations Nodes in dissimilar subnets VM obtains new IP address Clients need that new IP Address from DNS to reconnect
10.10.10.111 20.20.20.222
DNS Server 1 DNS Server 2DNS Replication
Record Created
VM = 10.10.10.111
Record Updated
VM = 20.20.20.222
Record Updated
Record Obtained
Site A Site B
Faster Failover for Multi-Subnet Clusters
• RegisterAllProvidersIP (default = 0 for FALSE)– Determines if all IP Addresses for a Network Name will be registered by DNS– TRUE (1): IP Addresses can be online or offline and will still be registered– Ensure application is set to try all IP Addresses, so clients can come online
quicker
• HostRecordTTL (default = 1200 seconds)– Controls time the DNS record lives on client for a cluster network name– Shorter TTL: DNS records for clients updated sooner– Exchange Server 2007 recommends a value of five minutes (300 seconds)
Solution #1: Local Failover First
Configure local failover fist for high availability– No change in IP addresses– No DNS replication issues– No data going over the WAN
Cross-site failover for disaster recovery
10.10.10.111
DNS Server 1
10.10.10.111
20.20.20.222
Site A Site B
Solution #2: Stretch VLANs
Deploying a VLAN minimizes client reconnection times– IP of the VM never changes
DNS Server 1 DNS Server 2
FS = 10.10.10.111
10.10.10.111
VLAN
Site A
Site B
Solution #3: Abstraction in Networking Device
• Networking device uses independent 3rd IP Address• 3rd IP Address is registered in DNS & used by client
10.10.10.111 20.20.20.222
DNS Server 1
DNS Server 2
VM = 30.30.30.30
30.30.30.30
Site A
Site B
Multi-Site Clustering
Introduction
Networking
Storage
Quorum
Storage in Multi-Site Clusters
Different than local clusters:– Multiple storage arrays – independent per site– Nodes commonly access own site storage– No ‘true’ shared disk visible to all nodes
Site B
SAN
Site A Site B
Site A Site B
Storage Considerations
Site A
Changes are made on Site A and replicated to Site B
DR requires data replication mechanism between sites
Site B
SAN
Site A Site B
Replica
Site BSite A
Replication Partners
Hardware storage-based replication• Block-level replication
Software host-based replication• File-level replication
Appliance replication• File-level replication
Synchronous Replication
Host receives “write complete” response from the storage after the data is successfully written on both storage devices
PrimaryStorage
SecondaryStorage
WriteComplete
Replication
Acknowledgement
WriteRequest
Site A Site B
Asynchronous Replication
• Host receives “write complete” response from the storage after the data is successfully written to just the primary storage device, then replication
Primary Storage
Secondary Storage
WriteComplete
WriteRequest
Replication
Site A Site B
Synchronous versus Asynchronous
Synchronous Asynchronous
No data loss Potential data loss on hard failures
Requires high bandwidth/low latency connection
Enough bandwidth to keep up with data replication
Stretches over shorter distances
Stretches over longer distances
Write latencies impact application performance
No significant impact on application performance
Cluster Validation with Replicated Storage
Multi-Site clusters are not required to pass the Storage tests to be supported
Validation Guide and Policyhttp://go.microsoft.com/fwlink/?LinkID=119949
Challenges of Block Storage Replication
Storage block level replication typically Uni-Directional
(per LUN)• Change blocks flow from source to remote• Possible to have different LUNs replicating in different
directions• Storage cannot enforce block level collision resolution• Application must determine resolution, or be coordinated in
some way Applications today implement shared nothing
model• Surfacing storage as R/W at multiple sites is only useful if
applications can handle a distributed access device• Few applications implement the necessary supportObvious exception is Cluster Shared Volumes for Hyper-V
Multi-Site Clustering
Introduction
Networking
Storage
Quorum
Quorum Overview
• Disk only (not recommended)• Node and Disk majority
• Node majority• Node and File Share majority
VoteVote Vote Vote Vote
Majority is greater than 50%Possible Voters:
Nodes (1 each) + 1 Witness (Disk or File Share) 4 Quorum Types
Replicated Disk Witness
• A witness is a tie breaker when nodes lose network connectivity– The witness disk must be a single decision maker, or problems can
occur• Do not use a Disk Witness in multi-site clusters unless directed by vendor
Replicated Storage
?Vote Vote Vote
Node Majority
Cross site network
connectivity broken!
can I communicate with majority of the nodes in the cluster?
Can I communicate with majority of the nodes in
the cluster?
5 Node Cluster: Majority = 3
Majority in Primary Site
Site A Site B
Yes, then Stay Up No, drop out of Cluster Membership
Node Majority
Disaster at Site 1
Can I communicate with majority of the nodes in the cluster?
No, drop out of Cluster Membership
5 Node Cluster: Majority = 3
Need to force quorum
manually
Site A
We are down!
Majority in Primary Site
Site A Site B
Forcing Quorum
Forcing quorum is a way to manually override and start a node even if the cluster does not have quorum– Important: understand why quorum was lost– Cluster starts in a special “forced” state– Once majority achieved, drops out of “forced” state
Command Line:• net start clussvc /fixquorum (or /fq)
PowerShell (R2):• Start-ClusterNode –FixQuorum (or –fq)
Multi-Site with File Share Witness
Site C (branch office)
Complete resiliency and automatic recovery from the loss of any 1 site \\Foo\Share
WAN
File Share Witness
Site A Site B
File Share Witness
Multi-Site with File Share Witness
\\Foo\Share
WAN
Complete resiliency and automatic recovery from the loss of connection between sites
Can I communicate with majority of the nodes in the cluster?
Can I communicate with majority of
the nodes (+FSW) in the cluster?
Site C (branch office)
Site A Site B
No (lock failed), drop out of Cluster Membership
Yes, then Stay Up
File Share Witness (FSW) Considerations
Simple Windows File Server Single file server can serve as a witness for
multiple clusters – Each cluster requires it’s own share– Can be made highly available on a separate cluster
Recommended to be at 3rd separate site for DR FSW cannot be on a node in the same cluster FSW should not be in a VM running on the same cluster
Quorum Model Recap
• Even number of nodes• Highest availability solution has
FSW in 3rd site
Node and File Share Majority
• Odd number of nodes• More nodes in primary siteNode Majority
• Use as directed by vendorNode and Disk Majority
• Not Recommended• Use as directed by vendor
No Majority: Disk Only
Session Summary
Multi-site Failover Clusters have many benefits You can achieve high-availability and disaster recover in a
single solution using Windows Server Failover Clustering
Multi-site clusters have additional considerations:• Determine network topology across sites• Choose a storage replication solution• Plan quorum model & nodes
Failover Clustering Resources
• Design for a Clustered Service or Application in a Multi-Site Failover Cluster http://technet.microsoft.com/en-us/library/dd197430(WS.10).aspx
• Checklist: Setting Up a Clustered Service or Application in a Multi-Site Failover Cluster http://technet.microsoft.com/en-us/library/dd197546(WS.10).aspx
• Cluster Information Portal: http://www.microsoft.com/windowsserver2008/en/us/clustering-home.aspx
• Clustering Technical Resources: http://www.microsoft.com/windowsserver2008/en/us/clustering-resources.aspx
• Clustering Forum (2008): http://forums.technet.microsoft.com/en-US/winserverClustering/threads/
• http://social.technet.microsoft.com/Forums/en-US/windowsserver2008r2highavailability/threads/
• R2 Cluster Features: http://technet.microsoft.com/en-us/library/dd443539.aspx
ResourcesSoftware Application
Developers
http://msdn.microsoft.com/
Infrastructure Professionals
http://technet.microsoft.com/
msdnindia technetindia @msdnindia @technetindia
© 2011 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.
The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and
Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.