matt hollingsworth principal program manager microsoft corporation dat303
TRANSCRIPT
Building a High Availability Strategy for Your Enterprise Using Microsoft SQL Server 2008
Matt HollingsworthPrincipal Program ManagerMicrosoft CorporationDAT303
Agenda
Introduction to High Availability and Disaster Recovery
SQL Server Always On Technologies
Developing Your Availability Solution
Conclusion
Introduction to High Availability and Disaster Recovery
DefinitionsIntroduce key terms and concepts
Business Continuity PlanningOverview of the BCP process
SQL Server High Availability PlanningHow does BCP apply to SQL Server availability?
High Availability and Disaster Recovery: Definition
High AvailabilityHigh availability is a system design protocol and associated implementation that ensures a certain absolute degree of operational continuity during a given measurement period
Availability defined in terms of service level agreements (SLA)
Recovery TimeData loss during unplanned downtime
A highly available application should be accessible by users x% of the time
Disaster RecoveryProcesses and procedures designed to restore business operations due to a natural or human-induced disaster
Typically involves providing redundancy spanning multiple sites or across geographic regions
Defining x and SLARecovery Time Objective (RTO) guided by availability requirements
How much downtime can you tolerate?
Recovery Point Objective (RPO) guided by criticality of application data
How much data can you lose?
AvailabilityClass
Acceptable Downtime (hrs/yr) OR RTO
Acceptable Data Loss (time of last copy) OR RPO
Tier 1 >99.99%(1 hr or less)
5 min or less
Tier 2 99.9% - 99.99% (1- 8.5 hrs)
5 mins to 8.5 hrs
Tier 3 (<99.9%)(Hours to days)
Hours to days
Tier1
RTO
RPO
Tier2
Tier3
Availa
bility
Requirements
Protection LevelsProtection against resource failures
Machine Database CorruptionDisk
Location RedundancyBuilding< 10 miles
Local HA
Regional DR
Geographic DR
Protection against Network Outages Site Failures
Location Redundancy– City, County– < 100-200 miles
Protection against Natural Disasters
Location Redundancy– State, Country– > 100-200 miles
Business Continuity Planning
Analysis
Solution Design
ImplementationTesting
Maintenance
Impact AnalysisCritical FunctionsThreat IdentificationRecovery Objectives
Solution DesignAchieve recovery objectives for relevant threats within specified constraints like budget, human resources etcCost\Benefit analysis of solutions
ImplementationDeploy the recommended solution
TestingTest to see if the solution meets the recovery requirements
MaintenanceYearly testing and review of procedures
SQL Server High Availability PlanningAnalysis
Application tiers serviced by the databasesCauses of database downtimeProtection levels: Local HA, Regional DR, Geographic DR
Solution DesignNeed to understand what solutions exists?What are the characteristics and cost of the solution?
ImplementationWhat are the deployment steps and best practices?
TestingHow do I test my implementation?
MaintenanceHow do I monitor and maintain the solution?
Analysis
Solution Design
ImplementationTesting
Maintenance
Database Downtime Drivers
Database Downtime
Unplanned Downtime
FailureProtection
User Errors
Planned Downtime
Online Administration
Predictable Resourcing
Analysis
Solution Design
Solution Architecture
HA Capabilities
Limitations and Caveats
Cost Vector
Understand the solutions and choices before making a decision
Solution Design
SQL Server Always On Technologies
Solution Design
Always On TechnologiesProvides a full range of options to minimize downtime and maintain appropriate levels of application availability
Solution Design
Increases Availability
• Backup and Restore• Log Shipping• Database Mirroring• Failover Clustering• Peer-Peer Replication
Decreased Downtime
• Online Index Operations• Table Partitioning• Enhanced Locking• Resource Governor• Database Snapshot• Dedicated Admin Connection• Dynamic Configuration
Always On Technology Overview
Architecture OverviewHow does it work?
Solution CharacteristicsData Loss GuaranteesFailover CharacteristicsRedundancy Levels and UtilizationCostLimitations and Caveats
Solution Design
Increases Availability
• Backup and Restore• Log Shipping• Database Mirroring• Failover Clustering• Peer-Peer Replication
Backup and RestoreBase availability technology for any solution
Protects against failures and recovery from errorsProvides Local HA and Site DR
Need to ensure the backups are accessible if site goes downHigh RTO due to restore timeRPO=0 can never be guaranteed
Types: Full, Differential, and Transaction LogFile-group backup/restore for large databases
Backup Compression provides faster and smaller backups in SQL Server 2008
Solution Design
Log ShippingAutomated transaction log backup and restore provides redundancy at the database level
SQLLogship.exe provides the underlying framework for doing automated backup, copy and restore
Backup on primary instanceRestore on secondary instance(s)
Scheduling is done through SQL Server Agent jobs
SQL Server 2008 provides sub-minute scheduling interval providing the ability to do quick backup and restores
No automatic failover capabilities
Solution Design
Database MirroringA database level high availability solution that provides complete protection against data loss and fast recovery through automatic failover
Maintains a redundant database by shipping log blocks when the transactions are committed on the principal
Synchronous and Asynchronous modes provide the spectrum of options to choose between availability and performance
Automatic failover when using witness server
Solution Design
Failover Clustering
Instance level protection built on Windows Failover Clustering shared disk model
Cluster nodes typically co-located within the same site to provide local HARegional DR possible using VLAN and stretch storage level replication
No built in data redundancy like database mirroring and log shipping
Data protection has to be provided at the storage level or by combining with other solutions
Solution Design
Transactional ReplicationA high performance data replication solution that provides granular table level replication
Logical data movement provides flexibility and better hardware utilization
Key scenarios:Customized application-specific DRReal-time reporting on secondary server that be used for Site DRScale out application queries with ability to use any one database copy for Site DR
Two types relevant for HA and DRTransactional and Peer-to-Peer
Solution Design
Always On Solution Characteristics
No Data Loss(RPO=0)
Failover Unit AutoFailover(RTO)
Inst DB Tab
+ **
Read Mult-iple
Write
*
*
*
Solutions
Log Shipping
DBM Sync
Async
Cluster
Transactional Replication
Peer-PeerReplication
RPO FailoverRedundancy and
Utilization
Hard-ware
App PerfImpact
Manag-eability
Low Low Low
Low High Low
Low Low Low
High*** Low *** Low***
Low Low High
Low Low High
Cost
Solution Design
* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology
What’s New in SQL Server 2008New Features
Resource GovernorManage SQL Server workloads and resources by specifying limits on resource consumption
Backup CompressionReduce backup and restore time
Feature EnhancementsDatabase Mirroring
Automatic recovery from page corruptionLog stream compressionFaster recovery on failover
Log ShippingSub-Minute Log ShippingBackup compression
Failover Clustering16 nodesRolling upgrade
Peer-Peer ReplicationHot add new nodes
Backup CompressionCommon questions:
“How much compression will I see?”“Will it be comparable to, say, SQL Litespeed?”
One simple answer: “It depends!”
All data compresses differently – the compression ratio achieved depends on:
The type of data in the databaseWhether the data in the database is already compressedWhether the data/database is encrypted
“We saw an 85 percent reduction in file size using SQL Server 2008 Backup Compression,” says Colin Neller, Senior Software Engineer at ServiceU and part of the company’s SQL Server 2008 implementation team. “A backup file that was previously over 300 GB is now only 40 GB, and the job runs in about half the time.”
Backup Compression: Backup Performance
Backup of a 322 MB Adventureworks database
A LOT more CPU used (avg 25%) BUT runtime = 21.6s (45% improvement) and backup stored in 76.7MB (4.2x compression ratio)
Hardly any CPU used (avg 5%), runtime = 39.5s, compression ratio of 0.
Uncom
pressed
Compressed
DEMO: Increasing Availability Using Always On Technologies
demo
Developing Your Availability Solution using SQL Server Always On Technologies
Solution Design
RecapApplication availability requirements or SLA drive primary solution choices
RPO and RTO are the key metrics used to define the SLA
Need mitigation against planned and unplanned downtimes
Multiple solution choices that provides varying cost\benefits
Other requirements apart from application SLA factor into the choice
Understand constraints and tradeoffs you can make
Database Mirroring
Clustering
Log Shipping
Peer-PeerReplication
Application Availability
Unplanned downtime Planned Downtime
Solution Design
Always On Solution Characteristics
No Data Loss(RPO=0)
Failover Unit AutoFailover(RTO)
Inst DB Tab
+ **
Read Mult-iple
Write
*
*
*
Solutions
Log Shipping
DBM Sync
Async
Cluster
Transactional Replication
Peer-PeerReplication
RPO FailoverRedundancy and
Utilization
Hard-ware
App PerfImpact
Manag-eability
Low Low Low
Low High Low
Low Low Low
High*** Low *** Low***
Low Low High
Low Low High
Cost
* Database Mirroring and Log Shipping can provide point in time read capability using STANDBY or database snapshots respectively** Database Mirroring provides fastest failover to hot secondary*** Depends on SAN technology
Solution Design
AdventureWorks Inc Scenario Adventureworks Inc is a
manufacturing company that manufactures and sells bicycles across the world. There are a number of applications, some that are mission critical that run on multiple SQL Server Instances
The DBA team is run by Darren who is responsible for deploying and managing the application databases. One of his core responsibilities is to ensure availability of all application databases in order to meet the application SLA
One datacenter located in OmahaThree applications
Manufacturing – Tier 1Finance – Tier 2 Scheduling – Tier 3
Manufacturing application runs on a dedicated SQL Server 2008 Instance
All other applications run on a second instance
Availability of manufacturing application is criticalImplement a solution at the lowest possible cost
Solution Design
Application Requirements
Manufacturing application has strict SLA’sFinance application requires readability on the secondary
The reports are run every 4 hours and need to be fresh as of the last one hour. To offload the reporting load from the main system they would like to utilize the mirror
Data LossRPO=0
RTO in secs
Failover Unit AutoFailover
Inst DB Tab
Read Multiple Sites
ReadWrite
Applications
Manufacturing
Finance
Scheduling
Solution Design
Solution Choice for Manufacturing Application
Clustering can provide a zero data loss solution that can also provide fast instance level failoverUse RAID configuration to provide data redundancy on the SANIf a redundant copy is required that can provide instance failover with zero data loss use SAN replication
High Cost Solution
Use synchronous database mirroring if instance failover is not needed
Solutions Data LossRPO=0
Fast RTO
Failover Unit AutoFailover
Read > 1Sites\ Copy
Read WriteInst DB Tab
Cluster
SAN Replication
DBM - Sync
Log Shipping
Transactional Replication
Peer-PeerReplication
Clustering with RAID
DBM - Async
Solution Design
For database level redundancy with acceptable data loss with minimal perf impact, asynchronous database mirroring is an optimal choice
Use database snapshots at periodic intervals to provide a readable snapshot of the data for reporting
Low cost solution
DBM - Async
Cluster
Solution Choice for Finance Application
Solutions Data LossRPO=0
Fast RTO
Failover Unit AutoFailover
Read > 1Sites\ Copy
Read WriteInst DB Tab
SAN Replication
Log Shipping
Transactional Replication
Peer-PeerReplication
Async Database Mirroring
Omaha Datacenter
Finance
Db Snapshotevery hour
Reports
Scheduling
DBM - Sync
Solution Design
Adding a Regional Datacenter Into the Mix
Regulatory and compliance requirements drive the need for having a additional datacenter within a 10 mile radius to provide redundancy against site level failure.
It is now required that all applications have the ability to failover to the regional datacenter across the river in Council Bluff
The SLA need to be maintained for tier 1 applications even in the case of site failures
Solution Design
Regional Site Solution Choices
Async Database Mirroring
Omaha Datacenter
Finance
Db Snapshotevery hour
Reports
Scheduling
Sync Mirroringno witness
Log Shipping
CB Datacenter
Cluster with SAN
Manufacturing
Solution Design
A Complete Topology
Considering the potential of floods and tornadoes destroying the regional data centers, Adventureworks Inc wants to maintain a disaster recovery site in San Antonio, TX
The disaster recovery site has lower SLA requirements for all applications
The manufacturing application can have an RPO of 1 hourThe RTO is set at 4 hours
Solution Design
Topology Diagram
Sync MirroringNo witness
Cluster with SAN
Log Shipping
Manufacturing
Solution Design
Scale Out and Availability Scenario
Adventureworks is building a new web based order management system that allows customers from all over the world access the system and place ordersThe core group of customers are in Western Europe, South East Asia and North America
Requirements– Geo Redundancy – Data Locality– High Availability– Local Read-Scale
Workload Characteristics– Mainly reads– Few writes
Application Characteristics– Each user logging in connects to a
particular server Partitioned based on user-id and region Writes from a user always happen on one
server regardless of the region the user log in from
– All reads redirected to the closest geo-location
Reasonable tolerance for latency (5-10 minutes)
Solution Design
Replication Topology
Peer Nodes
Read-Only Servers
Asia1 Asia2
Solution Design
Implementing and maintaining a HA solutiondemo
Licensing Facts
Passive servers are mirror, log shipped secondary and clustering passive nodeNo license required on passive if it is truly passiveA passive server does not need a license if the number of processors in the passive server is equal to or less than the number of processors in the active server. The passive server can take the duties of the active server for 30 days. Afterwards, it must be licensed accordingly.
HA Features Edition SupportFeature Express Workgroup Standard Enterprise Comments
Database Mirroring 1
Advanced high availability solution that includes fast failover and automatic client redirection
Failover Clustering 2
Backup Log-shipping Data backup and recovery solution
Online System Changes
Includes Hot Add Memory, dedicated administrative connection, and other online operations
Online Indexing
Online Restore
Fast RecoveryDatabase available when undo operations begin
₁Single thread redo₂ Limited to 2 node cluster
SummaryThere is no “one size fits all” solution
Consider the cost\benefits\constraints and compare that to availability requirements of the organization to determine the best solution
Use the charts to understand cost, benefit and constraints of the various SQL Server High Availability solutions
TEST the solution to ensure it can meet the availability requirements and meet SLA’s
question & answer