vmworld 2013: virtualizing highly available sql servers

1. Virtualizing Highly Available SQL Servers Scott Salyer, VMware VAPP5932 #VAPP5932

2. 22 Agenda Why Virtualize Causes of Downtime and Planning a strategy Scenario 1 Baseline High Availability Scenario 2 AlwaysOn Availability Groups Scenario 3 SQL Server Failover Clustering Scenario 4 Rolling Upgrades Disaster Recovery and Backup Summary 3. 33 Setting Expectations This is NOT a Best Practices Session This session will cover Availability and Recovery for SQL Server database VMs This session does NOT cover performance, sizing, scaling, or consolidationfor more information on these topics, please attend VAPP1006- GD SQL/MS Apps with Jeff Szastak (a group discussion) 4. 44 Summary TimetoMarket QualityofService Availability Quality of Service (QoS) Guaranteed performance SLAs through resource controls, dynamic load balancing, capacity & performance management Simplified security SLAs with app protection Time to Market (TTM) Availability Protection against app failures through high availability and fault tolerance Simplified business continuity with automated disaster recovery & backup Reduced app provisioning times to minutes through use of templates & intelligent policy management Dynamic scaling of apps through scale- up/scale-out capacity on demand Complete Flexibility. Non-Stop Reliability 5. 55 Causes of Downtime Planned Downtime Software upgrade (OS patches, SQL Server cumulative updates) Hardware/BIOS upgrade Unplanned Downtime Datacenter failure (natural disasters, fire) Server failure (failed CPU, bad network card) I/O subsystem failure (disk failure, controller failure) Software/Data corruption (application bugs, OS binary corruptions) User Error (shutdown a SQL service, dropped a table) 6. 66 Failover Clustering Local server redundancy Instance level failover Zero data loss Local server and storage redundancy Disaster recovery Database level failover Zero data loss with high safety mode Database Mirroring Log Shipping Multiple disaster recovery sites for databases Manual failover required App/user error recovery New in SQL Server 2012 AlwaysOn Failover Cluster Instance with shared disk architecture, native support for multi-site cluster AlwaysOn Availability Group with non-shared disk architecture, support for multiple secondary, readable secondary AlwaysOn SQL Server Native Availability Features 7. 77 Planning a High Availability Strategy Requirements Recovery Time Objective (RTO) What does 99.99% availability really mean? Recovery Point Objective (RPO) Zero data lost? HA vs. DR requirements Evaluating a technology Whats the cost for implementing the technology? Whats the complexity of implementing, and managing the technology? Whats the downtime potential? Whats the data loss exposure? Availability % Downtime / Year Downtime / Month * Downtime / week "Two Nines" - 99% 3.65 Days 7.2 Hours 1.69 Hours "Three Nines" - 99.9% 8.76 Hours 43.2 Minutes 10.1 Minutes "Four Nines" - 99.99% 52.56 Minutes 4.32 Minutes 1.01 Minutes "Five Nines" - 99.999% 5.26 Minutes 25.9 Seconds 6.06 Seconds * Using a 30 day month 8. 88 HardwareFailureTolerance Application Coverage VMware FT Unprotected Automated Restart Continuous 0% 10% 100% VMware HA VMotion (Planned Downtime) DB Mirroring / RAC / AAG Microsoft Clustering / Data Guard / AAG High Availability Options Clustering too complex and expensive for most applications VMware HA and FT provide simple, cost-effective availability VMotion provides continuous availability against planned downtime 9. 99 Scenario 1 Baseline High Availability Moving beyond physical limitations 10. 1010 VMware Availability Features 11. 1111 VMware vSphere High Availability (HA) Protection against host or operating system failure Automatic restart of virtual machines on any available host in cluster Provides simple and reliable first line of defense for all databases Minutes to restart OS and application independent, does not require complex configuration or expensive licenses 12. 1212 VM Mobility Server Maintenance VMware vSphere vMotion and VMware vSphere Distributed Resource Scheduler (DRS) Maintenance Mode Migrate running VMs to other servers in the pool Automatically distribute workloads for optimal performance Storage Maintenance VMware vSphere Storage vMotion Migrate VM disks to other storage targets without disruption Key Benefits Eliminate downtime for common maintenance No application or end user impact Freedom to perform maintenance whenever desired 13. 1313 App-Aware HA Through Health Monitoring APIs Leverage third-party solutions that integrate with VMware HA (for example, Symantec ApplicationHA) OS APP OS APP Database Health Monitoring Detect database service failures inside VMVMware HA 1 Database Service Restart Inside VM App start / stop / restart inside VM Automatic restart when app problem detected 2 Integration with VMware HA VMware HA automatically initiated when App restart fails inside VM Heartbeat from VM fails 3 App Restart 1 2 3 14. 1414 Standalone SQL Server VM with VMware HA, DRS, & vMotion Highlights: Quickly restore service after host failure Simple to configure and easy to manage Can use Standard Windows and SQL Server editions Note : Protection against hardware failures only Does not provide application-level protection 15. 1515 Scenario 2 AlwaysOn High Availability What happens when a node fails? 16. 1616 What are SQL Server Always On Availability Groups? Database-level replication over IP, no shared storage requirement Same advantages as failover clustering (service availability, patching, etc.) Two copies of the data, protection from data corruption Readable secondary Automatic or manual failover through WSFC policies 17. 1717 Scenario 2 Improving on AlwaysOn High Availability Technology Chosen AlwaysOn AG for HW and SW protection VMware HA & vMotion for added protection SRM for DR, SRM integration to restore AG on remote site Benefits Quickly restart failed AAG node to bring cluster back to full capabilities Migrate nodes off physical hardware (hosts or storage) without downtime or impact Automate Disaster Recovery at remote site with SRM 18. 1818 vSphere HA with AlwaysOn Availability Group (AG) Protection against HW/SW failures and DB corruption Storage flexibility (FC, iSCSI, NFS) Compatible w/ vMotion, DRS, HA RTO in few seconds vSphere HA + AlwaysOn AG Seamless integration, VMs rejoins AG after vSphere HA recovery Can shorten time that database is in unprotected state Reduces synchronization time after VM recovery 19. 1919 Demo Deploying AlwaysOn Availability Group on vSphere 20. 2020 Deploying AlwaysOn Availability Group on vSphere Step 1: vSphere platform setup Ensure disk is created as Thick Eager Zeroed Create DRS anti-affinity to avoid running VMs on the same host Step 2: Create WSFC Install Failover Clustering feature Create a cluster for the Availability Group Add SQL Server VMs as cluster nodes Configure quorum policy to use Node and File Share majority Step 3: Enable SQL Server for AlwaysOn Configure SQL Server service to enable AlwaysOn High Availability Groups on each SQL Instances Restart SQL service 21. 2121 Deploying AlwaysOn Availability Group on vSphere Continued Step 4: Create AG for AdventureWorks2012 database Prerequisite: Set database to use full recovery mode Prerequisite: Take a full backup of the database Create a 2 node AG with synchronous commit, automatic failover Create a Database Listener for the AG Step 5: Monitor AG from Dashboard Dashboard shows the heath state of the AG, and status of each replica 22. 2222 Scenario 3 SQL Server Failover Clustering (Shared Disk) 23. 2323 What is Microsoft Failover Clustering? Provides application high-availability through a shared-disk architecture One copy of the data, rely on storage technology to provide data redundancy Automatic failover for any application or user Suffers from restrictions in storage and VMware configuration 24. 2424 vSphere HA with Failover Clustering Highlights: RTO in few seconds Protection against HW/SW failures but not DB corruption Legacy application support (those not mirror-aware) Note: DRS and vMotion not available (only cold migration) No protection from data corruption or storage failures Storage must be FC Must use RDMs 25. 2525 VMware Support For Microsoft Clustering On vSphere Microsoft Clustering on VMware vSphere support VMware HA support vMotion DRS support Storage vMotion support MSCS Node Limits Storage Protocols support Shared Disk FC In- Guest OS iSCSI Native iSCSI In- Guest OS SMB FCoE RDM VMFS Shared Disk MSCS with Shared Disk Yes Yes1 No No 2 5 (5.1 only) Yes Yes No Yes5 Yes4 Yes2 Yes3 Exchange Single Copy Cluster Yes Yes1 No No 2 5 (5.1 only) Yes Yes No Yes5 Yes4 Yes2 Yes3 SQL Clustering Yes Yes1 No No 2 5 (5.1 only) Yes Yes No Yes5 Yes4 Yes2 Yes3 SQL AlwaysOn Failover Cluster Instance Yes Yes1 No No 2 5 (5.1 only) Yes Yes No Yes5 Yes4 Yes2 Yes3 Non shared Disk Network Load Balance Yes Yes1 Yes Yes Same as OS/app Yes Yes Yes N/A Yes N/A N/A Exchange CCR Yes Yes1 Yes Yes Same as OS/app Yes Yes Yes N/A Yes N/A N/A Exchange DAG Yes Yes1 Yes Yes Same as OS/app Yes Yes Yes N/A Yes N/A N/A SQL AlwaysOn Availability Group Yes Yes1 Yes Yes Same as OS/app Yes Yes Yes N/A Yes N/A N/A Shared Disk Configurations: Supported on vSphere with additional considerations for storage protocols and disk configs Non-Shared Disk Configurations: Supported on vSphere just like on physical * Use affinity/anti-affinity rules when using vSphere HA ** RDMs required in Cluster-across-Box (CAB) configurations, VMFS required in Cluster-in-Box (CIB) configurations VMware Knowledge Base Article: http://kb.vmware.com/kb/1037959 26. 2626 Scenario 4 Rolling Upgrades Patching without clusters 27. 2727 Patching Non-clustered Databases Benefits No need to deploy an MS cluster simply for patching / upgrading the OS and database Ability to test in a controlled manner (multiple times if needed) Minimal impact to production site until OS patching completed and tested Patching of secondary VM can occur during regular business hours Requires you to layout VMDKs correctly to support this scenario 28. 2828 Scripted MS SQL Server Rolling Patch Upgrades VMware PowerCLI and Powershell provide a reproducible result What about Audit trail / log of execution? Which roles participate in managing upgrade and how? VMware ESX VMware ESXi 29. 2929 Use vCenter Orchestrator and vCloud Automation Center to Enhance Rolling Patch Upgrades Automation Execution and Status Workflows provide a powerful means for process flow and control Creates a standard definition of infrastructure processes Execution status available in realtime Integrates with Scripting and Systems Managed Powershell execution Self Service Self Service Portal Initiated by assigned user Roles Delegated Approvals 30. 3030 Demo Automated Rolling Patch Upgrade using Standby VM 31. 3131 Rolling Patch Upgrade Using Standby VM Step 1: Configure Standby VM Create VM using SQL Server Sysprep or using OS only clone + SQL install Apply any server level configurations changes Patch Standby VM to the target service pack level Start client app (for demo purpose only) Step 2: Remove Primary VM from public network Disconnect public nic Observe: client is experiencing temporary connection down, and in a loop to reconnect Step 3: Hot remove resource from Primary VM Detach database from SQL Server instance using a script Take disk offline Hot remove VMDK from VM 32. 3232 Rolling Patch Upgrade Using Standby VM Continued Step 4: Hot add resource to Standby VM Hot add VMDK to Standby VM Bring disk online Attach database to SQL Server instance Step 5: Perform final role switch Configure Standby VM to take the IP address of the Primary VM public nic. Standby is now the new primary. Observe: client is automatically reconnected to the new primary with update service pack The old Primary VM can be taken down for application of service patch See blog post on: http://blogs.vmware.com/apps/2011/11/sql- server-rolling-patch-upgrade-using-standby-vm.html 33. 3333 Disaster Recovery and Backup 34. 3434 VMware vCenter Site Recovery Manager (SRM) Relies on storage or vSphere host replication Allows creation, maintenance, and execution of automated process to facilitate site recovery Safe testing without impacting production environment Self-documenting 35. 3535 VMware vCenter SRM with SQL Server AAG AAG provides local availability Storage replication keeps DR facility in sync During a site failure, the admin has full control of recovery After workflow is initiated, SRM automates the recovery process The entire process can be tested without actually failing over services! 36. 3636 In-guest SQL Server-Aware Backup Solution Standard method for physical or virtual Agent runs in the VM guest and handles database quiescing Data is sent over the IP network Can affects CPU utilization in the guest OS 37. 3737 Array-based Backup Backup vendor software coordinates with VSS to create a supported backup image of the SQL Server databases Snap-shotted databases can later be streamed to tape as flat files with no IO impact to the production SQL Server 38. 3838 VMware Putting It All Together Planned downtime avoidance vMotion & Storage vMotion Rolling SQL Server upgrades with vCO / vCAC Un-Planned downtime recovery vSphere HA + AppAware HA vSphere FT Disaster recovery Site Recovery Manager SQL Server 2012 AlwaysOn Availability Groups Pre-SQL Server 2012 Failover Clustering Database Mirroring Log Shipping Replication 39. 3939 Summary TimetoMarket QualityofService Availability Quality of Service (QoS) Guaranteed performance SLAs through resource controls, dynamic load balancing, capacity & performance management Simplified security SLAs with app protection Time to Market (TTM) Availability Protection against app failures through high availability and fault tolerance Simplified business continuity with automated disaster recovery & backup Reduced app provisioning times to minutes through use of templates & intelligent policy management Dynamic scaling of apps through scale- up/scale-out capacity on demand Complete Flexibility. Non-Stop Reliability 40. 4040 Resources Visit us on the web to learn more on specific apps http://www.vmware.com/solutions/business-critical-apps/ Visit our Business Critical Application blog http://blogs.vmware.com/apps/ and please attend our sessions listed below for more detailed information on virtualizing and managing Tier 1 Apps on VMware! VAPP5473 Automated Management of Tier-1 Applications on VMware VAPP5613 Successfully Virtualize Microsoft Exchange Server VAPP5932 Virtualizing Highly Available SQL Servers VAPP6124 Automating VMware Cloud and Virtualization Deployments with Dell Active Infrastructure VAPP5618 Virtualize Active Directory, the Right Way! VAPP4906 Architecting Oracle Databases on vSphere 5 with NetApp Storage VAPP5834 Virtualizing Mission Critical Oracle RAC with vSphere and vCOPS BCO4905 Disaster Recovery Solution with Oracle Data Guard and Site Recovery Manager VAPP4813 Real-world Design Examples for Virtualized SAP Environments VCM4891 Performance Management of Business Critical Applications using vCenter Operations Management 41. 4141 Questions? 42. 4242 Other VMware Activities Related to This Session HOL: HOL-SDC-1304 and HOL-SDC-1317 vSphere Performance Optimization vCloud Suite Use Cases - Business Critical Applications Group Discussions: VAPP1006-GD SQL/MS Apps with Jeff Szastak 43. THANK YOU 44. Virtualizing Highly Available SQL Servers Scott Salyer, VMware VAPP5932 #VAPP5932