configuration manager high availability and disaster recoveryschd.ws/hosted_files/mms2017/03/mms2017...
TRANSCRIPT
CONFIGURATION MANAGER HIGH AVAILABILITY AND DISASTER
RECOVERYBenjamin Reynoldsblogs.technet.microsoft.com/benjamin/Microsoft
Steve Thompsonwww.stevethompsonmvp.wordpress.comSenior ConsultantSoftchoice
@Steve_TSQL
SQL / EMS MVP
Many moons
Blues & BBQ
?
SQL, SSRS, SCCM
15 years
Sunriver, Oregon & Indian
Steve ThompsonBenjamin Reynolds
PRESENTATION Just a placeholder slide. Please use the example slides in the “Example Slides” Section. Questions? [email protected]
AGENDA SQL Server High Availability Options
FCI AlwaysOn Availability Groups AlwaysOn options in Azure Summary AOAG
MS IT - Migrating SCCM SQL Server to Azure Azure Considerations
Setting up an Availability Group on Azure
Efficiently moving the CM DB to an Availability Group
Lessons learned
SQL SERVER HIGH AVAILABILITY OPTIONS
SQL SERVER HIGH AVAILABILITY OPTIONS FCI (Failover Cluster Instance) AlwaysOn Availability Groups
Architecture Synchronous vs Asynchronous
AlwaysOn options in Azure Hybrid
Summary
MISSION-CRITICAL PERFORMANCE
Availability OptionsBasic Availability GroupsWith SQL 2016 Standard Edition
Enhanced Always OnDistributed Availability Groups, automatic replica seeding, distributed transactions, automatic failover, load balancing, manageability
Backup enhancementsManaged backup to Azure, Database Recovery Advisor
ALWAYS ON
Failover on SQL Server instance level Shared storage (SAN/SMB) Failover can take minutes based on load Multi-node clustering Passive secondary nodes
Failover on database level Direct attached storage Failover takes seconds Multiple secondaries Active secondaries
Availability
Failover Cluster Instancesfor servers
Availability Groupsfor groups of databases
WSFC node WSFC node
FAILOVER CLUSTER INSTANCES
Server failover Shared storage Multi-node clustering Passive secondary nodes Failover in minutes
SQL Server 2016
Shared storage
SQL Server 2016SQL Server Failover Cluster Instance
Availability
Availability Group Listener
AVAILABILITY GROUPS Multi-database failover Direct attached storage Multiple secondaries Either synchronous or asynchronous replicas Active secondaries Automatic page repair Planned manual failover Automatic failover without data loss when
using synchronous-commit mode Failover in seconds
SQL Server 2016
SQL Server 2016SQL Server 2016
Availability
MISSION-CRITICAL AVAILABILITY
Provides unified, simplified solution Streamlines deployment,
management, and monitoring
Reuses existing investments Offers SAN/DAS environments
Allows use of HA hardware resources Supports Fast, transparent failover
Detects failures reliably Handles multiple failures at once
Availability
AVAILABILITY GROUPS + FAILOVER CLUSTERING
WSFC needs to be running when Availability Groups are configured
Health status Failover Cluster Instances
and Availability Groups work together to ensure data is accessible despite failures
SQL Server 2016 nowsupports max 2 auto-failover targets
Availability Group Example
Windows Server Failover Clustering (WSFC) ClusterNetwork Subnet Network Subnet
Node NodeNodeNodeNode
SQL ServerInstance
SQL ServerInstance
SQL ServerInstance
AlwaysOn SQL ServerFailover Cluster Instance
Primary Replica Secondary Replica Secondary Replica Secondary Replica
Always On Availability Group
InstanceNetwork Name
WSFCConfiguration
WSFCConfiguration
WSFCConfiguration
WSFCConfiguration
WSFCConfiguration
InstanceNetwork Name
InstanceNetwork Name
InstanceNetwork Name
Availability Group Listener Virtual Network Name
Storage Storage Storage Shared Storage
BASIC AVAILABILITY GROUPS
Available in SQL Server 2016 Standard Edition or higher Provides failover support for single database Replaces database mirroring feature (now deprecated) Single replica for primary database, using either synchronous or asynchronous
commit mode Note: not supported with Configuration Manager
SQL SERVER 2016 – HIGH AVAILABILITYEnterprise Edition Availability with Enhanced AlwaysOnNEW
Standard Edition High Availability NEW
MISSION CRITICAL HA + hybrid DR• Multiple node failover clustering
(3 synchronous, up to 8 replicas) • Multiple database failover as a group• Fully readable secondaries for backup or reporting
operations• No domain join required with WS 2016
Enterprise
BASIC HA• Two node failover (1 active, 1 passive)• Single database failover to non-readable secondary• No domain join required with WS 2016
Standard
Enterprise includes all Standard features.
SQL AO AG Listener
Greater scalabilityLoad-balancing readable secondaries
Increased number of automatic failover targets
Log transport performance
Improved manageabilityDTC support with limitations
Database-level health monitoring
Group Managed Service Account
Domain-independent Availability Groups
AG_Listener
New York(Primary)
Asynchronous data movement
Synchronous data movement
Unified HA solution
Enhancements in Always On Availability Groups
AG
Hong Kong(Secondary)
AG
New Jersey(Secondary)
AG
LOAD BALANCING IN READABLE SECONDARIES In SQL 2014, read-only transactions routed by the Listener went to first available secondary Read-only routing lists can be configured to round-robin among specific set of secondaries
(for each primary)
DATABASE-LEVEL FAILOVER TRIGGER In SQL Server 2014, Availability Groups only monitor health of the instance
Database can be offline or corrupt, but will not trigger failover as long as instance itself is healthy
SQL Server 2016: Option to also monitor health of databases in Availability Group
Databases going offline trigger change in health status
You can now configure AlwaysOn Availability Groups for failover when database goes offline
This change requires setting DB_FAILOVER option to ON in CREATE AVAILABILITY GROUP (Transact-SQL) or ALTER AVAILABILITY GROUP (Transact-SQL) statements
CROSS-DATABASE TRANSACTIONS AND DISTRIBUTED TRANSACTIONSSupport for cross-database transactions within the same SQL Server Instance
Cross-database transactions within the same SQL Server instance are not supported for Always On Availability Groups
Support for distributed transactions Distributed transactions are supported with Always On Availability Groups between databases hosted by
two different SQL Server instances. It also applies to distributed transactions between SQL Server and another DTC-compliant server
Availability Groups must be running on Windows Server 2016 or Windows Server 2012 R2. For Windows Server 2012 R2 you must install the update in KB3090973
Availability Groups must be created with the CREATE AVAILABILITY GROUP command and the WITH DTC_SUPPORT = PER_DB clause
Distributed transactions are not supported for database mirroringhttps://msdn.microsoft.com/en-us/library/ms366279.aspx
Availability
HYBRID CLOUD SOLUTION FOR HA & DR
Easy Setup of on-premises and hybrid cloud HA & DR
Disaster Recovery Environment in Azure
Readable Secondary Replica for running backups to Azure Blob Storage
Cloud Reporting Data Source
Previously listener configuration in Azure was manual
Allows configuring availability grouplistener in Azure
Clients can connect to Azure replica after failover using listener name
SIMPLIFIED ADD AZURE REPLICA WIZARDAUTOMATIC LISTENER CONFIGURATION
BEFORE EXTENDING ALWAYS ON CONFIGURATIONConfigure site-to-site VPN tunnel between on-premises SQL Server and Azure
ALWAYS ON REPLICA IN AZURESample deployment
VPN Device VPN Gateway
S1
S2
Availability Group Virtual Network
PVPN Tunnel
On-premises Microsoft Azure
AD/DNS AGCloudReplica
CapabilityFor scalability, SQL Server 2016 adds in load balancing of readable secondariesIncreases number of auto-failover targets from two to three (synchronous replication)
BenefitsLog transport performance has been improvedSupport for Distributed Transaction Coordinator (DTC): Enrolled transactions for Availability Group databases with limitationsDatabase-level health monitoring
Availability
SUMMARY: ENHANCED ALWAYS ON
Always On Availability Group Demo
SITE SERVER ROLE HIGH AVAILABILITY! HOT OFF THE PRESS! (still being coded) New SCCM Role Active / Passive Mode for Primary Site Servers No longer the single point of failure Stay right here for “Configuration Manager High Availability and Disaster Recovery – Part 2”!
MIGRATING SCCM SQL SERVER TO AZURE
MSIT CURRENT STATE
Redmond Site 198,000 domain joined clients
Redmond Site 297,000 domain joined clients
North & South America
45,000 domain joined clients
Europe, MidEast, Africa 45,000 domain joined clients
Australia & Asia 65,000 domain joined clients
Device Mgmt.site
~126k devices
Infrastructure• 6 primary sites• 13 secondary sites• 300 distribution points
PCs and Devices• ~350,000 domain joined
managed desktops• ~126,000 enrolled mobile
devices• ~225k mobile devices (EAS)
Users• 265,000 users
Microsoft Intune
Azure Active Directory
Connector site role
Intune subscription
User Discovery
MS Online Directory Sync
Active Directory Federation Server
System Center Configuration Manager
33,500
MDM enrolled devices
15,000 34,500 36,000 3,500 3,500
MSIT CURRENT STATE
Device Mgmt Site (INT)• ~125,000 Mobile Devices• 1 SMS Provider • 4 SMS Certificate Registration
Point• 1 SMS Distribution Point• 1 SMS Enrollment Server• 1 SMS Management Point• 2 SQL servers with Cluster
Redmond Campus Site 1 (RD2)• ~83,000 Clients• 1 SMS Provider • 2 SMS Application Web Service• 18 SMS Distribution Point• 1 SMS Fallback Status Point• 6 SMS Management Point• 5 SMS Software Update Point• 2 SQL servers with Cluster
North & South America (NA1)• ~43,000 Clients• 1 SMS Provider • 2 SMS Application Web Service• 65 SMS Distribution Point• 1 SMS Fallback Status Point• 6 SMS Management Point• 4 SMS Software Update Point• 2 SQL servers with Cluster
Redmond Campus Site 2 (RD3)• ~108,000 Clients• 1 SMS Provider • 2 SMS Application Web Service• 25 SMS Distribution Point• 1 SMS Fallback Status Point• 6 SMS Management Point• 5 SMS Software Update Point• 2 SQL servers with Cluster
Asia Pacific & Australia (AU1)• ~67,000 Clients• 1 SMS Provider • 2 SMS Application Web Service• 11 SMS Distribution Point• 1 SMS Fallback Status Point• 3 SMS Management Point• 3 SMS Software Update Point• 2 SQL servers with Cluster
Beijing, China (BJW)• 6 Distribution Point1
SMS• 1 Management PointBangalore, India (BGL)• 4 Distribution Point1
SMS• 1 Management PointHyderabad, India (HDB)• 4 Distribution Point1
SMS• 1 Management PointSydney, Australia (SDN)• 7 Distribution Point1
SMS• 1 Management PointShanghai, China (SHN)• 7 Distribution Point1
SMS• 1 Management Point
k O
Charlotte, NC (CLT)• 15 Distribution Point1
SMS• 1 Management PointSao Paulo, Brazil (SPA)• 5 Distribution Point1
SMS• 1 Management PointSilicon Valley, CA (SVC)• 9 Distribution Point1
SMS• 1 Management Point
Summary• ~300,000 PCs and ~85,000 Mobile Devices Managed.• 6 Primary sites with 13 Secondary site and ~350 Distribution points globally.• 99% of all server roles are virtual and hosted on prem Hyper-V except SQL
Servers.• All site roles are reductant for high availability except CM primary site.
• SQL cluster with Hyper–V Cluster nodes for high availability.
Central Administration Site (CAZ)
• 1 SMS Provider • 1 SMS Software Update Point• 2 SQL servers with Cluster• 1 SMS SRS Reporting Point• Intune Connector Role
Microsoft IT onprem Active Directory for
user discovery & sync
CM/Hybrid Intune Connector
Europe, Middle East & Africa (EU1)• ~40,000 Clients• 1 SMS Provider • 2 SMS Application Web Service• 38 SMS Distribution Point• 1 SMS Fallback Status Point• 3 SMS Management Point• 3 SMS Software Update Point• 2 SQL servers with Cluster
London, UK (LDN)• 12 Distribution Point1
SMS• 1 Management PointMunich, Germany (MCH)• 7 Distribution Point1
SMS• 1 Management PointReading, UK (TVP)• 24 Distribution Point1
SMS• 1 Management PointVedbaek, Denmark (VED)• 7 Distribution Point1
SMS• 1 Management Point
MSIT FUTURE STATE = AZURE
MSIT AZURE All roles in Azure Replace secondaries with on-premise MP/DP Heavy use of peer cache Reduce server footprint from 363 to 107 SQL Server Always On Availability Groups Cloud Management Gateway Cloud DP
MIGRATING SCCM SQL SERVER TO AZURE Azure Considerations Setting up an Availability Group on Azure Efficiently moving the ConfigMgr DB to an Availability Group Lessons Learned
AZURE CONSIDERATIONS Network
ExpressRoute & VPN Gateway
VM Size Disk Size & Configuration VM and Disk Limitations
Throttling
Availability Sets (NOT Availability Group!)
https://docs.microsoft.com/en-us/sccm/core/understand/configuration-manager-on-azure
AZURE VM SIZE Premium Storage compatible
DS, DSv2, GS, and Fs series
https://docs.microsoft.com/en-us/azure/virtual-machines/virtual-machines-windows-sizes-memory
DISK SIZE & CONFIGURATION Premium Storage!
Use managed disks to be aligned with the VM’s fault domains* Throughput and IOPS vs Size Considerations
Stripe drives for more IOPS/throughput
Did you see our session on Configuring SQL for optimal performance? Separate data files from log files Separate TempDB files Format with 64KB allocation unit size
https://docs.microsoft.com/en-us/azure/storage/storage-premium-storage
Disk Max Size (GB)P10 128P20 512P30 1024
LIMITATIONS & THROTTLING
Striping premium disks increases the limitations IOPS or MB/s * Number of Disks Striped = New limitation (IOPS or MB/s) / 20 * Number of Disks Striped = “real” throttling points (50 ms intervals)
IOPS MB/s IO/50ms MB/50msP10 500 100 25 5P20 2,300 150 115 7.5P30 5,000 200 250 10
RECOMMENDED AZURE SIZE
AVAILABILITY SETS Planned vs Unplanned maintenance events OS or Application specific failures vs hardware/network/power failures Update Domain
Only one update domain is rebooted at a time
Fault Domain Groups of VMs that share a common power source and network switch
Each VM in an availability set is assigned an update domain and a fault domain VMs are separate from each other up to 5 VMs
Managed disks are aligned with the VM’s fault domains
So what? Ensure uptime during a planned or unplanned maintenance event! Use separate storage accounts for each VM Managed Disks! Use with a SQL Availability Group!
https://docs.microsoft.com/en-us/azure/virtual-machines/windows/manage-availability
AZURE VM PLACEHOLDER
Premium Disk Throughput limit
(MBps)1xP20 1501xP30 200
https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
VM size
Azure VM Limit (MBps)
Disks
PotentialMBps Actual MBps
DS_V2 13 250 2xP
30 400 250 (VM limit)
DS_V2 13 250 3xP
30 600 250 (VM limit)
DS_V2 14 512 2xP
30 512 400 (Disk limit)
DS_V2 14 512 3xP
30 600 512 (VM limit)
VM Size Throughput limit
(MBps)
IOPS at 4K Serial
Read
IOPS at 64k Random
Write
DS12_v2 128 12800 3984
DS13_v2 256 25600 6182
DS14_v2 512 50000 7963
AZURE VM PLACEHOLDER
https://docs.microsoft.com/en-us/azure/azure-subscription-service-limits
AN AVAILABILITY GROUP IN AZURE Create a Resource Group Create an Availability Set Create/Configure VMs Create Failover Cluster Install/Configure SQL
Make sure certain configurations are set! Check proper perms – just as in any recovery
Create Availability Group Manual vs Auto Failover
Create/Configure Internal Load Balancer (ILB) Create Configure Client Access Point/Listener
*Azure Template to do this now?
EFFICIENTLY MOVE THE CM DB TO AN AGChecklist – Pre-Downtime Activities Create the Availability Group using a “dummy” database On the current CM DB server install SQL backup jobs - full and log backups Change the CM DB to FULL recovery mode Take a Full database backup (then disable this job for now) Take a Log backup (and ensure the job is running every couple of hours) Restore the first full and log backups to both of the nodes in the availability group
Don’t forget to add “WITH NORECOVERY”!
Continue to backup/restore logs until ready for downtime
EFFICIENTLY MOVE THE CM DB TO AN AGChecklist – Go Time! Disable the log backup job Turn off sms services Wait 10-15 minutes Take one last log backup and restore this on both the nodes as previously done
At this point both nodes will have the latest copy of the CM database in a status of "Recovering"
On one node (and one node only!) run the following (update as appropriate)
Add the database to the Availability Group JOIN ONLY!
EFFICIENTLY MOVE THE CM DB TO AN AGChecklist – Go Time! Run the following script on the primary node
Failover to the other node (via SSMS) Run the previous script on the current primary node (previously the secondary) Perform a DB Move via Configuration Manager Recovery
LESSONS LEARNED Monitor for suspended databases!
Disk Latency – throttling CPU/Memory good Azure connections are very fast! Replication (DRS) on availability group failover – possible loss of messages
Hman should cleanly reset the DRS queues to account for this (in 1702+)
If DB Move fails…try adding port 445 to the ILB for the move (in 1702 TP?)
MSIT Azure Availability Group Demo
SECTION HEADERThis is the next section
TITLE Line1
Bullet Level 1 Bullet Level 2
Bullet Level 3
TITLECode
Text Only with Border Level 1
Level 2 Level 3
Text Only (Red) Level 1
Level 2 Level 3
TITLE Text 1
Level 1 Level 2
Level 3
Text 2 Level 1
Level 2 Level 3
Section 1 Text
Level 1 Level 2
Level 3
Section 2 Text
Level 1 Level 2
Level 3
TITLE
Demo Title