plongée profonde dans les technos de haute disponibilité d’exchange 2010 par un gourou exchange
DESCRIPTION
Vous aussi, devenez incollable sur la Haute Dispo d’Exchange ! Session technique, en Anglais, faite par le gourou des technos de haute disponibilité d’Exchange : Scott Schnoll. Scott est speaker aux TechReady et TechEd de Microsoft, a écrit de nombreux livres de référence, et il sera présent en exclusivité pour animer cette session. Parmi les thèmes abordés : Comment séparer mon flux de réplication des logs de mon flux client ? quand un DAG (Database Availability Group) tombe, comment le système choisit-il la bonne copie de la base de données à répliquer ? Allez au-delà des fonctions de base de la haute disponibilité et apprenez ce qui se passe réellement dans les arcanes d’un DAG Exchange. Cette session couvre le fonctionnement interne des DAGs, nous discuterons des réseaux de DAGs, d’Active Manager, de comment le système permet la sélection des meilleures réplications de bases et du Datacenter Activation Coordination Mode.TRANSCRIPT
palais des congrès Paris
7, 8 et 9 février 2012
08-févr-12Scott SchnollPrincipal Technical WriterMicrosoft Corporation
Exchange Server 2010High Availability Deep Dive
MSG306
En Anglais!
Exchange Server 2010 High Availability Deep Dive Quorum Witness, Witness Server, and Alternate Witness
Server Database Availability Group Networks Active Manager Best Copy Selection Datacenter Activation Coordination Mode
Agenda
Exchange Server 2010 High AvailabilityConcept: Quorum
Used to ensure that only one subset of members is functioning at one timeRequires a majority of members to be active and have communications with each otherRepresents a shared view of members (voters and some resources)Dual Usage
Data shared between the voters representing configuration, etc. Number of voters required for the solution to stay running (majority);
quorum is a consensus of voters When a majority of voters can communicate with each other, the cluster has
quorum When a majority of voters cannot communicate with each other, the cluster
does not have quorum
Quorum
Quorum is necessary for cluster functions and for DAG functions
The DAG must have quorum in order to mount and activate databases
Exchange 2010 uses only two of the four cluster quorum models
Node Majority (DAGs with an odd number of members) Node and File Share Majority (DAGs with an even number of
members)
Quorum = (V/2) + 1 (whole numbers only) 6 members: (6/2) + 1 = 4 votes for quorum (can lose 3 voters) 9 members: (9/2) + 1 = 5 votes for quorum (can lose 4 voters) 13 members: (13/2) + 1 = 7 votes for quorum (can lose 6
voters) 15 members: (15/2) + 1 = 8 votes for quorum (can lose 7
voters)
Quorum
Exchange Server 2010 High AvailabilityConcept: Witness, Witness Server and Alternate Witness Server
A witness is a share on a server that is external to the DAG that participates in quorum by providing a weighted vote for the DAG member that has a lock on the witness.log file Configured for all DAGs Used only by DAGs that have an even number
of membersWitness server does not maintain a copy of quorum data, does not vote, and is not a member of the DAG or cluster
Witness and Witness Server
XXXV
XX
Witness server used by a DAG after a datacenter switchoverDAG is configured to use alternate witness server when you run Restore-DatabaseAvailabilityGroup or ahead of time by using Set-DatabaseAvailabilityGroupDAGs do not dynamically switch witness servers Alternate witness server does not provide
redundancy for witness server or FSW resource
Alternate Witness Server
Exchange Server 2010 High AvailabilityDeep Dive: Database Availability Group Networks
A DAG network is a collection of one or more subnetsThere are two types of DAG networks
MAPI Network - connects DAG members to network resources (Active Directory, other Exchange servers, DNS, etc.)
Registered in DNS / DNS configured Uses default gateway Client for Microsoft Networks/File and Print Sharing enabled
Replication Network - used for/by continuous replication (log shipping and seeding)
Not registered in DNS / DNS not configured Typically no default gateway Client for Microsoft Networks/File and Print Sharing disabled
DAG Networks
Maximum round trip return latency between all DAG members must be 500 ms or less Regardless of the latency of the solution, customers
should validate that the network between all DAG members is capable of satisfying the data protection and availability goals of the deployment
May need to investigate increasing the number of databases or decreasing the number of mailboxes per database to achieve desired goals
DAG Networks
All DAGs must have: Exactly one MAPI network Zero or more Replication networks
Separate network(s) on separate subnet(s) LRU determines which replication network is used with
multiple replication networksDAG networks automatically created when server is added to DAG Based on cluster’s enumeration of networks
Cluster enumeration based on subnet One cluster network is created for each subnet
DAG Networks
Server / Network IP Address / Subnet Bits Default Gateway
EX1 – MAPI 192.168.0.15/24 192.168.0.1
EX1 – REPLICATION 10.0.0.15/24 N / A
EX2 – MAPI 192.168.0.16/24 192.168.0.1
EX2 – REPLICATION 10.0.0.16/24 N / A
DAG Networks
Name Subnet(s) Interface(s) MAPI Access Enabled Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15)EX2 (192.168.0.16)
True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15)EX2 (10.0.0.16)
False True
DAG Networks
Name Subnet(s) Interface(s) MAPI Access Enabled Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Server / Network IP Address / Subnet Bits Default Gateway
EX1 – MAPI 192.168.0.15/24 192.168.0.1
EX1 – REPLICATION 10.0.0.15/24 N / A
EX2 – MAPI 192.168.1.15/24 192.168.1.1
EX2 – REPLICATION 10.0.1.15/24 N / A
Collapse subnets into two DAG networks and disable replication for the MAPI network:
DAG Networks
Name Subnet(s) Interface(s) MAPI Access Enabled Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Collapse subnets into two DAG networks and disable replication for the MAPI network:
DAG Networks
Name Subnet(s) Interface(s) MAPI Access Enabled Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/2410.0.1.0/24
EX1 (10.0.0.15)EX2 (10.0.1.15)
False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Exchange Server 2010 High AvailabilityDeep Dive: Active Manager
Exchange component that manages high availability platform Runs inside the Microsoft Exchange Replication
service on every Mailbox server Is the definitive source of information on where a
database is active Stores this information in cluster database Provides this information to Active Manager client
running on other server roles (Client Access and Hub Transport)
Active Manager
Standalone Active ManagerPrimary Active Manager (PAM)Standby Active Manager (SAM)Active Manager Client Runs in RPC Client Access service on
CAS and Transport service on Hub
Active Manager Roles
Primary Active Manager (PAM) Runs on the node that owns the cluster core
resources (cluster group) Gets topology change notifications Reacts to server failures Selects the best database copy on failovers and
targetless switchovers Detects failures of local Information Store and local
databases
Active Manager
Standby Active Manager (SAM) Runs on every other node in the DAG Detects failures of local Information Store and local
databases Reacts to failures by asking PAM to initiate a failover
Responds to queries from CAS/Hub about which server hosts the active copy
Both roles are necessary for automatic recovery If the Microsoft Exchange Replication service is stopped,
automatic recovery will not happen
Active Manager
Mount and Dismount DatabasesProvide Database Availability InformationProvide Interface for Administrative TasksMaintains Database and Server State InformationMonitor for Failures and Initiate Recovery
Active Manager Functionality
Exchange Server 2010 High AvailabilityDeep Dive: Best Copy Selection
Process of finding the best copy of an individual database to activate, given a list potential copies for activation and their statusActive Manager selects the “best” copy to become the new active copy when the existing active copy fails or when an administrator performs a targetless switchover
Best Copy Selection
Sorts copies by copy queue length to minimize data loss, using activation preference as a secondary sorting key if necessarySelects from sorted listed based on which set of criteria met by each copyAttempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection – RTM
Sorts copies by activation preference when auto database mount dial is set to Lossless Otherwise, sorts copies based on copy queue
length, with activation preference used a secondary sorting key if necessary
Selects from sorted listed based on which set of criteria met by each copyAttempt Copy Last Logs (ACLL) runs and attempts to copy missing log files from previous active copy
Best Copy Selection – SP1 and later
Is database mountable? Is copy queue length <=
AutoDatabaseMountDial? If Yes, database is marked as current active
and mount request is issued If not, next best database tried (if one is
available)During best copy selection, any servers that are unreachable or “activation blocked” are ignored
Best Copy Selection
Criteria Copy Queue Length Replay Queue Length Content Index Status1 < 10 logs < 50 logs Healthy2 < 10 logs < 50 logs Crawling3 N / A < 50 logs Healthy4 N / A < 50 logs Crawling5 N / A < 50 logs N / A6 < 10 logs N / A Healthy7 < 10 logs N / A Crawling8 N / A N / A Healthy9 N / A N / A Crawling
10 Any database copy with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource
Best Copy Selection
Four copies of DB1DB1 currently active on Server1
Best Copy Selection – RTM
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
DB1
Server1 Server2 Server3 Server4
DB1 DB1 DB1X
Sort list of available copies based by Copy Queue Length (using AP as secondary sort key if necessary): Server3\DB1 Server2\DB1 Server4\DB1
Best Copy Selection – RTM
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy): Server3\DB1 Server2\DB1 Server4\DB1
Best Copy Selection – RTM
Lowest copy queue length – tried first
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Four copies of DB1DB1 currently active on Server1Auto database mountdial set to Lossless
Best Copy Selection – SP1 and later
DB1
Server1 Server2 Server3 Server4
DB1 DB1 DB1XDatabase Copy Activation
PreferenceCopy Queue
LengthReplay Queue
LengthCI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Sort list of available copies based by Activation Preference: Server2\DB1 Server3\DB1 Server4\DB1
Best Copy Selection – SP1 and later
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Sort list of available copies based by Activation Preference: Server2\DB1 Server3\DB1 Server4\DB1
Best Copy Selection – SP1 and later
Lowest preference value – tried first
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
After Active Manager determines the best copy to activate The Replication service on the target server attempts to
copy missing log files from the source (ACLL) If successful, then the database will mount with zero
data loss If unsuccessful (lossy failure), then the database will
mount based on the AutoDatabaseMountDial setting If data loss is outside of dial setting, next copy will be
tried
Best Copy Selection
After Active Manager determines the best copy to activate The mounted database will generate new log
files (using the same log generation sequence) Transport Dumpster requests will be initiated for
the mounted database to recover lost messages When original server or database recovers, it
will run through divergence detection and either perform an incremental resync or require a full reseed
Best Copy Selection
Exchange Server 2010 High AvailabilityDeep Dive: Datacenter Activation Coordination Mode
Datacenter Activation Coordination (DAC) mode is a property setting of a DAGActs as an application-level form of quorum Designed to prevent multiple copies of same database
mounting on different members due to loss of networkAlso enables use of Site Resilience cmdlets
Stop-DatabaseAvailabilityGroup Restore-DatabaseAvailabilityGroup Start-DatabaseAvailabilityGroup
DAC Mode
Exchange 2010 RTM DAC Mode is only for DAGs with three or more
members that are extended to two Active Directory sites
Exchange 2010 SP1 and later DAC Mode can (and should) be enabled for all
DAGs
DAC Mode
Uses Datacenter Activation Coordination Protocol (DACP), which is a bit in memory set to either: 0 = can’t mount 1 = can mount
DAC Mode
Active Manager startup sequence DACP is set to 0 DAG member communicates with other DAG members it
can reach to determine the current value for their DACP bits If the starting DAG member can communicate with all
other members, DACP bit switches to 1 If other DACP bits are set to 0, starting DAG member
DACP bit remains at 0 If another DACP bit is set to 1, starting DAG member
DACP bit switches to 1
DAC Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
DAC Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
AWS
DAC Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX-B
CAS-Pri
MBX-D
CAS-Sec HT2010
MBX-CMBX-A
HT2010
DAG1
Outlook Outlook
DAG1FSW
Active Active
AWS
DAC Mode
0 0 1 1
Thank you for attending!Contact me at any time with questions: [email protected] Twitter: @schnoll Blog: http://blogs.technet.com/scottschnoll
Questions?
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a
commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.