exchange server 2010 sp2 high availability deep dive scott schnoll principal technical writer...
TRANSCRIPT
Exchange Server 2010 SP2 High Availability Deep DiveScott SchnollPrincipal Technical WriterMicrosoft Corporation
EXL401
Agenda
Recent Behavior Changes – SP2 + UR3Database Availability Group NetworksActive ManagerBest Copy SelectionDatacenter Activation Coordination Mode
Deep Dive
Code changes in Service Pack 2 (SP2) and Update Rollup 3 for SP2
Recent Behavior Changes
OWA Cross-Site Silent Redirection with SSO
Introduced in Service Pack 2If you access OWA via CAS in the ‘wrong’ AD site, CAS has a decision to make: it can proxy or redirect to the target site
If there is no ExternalURL in that site, we proxy, the mailbox opens and the user gets accessIf the target site has an ExternalURL the user gets a page with a link to click
The user clicks the link, and logs in again, and gets accessThe user has to log in twice
We removed the need to click the link, which in some scenarios, results in a Single Sign On experience
OWA Cross-Site Silent Redirection with SSO
Documentationhttp://aka.ms/ljimzwhttp://aka.ms/wjysst
Videohttp://aka.ms/qjtvmq
Changes to Set-DatabaseAvailabilityGroup
Introduced in Update Rollup 3 for Exchange 2010 SP2Enables use of AllowCrossSiteRPCClientAccess property of the DAG
True – When database *overs from DC1 to DC2, Outlook will continue to use the CAS array in DC1 as the RPC endpointFalse (default) – When a database *overs from DC1 to DC2, the Outlook profile will be updated to use the CAS array in DC2 as the RPC endpoint
This will require a restart of the Outlook client
AlternateWitnessServer and AlternateWitnessDirectory can now be set to $null
Changes to Set-DatabaseAvailabilityGroup
Documentationhttp://aka.ms/qjtvmqhttp://technet.microsoft.com/en-us/library/dd297934.aspx*
*To be published very soon!
Changes to Active Manager
Introduced in Update Rollup 3 for Exchange 2010 SP2Mailbox database move using dial tone portability can take an extremely long time to fail if the source or destination mailbox server is downGet-Mailbox -Database DB1 | Set-Mailbox -Database DTDB1
RPC attempts to purge remote mailbox objects, but it can’t when one side is down, and it will eventually timeout
Could take 40-60 second per mailbox!We no longer need to wait that long, as we now handle the down server in a different manner
High Availability Concepts
http://aka.ms/ExHAConcepts
Deep Dive
Database Availability Group Networks
Database Availability Group Networks
A DAG network is a collection of one or more subnetsTwo types of DAG networks
MAPI Network - connects DAG members to Active Directory, other Exchange servers, DNS, etc.; also used by content indexing
Registered in DNS / DNS configuredUses default gatewayClient for Microsoft Networks/File and Print Sharing enabled
Replication Network - used for continuous replicationNot registered in DNS / DNS not configuredDoes not use a default gatewayClient for Microsoft Networks/File and Print Sharing disabled
Database Availability Group Networks
All DAGs must have:Exactly one MAPI networkZero or more Replication networksSeparate network(s) on separate subnet(s)LRU determines which network is used in multiple replication network environment
Automatically created when server is added to DAGBased on cluster’s enumeration of networks
Cluster enumeration based on subnetOne cluster network is created for each subnet
Database Availability Group Networks
Server / Network IP Address / Subnet Bits Default Gateway
EX1 – MAPI 192.168.0.15/24 192.168.0.1
EX1 – REPLICATION 10.0.0.15/24 N / A
EX2 – MAPI 192.168.0.16/24 192.168.0.1
EX2 – REPLICATION 10.0.0.16/24 N / A
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15)EX2 (192.168.0.16)
True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15)EX2 (10.0.0.16)
False True
Database Availability Group Networks
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Server / Network IP Address / Subnet Bits Default Gateway
EX1 – MAPI 192.168.0.15/24 192.168.0.1
EX1 – REPLICATION 10.0.0.15/24 N / A
EX2 – MAPI 192.168.1.15/24 192.168.1.1
EX2 – REPLICATION 10.0.1.15/24 N / A
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24 EX1 (192.168.0.15) True True
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03 192.168.1.0/24 EX2 (192.168.1.15) True True
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$false
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/24 EX1 (10.0.0.15) False True
DAGNetwork03
DAGNetwork04 10.0.1.0/24 EX2 (10.0.1.15) False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/2410.0.1.0/24
EX1 (10.0.0.15)EX2 (10.0.1.15)
False True
DAGNetwork03
DAGNetwork04
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/2410.0.1.0/24
EX1 (10.0.0.15)EX2 (10.0.1.15)
False True
DAGNetwork03
DAGNetwork04
Database Availability Group Networks
Collapse subnets into two DAG networks and disable replication for the MAPI network:
Name Subnet(s) Interface(s) MAPI Access Enabled
Replication Enabled
DAGNetwork01 192.168.0.0/24192.168.1.0/24
EX1 (192.168.0.15)EX2 (192.168.1.15)
True False
DAGNetwork02 10.0.0.0/2410.0.1.0/24
EX1 (10.0.0.15)EX2 (10.0.1.15)
False True
Set-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork01 -Subnets 192.168.0.0,192.168.1.0 -ReplicationEnabled:$falseSet-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork02 -Subnets 10.0.0.0,10.0.1.0Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork03Remove-DatabaseAvailabilityGroupNetwork -Identity DAG2\DAGNetwork04
Database Availability Group Networks
When using a single NICIt is both the MAPI and the Replication network
EnableReplication is $True
When using multiple NICsOne NIC is the MAPI network (typically)
EnableReplication is $False
Other NIC(s) are Replication network(s)Replication uses LRU to pick network to useIf Replication networks are unavailable, MAPI network is used
Database Availability Group Networks
DAG/cluster should ignore iSCSI or dedicated backup networks
Set-DatabaseAvailabilityGroupNetwork-Identity <DAG Network Name>-ReplicationEnabled:$false -IgnoreNetwork:$true
Database Availability Group Networks
Install hotfix from KB 2469100Fixes issue with manually added route table entries disappearing
Block cross-network communication
Blocked
Allowed
Subnet 3
Subnet 4
Subnet 2
Subnet 1
M M M M
R R R R
Deep Dive
Active Manager
Active Manager
Internal Exchange component that manages high availability platformRuns inside the Microsoft Exchange Replication service on every Mailbox serverIs the definitive source of information on where a database is active
Stores this information in cluster databaseProvides this information to Active Manager client running on other server roles (Client Access and Hub Transport)
Active Manager
Standalone Active ManagerPrimary Active Manager (PAM)Standby Active Manager (SAM)Active Manager Client
Runs in RPC Client Access service on CAS and Transport service on Hub
Active Manager
Primary Active Manager (PAM)Runs on the node that owns the cluster core resources (cluster group)Gets topology change notificationsReacts to server failuresSelects the best database copy on failovers and targetless switchoversDetects failures of local Information Store and local databases
Active Manager
Standby Active Manager (SAM)Runs on every other node in the DAGDetects failures of local Information Store and local databases
Reacts to failures by asking PAM to initiate a failover
Responds to queries from CAS/Hub about which server hosts the active copy
Both roles are necessary for automatic recoveryIf the Microsoft Exchange Replication service is stopped, automatic recovery will not happen
Active Manager
Which DAG member is the current PAM?Get-DatabaseAvailabilityGroup DAG1 | fl PrimaryActiveManager
How can I move the PAM role?Move-ClusterGroup “Cluster Group” -Node MBX2
or
Cluster group “cluster group” /move
Active Manager
Transition of Active Manager role state logged into Microsoft-Exchange-HighAvailability/Operational event log Crimson Channel
Active Manager Functionality
Mount and Dismount DatabasesProvide Database Availability InformationProvide Interface for Administrative TasksMaintains Database and Server State InformationMonitor for Failures and Initiate Recovery
Database *over
Database *oversSwitchover - An administrator action invoked by a taskFailover - Automatic operation initiated by the PAM
Begins with a Dismount operation and ends with a Mount operation
Mount / Dismount Database
Mount DatabaseAn admin action invoked through a taskThe last part of a database *over
Dismount DatabaseAn admin action invoked through a taskThe first part of a database switchover
AutoDismount
Occurs when a DAG loses quorumAll DAG members are running (but may not be participating in the cluster)Databases dismounted as quickly as possible by terminating the Information Store serviceThe only exception to the “SAM can take no action” rule
Crimson Channel
Present on all Exchange 2010 Mailbox serversApplications and Services Logs\Microsoft\Exchange
HighAvailabilityBlockReplicationDebugOperationalTruncationDebug
MailboxDatabaseFailureItemsDebugOperational
Applications and Services Logs\Microsoft\WindowsFailoverClustering
HighAvailability
Events for startup/shutdown of MSExchangeRepl.exe, and it’s components:
Active ManagerThird-Party Replication APITasks RPC serverTCP listenerVSS writer
Used by Active Manager for events related to role monitoring, database mount operations, log truncation, and cluster-related events
MailboxDatabaseFailureItems
Used to log events associated with any failures that affect a replicated mailbox database
Deep Dive
Best Copy Selection
Best Copy Selection
Process of finding the best copy of an individual database to activate, given a list potential copies for activation and their statusActive Manager selects the “best” copy to become the new active copy when the existing active copy fails or when an administrator performs a targetless switchoverDuring best copy selection, any servers that are unreachable or activation-blocked are ignored
Best Copy Selection First Three Steps – RTM1. Sort copies by copy queue length to minimize data
loss, using activation preference as a secondary sorting key if necessary
2. Select “best” copy from sorted listed based on which set of criteria met by each copy
3. Run Attempt Copy Last Logs (ACLL) and try to copy any missing log files from previous active copy
Best Copy Selection First Three Steps – SP1+1. Sort copies by activation preference when auto
database mount dial is set to Lossless (otherwise, sort copies based on RTM behavior)
2. Select “best” copy from sorted listed based on which set of criteria met by each copy
3. Run Attempt Copy Last Logs (ACLL) and try to copy any missing log files from previous active copy
Best Copy Selection Last Step – RTM+
4. Is database mountable?Is copy queue length <= AutoDatabaseMountDial?
If Yes, database is marked as current active and mount request is issued by Active Manager to the Information StoreIf not, next database in sorted list is tried, if one is available. If one is not available, an administrator must manually resolve the problem
Best Copy Selection – Selection Criteria
Criteria Copy Queue Length Replay Queue Length Content Index Status
1 < 10 logs < 50 logs Healthy
2 < 10 logs < 50 logs Crawling
3 N / A < 50 logs Healthy
4 N / A < 50 logs Crawling
5 N / A < 50 logs N / A
6 < 10 logs N / A Healthy
7 < 10 logs N / A Crawling
8 N / A N / A Healthy
9 N / A N / A Crawling
10 Any database copy with a status of Healthy, DisconnectedAndHealthy, DisconnectedAndResynchronizing, or SeedingSource
Example: Best Copy Selection – RTM
Four copies of DB1DB1 active on Server1
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
DB1
Server1 Server2 Server3 Server4
DB1 DB1 DB1X
Example: Best Copy Selection – RTM
Sort list of available copies based by Copy Queue Length (using AP if necessary):
Server3\DB1Server2\DB1Server4\DB1
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Example: Best Copy Selection – RTM
Only two copies meet first set of criteria for activation (CQL< 10; RQL< 50; CI=Healthy):
Server3\DB1Server2\DB1Server4\DB1
Lowest copy queue length – tried first
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Example: Best Copy Selection – SP1+
Four copies of DB1DB1 active on Server1Auto database mountdial set to Lossless
DB1
Server1 Server2 Server3 Server4
DB1 DB1 DB1XDatabase Copy Activation
PreferenceCopy Queue
LengthReplay Queue
LengthCI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Example: Best Copy Selection – SP1+
Sort list of available copies based by Activation Preference:
Server2\DB1Server3\DB1Server4\DB1
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Example: Best Copy Selection – SP1+
Sort list of available copies based by Activation Preference:
Server2\DB1Server3\DB1Server4\DB1
Lowest preference value – tried first
Database Copy Activation Preference
Copy Queue Length
Replay Queue Length
CI State Database State
Server2\DB1 2 4 0 Healthy Healthy
Server3\DB1 3 2 2 Healthy DiscAndHealthy
Server4\DB1 4 10 0 Crawling Healthy
Best Copy Selection – Post-Activation Events
Transport Dumpster requests will be initiated for the mailbox database to recover any lost messagesThe new active and mounted mailbox database will generate new log files using the same log generation sequenceWhen the previous active copy recovers, it will run through divergence detection and either perform an incremental resynchronization or require an administrator to reseed the database copy
Deep Dive
The MommyMayIMount Bit
Datacenter Activation Coordination Mode
Datacenter Activation Coordination Mode
DAC mode is a property of a DAGActs as an application-level form of quorumDesigned to prevent multiple copies of same database from mounting on multiple membersEnables the use of Exchange cmdlets for datacenter switchovers of DAGs
Stop-DatabaseAvailabilityGroupRestore-DatabaseAvailabilityGroupStart-DatabaseAvailabilityGroup
Datacenter Activation Coordination Mode
Uses Datacenter Activation Coordination Protocol (DACP) also known as the MommyMayIMount) bitDACP is an in-memory bit in the Exchange Replication serviceDACP has two possible values:
0 = cannot automatically mount databases on startup1 = can automatically mount databases on startup, provided Automount consensus is True
Datacenter Activation Coordination Mode
Microsoft Exchange Replication service startup sequence
Active Manager initializesDACP bit is set to 0DAG member communicates with other DAG members
If the starting DAG member can communicate with all members on the StartedMailboxServers list, starting DAG member DACP bit is set to 1If starting DAG member can communicate with another DAG member that has a DACP bit set to 1, starting DAG member DACP bit is set to 1If starting DAG member can communicate only with members that have a DACP bit set to 0, starting DAG member DACP bit remains at 0
Prim
ary D
atace
nter
Secondary Datacenter
MBX2 MBX4MBX3MBX1
DAG1
DAG1FSW
Datacenter Activation Coordination Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX2 MBX4MBX3MBX1
DAG1
DAG1FSW
DAG1AWS
Datacenter Activation Coordination Mode
Prim
ary D
atace
nter
Secondary Datacenter
MBX2 MBX4MBX3MBX1
DAG1
DAG1FSW
DAG1 AWS
Datacenter Activation Coordination Mode
0 0 1 1
Datacenter Activation Coordination Mode
Side effects of enabling DAC modeWhen a DAG in DAC mode is started after a complete shutdown, databases will not be mountable until all DAG members are up, running, and in communication with each otherWhen performing a datacenter switchover where only a single node remains in the cluster supporting the DAG, any reboot that changes both the boot time of the witness server and the boot time of the DAG member will prevent databases from mounting automatically
If the reboots were necessary and valid operations, administrators can force the databases online without causing split brain
Related Content
EXL308 - Real World High Availability and Site Resilient Design
EXL307 - Using a Load Balancer in Your Exchange Server 2010 Environment
EXL316 - Microsoft Lync 2010: Availability, Resiliency, and Recovery
EXL203 - How to Tell Your Manager You Need Quotas on Your Mailboxes
Find Me Later Today At the Exchange Booth from 12:30-1:30
Track Resources
Lync Team Blog: http://blogs.technet.com/b/uc/
Lync Facebook: http://www.facebook.com/MicrosoftOfficeCommunicator
Lync Website: http://lync.microsoft.com/en-us/Pages/unified-communications.aspx
Lync Server Blog: http://blogs.technet.com/b/nexthop/
Resources
Connect. Share. Discuss.
http://northamerica.msteched.com
Learning
Microsoft Certification & Training Resources
www.microsoft.com/learning
TechNet
Resources for IT Professionals
http://microsoft.com/technet
Resources for Developers
http://microsoft.com/msdn
Complete an evaluation on CommNet and enter to win!
MS Tag
Scan the Tagto evaluate thissession now onmyTechEd Mobile
© 2012 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries.The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to
be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS
PRESENTATION.