active directory replication troubleshooter

45
Active Directory Replication Troubleshooter Directory Replication Welcome to the Directory Replication component troubleshooter. This troubleshooter breaks down the steps required to troubleshoot the majority of the directory replication related issues. It is not all encompassing, but provides a breakdown of the components required for replication to work properly. These links lead to the component troubleshooting steps for each of the dependencies of Active Directory replication. These troubleshooting steps are in a specific order based on the component dependencies. To begin troubleshooting, click the first link below. Replication topologyName ResolutionRPC InterfaceKerberos securityJET database Overview This diagram outlines the steps necessary for Active Directory replication to occur. Portions of this diagram will be referenced in each step of the dependency troubleshooter. Replication Dependencies Active Directory replication relies on several core operating system components that must be functioning before replication will succeed. The most common causes for replication failures are listed below. Reference Windows 2003 Help and Support - How replication works Windows 2003 Technical Reference - Active Directory replication Directory Replication: Data Collection There are many data points involved in directory replication. However, most of the replication issues involve relatively simple troubleshooting steps. MPS Reports gathers a large amount of data, and frequently the data can be used to

Upload: ontherig

Post on 12-Apr-2015

117 views

Category:

Documents


11 download

TRANSCRIPT

Page 1: Active Directory Replication Troubleshooter

Active Directory Replication Troubleshooter

Directory Replication Welcome to the Directory Replication component troubleshooter. This troubleshooter breaks down the steps required to troubleshoot the majority of the directory replication related issues. It is not all encompassing, but provides a breakdown of the components required for replication to work properly.

These links lead to the component troubleshooting steps for each of the dependencies of Active Directory replication. These troubleshooting steps are in a specific order based on the component dependencies. To begin troubleshooting, click the first link below.

Replication topologyName ResolutionRPC InterfaceKerberos securityJET database

OverviewThis diagram outlines the steps necessary for Active Directory replication to occur. Portions of this diagram will be referenced in each step of the dependency troubleshooter.

Replication DependenciesActive Directory replication relies on several core operating system components that must be functioning before replication will succeed. The most common causes for replication failures are listed below.

Reference

Windows 2003 Help and Support - How replication works

Windows 2003 Technical Reference - Active Directory replication

Directory Replication: Data CollectionThere are many data points involved in directory replication. However, most of the replication issues involve relatively simple troubleshooting steps. MPS Reports gathers a large amount of data, and frequently the data can be used to resolve the issue. Please gather an MPS Reports if the following troubleshooting steps do not resolve the issue.

MPS ReportsMost of the data required to troubleshoot most directory replication issues can be found in a recent Directory Services MPS Reports. Some additional data collection methods appear throughout this troubleshooter. If MPS Reports data can be used instead of gathering additional data, it will be called out in the troubleshooting step. The Directory Services MPS Reports can be downloaded from the following link.

http://download.microsoft.com/download/b/b/1/bb139fcb-4aac-4fe5-a579-30b0bd915706/MPSRPT_DirSvc.EXE

Page 2: Active Directory Replication Troubleshooter

Next StepsReplication Topology

Replication TopologyRepadmin is a utility that can be used to display information about replication connections to the domain controller on which it is executed. In most cases, replication fails due to a failure with the RPC connection between one domain controller and another. Always rule out DNS as the cause first!

Goals         Determine the failing replication links         Identify the objectGuids associated with the failures

ProcedureThis procedure gathers data that is found in the repadmin.txt file from an Active Directory MPSReports output. If MPSReports has recently run MPSReports, this procedure may be skipped.

Use Repadmin To Determine Replication Topology

1.    Use repadmin /showreps to determine the replication partners and determine any failed replication attempts. In this case, the domain controller called RESKITDC01 has not been able to replicate with a domain controller called RESKITDC02.=====================================C:\>repadmin /showrepsREDMOND\RESKITDC01 DSA Options : (none) objectGuid : a0d6dbaf-4297-47b3-92b8-2d604d290bb5 invocationID: e805158b-f7e1-4d23-a797-5121262c0fa2 ==== INBOUND NEIGHBORS ====================================== CN=Schema,CN=Configuration,DC=RESKIT,DC=com Default-First-Site-Name\RESKITDC02 via RPC objectGuid: 035046f0-5de5-4adb-b1fc-259614a8de64Last attempt @ 2004-05-20 05:58.19 failed, result 8453:     Replication access was denied.  Last success @ 2004-05-19 12:12.01. 14 consecutive failure(s). CN=Configuration,DC=RESKIT,DC=com Default-First-Site-Name\RESKITDC02 via RPC objectGuid: 035046f0-5de5-4adb-b1fc-259614a8de64 Last attempt @ 2004-05-20 05:58.19 failed, result 8453:     Replication access was denied.  Last success @ 2004-05-19 12:12.01. 14 consecutive

Page 3: Active Directory Replication Troubleshooter

failure(s). DC=RESKIT,DC=com Default-First-Site-Name\RESKITDC02 via RPC objectGuid: 035046f0-5de5-4adb-b1fc-259614a8de64 Last attempt @ 2004-05-20 05:58.19 failed, result 8453:     Replication access was denied. Last success @ 2004-05-19 12:12.01. 14 consecutive failure(s).

2.    Locate the objectGuid of the failing replication connection. In the above output, the objectGuid is 035046f0-5de5-4adb-b1fc-259614a8de64. 3.    Note the failing objectGuid and focus on fixing replication between this DC (the RPC client) and the inbound neighbor DC (the RPC server). This GUID will be used to test DNS in the next troubleshooting step.

Next Steps"Name Resolution"

Name ResolutionDNS is the primary name resolution mechanism for the TCP/IP protocol. Because Active Directory replication relies on RPC over TCP/IP, it is very important that name resolution for TCP/IP functions correctly. Basic testing involves using nslookup to query the current DNS server to make sure the requested DNS names are resolved correctly to an IP address. The names requested by the Active Directory RPC endpoint include the host name and objectGuid name of the replication partner.

Procedure

Test DNS configuration

1.    Use the ipconfig utility to dump existing records from the DNS resolver cache.C:\>ipconfig /displaydnsWindows IP Configuration1.0.0.127.in-addr.arpa ----------------------------------------Record Name . . . . . : 1.0.0.127.in-addr.arpa.Record Type . . . . . : 12Time To Live . . . . : 0Data Length . . . . . : 4 Section . . . . . . . : Answer PTR Record . . . . . : localhost

Page 4: Active Directory Replication Troubleshooter

RESKIT-DC2----------------------------------------Record Name . . . . . : RESKIT-DC2.reskit.com Record Type . . . . . : 1 Time To Live . . . . : 1506 Data Length . . . . . : 4 Section . . . . . . . : Answer A (Host) Record . . . : 54.34.192.30

2.    Try to resolve the host name by using DNS. Nslookup bypasses the resolver cache. Therefore, it can be compared to any existing cached entries.

a.    Use the nslookup utility to query DNS by using the UDP protocol.C:\>nslookup reskit-dc2.reskit.com

Server: ns.reskit.com Address: 61.53.4.32

Name: reskit-dc2.reskit.comAddress: 54.34.192.30

b.    If this query fails, try the same query by using the TCP protocol by specifying the -vc switch on the nslookup command line.C:\>nslookup -vc reskit-dc2.reskit.com

Server: ns.reskit.com Address: 61.53.4.32

Name: reskit-dc2.reskit.comAddress: 54.34.192.30

3.    If both query attempts fail, there are several possible problems.         The server name is incorrect.Verify the actual remote server name.

         The domain suffix is incorrect.Verify the remote server is in the domain specified.

         The DNS server being used is no longer a valid DNS server.Try the queries in step 2 by using an alternate DNS server. If directory replication is failing, DNS records may not be up to date. Use nslookup to query an alternate DNS server.

C:\>nslookup reskit-dc2.reskit.com ns2.reskit.com

Server: ns2.reskit.com

Page 5: Active Directory Replication Troubleshooter

Address: 61.53.4.35

Name: reskit-dc2.reskit.comAddress: 54.34.192.30

         Replication of DNS records has not occurred. Therefore, the DNS server cannot resolve the name to an IP address.Use an alternate DNS server. Preferably a DNS server in the same site as the remote server.

         Network problems are causing DNS queries to fail.Remember, DNS queries use either TCP or UDP protocol. Both protocols require network connectivity to function.

4.    Verify the correct IP address of the remote server by using the ipconfig utility on the remote server.5.    If the cached entry is determined to be in error, flush the cached entry on the local server by using the ipconfig /flushdns command. C:>\ipconfig /flushdnsWindows IP ConfigurationSuccessfully flushed the DNS Resolver Cache.

6.    Try to resolve the IP address of the GUID of the failing inbound replication partner by using NSLOOKUP. The format of the CNAME record will be GUID._msdcs.DnsForestName, where GUID is the objectGuid of the failing replication partner and DnsForestName is the fully qualified DNS name of the forest.C:\>NSLOOKUP 035046f0-5de5-4adb-b1fc-259614a8de64._msdcs.reskit.comServer: ns.reskit.comAddress: 65.53.4.32Name: RESKIT-DC2.reskit.comAddress: 54.34.192.30Aliases: 035046f0-5de5-4adb-b1fc-259614a8de64._msdcs.reskit.com

7.    If the NSLOOKUP command successfully resolves the CNAME record and the IP address is the correct IP address for the failing replication partner, then DNS is working well enough for replication purposes between this domain controller and its replication partner.8.    Verify the DNS server that is used to return the query is the expected DNS server. 9.    Verify that the IP Address that is returned by the DNS server is the correct DNS address for the queried names. 10. If the correct IP address is not returned by the DNS server, either the client is querying the wrong server, or the DNS server has records that are out of date.

Page 6: Active Directory Replication Troubleshooter

Next Steps"RPC Troubleshooting"

RPC InterfaceThe endpoint mapper is a database that stores information about each RPC server that is running on a particular computer. The endpoint mapper is an RPC interface that listens on TCP port 135. The Directory Replication Server (DRS) listens on a dynamically assigned TCP port.

Goals         Verify that TCP port 135 is not blocked between domain controllers (DCs).         Verify that the Directory Replication Service on the replication partner is in the listening state.         Verify that the TCP port that is used by the Directory Replication Service on the replication partner DC can be accessed.

Tools

Tool Install point

Portqry.exe http://www.microsoft.com/downloads/details.aspx?familyid=89811747-c74b-4638-a2d5-ac828bdc6983&displaylang=en

Rpcdump.exe http://www.microsoft.com/downloads/details.aspx?FamilyID=9d467a69-57ff-4ae7-96ee-b18c4790cffd&displaylang=en

ProceduresThis procedure verifies both TCP port 135 connectivity and connectivity to the DRS RCP endpoint at the same time by using one command.

Verify port 135 connectivity and RPC endpoint connectivity by using rpcdump.exe

1.    Use rpcdump to query the endpoint mapper database and the DRS endpoint.c:\>rpcdump /s <partner_dc> /v /i > endpoints.txt

2.    Open the endpoints.txt file, and look for the endpoint that uses the ncacn_ip_tcp ProtSeq and has a UUID of e3514235-4b06-11d1-ab04-00c04fc2dcd2. The output should look similar to the following:

Page 7: Active Directory Replication Troubleshooter

ProtSeq:ncacn_ip_tcpEndpoint:1025NetOpt:Annotation:MS NT Directory DRS InterfaceIsListening:YESStringBinding:ncacn_ip_tcp:65.53.63.15[1025]UUID:e3514235-4b06-11d1-ab04-00c04fc2dcd2ComTimeOutValue:RPC_C_BINDING_DEFAULT_TIMEOUTVersMajor 4 VersMinor 0

3.    If the value of IsListening is YES, then both port 135 and the port being used by the DRS interface can be accessed. The port that is used by the DRS interface is indicated by the Endpoint value. In the above case, the TCP port being used by DRS is TCP port 1025. However, port 1025 is not the only port the DRS interface can listen on and check the output to verify the port number.4.    If the value of IsListening is NO, the problem is related to port blocking issues for the Endpoint port. Gather a network monitor trace and consult with a networking support engineer.5.    If the first lines in the endpoints.txt file indicate a failure to query endpoints on the replication partner, suspect an issue with TCP port 135 connectivity between this domain controller and the replication partner. The first few lines of the endpoints.txt file should look similar to the following:Querying Endpoint Mapper Database...

137 registered endpoints found.

6.    If no registered endpoints were found, the problem is connectivity to TCP port 135. Gather a network monitor trace and consult with a networking support engineer.7.    If the endpoints.txt file indicates no problems with port connectivity, proceed to Next Steps.

Next Steps"Replication Security"

SecurityActive Directory replication relies on the Kerberos security package for authentication. There are multiple reasons that security may fail including active directory replication itself.

Goals         Verify time synchronization is within Kerberos time constraints.         Verify the correct SPN registration (e3514235-4b06-11d1-ab04-00c04fc2dcd2/ntdsa_objectGuid/domainname)         Verify passwords are synchronized between replication partners.

Page 8: Active Directory Replication Troubleshooter

ToolsThe setspn utility is available from the Windows Support Tools or the following URL.

http://www.microsoft.com/windows2000/techinfo/reskit/tools/existing/setspn-o.asp

Procedures

To verify the Access this computer from network user right

1.    Check the <computername>_userrights.txt file in the Directory Services MPSReports to confirm which groups are listed. Everyone, Authenticated Users, and Enterprise Domain Controllers must have that user right for successful replication.

Important   There are cases where the Everyone group is removed from Access this computer from the network which is acceptable as long as Authenticated Users and Enterprise Domain Controllers are listed.

To check the time skew between domain controllers

1.    See Knowledge Base article 257187

Verify userAccountControl and the Kerberos trust

1.    Ensure the Kerberos Key Distribution Center (KDC) service is started.2.    Ensure the Trust computer for delegation check box is selected on the General tab of the domain controller Properties dialog box in Active Directory Users and Computers.3.    Using Adsiedit or Ldp (both included in the Windows 2000 Support Tools), confirm that the userAccountControl attribute is set to 532480. To check this, perform the following steps

c.    Type adsiedit.msc from Start, and then click Run.d.    Expand the Domain NC container.e.    Expand the object below, i.e. DC=Contoso, DC=COMf.     Expand OU=Domain Controllersg.    Right-click CN=<domain_controller>, and select Propertiesh.    Under Select a property to view, select userAccountControl and verify the value is 532480

Note   Check this value for each failing DC account on the local copy of AD for every partner DC. For example if DC-A and DC-B are failing replication, check the above on DC-A’s copy of AD and DC-B’s copy of AD.

4.    If the problem exists between domain controllers from different

Page 9: Active Directory Replication Troubleshooter

domains, Verify the trust relationship between the two domainsi.      Open Active Directory Domains and Trusts.j.     Right-click the desired domain and select Properties. k.    Click the Trusts tab.l.      Highlight the domain to verify and click Edit.m.  Click Verify.

Modify KDC Related Parameters

1.    If the problem exists between domain controllers from different domains, add the following registry value to the upstream replication partnerHKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\ParametersValue name: Replicator Allow SPN FallbackValue type: REG_DWORDValue data: 1

2.    Run the following command from the upstream partner:c:\>repadmin /add CN=Configuration,DC=<domain controller>,DC=<com> <root DC name> <fully qualified name of child domain controller>

3.    Remove the Replicator Allow SPN Fallback registry value after testing replication.

Reset Password and Refresh Kerberos Tickets

1.    Copy KLIST.EXE to the problem domain controllers.Note   Perform the following steps on all DC's failing to pull replication (downstream DCs). A DC that is pulling replication should be chosen to keep the KDC running.  If all DCs are failing to pull replication, choose any one DC and do not turn off the KDC there. The PDCe is generally chosen to keep KDC running unless it is the failing DC and another DC is  pulling replication without issue.

2.    Stop the KDC service:c:\>net stop KDC

3.    Purge the user's kerberos tickets using KLIST.c:\>klist purgeall

Or

c:\>klist purge

Page 10: Active Directory Replication Troubleshooter

(enter 'y' to confirm the purging of each ticket)

4.    Start a command prompt as the LocalSystem account using the Scheduler Service.c:\>at system_time /interfactive cmd.exe

(where system_time should be replaced with the current local time plus one minute)

5.    At the scheduled time a new command line window will open using the system account. Purge system tickets from this window via these steps:

Note   The window only opens on the console session. It will not appear if you have a TS session to the DC.

6.    Purge the user's and machine's kerberos tickets using KLIST.c:\>klist purgeall

Or

c:\>klist purge

(enter 'y' to confirm the purging of each ticket)

7.    Reset the secure channel from the domain controllers failing to pull replication to the domain controller which has the KDC service running.c:\>netdom resetpwd /server:PDCe /userd:domain\admin_account /passwordd:*

Type the password associated with the domain user.

Note   The domain controller chosen to have the KDC service running will be referred to as the PDCe from this step forward. However, this machine does not need to be the PDC Emulator.

8.    Access the PDCe by FQDN to force the Problem DC's to request new kerberos ticketsc:\>net use \\MyPDCe.MyDomain.com\IPC$

9.    Force the domain controller to replicate from the PDCe using AD Sites and Services

Note   Only force replication from the PDCe to the problem domain controllers.

Attempting to replicate from the problem domain controllers to the PDCe will fail.

10. Perform the following steps on the PDCe only.11. Open AD Sites and Services.12. Select the PDCe server, and then select NTDS Settings.13. Delete the inbound connection objects from the problem domain controllers

Page 11: Active Directory Replication Troubleshooter

ScreenShotOfNTDSSetttingsObjects#6d606791-d3c4-4566-ad06-6c3c8e6e2bee

14. Start the KCCc:\>repadmin /kcc

15. If the problem domain controllers exist in only one domain with more than two domain controllers, force all computer accounts to be replicated throughout the enterprise. This means all domain controllers must be synchronized with all other copies of their domain. For each computer that is reporting a replication error, use the following command to force that computer to become synchronized. The domain to synchronize must be specified. For more information see KB article 296993.c:\>repadmin /syncall /d /e problem_domain_controller domain_dn

For large environments, remove the /e switch to replicate domain controllers with the same site, or use /sync to target specific domain controllers in remote sites

Check for SPN Registration

1.    Make sure the Service Principal Name (SPN) is registered for each domain controller object on each partner domain controller. For more information see KB article 308111.2.    Review the Registered Service Principal Names section of the Netdiag output on partner domain controllers to ensure that the test passes. Export the SPNs of each domain controller object involved in the replication failure from each partner using the following command:ldifde -f spndump.txt -p base -l servicePrincipalName -d <DN of DC>

3.    If you have not already received and MPSReports, you can gather the required netdiag output using the following command.c:\>netdiag /v > netdiag.txt

4.    Visually compare the SPNs or use the Windiff tool from the Windows 2000 Support Tools to compare the files for differences. Under the Options menu in Windiff, uncheck everything except Show different files, Show left-only lines, and Show right-only lines. After identifying the missing SPNs, edit the good SPN file as follows.Change changetype: add to changetype: modify.

Add replace: servicePrincipalName after the changetype line.

Add "-" to the last line of the file.

5.    Import the correctly registered SPNs on the partner domain

Page 12: Active Directory Replication Troubleshooter

controllers that do not have proper SPNs registered for its replication partner domain controllers.ldifde -I -f goodSPNs.txt

Check Permissions on the Directory Partiitions

1.    Ensure the Enterprise Domain Controllers group has the required permissions on the directory partition’s access control list (ACL).

a.    Start AdsiEdit.b.    Right-click each partition object, and then select Properties.c.    On the View menu, select Advanced Features.d.    Select the Security tab, click Enterprise Domain Controllers in the name list, and then make sure the following permissions are selected under Allow          Manage Replication Topology.         Replicating Directory Changes.         Replication Synchronization

2.    Use Active Directory Sites and Services to make sure the server object and its corresponding NTDS Settings child object exist in the correct site.3.    Verify the following Group Policy security options under Security Settings match on all partner domain controllers.

         Digitally Sign Client Communication (Always).         Digitally Sign Client Communication (When Possible).         Digitally Sign Server Communication (Always).         Digitally Sign Server Communication (When Possible).         LAN Manager Authentication Level.         Crash on Audit Fail.

4.    Check for Kerberos fragmentation by typing ping <destination computer> -f -l 1472. If it fails at 1472, then packets are likely being fragmented. For more information see KB article 244474.

JET DatabaseThere are currently no troubleshooting steps related to the Jet database used to store the Active Directory. The Ntds.dit file is a Jet database. Therefore, typical Jet-related issues still apply. However, the Jet engine has been thoroughly tested and rarely fails. If a Jet error is returned, query the Knowledge Base for existing Jet issues.

http://support.microsoft.com/search/?adv=1

Typical IssuesAntivirus Scanning of the NTDS Folder

Page 13: Active Directory Replication Troubleshooter

Antivirus software can affect the Jet database because it opens the transaction log files that are used by Jet. If the antivirus software keeps a transaction log file open, it may prevent updates to the Active Directory database (Ntds.dit) from occurring in a timely manner. This can cause failures in the Active Directory components. Microsoft recommends that you exclude the NTDS directory from antivirus scans.

Missing Log FilesTo avoid the loss of data caused by a power outage or disk failure, the Jet engine updates the Jet database by first writing to a transaction log and then writing to the database. By first writing to a transaction log, the database is less susceptible to corruption caused by power failures or hardware failures. Because transactions from the log files are written to the database, a checkpoint is created to mark the transactions that have been successfully written. If a power failure occurs, the Active Directory components will start during the restart of the computer. However, any transactions that have not been written to the database will be read from the transaction logs and written to the database. If any log files are missing that have not been checkpointed, the Jet database has no way of updating its database and therefore, will not initialize successfully. Typically this behavior occurs one of the following scenarios is true:

         Deleted log files without realizing the importance of the files.         Restored the database from a backup but did not restore the log files along with the Ntds.dit file.

Insufficient Disk SpaceThe Jet engine relies on first updating a transaction log file before updating the database. If sufficient disk space does not exist to allow updates to the transaction log, updates to the Jet database will fail.

Replication Symptoms For more information about replication errors, click one of the following topics depending on the symptoms that you are experiencing.

Symptoms"Authentication Errors""Global Catalog Errors"

""Replication Engine Errors"

""Disjointed Namespace Issues"

Authentication ErrorsActive Directory replication is a client/server application that uses an RPC mechanism for communication between domain controllers. Authentication and handled by the RPC components and uses the Kerberos security package by default. Most of the authentication errors encountered by the replication engine are a result of issues with the Kerberos protocol.

ReferenceMicrosoft LDAP Error Codes

Microsoft Kerberos Overview

Related SectionsKerberos Troubleshooting

Page 14: Active Directory Replication Troubleshooter

Access is deniedThis issue typically indicates a Kerberos authentication problem, although there are several exceptions. These steps outline how to resolve the authentication failure.

To verify the Access this computer from network user right

5.    Check the <computername>_userrights.txt file in the Directory Services MPSReports to confirm which groups are listed. Everyone, Authenticated Users, and Enterprise Domain Controllers must have that user right for successful replication.

Important   There are cases where the Everyone group is removed from Access this computer from the network which is acceptable as long as Authenticated Users and Enterprise Domain Controllers are listed.

To check the time skew between domain controllers

6.    See Knowledge Base article 257187

To check the CrashOnAuditFail Registry Key

7.    Check the <computername>_regentries.txt file in the Directory Services MPSReports to confirm if crashonauditfail [REG_DWORD] = 0x28.    If CrashOnAuditFail = 0x2 perform the following steps

e.    Type regedit from Start, and then click Run.f.     Expand HKEY_LOCAL_MACHINE\System\CurrentControlSet\Control\LSAg.    Right-click crashonauditfail, select Modifyh.    Under Value data:, select 2 and change the value to 0i.      Reboot domain controller

9.    See Knowledge Base article 823659

Verify userAccountControl and the Kerberos trust

10. Ensure the Kerberos Key Distribution Center (KDC) service is started.11. Ensure the Trust computer for delegation check box is selected on the General tab of the domain controller Properties dialog box in Active Directory Users and Computers.12. Using Adsiedit or Ldp (both included in the Windows 2000 Support Tools), confirm that the userAccountControl attribute is set to 532480. To check this, perform the following steps

j.     Type adsiedit.msc from Start, and then click Run.k.    Expand the Domain NC container.

Page 15: Active Directory Replication Troubleshooter

l.      Expand the object below, i.e. DC=Contoso, DC=COMm.  Expand OU=Domain Controllersn.    Right-click CN=<domain_controller>, and select Propertieso.    Under Select a property to view, select userAccountControl and verify the value is 532480

Note   Check this value for each failing DC account on the local copy of AD for every partner DC. For example if DC-A and DC-B are failing replication, check the above on DC-A’s copy of AD and DC-B’s copy of AD.

13. If the problem exists between domain controllers from different domains, Verify the trust relationship between the two domains

p.    Open Active Directory Domains and Trusts.q.    Right-click the desired domain and select Properties. r.     Click the Trusts tab.s.    Highlight the domain to verify and click Edit.t.     Click Verify.

Modify KDC Related Parameters

14. If the problem exists between domain controllers from different domains, add the following registry value to the upstream replication partnerHKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\ParametersValue name: Replicator Allow SPN FallbackValue type: REG_DWORDValue data: 1

15. Run the following command from the upstream partner:c:\>repadmin /add CN=Configuration,DC=<domain controller>,DC=<com> <root DC name> <fully qualified name of child domain controller>

16. Remove the Replicator Allow SPN Fallback registry value after testing replication.

Reset Password and Refresh Kerberos Tickets

17. Copy KLIST.EXE to the problem domain controllers.Note   Perform the following steps on all DC's failing to pull replication (downstream DCs). A DC that is pulling replication should be chosen to keep the KDC running.  If all DCs are failing to pull replication, choose any one DC and do not turn off the KDC there. The PDCe is generally chosen to keep KDC running unless it is the failing DC and another DC is  pulling replication

Page 16: Active Directory Replication Troubleshooter

without issue.

18. Stop the KDC service:c:\>net stop KDC

19. Purge the user's kerberos tickets using KLIST.c:\>klist purgeall

Or

c:\>klist purge

(enter 'y' to confirm the purging of each ticket)

20. Start a command prompt as the LocalSystem account using the Scheduler Service.c:\>at system_time /interfactive cmd.exe

(where system_time should be replaced with the current local time plus one minute)

21. At the scheduled time a new command line window will open using the system account. Purge system tickets from this window via these steps:

Note   The window only opens on the console session. It will not appear if you have a TS session to the DC.

22. Purge the user's and machine's kerberos tickets using KLIST.c:\>klist purgeall

Or

c:\>klist purge

(enter 'y' to confirm the purging of each ticket)

23. Reset the secure channel from the domain controllers failing to pull replication to the domain controller which has the KDC service running.c:\>netdom resetpwd /server:PDCe /userd:domain\admin_account /passwordd:*

Type the password associated with the domain user.

Note   The domain controller chosen to have the KDC service running will be referred to as the PDCe from this step forward. However, this machine does not need to be the PDC Emulator.

24. Access the PDCe by FQDN to force the Problem DC's to request new kerberos ticketsc:\>net use \\MyPDCe.MyDomain.com\IPC$

25. Force the domain controller to replicate from the PDCe using AD Sites and Services

Page 17: Active Directory Replication Troubleshooter

Note   Only force replication from the PDCe to the problem domain controllers.

Attempting to replicate from the problem domain controllers to the PDCe will fail.

26. Perform the following steps on the PDCe only.27. Open AD Sites and Services.28. Select the PDCe server, and then select NTDS Settings.29. Delete the inbound connection objects from the problem domain controllers

ScreenShotOfNTDSSetttingsObjects#6d606791-d3c4-4566-ad06-6c3c8e6e2bee

30. Start the KCCc:\>repadmin /kcc

31. If the problem domain controllers exist in only one domain with more than two domain controllers, force all computer accounts to be replicated throughout the enterprise. This means all domain controllers must be synchronized with all other copies of their domain. For each computer that is reporting a replication error, use the following command to force that computer to become synchronized. The domain to synchronize must be specified. For more information see KB article 296993.c:\>repadmin /syncall /d /e problem_domain_controller domain_dn

For large environments, remove the /e switch to replicate domain controllers with the same site, or use /sync to target specific domain controllers in remote sites

Check for SPN Registration

32. Make sure the Service Principal Name (SPN) is registered for each domain controller object on each partner domain controller. For more information see KB article 308111.33. Review the Registered Service Principal Names section of the Netdiag output on partner domain controllers to ensure that the test passes. Export the SPNs of each domain controller object involved in the replication failure from each partner using the following command:ldifde -f spndump.txt -p base -l servicePrincipalName -d <DN of DC>

34. If you have not already received and MPSReports, you can gather the required netdiag output using the following command.c:\>netdiag /v > netdiag.txt

Page 18: Active Directory Replication Troubleshooter

35. Visually compare the SPNs or use the Windiff tool from the Windows 2000 Support Tools to compare the files for differences. Under the Options menu in Windiff, uncheck everything except Show different files, Show left-only lines, and Show right-only lines. After identifying the missing SPNs, edit the good SPN file as follows.Change changetype: add to changetype: modify.

Add replace: servicePrincipalName after the changetype line.

Add "-" to the last line of the file.

36. Import the correctly registered SPNs on the partner domain controllers that do not have proper SPNs registered for its replication partner domain controllers.ldifde -I -f goodSPNs.txt

Check Permissions on the Directory Partiitions

37. Ensure the Enterprise Domain Controllers group has the required permissions on the directory partition’s access control list (ACL).

u.    Start AdsiEdit.v.    Right-click each partition object, and then select Properties.w.   On the View menu, select Advanced Features.x.    Select the Security tab, click Enterprise Domain Controllers in the name list, and then make sure the following permissions are selected under Allow          Manage Replication Topology.         Replicating Directory Changes.         Replication Synchronization

38. Use Active Directory Sites and Services to make sure the server object and its corresponding NTDS Settings child object exist in the correct site.39. Verify the following Group Policy security options under Security Settings match on all partner domain controllers.Windows 2000 Group Policy

         Digitally Sign Client Communication (Always).         Digitally Sign Client Communication (When Possible).         Digitally Sign Server Communication (Always).         Digitally Sign Server Communication (When Possible).         LAN Manager Authentication Level.         Shut down system immediately if unable to log security audits

Windows Server 2003 Group Policy

         Microsoft network client: Digitally sign communications

Page 19: Active Directory Replication Troubleshooter

(always).         Microsoft network client: Digitally sign communications (if server agrees).         Microsoft network server: Digitally sign communications (always).         Microsoft network server: Digitally sign communications (if client agrees).         Network security: LAN Manager authentication level         Audit: Shut down system immediately if unable to log security audits

40. Check for Kerberos fragmentation by typing ping <destination computer> -f -l 1472. If it fails at 1472, then packets are likely being fragmented. For more information see KB article 244474.

Target account name is incorrectThe error messages "Target account name is incorrect" and "Target principal name is incorrect" are essentially the same errors and the steps for troubleshooting these two messages are therefore the same.

Procedures

Troubleshooting Steps

1.    If replication is failing between domain controllers in different domainsAdd the registry value below to the upstream replication partner.

HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NTDS\ParametersValue name:  Replicator Allow SPN FallbackValue type:  REG_DWORDValue data:  1

2.    Run the following command from the upstream partner: repadmin /add CN=Configuration,DC=<Contoso>,DC=<com> <root DC name> <fully qualified name of child DC>

Remove the Replicator Allow SPN Fallback registry value after testing replication.

3.    Search for duplicate computer or user accounts in the domain of the failing domain controller and its upstream replication partner. For more information see KB article 310340.4.    Review the server objects of the problematic domain controllers in Active Directory Sites and Services to make sure there are no duplicates or conflicting objects present.5.    Verify that multiple server names with the same IP address are not registered in DNS, which can happen if a domain controller is

Page 20: Active Directory Replication Troubleshooter

renamed and old DNS records are not scavenged. Use Adsiedit or Ldp (both are included in the Windows 2000 Support Tools) to verify that the dNSHostName attribute on each domain controller is populated with the correct value. To do this, perform the following steps.

y.    From Start\Run, run adsiedit.mscz.    Expand the Domain NC container.aa. Expand the object below, i.e. DC=Contoso, DC=COM.bb. Expand OU=Domain Controllers.cc.  Right-click CN=<domain_controller>, and select Properties. dd. Under Select a property to view, select dNSHostName and verify the value contains the fully qualified name of the server, i.e. dc1.contoso.com.

6.    If the problem domain controllers exist in only one domain with more than two domain controllers, then force all computer accounts to be replicated throughout the enterprise. That means all domain controllers must be synchronized with all other copies of their domain. For each computer that is reporting a replication error, use the following command to force that computer to become synchronized. repadmin /syncall /d /e <problem domain controller> <DN of domain>

For large environments, remove the /e switch to replicate domain controllers with the same site or use /sync to target specific domain controllers in remote sites.

7.    If the failing domain controllers reside in different domains, then specify the configuration partition. For more information see KB article 296993.repadmin /syncall /d /e <problem domain controller> <DN of config>

For large environments remove the /e switch to replicate domain controllers with the same site or use /sync to target specific domain controllers in remote sites.

8.    If the problem exists between domain controllers from different domains, check the trust relationship by doing the following:

ee. Open Active Directory Domains and Trusts.ff.   Right-click the desired domain and select Properties. gg. Click the Trusts tab.hh. Highlight the domain to verify and click Edit. ii.     Click Verify.

9.    The Netdom tool, included in the Windows 2000 Support Tools, can also be used to verify the trust.netdom trust <trusting_domain_name> /domain:<trusted_domain_name> /userd:<administrator> /password:<password> /verify /kerberos

10. If the error happens when attempting replication between two

Page 21: Active Directory Replication Troubleshooter

domain controllers in different domains that have a parent/child or tree root trust relationship, it may be the result of a missing object that represents the trust relationship between the two domains. This object is known as a trustedDomain object and is found in the System container in the Active Directory Users and Computers tool. This type of object directly relates to the trust relationships displayed in Active Directory Domains and Trusts. If this object is not present Active Directory, cross-domain authentication will fail. If you discover that the trustedDomain object is missing, refer to the “Missing trustedDomain object” section of the troubleshooter.11. Make sure the Service Principal Name (SPN) is registered for each domain controller object on each partner domain controller. For more information see KB article 308111.12. Review the Registered Service Principal Names section of the Netdiag output on partner domain controllers to ensure that the test passes. Export the SPNs of each domain controller object involved in the replication failure from each partner using the following command:ldifde -f spndump.txt -p base -l servicePrincipalName -d <DN of DC>

Either visually compare the SPNs or use the Windiff tool from the Windows 2000 Support Tools to compare the files for differences. Under the Options menu in Windiff, uncheck everything except Show different files, Show left-only lines, and Show right-only lines. Once you have identified the missing SPNs, edit the good SPN file with the following steps:

Change changetype:  add to changetype:  modify.Add replace:  servicePrincipalName after the changetype line.Add "-" to the last line of the file.

13. Import the correctly registered SPNs on the partner domain controllers that do not have proper SPNs registered for its replication partner domain controllers.ldifde -I -f goodSPNs.txt

LDAP Error 0x31LDAP error 0x31 is simply the error returned from the LDAP engine when credentials appear to be invalid. This is usually the result of a Kerberos failure during a secure LDAP bind from one domain controller to another.

Procedures

Password Synchronization

1.    Attempt to reset the computer account password and force a refresh of Kerberos tickets. Use the Netdom tool from the Windows

Page 22: Active Directory Replication Troubleshooter

2000 Support Tools to reset the machine account password.netdom resetpwd /server:<computername> /userd:<domain>\administrator /passwordd:<password>

Note Run the command on the problem domain controller. <computername> is any domain controller other than the domain controller with the invalid password.

2.    Set the Kerberos Key Distribution Center (KDC) service to manual on the problem domain controller and reboot.3.    After the reboot, start the KDC service and change it back to Automatic.4.    Sometimes the HKEY_LOCAL_MACHINE\Security\Policy\PolAcDmN registry key is set to the computer name instead of the NetBIOS domain name. Use Regedt32 to view this value as Regedit does not display REG_BINARY values properly.5.    By default SYSTEM is the only account with permissions to the SECURITY key under HKEY_LOCAL_MACHINE. To allow administrators to view the value of HKEY_LOCAL_MACHINE\Security\Policy\PolAcDmN the permissions on the SECURITY key need to be modified.

To modify the permissions on the Security key

1.    Start regedt32.exe2.    Select Edit Permissions3.    Highlight Administrators4.    Check Full Control under Allow5.    Click OK6.    Close the Registry Editor.7.    Restart regedt32.exe

8.    Highlight the No Name value and choose Display binary data from the View menu.9.    Confirm that the value in HKEY_LOCAL_MACHINE\Security\Policies\PolPrDmN is set to the NetBIOS domain name.10. Copy that value and paste it into HKEY_LOCAL_MACHINE \Security\Policies\PolAcDmN.

If the trustedDomain object is missing, there will usually be an Event ID 1265 logged in the directory service event log referencing a "Target account name is incorrect" error. If the error is being reported for replication between two domain controllers of different domains which have a parent/child or tree root trust relationship, this error may be the result of a missing object that represents the trust relationship between the two domains. This object is known as a trustedDomain object and is found in the System container in Active Directory Users and Computers. If this object is not present, cross-domain authentication will fail. To resolve this issue perform the following steps. For more information see KB article 257844.

Restore a missing trustedDomainObject (TDO)

Page 23: Active Directory Replication Troubleshooter

1.    From the domain that is generating the Event ID 1265 or “LDAP Bind error 31” error messages, 2.    Open Active Directory Domains and Trusts on the domain controller that holds the PDC Emulator operations master role for the domain. 3.    Right-click the object that represents the domain, and then select Properties.4.    Click the Trusts tab, and click Add to create both sides of the trust relationship to the remote domain. Because this would normally be a Kerberos trust, creating both sides of the trust is required. Creating the trusted side first generates the error message "Active Directory cannot verify the trust. Access is denied."

5.    Click OK.Active Directory Domains and Trusts displays the trust as a transitive, shortcut trust. Adding the trusting side generates the message

"To verify the new trust, you must have permissions to administer trusts for the domain <domain name>. Do you want to verify the new trust?”

6.    Click Yes, and supply the administrator credentials for the remote domain. When prompted for credentials, specify the NetBIOS domain name as well as the user name for example, CONTOSO\Administrator.The following error message is generated."Active Directory cannot verify the trust. Access is denied."

7.    Click OK. Again, note that Active Directory Domains and Trusts displays the trust as a transitive, shortcut trust.8.    After both sides of the trust are created, run the Netdom command below (Netdom is included in the Windows 2000 Support Tools):netdom trust <local_domain> /domain:<remote_domain> /userd:administrator /passwordd:* /usero:administrator /passwordo:* /reset /twoway

9.    Where <local_domain> is the domain on which the trust is being created and <remote_domain> is the parent, child, or root domain being trusted. In either case, the fully qualified domain name (FQDN) should be used, i.e. "Contoso.com". This should result in the following message:10. Type the password associated with the domain user:  (This is UserD)11. Type the password associated with the object user:  (This is UserO)12. Resetting the trust passwords between <local_domain> and

Page 24: Active Directory Replication Troubleshooter

<remote_domain>. The trust between <local_domain> and <remote_domain> has been successfully reset and verified.The command completed successfully.

13. Reboot the domain controller where these changes were made.14. Wait several minutes for Active Directory to establish a secure channel and the Knowledge Consistency Checker (KCC) to attempt to re-establish replication links to the domain controllers in the remote domain. During this period, test that logons across the trust relationship are successful and that no errors are logged in the directory service event log.Note   This procedure should only be performed if the trustedDomain object for the remote domain is not present in the System container.

Global Catalog ErrorsThis section covered errors related to issues with Global Catalog servers.

No Global Catalog can be contactedA failure to discover a global catalog can occur for a number of reasons, especially if it is a Microsoft® Exchange server that is failing to locate a global catalog. However, there are several tests that can be performed to verify if a global catalog is unavailable or if the problem client is just not receiving the advertisement.

ToolsThe nltest utility is included in the Support Tools package.

Procedures

Using NLTEST to discover a global catalog

1.    Run the following command to attempt to locate a global catalog server.nltest /dsgetdc:<fully qualified name of the domain> /gc /force

Using Windows Address Book to test global catalog functionality

1.    Click Start2.    Select Run3.    Type wab.exe and click OK4.    In the Windows Address Book, Select Find People.5.    Select Active Directory from the Look In box.6.    Type in a user name (i.e. John Doe)

Page 25: Active Directory Replication Troubleshooter

7.    Click Find Now.

Replication Global Catalog promotionWhen promoting a server to be a global catalog, Event ID 1119 indicates the promotion was successful. If Event ID 1119 is not logged, use the following procedure to determine the cause.

ProcedureResolving Global Catalog promotion issues

1.    Review the directory service event log for relevant events such as 1559, 1578, 1110, and 1126. If you do not see any relevant events, enable diagnostic logging on the global catalog by configuring the following values in the registry. For more information see KB article 314980.HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\DiagnosticsInter-Site Messaging - 2Replication Events - 3Internal Processing - 1 Global Catalog - 4

2.    If an Event ID 1119 exists stating that the domain controller was successfully promoted as a global catalog, and it was a recently logged event, then possibly it has started advertising before it fully synchronized all domain partitions hosted by domain controllers in remote sites. By default, any Windows 2000 domain controller on SP2 or lower will only check to ensure that all domain partitions hosted in its own site have successfully replicated. If there is a domain context in the forest that does not have a domain controller in the server’s local site or another global catalog in the site containing that partition, the domain controller will still advertise as a global catalog even though those partitions have not yet synchronized. This behavior is enforced by the following registry valueHKEY_LOCAL_MACHINE\CurrentControlSet\Services\NTDS\ParametersValue Name:  Global Catalog Partition OccupancyValue Type:  REG_DWORDDefault Value:  4

3.    The default value of 4 means all partitions in the same site are synchronized fully. This is also the maximum value if the domain controller is at SP2 or lower. In SP3, the value can now be set to 6, which requires all partitions in the forest to be synchronized before a domain controller will advertise as a global catalog. If the issue involves Microsoft Exchange Server, reference KB article 304403 for more information. Creating a connection object to the appropriate

Page 26: Active Directory Replication Troubleshooter

domain controller hosting the missing domain partition and forcing replication may expedite the process. To do this, perform the following steps.

jj.   In Active Directory Sites and Services, expand the problem server's site, and then the server object for that server.kk. Right-click on NTDS Settings and select New Active Directory Connection.ll.     Locate a domain controller that hosts the missing domain partition, double-click it, and click OK.mm.              Right-click the new connection object and select Replicate Now.

Alternatively, you can use repadmin.exe to force replication rather than Active Directory Sites and Services from step 3.

repadmin /sync DC=<MyMissingDomainName>,DC=<com> <MyProblemServerName> <GuidofSourceServer, ie.0d67193c-8cb1-4c4c-bd7c-af98e11d6d67>

4.    To obtain the GUID of the server, run repadmin /showreps \\<source server> and copy the ObjectGuid.5.    If no Event ID 1119 exists in the directory service event log, or the domain controller is not advertising as a global catalog, then determine what partitions have not replicated yet. Focusing on any Knowledge Consistency Checker (KCC) errors, specifically Event ID 1265, will help determine what partitions it is having problems. If no helpful events are logged, then enable diagnostic logging as in KB article 314980. The more important registry entries to focus on are the following:Replication Events:  set to 3Inter-Site Messaging:  set to 2Internal Processing:  set to 1Global Catalog:  set to 4.

Note Remove these settings when finished troubleshooting, as they will continue to fill up the event log.

6.    Once relevant events are identified, try to determine the reason for the replication failure, which is often listed at the bottom of the event description, generally referring to a “DNS lookup failure” or “Access is denied” error. After obtaining the error refer back to the troubleshooter and follow steps in the section pertaining to that error message.7.    After resolving all of the relevant errors, to verify the global catalog is advertising you can check the isGlobalCatalogReady value to ensure it is TRUE.

nn. Start the Ldp tool included in the Windows 2000 Support Tools.oo. click Connect on the Connections menupp. type the name for the global catalog server that is used for lookup in the Server Name box.

Page 27: Active Directory Replication Troubleshooter

qq. Type 3268 in the Port Number boxrr.   Uncheck the Connectionless check box clear.ss.  Look for the isGlobalCatalogReady value in text output.

Replication Engine ErrorsThis section covers symptoms related to Active Directory replication.

Replication Operation Encountered a Database ErrorThis error is generally seen in Event ID 1084 and 1085, as well as in a Dcdiag output. The error occurs if there is a mangled object or attribute such as DEL:<GUID> or CNF:<GUID> seen in the event description. This condition can block the promotion of a new global catalog, replication of a new partition, and can prevent an additional domain controller from being promoted. New features in the SP3 or above versions of Ntdsutil’s go fixup command can help clean up mangled attributes. SP3 and above also contain new functionality to help prevent this condition from happening in the first place.

SymptomDirectory Services Event Log

Source: NTDS

Error: 1084

Error Text:

Procedures

To Enable diagnostic logging

1.    Enable diagnostic logging using the steps below and then force replication.2.    Use Regedit to locate the 19 Inter-Site Messaging value under the following key in the registry.HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\NTDS\Diagnostics

3.    Note the original values of the following registry entries so you can restore the original values after you identify the problem. 4.    On the Edit menu, click DWORD, type 2, and then click OK.5.    Locate the 5 Replication Events value under the same key.6.    On the Edit menu, click DWORD, type 4, then click OK.7.    Locate the 9 Internal Processing value under the same key.8.    On the Edit menu, click DWORD, type 1, then click OK.9.    Quit Regedit.10. Open Active Directory Sites and Services, select the server object of the problem server, and force inbound replication with one

Page 28: Active Directory Replication Troubleshooter

of its replication partners.C:\>repadmin /syncall /d /e <DN of the problem Global Catalog> <DN of config>

11. With diagnostic logging enabled, there should be events describing which upstream partners, by GUID, that it is unable to replicate with. To translate the source server’s object GUID listed in the event description:12. Run repadmin /showreps from the server logging the errors.C:\windows\system32\>repadmin /showreps

13. Copy the object GUID from the event description and search for it in the Repadmin output. Then find a corresponding server name under the Inbound partners section.14. Installing SP4 is recommended to resolve these errors. If that is not possible, install SP3 and hotfix 812499 on both the problem server and the upstream source replication partner. The source server is listed in the event error first, and then the server generating the errors is listed.  KB article (hotfix) 812499 will prevent this problem in the future, and also includes an updated version of Ntdsutil to fix mangled objects and attributes.15. Ntdsutil must be run from Directory Services restore mode when attempting to fix mangled objects.

To perform a semantic check using ntdsutil

1.    Restart the domain controller in Directory Service Restore mode2.    Start a command prompt3.    Start ntdsutil.exe and enter the commands to start the semantic checker.c:\>ntdsutilsem d ago fixq

4.    You should see text that is similar to the following sample text to indicate a successful recovery.c:\ntdsutil c:\ntdsutil: sem d a semantic checker: go fix Fixup mode is turned on Opening DIT database... Done.Could not update "datatable" table: key already exists. Could not retrieve "ATTk589826" column in "datatable" table: (warning) column is null. Done.

Page 29: Active Directory Replication Troubleshooter

Opening database [Current].....Done.

Getting record count...1744 records Writing summary into log file dsdit.dmp.0 Records scanned: 1700 Processing records..Done.

semantic checker: q

5.    Reboot the domain controller into Active Directory mode.Caution   The error(s) returned by running the semantic checker will vary between domain controllers. The object of running the semantic checker is the verification and cleanup of common database related problems.

Disjointed Namespace IssuesThe following symptoms can occur when the DNS suffix of a domain controller is not the same as the DNS suffix used for name resolution.

For example:

The domain controller's fully-qualified domain name (FQDN) is dc01.contoso.com but the DNS domain is called northwindtraders.com.

Symptoms         You may be unable to join a client workstation to the domain. When you try to join a Windows XP Professional computer to the domain, you may receive an error message that is similar to the following: A domain controller for the domain DomainName.local could not be contacted.If you click Details on this message, the details of the error message may include text that is similar to the following: DNS was successfully queried for the service location (SRV) resource record used to locate a domain controller for domain DomainName.local. The query was for the SRV record for _ldap._tcp.dc._msdcs.DomainName.LOCAL

         You may be unable to log on to the domain.          You may be unable to promote an additional domain controller into the domain, and the following error may occur:The specified domain either does not exist or cannot be contactedA Service Principal Name (SPN) could not be constructed because the provided hostname is not in the necessary formatThe Directory Service failed to create the server object for CN=NTDS Settings,CN=CLIENT01,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=Contoso,DC=com on server DC01. Please ensure the network credentials provided have sufficient access to add a replica.

Page 30: Active Directory Replication Troubleshooter

The operation failed because: failed finding a suitable domain controller for the domain contoso.com. The specified domain either does not exist or could not be contacted."

         You may receive the following errors when attempting to use any Active Directory MMC snap-ins:Naming information cannot be located because: The logon attempt failedNaming information could not be located because the object name has bad syntax

         You may see the following errors in the system event log of a client, member server, or domain controller:Event ID: 5788Source: NetlogonDescription: Attempt to update Service Principal Name (SPN) of the computer object in Active Directory failed. The following error occurred: The attribute syntax specified to the directory service is invalid. Event ID: 5789Source: NetlogonAttempt to update DNS Host Name of the computer object in Active Directory failed. The following error occurred: The parameter is incorrect.

         You may see the following errors in the application event log of a client, member server, or domain controller:Event ID: 1000Source: UserenvDescription: Windows cannot establish a connection to CONTOSO.COM with (1787). Event ID: 1000Source: UserenvDescription: Windows cannot query for the list of Group Policy objects . A message that describes the reason for this was previously logged by this policy engine. Event ID: 1000Source: UserenvDescription: Windows cannot determine the user or computer name. Return value (1326). Event ID: 5721Source: Net LogonDescription: The session setup to the Windows NT or Windows 2000 Domain Controller for the domain contoso.com failed because the Domain Controller does not have an account for the computer <computername>.

         Installing the Exchange Server Recipient Update Service (RUS) results in the error:Only one instance of the Recipient Update Service can update a Domain Controller and all Domain Controllers on contoso.com are being updated. ID No: c1039c6c."

         The Exchange System Attendant service for Exchange 2000 may fail to start, and the following warning is logged in the application event log:Event ID: 9157Source: MSExchangeSADescription: Microsoft Exchange System Attendant does not have sufficient rights to read Exchange configuration objects in Active Directory. System attendant will try again in approximately one minute.

Page 31: Active Directory Replication Troubleshooter

         The Setspn tool may fail with the following error:Requested name "contoso\DC01$" not found in directory."

         PXE clients fail to authenticate using valid domain administrator credentials. The Client Installation Wizard screen titled "Logon Error" shows the following information:00004e28.OSC error - The System cannot validate your User Name Password or DomainThe system cannot validate your user name, password, or domain name. Verify that your user name and domain name are correct, and then retype your password. Passwords must be typed using the correct case. Be sure the CAPS LOCK key is not pressed.

         During the setup on a Mobile Information Server (MIS) server, when you enter the password for the Message Processor, you may receive the the following error:The wizard was interrupted before Mobile Information Server could be completely installed. Your system has not been modified.Event ID: 10005Source: MSIInstaller Description: Product: Mobile Information Server - error 29910 failed to validate user. Error no: 0x0 Error message: The operation completed successfully.

         When running ADMT you may receive the following error in the migration.log file:

2002-01-23 15:00:34 ERR2:7422 Failed to move object CN=Jsmith, hr=8009030d The credentials supplied to the package were not recognized

         Dcdiag may show the following errors:Starting test: NetLogons* Network Logons Privileges Check [DC01] An net use or LsaPolicy operation failed with error 1231, The network location cannot be reached Starting test: MachineAccount Could not open pipe with [DC01]:failed with 1231: The network location cannot be reached. For information about network troubleshooting, see Windows Help. Could not get NetBIOSDomainName Failed can not test for HOST SPN

         When you use the Small Business Personal Console or Active Directory Users and Computers to create users, and then you mailbox-enable the user, the following issues occur: • E-mail properties are not generated.          SMTP addresses are not generated.          The user is not displayed in the global address list (GAL).          The directory service event log may show the following warning event: Event ID: 1655Source: NTDSDescription: The attempt to communicate with global catalog \\DC01 failed with the following status: A Service Principal Name (SPN) could not be constructed because the provided hostname is not in the necessary format. The operation in progress might be unable to continue. The directory service will use the locator to try find an available global catalog server for the next operation that requires one.

         You receive the following error when installing Services for Unix 2.0 (SFU)error 26065 NIS Schema Upgrade Failed

Page 32: Active Directory Replication Troubleshooter

ResolutionRefer to Knowledge Base article 257623

http://support.microsoft.com/kb/257623/EN-US

Directory Replication: KB Articles These are some of the commonly used KB articles for solving directory replication related issues.

Name Resolution IssuesDomain controller's domain name system suffix does not match domain name (257623)

Troubleshooting Active Directory replication failures that occur because of DNS lookup failures, event 2087, or event 2088

How to troubleshoot RPC Endpoint Mapper errors in Windows Server 2003 (839880)

Connectivity IssuesHow to troubleshoot RPC Endpoint Mapper errors in Windows Server 2003 (839880)

Service overview and network port requirements for the Windows Server system (832017)

Error MessagesTroubleshooting Active Directory replication failures during dcpromo promotion (833372)

HOW TO: Troubleshoot Intra-Site Replication Failures (249256)

How to troubleshoot an "Internal error" error message during the replication phase of dcpromo (265090)

Replication Not Working Properly Between Domain Controllers After Deleting One from Sites and Services (262561)

How to troubleshoot Event ID 1311 messages on a Windows 2000 domain (307593)

"Replication Access Was Denied" Error Messages Occur After You Promote a Server to Domain Controller (329860)

How to troubleshoot RPC Endpoint Mapper errors in Windows Server 2003 (839880)

Service overview and network port requirements for the Windows Server system (832017)

Tools RelatedDetermining the Server GUID of a Domain Controller (224544)

Using Repadmin.exe to Troubleshoot Active Directory Replication (229896)

Active Directory Scalability The goal of these tests was to test scalability of Active Directory on x64 hardware, particularly with very large directory databases. The physical and logical diagrams from the validation section are representative of the

Page 33: Active Directory Replication Troubleshooter

environment used to perform the Scalability tests. Support.contoso.com was comprised of only x64 AMD Operteron machines. Engineering.contoso.com was comprised of only x32 Intel machines.

The tasks associated with this portion of the testing were as follows:

         x32 vs x64 Testing         DNS Service Start-up time

Active Directory Scalability ResultsAs expected, Active Directory was able to easily scale to our test load. Our primary constraint during testing and load simulation turned out to be the disk subsystem, both in performance and space. We used ADTest to populate the Directory and create the OU structure. Our simple population testing (creating users, modifying group membership) showed a large difference in write performance between the two systems platforms.

Platform Writes Per Second ADTest Threads Disk Controller Cache

X32 ~ 70-90 4 48 MB

X64 ~575 - 625 8 256 MB

What we observed during the month of population testing was that the x64 machines were not only able to handle more threads of ADTest and but they were also able to perform more writes per second before operations were failing and being dropped. Increasing the number of threads on either platform past 4 and 8 respectively resulted in write failures.

We believe one of the reasons for the difference in write performance is the amount of cache on the controller cards. The disk subsystem was configured the same between platforms with the x64machines having larger hard disks. RAID 1 was used for all volumes with the OS and DIT on the first volume with the Log files on the second volume. That data provides evidence that the entire disk subsystem should be tested as a whole. Ultimately, Active Directory will be write-bound, so disk controllers should be sized accordingly.

Note For a complete treatment of the performance benefits of 64 bit Domain Controllers, readers are encouraged to review the Active Directory Performance for 64-bit Versions of Windows Server 2003 whitepaper available on Microsoft.com.

The primary data points we wanted to capture centered on how the directory information tree (DIT) grew on disk as we added users and groups. We took numerous snapshots as we added users and groups. The DIT size grew as follows:

Total Users Total Security Groups

Users Per Group

DIT Size

100,000 0 0 ~900 Megs

100,000 25,000 25 ~1.1 Gigs

400,000 25,000 25 ~4.2 Gigs

500,000 100,000 125 ~9.5 Gigs

1,000,000 250,000 125 ~20 Gigs

Page 34: Active Directory Replication Troubleshooter

To track the growth of the DIT, we took raw snapshots of the database after each progressive run. We then accounted for any whitespace by multiplying the number of owned pages with data on them by the Active Directory page size, 8k.

There are a few other useful data points that can be gleamed from the data that was collected during the various DIT stages. Both the DIT and internal structures grew somewhat predictably as users and groups were increased. We were unable to test the growth effects of other objects such as certificates and DNS records due to time constraints, and both of these should be tested by Contoso.

As the number of security groups and associated group membership increased, the amount of metadata increased steadily. We include in the term ‘metadata’ not only the commonly associated replication metadata (originating DC, USN, Version, Date, etc) but also the metadata that is used to maintain the internal database relationships, such as the link tables, and indexes.

Total Users Total Security Groups

Users Per Group

DIT Size Indices as a Percent of DIT Size

100,000 0 0 ~900 Megs ~10%

100,000 25,000 25 ~1.1 Gigs ~20%

400,000 25,000 25 ~4.2 Gigs ~10%

500,000 100,000 125 ~9.5 Gigs ~30%

1,000,000 250,000 125 ~20 Gigs ~40%

It is interesting to note that for the third data set of 400,000 users with 25,000 groups and 25 members in each group, the indices as a percentage of the total DIT size decreased from the same number of groups with 100,000 users. There are a couple of explaining factors for this. First, group membership between the two data points did not change. Thus, the link table and indices in the 400,000 user test was unchanged from the 100,000 test. Second, other indices did increase in size, namely those responsible for indexing user attributes, such as UPN, samAccountName, Given Name. But their increase was not relatively large enough to offset the link data that remained constant. Thus, while the indices as a percentage of DIT went down, the total amount of space consumed by the indices in the 400,000 user test case actually increased.

The data also indicates that very quickly, the amount of metadata becomes a much larger proportion of the data stored in the DIT than the data itself. This high cost of the local disk space allows for highly efficient replication. If less metadata were stored on disk, the DIT would be smaller, but the amount of bandwidth required between domain controllers would increase.

Thus, it is important to note that DIT size does not equate to total replication traffic. That is to say, if we were to promote a new domain controller into the domain with 1M users and 250,000 security groups, we would not expect to see anywhere close to the approximately 20 Gigs of data that is consumed by the DIT. Most of the aforementioned metadata is not replicated to other domain controllers, but is rather recreated on the destination. For example, with the 20 Gig DIT, we know that indices consumed 40% of the space, and these indices would never be replicated on the wire.

Indices for attributes containing no values consume only one 8K page in the DIT as a placeholder. The commonly populated attribute indices, such as UPN, samAccountName, Given Name, Surname, Display Name, etc. consumed the most amount of space. Each of the aforementioned attribute indices consumed between 50-60 megs of disk space for the 1M user test case. While not a large portion of the total space consumed by

Page 35: Active Directory Replication Troubleshooter

the DIT, it is important to recognize that each additional piece of data that Contoso wishes to index, store and present, and the format of the data itself, will have a measurable impact on the space consumed by the DIT as a whole. On the other hand, the sum total of the linked values indices for the same 1M test case was around 7-8 Gigs. This provides evidence of the benefits of a well planned identity management strategy- poorly managed groups and group membership have a quantifiable impact on the Directory.

The space consumed by indices provides another useful reference point for domain controller sizing. Depending on the role of the domain controller (authentication DC, Exchange query DC, line of business DC) we can make some broad generalizations regarding the effectiveness of varying amount of RAM.

A domain controller is essentially a ‘write-few, read-many’ database. One of the reasons indices are maintained is so that the ‘read-many’ portions of the workload is handled as efficiently as possible. There are thus three general levels of database caching that we can use as benchmarks.

6.    Caching only some of the indices and data. This would provide the lowest level of performance. If we are unable to even cache the frequently used indices, our system performance would be heavily disk subsystem bound.7.    Caching all the indices: This would provide a very good level of relative performance for a reasonable cost. By having enough RAM to cache all of the indices, the most frequently used data structures do not require disk paging and performance should be markedly improved.8.    Caching the entire database: Clearly this would provide the greatest level of relative performance, but at the cost of more expensive hardware.

For Contoso, the best blend of price and performance will center on the second scenario of providing sufficient RAM to cache all the indices plus some additional amount of dependent on the role of the DC.

9.    Authentication Domain Controller: for domain controllers whose primary purpose is user authentication, we would expect good performance from having enough RAM to cache all the indices plus the data portion of the Directory for the user population that we expect to serve. This user population will vary from location to location and would require testing to further refine. As a starting point, with 1M users and 250,000 security groups with 125 members in each group, we would recommended a bare minimum of 8 Gigs of RAM (40% of 20 Gigs). Note that the total amount of space consumed by indices is predominately a function of the link indices, and not of the attribute indices.10. Exchange Domain Controller: for domain controllers / global catalog servers whose primary purpose is to service the needs of Microsoft Exchange Server, we would expect good performance from having enough RAM to cache all of the indices, plus some additional RAM for any Exchange specific data that is frequently required. The Exchange domain controllers in the Application Forest will not be processing interactive logons and will house only disabled mail accounts. Thus the data caching requirements should be different than the needs of an authentication domain controller, and should be tested accordingly.11. Line of Business Domain Controller: fro domain controllers (or ADAM servers) deployed to handle specific line of business directory-integrated applications, we would expect performance to vary based on the needs of the application. Presumably, custom applications would create their own application-specific indices and thus, sufficient RAM to cache the application specific indices and data would be required for best performance. We are unable to provide any specific guidance for

Page 36: Active Directory Replication Troubleshooter

this scenario and recommend Contoso test applications for performance prior to deployment.

Performance will incrementally increase between caching all indices to caching the entire directory.

Contoso can engage Microsoft Services to help determine the amount of space consumed by indices in any given DIT. The domain controller must be brought offline for the process to occur.

Answer: 67% of the DIT was dedicated to indicies.

Posted: Friday, September 07, 2007 3:07 PM by rbeard47 Filed under: DS, Active Directory, Replication, Kerberos

Comments

Online Component Troubleshooters said:

Active Directory Replication Windows Management Instrumentation Active Directory Federated Services