1

Have a DC that has recently been part of a business continuity test. From what I understand the server (which is virtual) was snapshotted, test carried out while the link between the two sites was down and then reverted to the snapshot. Now that the link is back up I am seeing notifications through Solar Winds that the AD service is in error. Looking at the server the NETLOGON service is paused. From what I can gather from the event logs this is due to repeated replication attempts failing. There is also a notification that AD was restored in an unsupported method (probably snapshot).

I have tried to force replication using the sites and services snapin but that fails, stating that the server is currently rejecting replication. I can ping the server though oddly it seems to respond from the 10.168.3 NIC and not the 10.168.50 NIC that I would have expected. Both IPs can be pinged though and the server can be connected to via RDP or console via vSphere.

Running a repadmin /show various failure but I am sure these are due to some underlying failure that is blocking the replication service from starting. Bit new to this level of troubleshooting but would be grateful of any help that could be thrown my way.

EDIT: Wondwering if it may be something to do with a USN Rollback (?)/. Link to KB here

  • How did you take a snapshot? And how did you restore it. Snapshots of DCs are not supported. Only in cases to retrieve object attributes or deleted objects usually. – Tatas Nov 04 '11 at 15:20

2 Answers2

3

Your issue is almost definitely due to the USN Rollback. Reverting back to a snapshot is not a supported method for recovering a DC. To resolve the issue, follow the steps outlined in the KB article you referenced. This will include Demoting the DC, cleaning up the metadata, and then promoting it.

HostBits
  • 11,776
  • 1
  • 24
  • 39
  • Thanks for that. From what I have read of the KB once the DC is demoted it is a case of removing the server object from AD Users and Computers (as we are running a 2008 Native domain)? This should clear up the metadata. –  Nov 04 '11 at 15:33
  • 1
    You should also look in your domain DNS zone under the _msdcs, _sites, _tcp, _udp, DomainDnsZones, and ForestDnsZones (and any subfolders in those zones) for references to the demoted DC. Those references should be removed. – HostBits Nov 04 '11 at 15:57
  • 1
    Also look in AD Sites & Services within the AD site for that DC, and verify the server object and the NTDS settings are gone after the demotion. If they are not, delete them (only after the removal has occurred). – HostBits Nov 04 '11 at 15:58
0

Three things:

If you see errors like that, you should never attempt to force replication. There is a reason that replication was stopped, and it is usually bad.

Do not use snapshots on a domain controller.

You don't want to be in a scenario where someone turned up an old copy of a dc and now you are replicating objects that should be gone. If you have not already done so, you should enable strict replication. Enabling this setting on a domain controller prevents lingering objects from being replicated inbound from an offending dc with a lingering object.

Running Domain Controllers in Hyper-V
http://technet.microsoft.com/en-us/library/virtual_active_directory_domain_controller_virtualization_hyperv%28WS.10%29.aspx

From the article:
Strict replication consistency should be enabled on all domain controllers
http://technet.microsoft.com/en-us/library/dd723692%28WS.10%29.aspx

When a domain controller in your Active Directory environment is disconnected from the replication topology for an extended period of time, all objects that are deleted from AD DS on all other domain controllers might remain on the disconnected domain controller. Such objects are called lingering objects. When this domain controller is reconnected to the replication topology, it acts as a source replication partner that has one or more objects that its destination replication partners no longer have. Problems occur when these lingering objects on the source domain controller are updated and these updates are sent by replication to the destination domain controllers. A destination domain controller can respond in one of two ways:

  1. If the destination domain controller has strict replication consistency enabled, it recognizes that it cannot update the object (because the object does not exist), and it locally halts inbound replication of the directory partition from that source domain controller.

  2. If the destination domain controller does not have strict replication consistency enabled, it requests the full replica of the updated object, which introduces a lingering object into the directory.

An outdated domain controller can store lingering objects with no noticeable effect as long as an administrator, application, or service does not update the lingering object or attempt to create an object with the same name in the domain or with the same user principal name (UPN) in the forest. However, the existence of lingering objects can cause problems, especially if the object is a security principal. The following symptoms indicate that a domain controller has lingering objects:

  • A deleted user or group account remains in the global address list (GAL) on computers running Microsoft Exchange Server. Therefore, although the account name appears in the GAL, attempts to send e-mail messages result in errors.

  • Multiple copies of an object appear in the object picker or GAL for an object that should be unique in the forest. Duplicate objects sometimes appear with altered names, causing confusion on directory searches. For example, if the relative distinguished name (also known as DN) of two objects cannot be resolved, conflict resolution appends "*CNF:GUID" to the name, where * represents a reserved character, CNF is a constant that indicates a conflict resolution, and GUID represents the objectGUID attribute value.

  • E-mail messages are not delivered to a user whose Active Directory account appears to be current. After an outdated domain controller or global catalog server becomes reconnected, both instances of the user object appear in the global catalog. Because both objects have the same e-mail address, e-mail messages cannot be delivered.

  • A universal group that no longer exists continues to appear in a user’s access token. Although the group no longer exists, if a user account still has the group in its security token, the user might have access to a resource that you intended to be unavailable to that user.

  • A new object or Exchange mailbox cannot be created, but you do not see the object in AD DS. An error message reports that the object already exists.

  • Searches that use attributes of an existing object incorrectly find multiple copies of an object of the same name. One object has been deleted from the domain, but it remains in an isolated global catalog server.

Greg Askew
  • 34,339
  • 3
  • 52
  • 81
  • Wow, thanks for the detail. I will look in to enabling strict replication in our environment and have updated the procedures in place to make sure snapshots are not used on DCs. I have assigned the answer to Cheekaleak as it looks like that was the solution but have upmarked your post as it is definitely the way to prevent it in future! Thanks again –  Nov 07 '11 at 09:26