4

I seem to be having an issue with replication between our domain controllers, the setup is as follows;

One domain two domain controllers (2008) one is virtulised one is physical same site ping between domain controllers is fine.

Ok, so basically I had to do a bios upgrade to the server that was hosting the virtual machines (domain controller was one of the virtual machines). After the update we had a problem with our cisco switch as smart ports was enabled and stopping traffic between all virtual machines and the physical network containing all other physical machines.

Now we fixed this by disabling smart ports on the 2960 and all virtual machines can communicate successfully with the physical machines and everything is fine. However; when we started the VM for the domain controller it took a VERY long time to bootup (common with AD / DNS problems I know). When it finally boot up I logged in and immediately tried to ping the second DC. The ping responded fine and all was ok network wise. But all of a sudden the domain controllers are not synchronising. I tried repadmin /syncall and errors are coming up, I tried dcdiag /q and I am also getting errors. The RPC service cannot communicate with the FSMO holder (in short).

I checked and the dfsr service is running fine. I switched of any firewalls and antivirus and still they cannot communicate except with ping. Nothing has changed at all??

Can someone point me in the right direction as to where to start? For testing purposes I created an object on the second DC and it did not replicate to the first DC (FSMO holder).

C:\Users\Administrator>dcdiag /q
         There are warning or error events within the last 24 hours after the
         SYSVOL has been shared.  Failing SYSVOL replication problems may cause
         Group Policy problems.
         ......................... IME-DC1 failed test DFSREvent
         [Replications Check,IME-DC1] A recent replication attempt failed:
            From IME-DC2 to IME-DC1
            Naming Context: DC=ForestDnsZones,DC=XXX,DC=com
            The replication generated an error (1726):
            The remote procedure call failed.
            The failure occurred at 2013-10-02 21:11:34.
            The last success occurred at 2013-10-02 20:05:07.
            2 failures have occurred since the last success.
         [Replications Check,IME-DC1] A recent replication attempt failed:
            From IME-DC2 to IME-DC1
            Naming Context: DC=DomainDnsZones,DC=XXX,DC=com
            The replication generated an error (1726):
            The remote procedure call failed.
            The failure occurred at 2013-10-02 21:09:56.
            The last success occurred at 2013-10-02 20:04:39.
            2 failures have occurred since the last success.
         [Replications Check,IME-DC1] A recent replication attempt failed:
            From IME-DC2 to IME-DC1
            Naming Context: CN=Schema,CN=Configuration,DC=XXX,DC=com
            The replication generated an error (1726):
            The remote procedure call failed.
            The failure occurred at 2013-10-02 21:02:40.
            The last success occurred at 2013-10-02 17:55:42.
            6 failures have occurred since the last success.
         [Replications Check,IME-DC1] A recent replication attempt failed:
            From IME-DC2 to IME-DC1
            Naming Context: CN=Configuration,DC=XXX,DC=com
            The replication generated an error (1726):
            The remote procedure call failed.
            The failure occurred at 2013-10-02 20:57:56.
            The last success occurred at 2013-10-02 20:04:36.
            3 failures have occurred since the last success.
         [Replications Check,IME-DC1] A recent replication attempt failed:
            From IME-DC2 to IME-DC1
            Naming Context: DC=XXX,DC=com
            The replication generated an error (1726):
            The remote procedure call failed.
            The failure occurred at 2013-10-02 21:05:29.
            The last success occurred at 2013-10-02 20:05:10.
            2 failures have occurred since the last success.
         ......................... IME-DC1 failed test Replications
         An Error Event occurred.  EventID: 0x00000457
            Time Generated: 10/02/2013   21:47:42
            Event String:
            Driver Microsoft XPS Document Writer v4 required for printer Microso
ft XPS Document Writer is unknown. Contact the administrator to install the driv
er before you log in again.
         ......................... IME-DC1 failed test SystemLog

C:\Users\Administrator>

I've also included an eventlog error from the active directory log.

Log Name:      Directory Service
Source:        Microsoft-Windows-ActiveDirectory_DomainService
Date:          02/10/2013 22:13:33
Event ID:      1308
Task Category: Knowledge Consistency Checker
Level:         Warning
Keywords:      Classic
User:          ANONYMOUS LOGON
Computer:      IME-DC1.XXX.com
Description:
The Knowledge Consistency Checker (KCC) has detected that successive attempts to replicate with the following directory service has consistently failed. 

Attempts:
7 
Directory service:
CN=NTDS Settings,CN=IME-DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=XXX,DC=com 
Period of time (minutes):
128 

The Connection object for this directory service will be ignored, and a new temporary connection will be established to ensure that replication continues. Once replication with this directory service resumes, the temporary connection will be removed. 

Additional Data 
Error value:
1818 The remote procedure call was cancelled.
Event Xml:
<Event xmlns="http://schemas.microsoft.com/win/2004/08/events/event">
  <System>
    <Provider Name="Microsoft-Windows-ActiveDirectory_DomainService" Guid="{0e8478c5-3605-4e8c-8497-1e730c959516}" EventSourceName="NTDS KCC" />
    <EventID Qualifiers="32768">1308</EventID>
    <Version>0</Version>
    <Level>3</Level>
    <Task>1</Task>
    <Opcode>0</Opcode>
    <Keywords>0x8080000000000000</Keywords>
    <TimeCreated SystemTime="2013-10-02T18:13:33.071Z" />
    <EventRecordID>12274</EventRecordID>
    <Correlation />
    <Execution ProcessID="652" ThreadID="1332" />
    <Channel>Directory Service</Channel>
    <Computer>IME-DC1.XXX.com</Computer>
    <Security UserID="S-1-5-7" />
  </System>
  <EventData>
    <Data>7</Data>
    <Data>CN=NTDS Settings,CN=IME-DC2,CN=Servers,CN=Default-First-Site-Name,CN=Sites,CN=Configuration,DC=XXX,DC=com</Data>
    <Data>128</Data>
    <Data>The remote procedure call was cancelled.</Data>
    <Data>1818</Data>
  </EventData>
</Event>
DanBig
  • 11,393
  • 1
  • 28
  • 53
Raf
  • 41
  • 1
  • 1
  • 3

1 Answers1

3

Long startup times are a sign that you've ordered your DNS servers wrong in your DCs' network adapter settings. This can also cause the replication issues that you're seeing. Read the answer to this question and correct your setup. I'd imagine you'll likely see an improvement afterwards.

What should the order of DNS servers be for an AD Domain Controller and Why?

If that still does not fix your issue, you need to find out why RPC is not working between the two servers. This can be because of a network configuration issue, a firewall issue (hardware or host-based) or any number of other reasons. Simply pinging a server does not ensure that RPC can communicate successfully, all that it means is that ICMP is working between the two.

MDMarra
  • 100,183
  • 32
  • 195
  • 326
  • My DNS has always been as per that link. The only difference is rather then 127.0.0.1 for the alternate on each DC they are there own IP address. I have disabled all firewalls and antiviruses but the issue here is nothing has changed whatsoever for these changes needing to be made? – Raf Oct 02 '13 at 19:14
  • @Raf It sounds like an awful lot has changed on your networking configuration. I'd look there. – MDMarra Oct 02 '13 at 19:16
  • I don't believe it, I went to control panel on DC2 and checked the firewall, it said it is disabled however I clicked switch on/off and it was switched on. I switched it off, rebooted the server and everything is syncing again? Could it have been the firewall on DC2? I thought you could use firewalls between Domain controllers providing the correct exceptions are in place! – Raf Oct 02 '13 at 19:29
  • You absolutely can (and should) use firewalls. If turning the firewalls off solved your problem, then that means that you didn't have the proper exceptions in place. – MDMarra Oct 02 '13 at 19:32
  • Ok I rebooted DC1 to make sure everything would be ok and the issues have come back again. I'll leave it until tomorrow morning to see if any progress happens. But basically speaking it shows there are actually no physical connectivity issues, it is something holding up DFS replication from being carried out. – Raf Oct 02 '13 at 20:14
  • Sysvol uses DFS, but the directory replication does not. It uses RPC. You have larger issues than just DFS not working. – MDMarra Oct 02 '13 at 20:16
  • Can you point me in the right direction as to where to start? Telling me its just an RPC issue doesn't really help. I know my way around AD so if there is any output you need I can always post it. Thanks for all your help – Raf Oct 02 '13 at 20:21
  • @Raf I already have. If RPC is failing, then something is blocking RPC traffic between your hosts. – MDMarra Oct 02 '13 at 20:45
  • What network profile are the DC's showing in Network and Sharing Center? If it's anything other than "Domain network" then the firewall is likely still causing the problem as different rules exist for different network profiles. If that's the case then it's likely still a DNS problem causing the DC to identify that it's on a network other than a Domain network. – joeqwerty Oct 02 '13 at 21:30
  • I have fixed the issue. When we did the HP PSP update it also upgraded the Network Drivers (forgot to mention my bad). After reading this link: http://social.technet.microsoft.com/wiki/contents/articles/11795.troubleshooting-ad-replication-error-1818-the-remote-procedure-call-was-cancelled.aspx - there is a section that mentions network drivers. Removing the new 'updated' drivers and installing the older ones cleared all errors and replication started again. The updated host drivers must have been causing network connectivity issues for its own virtual machines. – Raf Oct 04 '13 at 08:52