8

Our Exchange 2013 server suddenly started logging errors about bad replication between sites (there should be no replication as there is no other server) as well as several other issues that appear to be related.

We have two sites connected by VPN. This is the Active Directory Sites configuration: Active Directory Sites
There are multiple subnets at each site and the routing between the subnets and the sites is working fine.

The Exchange server's IP address is 10.10.0.26 and it is running on the same Hyper-V host as a DC with the IP 10.10.0.21 (the one named XXXX-DC01 in the picture) which is set as part of the Default-First-Site.
Servers in Sites

The Exchange server thinks it is in the YGXXX site:
enter image description here

I enabled NTLOGON.LOG, but the only related information appears to be:
enter image description here

How can I figure out why the server is choosing the wrong site?

yakatz
  • 1,213
  • 3
  • 12
  • 33
  • 2
    From the member server, telnet to ports 88, 389, 445, etc., of your site-local domain controllers. I would be blaming the network so hard right now if I were you, but then, firewalls spontaneously reconfigure themselves in the middle of the night just specifically to piss me off where I work. – Ryan Ries Dec 11 '14 at 20:58
  • 1
    @RyanRies Those and others all seem to be open. Windows Firewall is disabled on DC and the Exchange server and DC are on the same Hyper-V server, but I just noticed they are on different virtual switches (a.k.a. different physical network ports in this case). Could that do it? – yakatz Dec 12 '14 at 00:47

1 Answers1

2

The Microsoft article KB247811, How Domain Controllers Are Located in Windows is useful here.

That said, here's a list of things I'd check if you haven't tried them already:

  • Run dcdiag.exe on all domain controllers to see if they've been having trouble replicating. You may also want to check the event logs -- sometimes I find them easier to read than the dcdiag output.
  • Make sure the XXXX-DC01 server's IP address is listed in the DNS servers listed in your Exchange server's network connection properties. If the YG site DC is listed in there, consider removing it if it provides no meaningful redundancy.
  • From your Exchange server, test DNS lookup:

    c:\> nslookup
    Default Server: XXXX-DC01.xxxxxxxxx.edu
    Address: 10.10.0.21
    
    > set q=SRV
    > _ldap._tcp.xxxxxxxxx.edu
    
  • If you don't get a response that points to your first site DC, you have a DNS connectivity problem, a DNS server problem, and/or an FSMO problem.
  • If you do get a good response, then try running an LDAP query against the DC servers returned as a result. Given your setup, you likely already have Active Directory Users and Computers (dsa.msc) installed on Exchange server. Run that from the Exchange server. Right-click on the root object in the hierarchy and connect to your XXXX-DC01 domain controller. If you can't connect, then you know you have an LDAP problem, either with the service on the DC or with connection and authentication from the Exchange VM.
  • If you can connect through dsa.msc, then my last suggestion would be to check FSMOs. This is unlikely to be the problem, but is worth checking. Make sure you have one DC in each site that has a global catalog (the GC property can be changed on the server's NTDS object properties inside Active Directory Sites and Services), and that the schema master FSMO is not a global catalog server. Alternatively, you can just make all the servers global catalog servers. Setting them all is a bit of a brain-dead option, but if you have a small directory structure that rarely updates, it's not the worst thing in the world.
  • DNS is fine. All of our DCs are already GCs. From the exchange server, one of the 3 in the first site is unreachable (actually the old exchange server which was a DC too - odd because it can contact exchange on that system for mailbox replication). Might it only be trying to contact that one then switching sites? – yakatz Dec 23 '14 at 01:26
  • You wrote "there should be no replication as there is no other server". So do you mean that there _used_ to be Exchange on that server and it used to replicate OK when it was still installed, or do you mean that Exchange is still there -- just maybe not as actively used -- and the newer Exchange server can still talk to it? Either way, you may want to look at why some traffic between them fails as "unreachable". – Howard Miller Dec 23 '14 at 20:16
  • The servers don't have replication set up, but the new one keeps saying replication failed because it thinks it is in `Site B` and it knows the Mailbox Database should be in `Site A`. Exchange on the old system has no mailbox databases on it now, so replication is not even possible. – yakatz Dec 23 '14 at 20:23
  • Why do you still have Exchange components installed on the older server then? If you don't need them, uninstall them. If you do, migrate their functions to the new server. Then uninstall them. – Howard Miller Dec 23 '14 at 20:26
  • Working on uninstalling them, but did not finish yet... – yakatz Dec 23 '14 at 20:31