7

I need your help. I've struggled with this for months and nothing I've found online has helped me. The problem is, domain computers sometimes point to an incorrect domain controller in a different site. I have two sites connected via VPN: Site-A with two domain controllers and Site-B with one. Here is my current configuration:

Network configuration

Computers in Site-A usually connect to either SRV-1 or SRV-2 (as they should) but computers in Site-B only rarely connect to SRV-3. There is a very slow ADSL connection between the sites, so connecting to a wrong site makes the client nearly unusable.

All DCs are also DFS servers. The biggest downside is that while clients connect to wrong DC, they also connect to a wrong DFS server and only list the servers in the wrong site as available DFS servers.

There is a WINS server on SRV-1 and all the machines are pointing their WINS client to 192.168.0.70. WINS records seem okay:

WINS records on SRV-1

I've also gone through the DNS records on all servers, and they seem correct. The servers are in correct sites in AD Sites and Services and they have been assigned the correct subnets. All servers are connected (two-way) to each other in NTDS settings.

Some observations I've made:

SRV-1 in Site-A (192.168.0.0/24):

C:\Users\Administrator>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-1'.
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\Users\Administrator>nltest /DSGETSITE
Site-A
The command completed successfully

C:\Users\Administrator>nltest /DSGETDC:DOMAIN
           DC: \\SRV-1
      Address: \\192.168.0.70
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-A
Our Site Name: Site-A
        Flags: PDC GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

C:\Users\Administrator>nltest /dsgetsitecov
Site-A
The command completed successfully

SRV-2 in Site-A (192.168.0.0/24):

C:\Users\Administrator>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-1'.
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETSITE
Site-A
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETDC:DOMAIN
           DC: \\SRV-2
      Address: \\192.168.0.71
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-A
Our Site Name: Site-A
        Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /dsgetsitecov
Site-A
The command completed successfully

SRV-3 in Site-B (192.168.2.0/24):

C:\Users\Administrator>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-1'.
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETSITE
Site-B
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /DSGETDC:DOMAIN
           DC: \\SRV-3
      Address: \\192.168.2.70
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-B
Our Site Name: Site-B
        Flags: GC DS LDAP KDC WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

C:\Users\Administrator.DOMAIN>nltest /dsgetsitecov
Site-B
The command completed successfully

Client PC in Site-B (192.168.2.0/24):

C:\WINDOWS\system32>nltest /DCLIST:DOMAIN
Get list of DCs in domain 'DOMAIN' from '\\SRV-2'.
    SRV-2.domain.example.local        [DS] Site: Site-A
    SRV-1.domain.example.local [PDC]  [DS] Site: Site-A
    SRV-3.domain.example.local        [DS] Site: Site-B
The command completed successfully

C:\WINDOWS\system32>nltest /DSGETSITE
Site-A
The command completed successfully

C:\WINDOWS\system32>nltest /DSGETDC:DOMAIN
           DC: \\SRV-2
      Address: \\192.168.0.71
     Dom Guid: d8a18714-3272-4075-a5de-b1af522ec649
     Dom Name: DOMAIN
  Forest Name: domain.example.local
 Dc Site Name: Site-A
Our Site Name: Site-A
        Flags: GC DS LDAP KDC TIMESERV WRITABLE DNS_FOREST CLOSE_SITE FULL_SECRET WS
The command completed successfully

Note that DSGETSITE and DSGETDC return wrong values on the Client PC.

The funny thing is that it changes from day to day where the clients decide to point themselves to. I've tried restarting the clients, it doesn't help. I've tried restarting the servers one-by-one, no difference. None of the servers are multi-homed.

Servers are Windows Server 2008 R2 and client Win7 Pro / Win10 Pro.

Any help will be much appreciated!

Aleksiv95
  • 131
  • 2
  • 9
  • `1.` Not for nothing, but best practice is to use 127.0.0.1 as Tertiary DNS, not as Primary or Secondary DNS. `2.` Ping isn't a valid tool to test this scenario. It will return the first DC that answers, but that doesn't have any relevance to determining Site affiliation. `3.` I'll have more comments momentarily. – joeqwerty Dec 11 '15 at 17:37
  • Why are you using WINS? – joeqwerty Dec 11 '15 at 17:39
  • My previous VPN configuration didn't work without WINS, I'm not sure if it's needed at all currently. – Aleksiv95 Dec 11 '15 at 17:41
  • That sounds weird. Why would a VPN depend upon WINS? Nothing in AD, DNS or DFS is dependent upon WINS. What do the site specific DNS records look like? Run `nltest /dsgetsitecov` on each DC to see which site or sites each DC is covering. – joeqwerty Dec 11 '15 at 17:48
  • Do you mean the ones in Forward lookup zones > domain.example.local > _sites? http://i.imgur.com/KqjRUaO.png and those dns records point to the right IPs. `nltest /dsgetsitecov` returns Site-A on SRV-1 and SRV-2, and Site-B in SRV-3. So they're correct. – Aleksiv95 Dec 11 '15 at 17:54
  • what's the utilization on the DC's per site? If SRV-3 doesn't resolve to the right IP how can anyone authenticate against that? – Jim B Dec 11 '15 at 18:02
  • SRV-3 resolves to the right IP as I understand it. The utilization is fairly low on all servers, since there are only around ten clients in the domain. – Aleksiv95 Dec 11 '15 at 18:06
  • Did you run `nltest /dsgetsitecov` on each DC to confirm that they're all covering the correct site? – joeqwerty Dec 11 '15 at 18:10
  • Yes, I did. `nltest /dsgetsitecov` returns Site-A on SRV-1 and SRV-2, and Site-B in SRV-3. So they're correct. – Aleksiv95 Dec 11 '15 at 18:27
  • What did "Note that SRV-3 pings to the wrong IP, and DSGETSITE and DSGETDC return wrong values" mean if it is returning the correct IP? – Jim B Dec 11 '15 at 18:52
  • `ping SRV-3` returns the correct IP. I assumed that pinging to domain name returns the DC in current site but apparently that's not the case so I omitted that from the original post - sorry about the confusion. `DSGETSITE` and `DSGETDC` still return the wrong site and DC. – Aleksiv95 Dec 12 '15 at 19:08

2 Answers2

3

Okay, I figured it out. In the end it was a network issue; no changes needed to be made to the domain controllers. I had already configured policy routes for the VPN, but I had forgot to specify how to prioritize packets. I added an additional policy route for in-LAN traffic, and assigned it a DSCP value of cs4. For the tunneling routes I gave cs5. I'm not familiar with DSCP, but I understood that the smaller the number, the more important the route is (4 and 5 are just random numbers). Below is a screenshot of the final configurations on my ZyXEL ZyWall routers (I hope you appreciate Paint art):

enter image description here

I sort of understand why this solved my problem: now the main priority is to send packets to the local network, and only after that over the VPN. I still find it a bit confusing. Is it possible that if the server and the client are in different networks, the server doesn't see the IP of the client but the IP of one of the routers, and thus cannot make the decision about in which site the IP address belongs? I'm curious about finding out a further explanation.

Thanks to everyone who helped me, I appreciate it :)

Aleksiv95
  • 131
  • 2
  • 9
1

Ping does not provide any useful information. Ping is a straight DNS lookup, and does not represent how the DC Locator process functions.

You may want to use w32tm /query /status /verbose /computer:SRV-3 to confirm the time service on SRV-3 is functioning correctly.

It's probably simplest to do a packet capture, but you may also be able to manually isolate where the process is failing by simulating what occurs on the client PC in Site B.

  1. nslookup
    set type=srv
    _ldap._tcp.dc._msdcs.domain

This should return the list of ALL of your domain controllers (that have A record registered in DNS/aren't filtered by DNS Mnemonics).

  1. Build list of functional DC's by performing LDAP bind to each DC.

  2. First DC to respond returns the client site, the site the DC is in, and DSClosestFlag (0 or 1).

  3. If DC is in client site or DSClosestFlag = 1 or client has no site, use that DC. If not, perform:

    nslookup
    set type=srv
    _ldap._tcp.sitename._sites.domain

  4. Build list of functional DC's by performing LDAP bind to each DC.

  5. If no results from that, use any functional DC. (Unless "Try next closest site" is enabled. By default it is not.).

  6. If results and only one DC, use it. If multiple results, select DC based on SRV lowest priority number/highest weight number.

Greg Askew
  • 34,339
  • 3
  • 52
  • 81
  • There was an error in `w32tm` settings (sync error 1), so I reconfigured it (`/config /syncfromflags:domhier /update` and `/resync /rediscover`). This didn't solve the original issue. However, `w32tm` went back to sync error 1 after a while; can that be the cause for the wrong site issue? PDC is SRV-1 and that is pointing to time.windows.com. – Aleksiv95 Dec 12 '15 at 19:30
  • Enable w32time debug logging using: http://blogs.msdn.com/b/w32time/archive/2008/02/28/configuring-the-time-service-enabling-the-debug-log.aspx . Also, what is the version of w32time.dll? – Greg Askew Dec 13 '15 at 15:44
  • The file version is 6.1.7600.16385 (servers are 2008 R2). So today I thought I'd give Wireshark a go, but suddenly all the clients show the right DC, site, and DFS server. So I'll wait; sooner or later they will point to the wrong site, certainty is 100%. – Aleksiv95 Dec 13 '15 at 17:52
  • That version of w32time.dll has a known defect. You should update to the latest version here: https://support.microsoft.com/en-us/kb/2493006. Also, are you not running SP1 on your DC's? – Greg Askew Dec 13 '15 at 17:59
  • I see, so that fixed the issues with `w32tm`. Thanks :) Today I started thinking outside the box and discovered that the network metrics might be configured incorrectly. After some tuning, clients _seem_ to be connecting to right sites now. I'm carefully optimistic about this, so I'll observe how it works for a couple of days, and if this is the solution, I'll post here and tell what the exact reason was and how I resolved it. – Aleksiv95 Dec 14 '15 at 19:13
  • Oh and yes, all the DC's are running SP1. – Aleksiv95 Dec 14 '15 at 19:13