0

I have a set of several offices joined using various combinations of IPsec VPNs and an MPLS network. The majority of sites form a mesh arrangement using the VPNs, but site B only has a single IPsec VPN to site A - site B cannot reach any of the other sites (sites C, and D).

Sites A, C, and D all share an Active Directory domain, say "companya.com". Domain controllers for companya.com are located in all three sites, and are all running Windows Server 2012 R2.

Site B runs its own Active Directory domain - say "companyb.com". Domain controllers for companyb.com are located solely in site B. One runs Windows Server 2019, the other runs Windows Server 2012 R2.

We have established a two-way trust between the AD domains companya.com and companyb.com. This was achieved using a conditional forwarder in the AD DNS servers for companyb.com, pointing to both domain controllers on site A for companya.com; in addition, we set up a stub zone in companya.com to point to both domain controllers on site B for companyb.com.

As expected, both domain controllers in site A can reliably contact a domain controller in site B. However, both domain controllers in site B can only reliably contact domain controllers in site A - because we deployed a conditional forwarder, sometimes DNS lookups for SRV records describing domain controllers in site B return results for sites C and D, which site B cannot access at all. This is causing sporadic errors such as "The system cannot contact a domain controller to service the authentication request. Please try again later."

I need to ensure that when domain controllers in companyb.com perform lookups to find domain controllers on companya.com, that only domain controllers in site A are returned.

I have tried:

  • Configuring an AD Site for site B, using the subnet to which the companyb.com domain controllers are connected. However, I don't think this works because there is nothing configured in DNS to specify that site B should only be served results for site A.
  • Replacing the conditional forwarder on DCs in companyb.com with a stub zone. The same problem persisted, except worse because the stub zone caused lookups against DNS servers in sites C and D, which are unreachable.
  • Manually adding a primary AD-integrated zone for companya.com on companyb.com's DNS servers, and adding records all pointing to domain controllers on site A:
    • Multiple A records for companya.com.
    • Multiple _gc._tcp.companya.com records.
    • Multiple _ldap._tcp.companya.com records.
    • Multiple _kerberos._tcp.companya.com records.

I am not able to build IPsec tunnels between site B, and sites C and D, nor am I able to route traffic to sites C and D via the existing tunnel from site B to site A.

I suspect that Windows Server 2016's DNS policies feature might help this situation but I don't have access to DCs running Windows Server 2016. I also suspect I may have missed some records when I manually set up a DNS zone om companyb.com's servers.

Any insight would be appreciated.

  • This seems unnecessary and convoluted. The DC Locator process is designed to select and test DC's. If the error is "system cannot contact a domain controller", that means there were NO usable DC's in any of the returned results. Additionally, the client should attempt to use any of the records, if it doesn't have a record in it's local site for the target domain. – Greg Askew Aug 13 '21 at 14:29
  • What might cause connections to be sporadic in this instance? I've checked the VPN and can see traffic destined for domain controllers in site A from site B but only for some requests; others show traffic from site B to sites C, and D and such traffic fails to route in the absence of any tunnels between the relevant subnets. – Daniel Arkley Aug 13 '21 at 15:06
  • There could be numerous causes. There needs to be a correlated packet capture running when the symptom occurs to know more. It could be a DC is passing the LDAP ping test, but is operationally down. It could be incorrect firewall port allocations for RPC activity. – Greg Askew Aug 14 '21 at 12:10
  • As suggested I've looked through packet captures and I can see active attempts to connect to unreachable DCs. It isn't possible for these DCs to have responded to an LDAP ping test as you've described, as there is no VPN tunnel between site B and sites C and D, so I'm unsure why traffic is being directed that way? – Daniel Arkley Aug 17 '21 at 13:36
  • It depends on what the "connect" is and the application. If it is Windows native functionality, it is expected that the endpoint will connect to any and all of the DC's and test them. If it is an (dysfunctional) application that is connecting to the domain FQDN, it will connect to the first DC record returned regardless of status. Furthermore, if a site DC is not accessible from other locations, it should not be advertised as a globally available DC. That's what DNS Mnemonics are for. – Greg Askew Aug 17 '21 at 14:05
  • I'm seeing tcp/445, categorised Active Directory, from site B to sites C and D. The application in question is Sage connecting to a Windows File Share to locate its data. Can you elaborate on the DNS mnemonics? My understanding of the DNS setup was that if I were in companya.com's domain, I could define an AD site, sitea.companya.com. Then, if clients want to locate DCs on sitea they can resolve _ldap._tcp.sitea.companya.com. How does that work across a trust given that trusts aren't aware of sites in other domains? – Daniel Arkley Aug 18 '21 at 22:01
  • If it is an application, I don't see SRV records factoring into this. If you have A records for the same as parent record, and those DC's aren't reachable, this would be expected. You either need to cleanup the A records for the problem domain in the site where the issue is occurring, or change the application to be more resilient and not use dead records (unlikely). Stub zones are for native Windows functionality in a normal domain, not for application usage in a forest where dead DNS records are published for the domain. – Greg Askew Aug 19 '21 at 11:06

0 Answers0