0

I'm not much of an admin so I'll try to describe the problem as I experience it.

I'm running a virtual machine on Azure with Windows Server 2019 Datacenter. It acts as Navision server. It is joined to our domain. There is only one DC, on premise, it serves as AD, DHCP and DNS server. The connection between DC and NAV-server is via VPN tunnel.

Recently, users can't start their NAV client software, receiving a Kerberos fault. The reason for this is that the NAV-server lost the domain trust relationship. In the event-logs of the NAV-server are errors from GP-update, saying that the name to the DC can't be resolved.

Therefore I tried to flush the DNS cache on the NAV-server and it worked. It worked only for some hours though.

  • nltest /query resulted in connection status 1311 (ERROR_NO_LOGON_SERVERS).
  • ping on the name as well on the FQDN of the DC resulted in error that host was not found
  • ping on the IP of the DC worked!
  • nslookup on the DC and on the _ldap._tcp.dc._msdcs... worked!
  • The cache does at this time not hold an entry for the DC, since it was removed due to TTL

I have already removed the NAV-server from the domain and joined again with no change. The only things that seems to repair it is the DNS-cache-flush. After that also the authentication to the DC works again fine and NAV-client software starts.

I have decreased the TTL on the DC-DNS server to one minute for testing. After the DNS entry is removed from cache (ofter 1 minute), a ping to the name of the DC restores the entry (as I would expect it).

Without activities on the NAV-server, after some hours, the name resolution fails again.

UPDATE: i have investigated a bit deeper and inspected the DNS traffic sent from the client. As long as everyting works, the azure-vm sends a DNS request and receives the answer from the DC. One request, one reply. As soon as name resolution fails, the behavior is as follows.

  1. azure-vm sends a request to the DC
  2. azure-vm immediately sends a request to the secondary DNS server (1.1.1.1) with identical QueryId - is this normal?
  3. secondary DNS server replies (Name Error, of course since my DC is not known to 1.1.1.1)
  4. DC replies with correct IP address

This is not how i imagined the DNS-client would work and it seems not correct to me.

Thomas
  • 1
  • 3

1 Answers1

0

Check the time on all of your VMs. If any are out then you can run into problems like this.

Especially in a VM environment. Normally the host provides time to the clients but quite often admins can forget to use an external time source in their PDC which means time is set from the host internal clock which creeps over time.

Bevan
  • 151
  • 4