3

Last Friday, I turned on DNS Scavenging on one of my domain controllers. This was done with the goal of removing old unused A records from decommissioned computers. However it turned out that our DHCP(provided by Meraki hardware) was misconfigured and no relaying information back to DNS, so computers were typically only creating A records when they were first joined to the domain. Due to this, after scavenging ran on Saturday, 600 out of 900 A records were removed. That's not the problem I'm asking about though.

The actual problem is that since DNS Scavenging ran, a number of computers have been unable to reach internal resources by name. This has mostly affected remote worker laptops that connect to our VPN, but we have also had laptops affected that have never left our corporate network, as well as one server. Some more precise symptoms:

Affected computers cannot ping any of our servers by name, getting back an error "Ping request could not find host x. Please check the name and try again." Running wireshark shows that the ping initiates a DNS request, which returns the result "0x8182 standard query response, server failure." Affected computers can run nslookup to any of our servers, getting back the proper name and IP address. Affected computers can ping any of our servers by IP address. Affected computers had their A records deleted from DNS, but many more computers had their A records deleted and continued working fine. External DNS works fine for affected computers. Running ipconfig /registerdns from an affected computer does not create a new Forward Lookup Zone A record.

The only succesful workaround we have achieved so far involves removing a computer from our domain and readding that. In most cases this fixes issues and the computer can access network resources succesfully. However this does not create new A records for computers in DNS, and some affected computers are not fixed and continue being unable to access network resources or ping servers.

Attempted Fixes:

Running various commands to clear out network settings from affected computers: ipconfig /flushdns, ipconfig /registerdns, ipconfig /release, ipconfig /renew, netsh winsock reset catalog, netsh int ip reset reset.log, route /f. Along with restarts, none of these have changed the problem. Navigating to an affected computer in AD and clicking Reset account. This also made no change. Restarting domain controllers multiple times - no change. Enabling DNS Debug logging - this shows that internal DNS queries from affected PCs are getting back messages with an RCODE of 2, indicating a server failure. See the end for a full packet example. Updated forward lookup zone dynamic updates to allow nonsecure and secure updates. - No change. We created a new server 2016 virtual domain controller in Azure and reconfigured DHCP to direct some affected computers to use it as their primary DNS server. - After doing so, affected computers using it as their DNS server do create new A records in DNS. They still cannot reach network resources by name.

So we are still unable to consistently get every affected computer working again, and have had to remove and rejoin many computers from our domain as a workaround. If any of yall know what could cause of this or other potential fixes, please let me know.

Return DNS packet example:

5/15/2018 9:11:07 AM 17CC PACKET 00000211E32038A0 UDP Snd 10.2.151.35   aa53 R Q [8281 DR SERVFAIL] A     (7)dcazure(0)
UDP response info at 00000211E32038A0
  Socket = 724
  Remote addr 10.2.151.35, port 50641
  Time Query=40319, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0019 (25)
  Message:
    XID     0xaa53
    Flags   0x8182
      QR       1 (RESPONSE)
      OPCODE   0 (QUERY)
      AA       0
      TC       0
      RD       1
      RA       1
      Z       0
      CD       0
      AD       0
      RCODE   2 (SERVFAIL)
    QCOUNT   1
    ACOUNT   0
    NSCOUNT 0
    ARCOUNT 0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name     "(7)dcazure(0)"
      QTYPE A (1)
      QCLASS 1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty
joeqwerty
  • 108,377
  • 6
  • 80
  • 171
  • Did you ticked the option to enable secure and non-secure update for the time the DNS fill up from the DHCP ? – yagmoth555 May 16 '18 at 17:08

1 Answers1

1

After further troubleshooting, we discovered that this problem was due to Directaccess being misconfigured on client machines and causing them to believe they were not on our corporate network when they were. Fixed it by adjusting our Directaccess group policy and deleting a registry key on affected clients to force them to clear out existing settings.