0

In a Windows Server 2008 domain the on-site domain controller serves as both DNS and WINS server. It is also a DHCP server. One workstation is used as a platform for all admin of the various servers in the domain. All servers are shut down at night after a system-wide backup. Every morning there is a period during which no remote desktop sessions can connect to any of the servers which have IP addresses allocated by DHCP. My strategy is to nail down the IP addresses of these boxes, but I want to understand the problem first. This morning the problem is as follows:

Nslookup and Nblookup both agree on the IP address of the target computer, but when I try to ping the target computer, ping somehow resolves the IP address differently and ping fails. My first indication of the problem was when three different remote desktop sessions all failed to connect. If I wait an hour or so after the servers in question have booted up the problem will resolve itself.

On the workstation that cannot connect I have done IPCONFIG /FLUSHDNS. On the DHCP server I set DHCP to use 30 day leases according to the theory that changing IP addresses somehow cause this problem, but the next morning the problem remains. I have confirmed that no computers are mentioned in any HOSTS or LMHOSTS files. I can't figure out where ping is getting its IP address.

Any ideas?

MadHatter
  • 78,442
  • 20
  • 178
  • 229
user2161082
  • 9
  • 1
  • 2
  • Have you disabled the DNS client on a machine with the problems? Then at least you can confirm if it's a local issue, as the only choice the machine will have will be to try a dns request.. – NickW Mar 12 '13 at 14:45
  • 1
    Does it resolve to an external IP address or internal IP address? – Nixphoe Mar 12 '13 at 14:47
  • My suggestion, fire up wireshark/tcpdump when you are seeing the problem. See what name resolution request is made directly before the first ICMP packet. – Zoredache Apr 16 '13 at 00:15
  • I agree with @Zoredache that a network capture may be in order. When you successfully ping the name, are you using the FQDN or just the hostname? If hostname, can you try it again (make sure you type it lowercase) and report back what the "Pinging ...." line reads? You can obscure the name/IP but I'm curious the format and case of the result. Also suggest you perform an `ipconfig /displaydns` after the ping to see if it is somehow being resolved by DNS. – charleswj81 Jun 13 '15 at 20:32
  • Also, this is a long shot, but are you using DirectAcces or Name Resolution Policies by chance? – charleswj81 Jun 13 '15 at 20:34
  • 1
    Sorry about the flood of comments, but what is the erroneous IP? If you can't post it, does it have any significance? – charleswj81 Jun 13 '15 at 20:36

5 Answers5

1

Nslookup and Nblookup both obtain their data directly from the servers, while ping goes through local resources first, hosts file, DNS cache, then DNS lookup (I may have skipped a step) so if it finds an answer there first, that's what sticks.

NickW
  • 10,183
  • 1
  • 18
  • 26
  • 2
    Yes, obviously ping has some local resources, but I have flushed the DNS cache and checked the HOSTS / LMHOSTS files. Where else? This morning again an hour after starting up all the servers (mostly VMs) I attempt to open remote desktops to no success. NBLOOKUP and NSLOOKUP both give the same, correct IP address, but ping comes up with an erroneous IP address and fails. If I wait another hour without taking any action the problem clears up on its own. I haven't been systematic enough to notice whether ping is coming up with yesterday's IP address or something entirely different. Tomorrow. – user2161082 Mar 13 '13 at 14:44
0

Is there an entry for that IP address in your local C:\Windows\System32\Drivers\etc\hosts file?

EDIT: I just saw you mention there's nothing in the hosts file. I can't really explain your reported behavior. I would go back to square one and do a series of sanity checks to verify what you think you know is true and correct.

EDIT 2: The DHCP service is configured by default to update DNS as leases are modified. These changes then have to be replicated to all of your AD integrated DNS servers via AD replication. This process can take up to fifteen minutes to fully complete. During this time, some DNS servers may show the correct IP while others might show the old one. You can confirm this by querying the DNS servers directly. Try running nslookup <query> <resolver> where <resolver> is the name is your internal DNS server. Query each one in turn to verify they are returning consistent results.

John Homer
  • 1,293
  • 10
  • 10
  • The DHCP server is the same machine as the domain controller / DNS server and is on the same subnet as both computers (the server I am trying to remote desktop and the workstation from which I am trying to establish a remote desktop) so a different domain controller on a different subnet should not come into the picture though it might be listed as a second DNS server. I am racking my brains for a solution along the lines of "sanity check" second checking everything I can think of. – user2161082 Mar 13 '13 at 14:38
  • Did you clear the DNS cache for the local resolver for your workstation (the second DNS server you mention above)? Each DNS server maintains it's own local cache. Try clearing the cache for both DNS servers, in addition to 'ipconfig /flushdns'. This is done by right-clicking on the DNS server in DNS Manager and selecting 'Clear Cache'. – John Homer Mar 14 '13 at 16:49
0

This problem just went away. The actions I took were first to force DHCP to register IP address changes in DNS (even if not asked to). That did not fix the problem. Next I set leases to 30 days. That also did not fix the problem. After the problem went away I set leases to one hour, but that didn't cause the problem to reccur. Note that none of these actions are logically connected to the actual problem. They fall in the category of grasping at straws. I can not think of any good reason why, after flushing the DNS, NSLOOKUP and NBLOOKUP would give one IP address and ping (and, presumably, the remote desktop application) would find another IP address. (HOSTS and LMHOSTS are default with no IP addresses mentioned.)

user2161082
  • 9
  • 1
  • 2
  • The problem came back this morning. After a remote desktop session failed to connect I flushed the DNS, I did an NSLOOKUP which gave me IP address A; I did an NBLOOKUP which also gave me address A and then I pinged which resolved to address B and failed. Finally I logged into the problem server via a Virtual Machine connection and with one more DNS flush the problem went away. The problem stayed around for almost an hour until I logged into the machine. – user2161082 Mar 21 '13 at 18:09
0

This sounds like a DNS problem for sure. Set the TTL to 1 minute and if it has a setting that lets you bump the serial number, bump it.

If that fails, I'd try isolating the possible culprits one by one. Schedule a maintenance window when no one needs a workstation (say, 8PM or however it works for your office). With all the client machines still turned on, portscan the entire netblock to ensure that no one else is running a DHCP, DNS, or wins server. Ensure that every last option on your DNS server is set correctly.

If that doesn't help, I'd start by picking a new IP to be your DNS server. When everyone has gone home, configure the master to hand out that IP to everyone over DHCP and wins. Shut down all the client machines. Migrate the DNS server to a new, clean machine or VM, then shut down the domain master as well.

Then, turn on the domain master and all the client machines. What happens?

Huns
  • 101
  • 1
0

Did you ever sort this problem? It sounds like something is/was fundamentally broken with the address assignment on your network.

I would have Wiresharked while pinging and observed the IPs returned by ICMP ECHO and nslookup/nblookup.

If you're on a domain, nslookup by default will be returning the IP as known by the domain's DNS server (and the Domain Controller will often be handing out its own address as the DNS server for DHCP Clients on smaller domains).

Ping relies on ARP - a host joining a network solicits existing hosts on the network to tell it what their IPs are (and each host maintains its own list of known hosts' IPs mapped to their MACs).

It may be that the DC is getting confused about addresses it's assigning, or you may have competing DHCP servers on the LAN (coincidentally assigning IPs in the same range) and some clients are talking to the wrong one.

Either way, I'd be interested to know if you ever found a resolution to this issue.

Chris Woods
  • 388
  • 3
  • 21