10

Let me start out by stating that I have already searched many sources for information or a resolution, but I have been unable to find a permanent solution.

Problem: Randomly, for what appears to be no reason, the windows server starts to respond with the error message when it tries to access any network resource. It does not matter if it is a networked drive, a UNC path or a symbolic link. Also, clients cannot reach the server either once this error starts to occur. Remote Desktop states that the machine cannot be found when I try to connect. PING will return the IP assigned, but it states request timed out. There is no information in the windows event log for this error.

The server is a VM running Windows Server 2016. There is only one virtual network card assigned and there are no segmented VLAN’s.

Starting with http://support.microsoft.com/kb/319504 - I realize that this is for an older version of windows, but I do in fact get “system error 68 has occurred” when I execute the command “net use *\server\folder” at the time that the server is producing the error. However, none of the ways to fix the problem work.

I have a hard time believing that all ephemeral ports have been utilized. Executing the command “netsh int ipv4 show dynamicport tcp” currently displays that there are 16384 ports available for use.

Executing “netstat -ano” at the time the server is producing the error displays very few network resources in use (less than 50). The states are either listening or established. There are no sessions or ports stuck in time_wait or close_wait.

Next, https://support.microsoft.com/en-us/help/929851/the-default-dynamic-port-range-for-tcp-ip-has-changed-in-windows-vista. This article confirms what I am seeing for the dynamic range of ports, that it starts with 49152 instead of between 1024 and 5000. It also showed me the netsh command used above.

Most of the Google searches just point me back to support.microsoft.com/kb/319504, which is the first article that I went to or they are for an unrelated product (such as BizTalk or Exchange).

The VM has a light load. There are not many clients connected. The only software that is currently installed is SQL Server 2016.

If I reboot the VM, the error goes away for a few days. Then it comes back. And the really weird thing is, I have 2 VM’s that are acting this way. The VM host machine is working without error. And all of the other VM’s on that host are working without error. The underlying network has no reported issues either. All machines are on the same domain.

I am at a loss on what is producing the error. Any assistance would be greatly appreciated.

Thanks

yodabit
  • 101
  • 1
  • 3
  • By chance are you running ISCSI? – Bill Woodall Jul 21 '17 at 15:29
  • 1
    I've had a colleague almost rip his face off over this issue with no forseeable resolution. Have you reached out to Microsoft about this? – Spooler Mar 09 '18 at 11:19
  • Is the Windows Firewall service stopped? I've seen similar issues where someone stopped the service thinking it would stop the firewall. The service acts as a helper for the high range ephemeral ports and stopping it causes lots of problems. – duct_tape_coder Sep 27 '18 at 19:32
  • I have been having the same issue on a VM. It is a generation 1 VM that was V2V'd. All others are working fine. Rebuilding the NIC doesnt work. Did you find a fix? I'm about to rebuild the server. – KeithRichardson Sep 29 '18 at 12:50
  • @nurgent - look here: https://capens.net/content/fix-windows-error-name-limit-local-computer-network-adapter-card-was-exceeded – paulsm4 Oct 12 '18 at 20:08
  • This might be obvious but have you checked thoroughly what is different about the two servers that have this issue ? Also have they been doing this since they were built, or is it newish? If the latter, what changes were made around the time that it started ? – PatrickTaylor Feb 24 '19 at 07:28

2 Answers2

1

I started having this same exact issue on a Windows Server 2016 Hypervisor. My netstat results show me about the same as the original post as far as established/listening connections - nothing out of the ordinary.

Here's where things get interesting: I had set this hypervisor up as a Watchguard logging repository for the firewall and noticed that when the problematic connectivity issue kicks in, an element of this logging service has also failed. Not sure whether this failure is the chicken or the egg but the logging service uses PostgreSQL-8.2 and runs it as a service set to start 'Automatic'. The service is found to not be running and when I try to manually start it in this state it starts then stops. I noticed a few hung instances of postgres.exe process still running even though the service is stopped. If I end those processes, I can start the PostgreSQL-8.2 service and the server's network connectivity suddenly becomes normal again.

I'm not sure yet how to keep this from happening, though, or whether the PostgreSQL service is causing this or just a domino in a cascade of failures.

This might not be exactly what to look for in most cases but could be a clue to others to look for a service or process that has hung up network connectivity.

nscny
  • 11
  • 1
0

The network configuration of your VM is relevant here. Please share it?

Though I'm more familiar with Linux than Windows, if you are using a simple bridge network, I could imagine this happening either due to resource exhaustion caused by one or more other nodes (two VMs and a host sharing an IP, and between them they use up all the ephems), or simply because the ephemeral port the system wishes to use is already in use by another VM or the host itself and Windows naively assumes that it has exclusive rights to all ports, meaning that a failure to bind on min(in_use_port + 1, max_port) unambiguously indicates an exhaustion of ports. The one aspect of this that doesn't fit this hypothesis is that ping isn't responding. Ping is ICMP, and has nothing to do with availability of ephemeral ports, or lack thereof.

BMDan
  • 7,129
  • 2
  • 22
  • 34