I had previously asked this on Stack Overflow and was advised to move the question over to Server Fault. So here I am :)
The application I am working with connects to a couple of TCP servers running on different computers.
This one time, we took a brand new installation of Windows to the site, installed our application, plugged the 2 relevant networks to the client computer, and eventually the application would not connect to one of the servers, while the other one would connect just fine. This is a screenshot of the ipconfig output:
The server which we were failing to connect had IP address 192.168.3.2
. The client where the application was running on had IP address 192.168.3.1
.
We tried pinging the server, which was successful:
And finally, this is the output of route print:
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.42.1 192.168.42.136 291
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
192.168.42.0 255.255.255.0 On-link 192.168.42.136 291
192.168.42.136 255.255.255.255 On-link 192.168.42.136 291
192.168.42.255 255.255.255.255 On-link 192.168.42.136 291
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 192.168.42.136 291
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 192.168.42.136 291
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
192.168.3.0 255.255.255.0 192.168.3.1 1
0.0.0.0 0.0.0.0 192.168.42.1 Default
===========================================================================
we observed that the client application log was dumping this when connection was attempted: IpUtility.CheckIPRouting IP address 192.168.3.2 incorrectly routed through 192.168.42.136
.
We opened the application source code and noticed that all it does is to query the interface used for the particular remote IP address (which I don't particularly understand):
This is a snippet of CheckIPRouting
:
public static void CheckIPRouting(IPAddress iPAddressTarget, IPAddress ipAddressGateway)
{
IPEndPoint remoteEndPoint = new IPEndPoint(iPAddressTarget, 0);
Socket socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
IPAddress localEndPoint = QueryRoutingInterface(socket, remoteEndPoint);
if (localEndPoint.Equals(ipAddressGateway))
{
log.Trace($"IP address {iPAddressTarget} correctly routed through {localEndPoint.MapToIPv4()}");
}
else
{
log.Debug($"IP address {iPAddressTarget} incorrectly routed through {localEndPoint.MapToIPv4()}");
}
}
And this is QueryRoutingInterface
:
private static IPAddress QueryRoutingInterface(Socket socket, IPEndPoint remoteEndPoint)
{
SocketAddress address = remoteEndPoint.Serialize();
byte[] remoteAddrBytes = new byte[address.Size];
for (int i = 0; i < address.Size; i++)
{
remoteAddrBytes[i] = address[i];
}
byte[] outBytes = new byte[remoteAddrBytes.Length];
socket.IOControl(IOControlCode.RoutingInterfaceQuery, remoteAddrBytes, outBytes);
for (int i = 0; i < address.Size; i++)
{
address[i] = outBytes[i];
}
EndPoint ep = remoteEndPoint.Create(address);
return ((IPEndPoint) ep).Address;
}
Later, we noticed that the client application code also has a setting to force the connection through an interface IP address with this code:
logger.Info("Creating tcpClient for {0} using address {1}", connectionResourceName, adapterAddress);
IPEndPoint adapter = new IPEndPoint(adapterAddress, 0);
tcpClient = new TcpClient(adapter);
Which was clearly seen in the application log:
Creating tcpClient for RESOURCE using address 192.168.3.1
At this point I was baffled. Then I decided to change the configuration of the application, and instead of using 192.168.3.1
interface to establish the connection, I changed the interface address to 0.0.0.0
.
Then the connection was established!
The route print at that point then became:
IPv4 Route Table
===========================================================================
Active Routes:
Network Destination Netmask Gateway Interface Metric
0.0.0.0 0.0.0.0 192.168.42.1 192.168.42.136 291
127.0.0.0 255.0.0.0 On-link 127.0.0.1 331
127.0.0.1 255.255.255.255 On-link 127.0.0.1 331
127.255.255.255 255.255.255.255 On-link 127.0.0.1 331
192.168.3.0 255.255.255.0 On-link 192.168.3.1 2
192.168.3.1 255.255.255.255 On-link 192.168.3.1 257
192.168.3.255 255.255.255.255 On-link 192.168.3.1 257
192.168.42.0 255.255.255.0 On-link 192.168.42.136 291
192.168.42.136 255.255.255.255 On-link 192.168.42.136 291
192.168.42.255 255.255.255.255 On-link 192.168.42.136 291
224.0.0.0 240.0.0.0 On-link 127.0.0.1 331
224.0.0.0 240.0.0.0 On-link 192.168.42.136 291
224.0.0.0 240.0.0.0 On-link 192.168.3.1 257
255.255.255.255 255.255.255.255 On-link 127.0.0.1 331
255.255.255.255 255.255.255.255 On-link 192.168.42.136 291
255.255.255.255 255.255.255.255 On-link 192.168.3.1 257
===========================================================================
Persistent Routes:
Network Address Netmask Gateway Address Metric
192.168.3.0 255.255.255.0 192.168.3.1 1
0.0.0.0 0.0.0.0 192.168.42.1 Default
===========================================================================
Then I decided to revert the configuration of the client application to force connection through 192.168.3.1 adapter as before, and the connection was getting established normally.
I then compared the 2 route prints (before and after). I noticed the following new entries on the second:
192.168.3.0 255.255.255.0 On-link 192.168.3.1 2
192.168.3.1 255.255.255.255 On-link 192.168.3.1 257
192.168.3.255 255.255.255.255 On-link 192.168.3.1 257
224.0.0.0 240.0.0.0 On-link 192.168.3.1 257
255.255.255.255 255.255.255.255 On-link 192.168.3.1 257
I'm guessing these are making all the difference. But I thought the persistent route would do that just fine.
Finally, I get to the question:
Why did Windows ignore the persistent route and decided to route TCP packets through the wrong interface? How could I make this foolproof in a way that Windows won't make the wrong decision?
For reference, this is with Windows 10 IoT Enterprise.
0.0.0.0 0.0.0.0 192.168.42.1
, which is unnecessery, bacause its identical to default gateway for 192.168.42.136 interface. It's looks like Interfece 192.168.3.1 was not ready to communicate. It's a bit confusing, as you said, have you ping response from 192.168.3.2 all the time you tested tcp connection? I'm trying to say that you problem can have very simple solution like disconnected cable. – Artur Wosztyl Aug 30 '19 at 07:35