3

I had previously asked this on Stack Overflow and was advised to move the question over to Server Fault. So here I am :)

The application I am working with connects to a couple of TCP servers running on different computers.

This one time, we took a brand new installation of Windows to the site, installed our application, plugged the 2 relevant networks to the client computer, and eventually the application would not connect to one of the servers, while the other one would connect just fine. This is a screenshot of the ipconfig output:

enter image description here

The server which we were failing to connect had IP address 192.168.3.2. The client where the application was running on had IP address 192.168.3.1.

We tried pinging the server, which was successful: enter image description here

Tracert was also successful: enter image description here

And finally, this is the output of route print:

IPv4 Route Table 
=========================================================================== 
Active Routes: 
Network Destination        Netmask          Gateway       Interface  Metric 
          0.0.0.0          0.0.0.0     192.168.42.1   192.168.42.136    291 
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331 
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331 
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331 
     192.168.42.0    255.255.255.0         On-link    192.168.42.136    291 
   192.168.42.136  255.255.255.255         On-link    192.168.42.136    291 
   192.168.42.255  255.255.255.255         On-link    192.168.42.136    291 
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331 
        224.0.0.0        240.0.0.0         On-link    192.168.42.136    291 
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331 
  255.255.255.255  255.255.255.255         On-link    192.168.42.136    291 
=========================================================================== 
Persistent Routes: 
  Network Address          Netmask  Gateway Address  Metric 
      192.168.3.0    255.255.255.0      192.168.3.1       1 
          0.0.0.0          0.0.0.0     192.168.42.1  Default  
=========================================================================== 

we observed that the client application log was dumping this when connection was attempted: IpUtility.CheckIPRouting IP address 192.168.3.2 incorrectly routed through 192.168.42.136.

We opened the application source code and noticed that all it does is to query the interface used for the particular remote IP address (which I don't particularly understand):

This is a snippet of CheckIPRouting:

        public static void CheckIPRouting(IPAddress iPAddressTarget, IPAddress ipAddressGateway)
        {
            IPEndPoint remoteEndPoint = new IPEndPoint(iPAddressTarget, 0);
            Socket socket = new Socket(AddressFamily.InterNetwork, SocketType.Stream, ProtocolType.Tcp);
            IPAddress localEndPoint = QueryRoutingInterface(socket, remoteEndPoint);
            if (localEndPoint.Equals(ipAddressGateway))
            {
                log.Trace($"IP address {iPAddressTarget} correctly routed through {localEndPoint.MapToIPv4()}");
            }
            else
            {
                log.Debug($"IP address {iPAddressTarget} incorrectly routed through {localEndPoint.MapToIPv4()}");
            }
        }

And this is QueryRoutingInterface:

        private static IPAddress QueryRoutingInterface(Socket socket, IPEndPoint remoteEndPoint)
        {
            SocketAddress address = remoteEndPoint.Serialize();

            byte[] remoteAddrBytes = new byte[address.Size];
            for (int i = 0; i < address.Size; i++)
            {
                remoteAddrBytes[i] = address[i];
            }

            byte[] outBytes = new byte[remoteAddrBytes.Length];
            socket.IOControl(IOControlCode.RoutingInterfaceQuery, remoteAddrBytes, outBytes);
            for (int i = 0; i < address.Size; i++)
            {
                address[i] = outBytes[i];
            }

            EndPoint ep = remoteEndPoint.Create(address);
            return ((IPEndPoint) ep).Address;
        }

Later, we noticed that the client application code also has a setting to force the connection through an interface IP address with this code:

            logger.Info("Creating tcpClient for {0} using address {1}", connectionResourceName, adapterAddress);
            IPEndPoint adapter = new IPEndPoint(adapterAddress, 0);
            tcpClient = new TcpClient(adapter);

Which was clearly seen in the application log:

Creating tcpClient for RESOURCE using address 192.168.3.1

At this point I was baffled. Then I decided to change the configuration of the application, and instead of using 192.168.3.1 interface to establish the connection, I changed the interface address to 0.0.0.0.

Then the connection was established!

The route print at that point then became:

IPv4 Route Table 
=========================================================================== 
Active Routes: 
Network Destination        Netmask          Gateway       Interface  Metric 

          0.0.0.0          0.0.0.0     192.168.42.1   192.168.42.136    291 
        127.0.0.0        255.0.0.0         On-link         127.0.0.1    331 
        127.0.0.1  255.255.255.255         On-link         127.0.0.1    331 
  127.255.255.255  255.255.255.255         On-link         127.0.0.1    331 
      192.168.3.0    255.255.255.0         On-link       192.168.3.1      2 
      192.168.3.1  255.255.255.255         On-link       192.168.3.1    257 
    192.168.3.255  255.255.255.255         On-link       192.168.3.1    257 
     192.168.42.0    255.255.255.0         On-link    192.168.42.136    291 
   192.168.42.136  255.255.255.255         On-link    192.168.42.136    291 
   192.168.42.255  255.255.255.255         On-link    192.168.42.136    291 
        224.0.0.0        240.0.0.0         On-link         127.0.0.1    331 
        224.0.0.0        240.0.0.0         On-link    192.168.42.136    291 
        224.0.0.0        240.0.0.0         On-link       192.168.3.1    257 
  255.255.255.255  255.255.255.255         On-link         127.0.0.1    331 
  255.255.255.255  255.255.255.255         On-link    192.168.42.136    291 
  255.255.255.255  255.255.255.255         On-link       192.168.3.1    257 
=========================================================================== 
Persistent Routes: 
  Network Address          Netmask  Gateway Address  Metric 
      192.168.3.0    255.255.255.0      192.168.3.1       1 
          0.0.0.0          0.0.0.0     192.168.42.1  Default  
=========================================================================== 

Then I decided to revert the configuration of the client application to force connection through 192.168.3.1 adapter as before, and the connection was getting established normally.

I then compared the 2 route prints (before and after). I noticed the following new entries on the second:

      192.168.3.0    255.255.255.0         On-link       192.168.3.1      2 
      192.168.3.1  255.255.255.255         On-link       192.168.3.1    257 
    192.168.3.255  255.255.255.255         On-link       192.168.3.1    257 
        224.0.0.0        240.0.0.0         On-link       192.168.3.1    257
  255.255.255.255  255.255.255.255         On-link       192.168.3.1    257

I'm guessing these are making all the difference. But I thought the persistent route would do that just fine.

Finally, I get to the question:

Why did Windows ignore the persistent route and decided to route TCP packets through the wrong interface? How could I make this foolproof in a way that Windows won't make the wrong decision?

For reference, this is with Windows 10 IoT Enterprise.

1 Answers1

1

The persistent route you added is unnecessary, probably wrong.

Persistent Routes: 
Network Address          Netmask  Gateway Address  Metric 
      192.168.3.0    255.255.255.0      192.168.3.1       1 

You've said to windows: "Send all communication addressed to 192.168.3.0/24 subnet via 192.168.3.1 gateway."

192.168.3.0/24 is directly connected network. You need not any routes to communicate.

In first route print output there was no information about how to communicate with 192.168.3.0/24 network. This could by caused by interface down/disconnected cable. Can You reproduce problem on different machine?

  • If I don't need routes to communicate because it is directly connected, then why you say that in the first `route print` there was no information about how to communicate, sorry that is a bit confusing. Assuming it does need information in the route table, didn't the persistent route have that information? I mean.. It was pinging properly and `tracert` showed a direct connection (1 hop). Yet, on TCP layer, the packets were going through the wrong adapter... – johnildergleidisson Aug 29 '19 at 14:27
  • I have a vague feeling that routes for local interfaces aren't added to the table until they're needed, perhaps that might explain why the route suddenly appeared when the software succeeded in making the connection? And perhaps in the original configuration Windows got confused because you were explicitly trying to connect *from the gateway address* listed in the persistent route? – Harry Johnston Aug 30 '19 at 00:02
  • @johnildergleidisson as i said persistent routes is not necessary in your case. You have another rule 0.0.0.0 0.0.0.0 192.168.42.1 , which is unnecessery, bacause its identical to default gateway for 192.168.42.136 interface. It's looks like Interfece 192.168.3.1 was not ready to communicate. It's a bit confusing, as you said, have you ping response from 192.168.3.2 all the time you tested tcp connection? I'm trying to say that you problem can have very simple solution like disconnected cable. – Artur Wosztyl Aug 30 '19 at 07:35
  • The persistent route to `0.0.0.0` wasn't added by us, not sure why Windows decided to do that. The persistent route to `192.168.3.0` was added by us in the past because for some reason we don't seem to fully understand yet, sometimes Windows decides to use the wrong route. The server we connect on `192.168.3.2` has a fixed IP and a dedicated network. After the route was added by us, it solved some issues but as you point out, it might also create problems. That's why I'm here asking for expert opinion :). Regarding the ping to `192.168.3.1`, yes, it was pinging at all times. – johnildergleidisson Aug 30 '19 at 13:18