A while ago a strange problem occurred in our kubernetes cluster. We have a network containing windows servers (webserver, mailserver, etc.) and a kubernetes cluster running Rancher v2.6.0.
The cluster is communicating with the windows server via http requests and smtp/imap to send and read emails. For a while now random http requests fail with the error message no route to host
. It seems to only be limited to connections within the network and not affecting requests to third party apis. And the error does not always occur. A lot of requests go through without any issues and some fail. I implemented a retry-policy to try the same request again a few seconds later and sometimes it works on the first retry, sometimes the second and sometimes not at all.
I tried to google for a solution but I couldn't come up with anything, especially since only a percentage of all requests are affected.
Our sysadmin maintaining the network and windows server cannot identify any issues or even see the requests. So my guess is that the requests do not leave the cluster.. if that makes sense.
Unfortunately the kubernetes cluster used to be maintained by a colleague who is not available anymore. I'd be very grateful for suggestions where to start looking for a solution.