13

I have an issue in a long-lived process called kube-proxy being part of Kubernetes.

The problem is that from time to time a connection is left in FIN_WAIT2 state.

$ sudo netstat -tpn | grep FIN_WAIT2
tcp6       0      0 10.244.0.1:33132        10.244.0.35:48936       FIN_WAIT2   14125/kube-proxy
tcp6       0      0 10.244.0.1:48340        10.244.0.35:56339       FIN_WAIT2   14125/kube-proxy
tcp6       0      0 10.244.0.1:52619        10.244.0.35:57859       FIN_WAIT2   14125/kube-proxy
tcp6       0      0 10.244.0.1:33132        10.244.0.50:36466       FIN_WAIT2   14125/kube-proxy

These connections stack up over time making the process misbehave. I already reported an issue to Kubernetes bug-tracker but I'd like to understand why such connections are not closed by the Linux kernel.

According to its documentation (search for tcp_fin_timeout) connection in FIN_WAIT2 state should be closed by the kernel after X seconds, where X can be read from /proc. On my machine it's set to 60:

$ cat /proc/sys/net/ipv4/tcp_fin_timeout
60

so if I understand it correctly such connections should be closed by 60 seconds. But this is not the case, they are left in such state for hours.

While I also understand that FIN_WAIT2 connections are pretty unusual (it means the host is waiting for some ACK on from the remote end of the connection which might already be gone) I don't get why these connections are not "closed" by the system.

Is there anything I could do about it?

Note that restarting the related process is a last resort.

Adam Romanek
  • 283
  • 1
  • 3
  • 9
  • 1
    By the way, in FIN-WAIT2, the connection is *not* waiting for an ACK (the FIN it has sent has already been acknowledged, which is why we are not in FIN-WAIT1). Instead, the other end still has the option to send an unlimited amount of data. – Hagen von Eitzen Nov 23 '15 at 16:27

2 Answers2

15

The kernel timeout only applies if the connection is orphaned. If the connection is still attached to a socket, the program that owns that socket is responsible for timing out the shutdown of the connection. Likely it has called shutdown and is waiting for the connection to shut down cleanly. The application can wait as long as it likes for the shutdown to complete.

The typical clean shutdown flow goes like this:

  1. The application decides to shut down the connection and shuts down the write side of the connection.

  2. The application waits for the other side to shut down its half of the connection.

  3. The application detects the other side's shutdown of the connection and closes its socket.

The application can wait at step 2 for as long as it likes.

It sounds like the application needs a timeout. Once it decides to shut the connection down, it should give up waiting for the other side to do a clean shutdown after some reasonable amount of time.

David Schwartz
  • 31,215
  • 2
  • 53
  • 82
  • I will check this information with Kubernetes developers to see if such a timeout is implemented. I'll accept the answer once I verify it. Nevertheless thanks for quick response. – Adam Romanek Nov 23 '15 at 12:01
  • I'd like to understand your answer in greater detail. Could you please explain what is an orphaned connection? – Adam Romanek Nov 26 '15 at 11:42
  • 1
    @AdamRomanek An orphaned connection is one with no associated sockets, that is, one that can only be accessed by the kernel itself and that no process can perform an operation on. – David Schwartz Nov 27 '15 at 20:19
  • This would help..." https://blog.cloudflare.com/this-is-strictly-a-violation-of-the-tcp-specification/ – John Greene Sep 16 '16 at 00:28
3

If the socket is shutdown(), but not close() yet, the socket will stay in FIN_WAIT2 state. And since the application still owns the file descriptor, the kernel wouldn't bother to clean up.

L. Yan
  • 31
  • 1