By default, when both tcp_tw_reuse
and tcp_tw_recycle
are disabled, the kernel will make sure that sockets in TIME_WAIT
state will remain in that state long enough -- long enough to be sure that packets belonging to future connections will not be mistaken for late packets of the old connection.
When you enable tcp_tw_reuse
, sockets in TIME_WAIT
state can be used before they expire, and the kernel will try to make sure that there is no collision regarding TCP sequence numbers. If you enable tcp_timestamps
(a.k.a. PAWS, for Protection Against Wrapped Sequence Numbers), it will make sure that those collisions cannot happen. However, you need TCP timestamps to be enabled on both ends (at least, that's my understanding). See the definition of tcp_twsk_unique for the gory details.
When you enable tcp_tw_recycle
, the kernel becomes much more aggressive, and will make assumptions on the timestamps used by remote hosts. It will track the last timestamp used by each remote host having a connection in TIME_WAIT
state), and allow to re-use a socket if the timestamp has correctly increased. However, if the timestamp used by the host changes (i.e. warps back in time), the SYN
packet will be silently dropped, and the connection won't establish (you will see an error similar to "connect timeout"). If you want to dive into kernel code, the definition of tcp_timewait_state_process might be a good starting point.
Now, timestamps should never go back in time; unless:
- the host is rebooted (but then, by the time it comes back up,
TIME_WAIT
socket will probably have expired, so it will be a non issue);
- the IP address is quickly reused by something else (
TIME_WAIT
connections will stay a bit, but other connections will probably be struck by TCP RST
and that will free up some space);
- network address translation (or a smarty-pants firewall) is involved in the middle of the connection.
In the latter case, you can have multiple hosts behind the same IP address, and therefore, different sequences of timestamps (or, said timestamps are randomized at each connection by the firewall). In that case, some hosts will be randomly unable to connect, because they are mapped to a port for which the TIME_WAIT
bucket of the server has a newer timestamp. That's why the docs tell you that "NAT devices or load balancers may start drop frames because of the setting".
Some people recommend to leave tcp_tw_recycle
alone, but enable tcp_tw_reuse
and lower tcp_timewait_len
. I concur :-)