3

SOME THEORY

I've been doing some reading on tcp TIME-WAIT (here and there) and what I read is that it's a value set to 2 x MSL (maximum segment life) which keeps a connection in the "connection table" for a while to guarantee that, "before your allowed to create a connection with the same tuple, all the packets belonging to previous incarnations of that tuple will be dead".

Since segments received (apart from SYN under specific circumstances) while a connection is either in TIME-WAIT or no longer existing would be discarded, why not close the connection right away?

Q1: Is it because there is less processing involved in dealing with segments from old connections and less processing to create a new connection on the same tuple when in TIME-WAIT (i.e. are there performance benefits)?

If the above explanation doesn't stand, the only reason I see the TIME-WAIT being useful would be if a client sends a SYN for a connection before it sends remaining segments for an old connection on the same tuple in which case the receiver would re-open the connection but then get bad segments and and would have to terminate it.

Q2: Is this analysis correct?
Q3: Are there other benefits to using TIME-WAIT?

SOME PRACTICE

I've been looking at the munin graphs on a production server that I administrate. Here is one: enter image description here

As you can see there are more connections in TIME-WAIT than ESTABLISHED, around twice as many most of the time, on some occasions four times as many.

Q4: Does this have an impact on performance?
Q5: If so, is it wise/recommended to reduce the TIME-WAIT value (and what to)?
Q6: Is this ratio of TIME-WAIT / ESTABLISHED connections normal? Could this be related to malicious connection attempts?

Max
  • 3,373
  • 15
  • 51
  • 71
  • Related: There is an active RFC draft that advocates abolishing the `TIME-WAIT` state completely. Title: *"Sharp Close": Elimination of TIME-WAIT state of TCP connections*. RFC draft: [draft-kitamura-tcp-sharp-close](https://datatracker.ietf.org/doc/draft-kitamura-tcp-sharp-close/) – StackzOfZtuff Jul 02 '15 at 10:57

2 Answers2

4

In short, don't worry about TIME_WAIT. The overhead is almost none, and usually poses no problems.

On a busy server, port exhaustion is possible, and in that case there is the sysctl option of net.ipv4.tcp_tw_reuse = 1, which allows the kernel to reuse old ports that are still in TIME_WAIT as needed.

TIME_WAIT is part of the TCP specification, and is there to catch packets that may still be in transit (remember, not all connections are reliable, and that is what TCP aimed to solve). The timeout value may be very high for most modern uses, but it doesn't normally interfere with anything other than the output of netstat.

If you are in control of the socket yourself, and are certain you aren't waiting for data (e.g. you're final sender, or you don't care about a response), you can close the socket after setting the SO_NOLINGER option, which will terminate the connection with an RST, and immediately discard the socket.

So your questions:

Q1,Q2,Q3: It's there to collect late packets, "just in case", because links can be unreliable. It's part of the spec, it prevents packet loss, and adjusting it has no real benefit.

Q4: No

Q5: Don't worry about it, and you have the option of force the reuse of these sockets if need be.

Q5: TIME_WAIT and ESTABLISHED aren't correlated, other than the more short-lived connections you have, the greater that ratio will be. It could be cause by something malicious, but it's not an indicator any more than "excessive network activity" would be.

JimB
  • 1,924
  • 12
  • 15
2

Some answers from my limited experience involving TIME_WAIT:

1/2/3) See this SO question and this page for a good explanation of TIME_WAIT. It is less a performance issue and more a quality of service to ensure that all TCP packets in a connection get properly received.

4/5) One performance issue related to TIME_WAIT is that on a very busy server you may eventually run out of available connections if you have too many in the TIME_WAIT state. If you are running into this problem you may try reducing the TIME_WAIT value but this may fall into the "I know what I'm doing" tweak category. See this SO question for a few more details.

6) The default value of TIME_WAIT "should" be around 240s (or twice the packet MSL of 120s). Thus the ratio of established/wait connections will depend on your incoming connection rate and how long they stay open. For example, I checked on a few of my busy servers and the ratio ranged from 1.3 to 400, all of which I would consider normal based on the server and traffic it receives.

uesp
  • 3,384
  • 1
  • 17
  • 16