29

I'm dealing with a server that isn't mine to configure. Something somewhere between me and this server kills any connection after 5 minutes if no traffic passes between the two machines. This includes active connections where a command is running on the server; specifically, I've demonstrated that this disconnection occurs using an SSH connection (running a bash command that doesn't print any output for more than 5 minutes) and a SQL database (running a SQL command that lasts more than 5 minutes). It also disconnects if I go read something and don't send any commands for 5 minutes, of course.

For the SSH setting, the group responsible for the server has recommended that I enable keep alive client side. They haven't provided any suggestions for the database connections.

I'm completely confused, though. What is the security benefit here? For SSH, I can bypass their settings completely from the client side, and they even recommend doing so. (The latter means there cannot even be a "security through obscurity" argument, since it won't be obscure because every user will need to know about it. Not that I think this would foil an attack even if it were obscure, especially since I found out on my own before they even recommended it.) For both, this demonstrably degrades availability. I could potentially see that disconnecting active sessions after a few hours of inactivity might be beneficial (Although the possible threats of long open sessions aren't immediately clear to me.), but every 5 minutes? This means I can't even read a DBA.SE post while I'm working out my SQL without being disconnected. Is this as ludicrous as it sounds to me, or is there something I'm missing?

Clarifications

Some points have been mentioned in comments, so I'd like to clarify a little.

  • I am able to consistently reproduce the timeout at 5 minutes. I used commands that record a timestamp server side every few seconds, and after a disconnect, the last timestamp recorded was always exactly 5 minutes after the first timestamp. So the disconnects were never sporadic.
  • Shortly after I wrote this post, the system/network admins responded that this is indeed intentional. I quote, "More fallout from the 5 min time limit set on the F5s a few weeks back?" I'm not sure what an F5 is; some Googling suggests it's a very expensive switch, which corresponds to someone who was trying to find a workaround for my DB connections later mentioning the switch settings.

(I don't believe this information invalidates any answers.)

jpmc26
  • 823
  • 9
  • 17
  • Why do you think this is being done intentionally? – Nick Bastin Apr 27 '16 at 23:39
  • 4
    @NickBastin Because of the sys admins' response. Apparently, this is a recently implemented change, and I'm not the first to bring it up. – jpmc26 Apr 27 '16 at 23:44
  • Have you tired setting the `ServerAliveInterval` options [.ssh/config](http://linux.die.net/man/5/ssh_config)? This induces periodic “ping” messages at the application layer going through the encrypted channel, so the proxy shouldn't be able to distinguish them from actual data still, slowly, flowing. That should work around the issue. – Jan Hudec Apr 28 '16 at 06:37
  • 1
    Also I noticed, that `TCPKeepAlive` [ssh option](http://linux.die.net/man/5/ssh_config) is on by default. If it is not turned off on your side, it indicates terminating the connections is indeed deliberate, because firewalls/masquerades will _not_ terminate connections that send TCP keepalives. Proxies will, though, because TCP keepalive sends empty packets that are not visible at the socket layer. – Jan Hudec Apr 28 '16 at 06:40
  • @JanHudec I'm aware of `ServerAliveInterval`, but as I said, this isn't my server to manage. What you mention is of course the server side equivalent to the client side settings I mentioned. However, the database does not go through SSH, as far as I know, which indicates that the connection killing is more general than SSH. I'm really not entirely clear on *where* exactly they're enforcing this, which is why I tried not to be overly specific in the question and just focus on the connection termination itself. – jpmc26 Apr 28 '16 at 06:49
  • 4
    @jpmc26, no, `ServerAliveInterval` is a _client_ option. `ServerAliveInterval` and `TCPKeepAlive` are _very different_ options. Also, if you can SSH to the server and if you can keep SSH from timing out, you could use it's forwarding feature to tunnel the database connection through it to protect it. – Jan Hudec Apr 28 '16 at 07:04
  • @JanHudec: the default keepalive interval is 2 hours on Linux and MSWindows. – symcbean Apr 28 '16 at 11:52
  • It could be in place to free up resources if it's a shared server or a server with lots of traffic. – Pharap Apr 29 '16 at 06:53
  • Unrelated to the actual security-focused question of why such a scenario was created: if the server has tmux or screen, those commands can support resuming a detached session, which may be an effective way to not lose work. – TOOGAM Apr 29 '16 at 09:00

5 Answers5

52

I'm completely confused, though. What is the security benefit here?

Nothing. The most likely scenario is that something in between is timing out the connection after 5 minutes to conserve resources. That could be a firewall, a WAN accelerator, an SSL accelerator, etc. Or it could be just a bad default setting. Who knows?

Network admins often have different concerns than everyone else, that often times can come into conflict with others. We work in a silo-ed world where the holistic picture isn't taken into account.

Don't assume there's a particularly good reason for every setting, but leave room for the potential that the 5 minute timeout was a quick fix for some other problem they're having, and your application problem was blowback.

Steve Sether
  • 21,480
  • 8
  • 50
  • 76
  • 2
    I noticed, that SSH defaults to enabling TCP keepalive. This would prevent firewall or masquerade from dropping the connection. A proxy still might, because it wouldn't see TCP keepalives. – Jan Hudec Apr 28 '16 at 06:42
  • 6
    The "router" I got from my cable Internet provider did disconnect every connection after 15min. Sucks if you need to use SSH. I found out that this piece of cheap plastic couldn't really handle more than ~5000 open connections without crashing, so that's probably why they implemented this. – Josef Apr 28 '16 at 07:23
  • 1
    IME, the most likely cause is not the connection being timed out but a badly configured firewall with an overflowing state table. – symcbean Apr 28 '16 at 11:49
  • 2
    @symcbean Also very possible. But you'd expect this to be less regular than 5 minutes. – Steve Sether Apr 28 '16 at 13:49
31

This sounds like a good example of a security "cargo cult". A security control has been implemented blindly without understanding the context involved or indeed implementing it correctly.

Generally speaking in security the point of an idle timeout it to reduce the risk of situations where a client machine is left unattended and a malicious user gets to the machine and executes unauthorised commands on it. The balance in these timeouts tends to be one of usuability (which favours longer or no timeout) and security (which favours shorter timeouts).

You can sometimes spot security cargo culting with exactly what you've mentioned which is that the operators of the system are actually helping you bypass the nominal control (in your case by recommending keep-alives be used)

Rory McCune
  • 60,923
  • 14
  • 136
  • 217
  • Risk of client machines left unattended should be mitigated by making sure they have automatic screen lock configured. On Windows it can be forced via group policy. – Jan Hudec Apr 28 '16 at 07:00
  • 1
    This is of course a good idea where you control all the client machines and can enforce policy on them, although again I've seen environments where people insist on applying that at the application layer and the OS layer regardless... – Rory McCune Apr 28 '16 at 07:02
  • 1
    You're overlooking the clear benefit of this approach: if you log into a system, lock your local screen, grab a cup of coffee, you'll come back to find that your session has timed out, so you can log in again and that forces you to retype your password which builds muscle memory so the administrators can start raising the bar to include 64 character passwords with 4 emoticons – Foon Apr 29 '16 at 13:06
13

I'm completely confused, though. What is the security benefit here?

It might not be a question of security but have a different reason. Unfortunately your question only offers your view so we can only speculate what the real reason might be.

One explanation might be that there is a simple stateful packet filter where the states time out after 120 seconds of inactivity. This means any data transferred after this inactivity will be blocked because there is no open state any more. The reason for this might be a device which has only very limited resources and can thus not keep too much states open at the same time, for instance a firewall which was designed with 10 users in mind but is now used by 100 users.

Of course it might also be that a BOFH is running the system which is probably more your view at the issue :) But this is hard to tell without having more insight in the actual system.

Steffen Ullrich
  • 184,332
  • 29
  • 363
  • 424
  • I pray it's not a load issue. I didn't mention this in the question (so I understand why you make note of the possibility), but this is supposed to be an entire "private cloud" environment that will host dozens of applications for many development teams. And this restriction is in place even for the servers dedicated to *development*. – jpmc26 Apr 27 '16 at 20:34
  • Good point about the packet filters - without a timeout, an attacker can DoS by a SYN flood type of attack. The timeout provides a limit to how long that remains effective after the attack stops. – Toby Speight Apr 28 '16 at 10:14
8

As you may have noticed this has nothing to do with security whatsoever. Instead, the practice of killing long-lived dormant TCP connections has to do with bugs - it's a work-around for buggy software.

One of the more famous examples of software leaving TCP sessions hanging is Internet Explorer (at least up to version 7). IE had the habit of not ending TCP sessions correctly so while on the client PC/laptop/phone the socket is considered closed the server still sees the connection as open. And IE is just one of the more well-known examples. There are lots of buggy software out there.

So why should a server care? Because each open socket is also an open file descriptor. Not handling this situation causes the server to run out of file descriptor. In theory the server can also run out of socket ID but that number is much, much higher than the number of open file descriptors an OS can handle.

Because of this, various implementations of the TCP stack include a timeout on dead sockets. On some OSes you can configure this timeout.

slebetman
  • 231
  • 2
  • 4
2

You have almost always timeout on inactive TCP session implemented in routers, switches and all that low level network machinery. It has little to do with security in the sense of protection against a deliberate attack, but just try to not exhaust resources because of buggy software failing to close connection, or other hardware errors in external network preventing the closing packet to reach its destination.

Because of those possible transcient failures, a network equipement (normally never resetted unless broken) that would never timeout inactive connection would end in resource starvation and will generate itself a deny of service...

The worse, is that they are often low resources (low cost) systems and for that reasons, network administrators try hard to reduce the risk that their part will crash under load and as such tend to use aggressive timeouts. That's the reason for using keep alive TCP connection (if the keep alive packed passes there is something at the other end ) combined with short network timeouts.

But the real problem here is that application developpers know little about low level network usages, and that network administrators do not care much about high level applications. Life is like that...

Serge Ballesta
  • 25,636
  • 4
  • 42
  • 84