31

Recently we had an apache server which was responding very slowly due to SYN flooding. The workaround for this was to enable tcp_syncookies (net.ipv4.tcp_syncookies=1 in /etc/sysctl.conf).

I posted a question about this here if you want more background.

After enabling syncookies we started seeing the following message in /var/log/messages approximately every 60 seconds:

[84440.731929] possible SYN flooding on port 80. Sending cookies.

Vinko Vrsalovic informed me that this means the syn backlog is getting full, so I raised tcp_max_syn_backlog to 4096. At some point I also lowered tcp_synack_retries to 3 (down from the default of 5) by issuing sysctl -w net.ipv4.tcp_synack_retries=3. After doing this, the frequency seemed to drop, with the interval of the messages varying between roughly 60 and 180 seconds.

Next I issued sysctl -w net.ipv4.tcp_max_syn_backlog=65536, but am still getting the message in the log.

Throughout all this I've been watching the number of connections in SYN_RECV state (by running watch --interval=5 'netstat -tuna |grep "SYN_RECV"|wc -l'), and it never goes higher than about 240, much much lower than the size of the backlog. Yet I have a Red Hat server which hovers around 512 (limit on this server is the default of 1024).

Are there any other tcp settings which would limit the size of the backlog or am I barking up the wrong tree? Should the number of SYN_RECV connections in netstat -tuna correlate to the size of the backlog?


Update

As best I can tell I'm dealing with legitimate connections here, netstat -tuna|wc -l hovers around 5000. I've been researching this today and found this post from a last.fm employee, which has been rather useful.

I've also discovered that the tcp_max_syn_backlog has no effect when syncookies are enabled (as per this link)

So as a next step I set the following in sysctl.conf:

net.ipv4.tcp_syn_retries = 3
        # default=5
net.ipv4.tcp_synack_retries = 3
        # default=5
net.ipv4.tcp_max_syn_backlog = 65536
        # default=1024
net.core.wmem_max = 8388608
        # default=124928
net.core.rmem_max = 8388608
        # default=131071
net.core.somaxconn = 512
        # default = 128
net.core.optmem_max = 81920
        # default = 20480

I then setup my response time test, ran sysctl -p and disabled syncookies by sysctl -w net.ipv4.tcp_syncookies=0.

After doing this the number of connections in the SYN_RECV state still remained around 220-250, but connections were starting to delay again. Once I noticed these delays I re-enabled syncookies and the delays stopped.

I believe what I was seeing was still an improvement from the initial state, however some requests were still delayed which is much worse than having syncookies enabled. So it looks like I'm stuck with them enabled until we can get some more servers online to cope with the load. Even then, I'm not sure I see a valid reason to disable them again as they're only sent (apparently) when the server's buffers get full.

But the syn backlog doesn't appear to be full with only ~250 connections in the SYN_RECV state! Is it possible that the SYN flooding message is a red herring and it's something other than the syn_backlog that's filling up?

If anyone has any other tuning options I haven't tried yet I'd be more than happy to try them out, but I'm starting to wonder if the syn_backlog setting isn't being applied properly for some reason.

Alex Forbes
  • 2,392
  • 2
  • 19
  • 26

3 Answers3

28

So, this is a neat question.

Initially, I was surprised that you saw any connections in SYN_RECV state with SYN cookies enabled. The beauty of SYN cookies is that you can statelessly participate in the in TCP 3-way handshake as a server using cryptography, so I would expect the server not to represent half-open connections at all because that would be the very same state that isn't being kept.

In fact, a quick peek at the source (tcp_ipv4.c) shows interesting information about how the kernel implements SYN cookies. Essentially, despite turning them on, the kernel behaves as it would normally until its queue of pending connections is full. This explains your existing list of connections in SYN_RECV state.

Only when the queue of pending connections is full, AND another SYN packet (connection attempt) is received, AND it has been more than a minute since the last warning message, does the kernel send the warning message you have seen ("sending cookies"). SYN cookies are sent even when the warning message isn't; the warning message is just to give you a heads up that the issue hasn't gone away.

Put another way, if you turn off SYN cookies, the message will go away. That is only going to work out for you if you are no longer being SYN flooded.

To address some of the other things you've done:

  • net.ipv4.tcp_synack_retries:
    • Increasing this won't have any positive effect for those incoming connections that are spoofed, nor for any that receive a SYN cookie instead of server-side state (no retries for them either).
    • For incoming spoofed connections, increasing this increases the number of packets you send to a fake address, and possibly the amount of time that that spoofed address stays in your connection table (this could be a significant negative effect).
    • Under normal load / number of incoming connections, the higher this is, the more likely you are to quickly / successfully complete connections over links that drop packets. There are diminishing returns for increasing this.
  • net.ipv4.tcp_syn_retries: Changing this cannot have any effect on inbound connections (it only affects outbound connections)

The other variables you mention I haven't researched, but I suspect the answers to your question are pretty much right here.

If you aren't being SYN flooded and the machine is responsive to non-HTTP connections (e.g. SSH) I think there is probably a network problem, and you should have a network engineer help you look at it. If the machine is generally unresponsive even when you aren't being SYN flooded, it sounds like a serious load problem if it affects the creation of TCP connections (pretty low level and resource non-intensive)

quanta
  • 50,327
  • 19
  • 152
  • 213
Slartibartfast
  • 3,265
  • 17
  • 16
  • Thanks - this is an interesting and informative answer. It certainly answers my query about the relationship between the connections in the SYN_RECV state and the sending of cookies. The machine was responsive to non HTTP, including SSH and HTTPS which receives much less traffic than HTTP. Thus we have decided that reducing the traffic is the way to go. – Alex Forbes Aug 04 '11 at 08:46
  • With regards to getting a network engineer to take a look - good suggestion but we're migrating away from this datacentre, so it's probably not worthwhile when we're bringing a couple of new servers online elsewhere. I think you might be right about it being a network issue - perhaps a problem with the load balancer or firewall. Thanks again for your insights! – Alex Forbes Aug 04 '11 at 08:47
17

I've faced into exactly the same problem on a fresh install of Ubuntu Oneiric 11.10 running a webserver (apache2) with a heavy loaded website. On Ubuntu Oneiric 11.10 syncookies were enabled by default.

I had the same kernel messages stating a possible SYN flood attack on the webserver port:

kernel: [739408.882650] TCP: Possible SYN flooding on port 80. Sending cookies.

At the same time, i was pretty sure, that there was no attack happening. I had this messages returning at 5min interval. This seemed just like a load peek, because an attacker would keep the load high all the time, while trying to get the server stop responding to requests.

Tuning the net.ipv4.tcp_max_syn_backlog parameter did not lead to any improvement - the messages continued at the same rate. the fact that the number of SYN_RECV connections was always really low (in my case under 250) was an indicator, that there must be some other parameter, that is responsible for this message.

I have found this bug-message https://bugzilla.redhat.com/show_bug.cgi?id=734991 on the red hat site stating that the kernel message could be as a result of a bug (or misconfiguration) on the application side. Of course the log message is very misleading! As this is not the kernel parameter that is responsible in that case, but the parameter of your application, beeing passed to the kernel.

So we should also take a look at the configuration parameters of our webserver application. Grab apache docs and go to http://httpd.apache.org/docs/2.0/mod/mpm_common.html#listenbacklog

The default value of ListenBacklog parameter is 511. (This corresponds with the number of connections, that you have observed on your red hat server. Your another server may possibly have a lower number configured.)

Apache has an own configuration parameter for the backlog queue for incoming connections. if you have a lot of incoming connections, and at any moment (just as a random thing) they arrive all together at nearly the same time, such that the webserver is not able to serve them fast enough in an appropriate way, your backlog will be full with 511 connections and kernel will fire the above message stating a possible SYN flood attack.

To solve this, i add the following line to /etc/apache2/ports.conf or one of the other .conf files, that will be loaded by apache (/etc/apache2/apache2.conf should be also ok):

ListenBackLog 5000

you should also set the net.ipv4.tcp_max_syn_backlog to a reasonable value. in my understanding, the kernel maximum will limit the value, that you will be able to configure in the apache configuration. so run:

sudo sysctl -w net.ipv4.tcp_max_syn_backlog=5000

After tuning the config, do not forget to restart your apache:

sudo service apache2 restart ( or sudo /etc/init.d/apache2 restart )

In my case, this configuration change immediately stopped the kernel warnings. I'm able to reproduce the messages by setting a low ListenBackLog value in the apache config.

quanta
  • 50,327
  • 19
  • 152
  • 213
Jeff
  • 423
  • 1
  • 5
  • 10
  • 2
    Great answer. Assuming what you say is correct I'd mark this as the accepted answer but I can't really test it - reducing the load solved the problem and I have a policy of not tinkering with production servers without good cause :) – Alex Forbes Feb 15 '12 at 16:44
  • I can confirm this does work essentially it's a kernel anti-DDOS feature however when you are receiving say a lot of web traffic it ends up blocking your legitimate users! – Areeb Soo Yasir Jan 04 '18 at 18:40
5

After some tests with kernel 3.4.9 the number of SYN_RECV connections in netstat depends on

  • /proc/sys/net/core/somaxconn rounded up to the next power of 2 (e.g. 128 -> 256)
  • 75% of /proc/sys/net/ipv4/tcp_max_syn_backlog if /proc/sys/net/ipv4/tcp_syncookies is set to 0 or 100% if /proc/sys/net/ipv4/tcp_syncookies is set to 1
  • ListenBackLog in the apache config rounded up to the next power of 2 (e.g. 128 -> 256)

the minimum of each of this parameters is used. After changing somaxconn or ListenBackLog apache has to be restarted.

And after increasing tcp_max_syn_backlog apache has also to be restarted.

Without tcp_syncookies apache is blocking, why in this case only 75% of tcp_max_syn_backlog is the limit is strange. and increasing this paramter increases the SYN_RECV connections to 100% of the old value without restarting apache.

usoft
  • 51
  • 1
  • 2
  • And also the call `/bin/echo m >/proc/sysrq-trigger` often leads to a _possible SYN flooding on port 80. Sending cookies_ message. – usoft Oct 11 '12 at 11:53