Recently we had an apache server which was responding very slowly due to SYN flooding. The workaround for this was to enable tcp_syncookies (net.ipv4.tcp_syncookies=1 in /etc/sysctl.conf
).
I posted a question about this here if you want more background.
After enabling syncookies we started seeing the following message in /var/log/messages approximately every 60 seconds:
[84440.731929] possible SYN flooding on port 80. Sending cookies.
Vinko Vrsalovic informed me that this means the syn backlog is getting full, so I raised tcp_max_syn_backlog to 4096. At some point I also lowered tcp_synack_retries to 3 (down from the default of 5) by issuing sysctl -w net.ipv4.tcp_synack_retries=3
. After doing this, the frequency seemed to drop, with the interval of the messages varying between roughly 60 and 180 seconds.
Next I issued sysctl -w net.ipv4.tcp_max_syn_backlog=65536
, but am still getting the message in the log.
Throughout all this I've been watching the number of connections in SYN_RECV state (by running watch --interval=5 'netstat -tuna |grep "SYN_RECV"|wc -l'
), and it never goes higher than about 240, much much lower than the size of the backlog. Yet I have a Red Hat server which hovers around 512 (limit on this server is the default of 1024).
Are there any other tcp settings which would limit the size of the backlog or am I barking up the wrong tree? Should the number of SYN_RECV connections in netstat -tuna
correlate to the size of the backlog?
Update
As best I can tell I'm dealing with legitimate connections here, netstat -tuna|wc -l
hovers around 5000. I've been researching this today and found this post from a last.fm employee, which has been rather useful.
I've also discovered that the tcp_max_syn_backlog has no effect when syncookies are enabled (as per this link)
So as a next step I set the following in sysctl.conf:
net.ipv4.tcp_syn_retries = 3
# default=5
net.ipv4.tcp_synack_retries = 3
# default=5
net.ipv4.tcp_max_syn_backlog = 65536
# default=1024
net.core.wmem_max = 8388608
# default=124928
net.core.rmem_max = 8388608
# default=131071
net.core.somaxconn = 512
# default = 128
net.core.optmem_max = 81920
# default = 20480
I then setup my response time test, ran sysctl -p
and disabled syncookies by sysctl -w net.ipv4.tcp_syncookies=0
.
After doing this the number of connections in the SYN_RECV state still remained around 220-250, but connections were starting to delay again. Once I noticed these delays I re-enabled syncookies and the delays stopped.
I believe what I was seeing was still an improvement from the initial state, however some requests were still delayed which is much worse than having syncookies enabled. So it looks like I'm stuck with them enabled until we can get some more servers online to cope with the load. Even then, I'm not sure I see a valid reason to disable them again as they're only sent (apparently) when the server's buffers get full.
But the syn backlog doesn't appear to be full with only ~250 connections in the SYN_RECV state! Is it possible that the SYN flooding message is a red herring and it's something other than the syn_backlog that's filling up?
If anyone has any other tuning options I haven't tried yet I'd be more than happy to try them out, but I'm starting to wonder if the syn_backlog setting isn't being applied properly for some reason.