The first thing to do is to determine what's happening. All you know at the moment is that Apache is unresponsive sometimes and during these times you have a lot of half-open TCP connections. You don't know if the TCP connections are the cause of the problem or just a symptom.
Do you have any performance monitoring system such as Cacti, Munin, Zabbix, Observium or any of the other options in this space? If not, get one of them now. Configure them to graph all the normal things such as memory usage, load average, CPU usage, free disk space, IOPS, network usage, etc but it also might be worth adding custom metrics such as requests per minute or TCP states. Also add whatever templates match the services you are using, such as Apache, MySQL, memcached, etc.
By analysing the graphs that these tools produce you should be able to find the resource that is hitting 100% during these downtimes. From there, you can trace the cause-and-effect chain back through to the initial trigger that caused the problem. It's possible that the resource that is being exhausted is not being measured by you and may not even be in your control.
As a guess, if your connections are in the SYN_SENT
state, you are probably using someone else's API over HTTP and either they are down or they temporarily blocked you. If they are in the SYN_RECV
state, there might be a firewall problem that is blocking the SYN-ACK
responses from being received by the client or the response from the client being received by you. It could also be the SYN-flood you suggested.
During a real SYN-flood, you will see the bandwidth and the number of packets per second jump significantly. Use tcpdump -w packet-capture-file.cap
or -j LOG
in iptables to log these packets and see if you can spot a pattern. Maybe the source address is always the same, maybe it's always in a small range, maybe a strange TCP flag (such as URG or PSH) is set. Failing that, see if the IP addresses involved make multiple connection attempts. If they do, you can add them to a DROP
rules in your firewall so that your OS won't have to deal with them. Depending on how many different IP addresses there are, you may need to split your rules into several tables to reduce the size of the list iptables has to scan through.
The DDoS may be big enough that it's actually exhausting your hosting provider's bandwidth. If that is the case, they or you will probably need to go to a dedicated DDoS mitigation company. These guys use either DNS or BGP to route your traffic to them, filter out the DDoS traffic and send the rest on to you. They're generally not cheap and you will have to weigh the cost of downtime against the cost of the mitigation service. There are also services like CloudFlare that can prevent DDoS attacks up to certain limits and have more affordable pricing plans.