We were targeted earlier today by a DDoS attack. There was 20x as many connections as normal on our load balancer (HAProxy), and all the backend nodes continued to go down during this attack.
System structure: HAProxy > Squid > Apache (for ModSecurity) > IIS app layer.
During the attack, I noticed that there was a MaxClients Reached error in Apache, so I bumped the setting from 150 to 250, and it seemed to help to some extent. However, it seemed that I had to keep restarting Apache manually in order for the backends to recover. The attack lasted for about 50 minutes.
After the attack began to subside, a final Apache restart on each node brought us into the green, but now I'm looking into why it occurred in the first place. In the error logs in Apache, I see a lot of these:
[Wed Jun 22 11:46:12 2011] [error] [client 10.x.x.x] proxy: Error reading from remote server returned by /favicon.ico
[Wed Jun 22 11:46:13 2011] [error] [client 10.x.x.x] (70007)The timeout specified has expired: proxy: error reading status line from remote server www.example.com
Apache is using default keep-alive settings (keep-alives are enabled and timeout is set to 15 seconds). After doing some additional reading on HAProxy + keep-alives, is it a reasonable conclusion to believe that the DDoS was worsened by keep-alives being enabled?
While the HAProxy max connections are way below the maximums set in Apache, perhaps with the 20x connections too many connections were being opened in the ol' DOS fashion, but then Apache was keeping them open.