In summary, I had a case where all Apache threads were hung because all of them were waiting for a TCP ACK from clients after having sent the HTTP page, and because of that, the Apache threads were waiting 300s (Timeout value of conf) before going to the next request. And then the same thing happened.
The things happened as is: there was a sudden peak of traffic, apache & db servers load goes up as expected, and at some point, every apache thread goes into this state, and the apache & db load goes to 0. And it stayed this way for hours. After a apache restart, pages are processed normally again, load goes up, and again this happen and load goes to 0. Now, if this was an attack or a consequence of some bad soft/hardware is yet to be known.
To the details: When everything is hung, you go see the server-status page of apaches, and all threads are marked as "W" (working), and you see the timer SS (time the request is being processed) going up up to 300s, and then go next to the next HTTP request and start again to 300s.
In the socket part via netstat, we see all the sockets from these Apache thread hung in CLOSE_WAIT with a high Send-Q value (packets sent not acknowledged). Using strace, we indeed see Apache doing a poll() of 300s on the socket, waiting for the packets to be acknowleged.
Now, whether it is an attack or some bad network configuration that lost the packets, my question is: how do we prevent this? It seems to be a particular nasty kind of attack.
I am aware about the slow loris attack, when you make a HTTP request very slowly, and this can be mitigated if you have a CDN, a reverse proxy, ... But for this particular case, I am not seeing something that can prevent that?
How would you prevent this to happen?
Thanks!