I'm wondering what are the common backpressure strategies people use for their web services?
Imagine your service operates under heavy load, and at some point the load reaches 120% of your capacity. How do you deal with this?
The most sound strategy I can think of is to start rejecting connections. So if one host reaches its peak capacity (e.g. all Apache workers are busy) I would start rejecting TCP connection until one of the workers frees up. This way all connection that are accepted are handled immediately without queuing (so the latency is minimal), and the excessive 20% are rejected, allowing load balancer to redispatch them to the other host or to perform any other load shedding strategy (e.g. redirecting to static/cached content).
I think this fail-fast approach is much superior to any kind of queueing. Small queues are good for absorbing short bursts in traffic, but with excessive queueing your system can fail spectacularly under heavy load. For example with FIFO queue processing without any AQM it can get into the state when all processed requests have already timed out on the client side, thus system makes no forward progress.
I was surprised that this strategy is not that easy to implement as it sounds. My approach was to set a small listen backlog on web server, expecting all connections that do not fit to be rejected. But due to changes in Linux kernel 2.2 this strategy falls apart (see http://veithen.github.io/2014/01/01/how-tcp-backlog-works-in-linux.html).
Newer Linux kernel accepts connections for you unconditionally. SYN-ACK response is sent to the client without considering listen backlog size at all. Enabling tcp_abort_on_overflow option does not help much either. This option makes kernel send RST when connection does not fit into accept queue, but at this point client already considers the connection ESTABLISHED and may have started sending the data.
This is especially problematic with HAProxy. If connection was successfully established it will not redispatch the request to the other server, since this request may have had some side-effects on the server.
So I guess my questions are:
- am I the weird one for trying to implement something like this?
- are there any other strategies for dealing with sustained high load you can recommend?
- is Linux kernel's tcp_abort_on_overflow broken and should've been applied to half-open queue instead?
Thanks in advance!