HAProxy graceful reload with zero packet loss

Question

I'm running an HAProxy load balancing server to balance load to multiple Apache servers. I need to reload HAProxy at any given time in order to change the load balancing algorithm.

This all works fine, except for the fact that I have to reload the server without losing a single packet (at the moment a reload is giving me 99.76% success on average, with 1000 requests per second for 5 seconds). I have done many hours of research about this, and have found the following command for "gracefully reloading" the HAProxy server:

haproxy -D -f /etc/haproxy/haproxy.cfg -p /var/run/haproxy.pid -sf $(cat /var/run/haproxy.pid)

However, this has little or no effect versus the plain old service haproxy reload, it's still dropping 0.24% on average.

Is there any way of reloading the HAProxy config file without a single dropped packet from any user?

If you need that much reliability a better solution would be to run more than one instance of HAproxy where you can take one out of service to reload, put it back in and repeat for the other(s). — , Mar 07 '14 at 21:19

score 39 · Accepted Answer · answered Mar 07 '14 at 21:48

39

According to https://github.com/aws/opsworks-cookbooks/pull/40 and consequently http://www.mail-archive.com/haproxy@formilux.org/msg06885.html you can:

iptables -I INPUT -p tcp --dport $PORT --syn -j DROP
sleep 1
service haproxy restart
iptables -D INPUT -p tcp --dport $PORT --syn -j DROP

This has the effect of dropping the SYN before a restart, so that clients will resend this SYN until it reaches the new process.

answered Mar 07 '14 at 21:48

Mxx

2,312
2
26
40

1

http://serverfault.com/questions/627988/will-this-haproxy-restart-script-work-as-gracefully-i-think-it-will – Kladskull Sep 12 '14 at 00:23
both of these commands gave me this: `iptables v1.4.14: invalid port/service `--syn' specified` – Dmitri DB Oct 28 '14 at 21:26
6

@DmitriDB you're supposed to replace `$PORT` with the actual port `haproxy` is listening on. If haproxy is listening on multiple ports, write replace `--dport $PORT` with `--dports $PORTS_SEPARATED_BY_COMMAS`, e.g., `--dports 80,443`. – pepoluan Dec 18 '14 at 11:17
2

iptables 1.4.7 (Centos 6.7) - you have to also specify -m mulitport if you want to use --dports. So its "iptables -I INPUT -p tcp -m multiport --dports 80,443 --syn -j DROP" and likewise for the -D – carpii Sep 16 '15 at 21:47

Steve Jansen · Answer 2 · 2015-06-01T20:00:00.140

Yelp shared a more sophisticated approach based on meticulous testing. The blog article is a deep dive, and well worth the time investment to fully appreciate it.

True Zero Downtime HAProxy Reloads

tl;dr use Linux tc (traffic control) and iptables to temporarily queue SYN packets while HAProxy is reloading and has two pids attached to the same port (SO_REUSEPORT).

I'm not comfortable re-publishing the entire article on ServerFault; nevertheless, here are a few excerpts to pique your interest:

By delaying SYN packets coming into our HAProxy load balancers that run on each machine, we are able to minimally impact traffic during HAProxy reloads, which allows us to add, remove, and change service backends within our SOA without fear of significantly impacting user traffic.

# plug_manipulation.sh
nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --buffer
service haproxy reload
nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite

# setup_iptables.sh
iptables -t mangle -I OUTPUT -p tcp -s 169.254.255.254 --syn -j MARK --set-mark 1

# setup_qdisc.sh
## Set up the queuing discipline
tc qdisc add dev lo root handle 1: prio bands 4
tc qdisc add dev lo parent 1:1 handle 10: pfifo limit 1000
tc qdisc add dev lo parent 1:2 handle 20: pfifo limit 1000
tc qdisc add dev lo parent 1:3 handle 30: pfifo limit 1000

## Create a plug qdisc with 1 meg of buffer
nl-qdisc-add --dev=lo --parent=1:4 --id=40: plug --limit 1048576
## Release the plug
nl-qdisc-add --dev=lo --parent=1:4 --id=40: --update plug --release-indefinite

## Set up the filter, any packet marked with “1” will be
## directed to the plug
tc filter add dev lo protocol ip parent 1:0 prio 1 handle 1 fw classid 1:4

Gist: https://gist.github.com/jolynch/97e3505a1e92e35de2c0

Cheers to Yelp for sharing such amazing insights.

Excellent link! But perhaps you'd like to summarize it in here in case the link expires. That's the only reason for no upvote. — hookenz, May 20 '15 at 22:54

score 10 · Answer 3 · answered Dec 31 '15 at 00:22

There is another much simpler way to reload haproxy with true zero downtime - it is named iptables flipping (the article is actually Unbounce response to Yelp solution). It is cleaner than accepted answer as there is no need to drop any packets which may cause problems with long reloads.

Briefly, the solution consists of the following steps:

Let's have a pair of haproxy instances - the first active which receives a traffic and the second in standby which does not receive any traffic.
You reconfigure (reload) standby instance at any time.
Once standby is ready with new config you divert all NEW connections to standby node which becomes new active. Unbounce provides bash script which does the flip with few simple iptable commands.
For a moment you have two active instances. You need to wait till opened connections to old active will cease. The time depends on your service behaviour and keep-alive settings.
Traffic to old active stops which becomes new standby - you are back in step 1.

Moreover the solution can be adopted to any kind of service (nginx, apache etc) and is more fault tolerant as you can test standby configuration before it goes online.

Jason Stubbs · Answer 4 · 2019-02-20T20:36:32.407

Edit: My answer makes the assumption that the kernel only sends traffic to the most recent port to be opened with SO_REUSEPORT, whereas it actually sends traffic to all processes as described in one of the comments. In other words, the iptables dance is still required. :(

If you're on a kernel that supports SO_REUSEPORT, then this problem shouldn't happen.

The process that haproxy takes when it restarts is:

1) Try setting SO_REUSEPORT when opening the port (https://github.com/haproxy/haproxy/blob/3cd0ae963e958d5d5fb838e120f1b0e9361a92f8/src/proto_tcp.c#L792-L798)

2) Try opening the port (will succeed with SO_REUSEPORT)

3) If it didn't succeed, signal the old process to close its port, wait 10ms and try it all again. (https://github.com/haproxy/haproxy/blob/3cd0ae963e958d5d5fb838e120f1b0e9361a92f8/src/haproxy.c#L1554-L1577)

It was first supported in the Linux 3.9 kernel but some distros have backported it. For example, EL6 kernels from 2.6.32-417.el6 support it.

It will happen with `SO_REUSEPORT` under some particular scenario - especially under heavy traffic. When SYN is sent to old haproxy process and in the same moment it closes listening socket which results in RST. See Yelp article mentioned in other answer above. — gertas, Dec 30 '15 at 23:43
That sucks... Just to summarise the issue, Linux distributes new connections between all processes listening on a particular port when SO_REUSEPORT is used so there is a short time where the old process will still get connections put into its queue. — Jason Stubbs, Jan 04 '16 at 11:19

score 2 · Answer 5 · answered May 03 '17 at 17:05

I'll explain my setup and how I solved the graceful reloads:

I have a typical setup with 2 nodes running HAproxy and keepalived. Keepalived tracks interface dummy0, so I can do a "ifconfig dummy0 down" to force switch over.

The real problem is that, I don't know why, a "haproxy reload" still drops all the ESTABLISHED connections :( I tried the "iptables flipping" proposed by gertas, but I found some issues because it performs a NAT on the destination IP address, which is not a suitable solution in some scenarios.

Instead, I decided to use a CONNMARK dirty hack to mark packets belonging to NEW connections, and then redirect those marked packets to the other node.

Here's the iptables ruleset:

iptables -t mangle -A PREROUTING -i eth1 -d 123.123.123.123/32 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 192.168.0.2
iptables -t mangle -A PREROUTING -i eth1 -m mark --mark 1 -j DROP

First two rules mark the packets belonging to the new flows (123.123.123.123 is the keepalived VIP used on the haproxy to bind the frontends on).

Third and fourth rules mark packets FIN/RST packets. (I don't know why, TEE target "ignores" FIN/RST packets).

Fifth rule sends a duplicate of all marked packets to the other HAproxy (192.168.0.2).

Sixth rule drops packets belonging to new flows to prevent reaching their original destination.

Remember to disable rp_filter on interfaces or kernel will drop those martian packets.

And last but not least, mind the returning packets! In my case there is asymmetric routing (requests come to client -> haproxy1 -> haproxy2 -> webserver, and replies go from webserver -> haproxy1 -> client), but it doesn't affect. It works fine.

I know the most elegant solution would be to use iproute2 to do the divert, but it only worked for the first SYN packet. When it received the ACK (3rd packet of the 3-way handshake), it didn't marked it :( I couldn't spend much time to investigate, as soon as I saw it works with TEE target, it left it there. Of course, feel free to try it with iproute2.

Basically, the "graceful reload" works like this:

I enable the iptables ruleset and immediately see the new connections going to the other HAproxy.
I keep an eye on "netstat -an | grep ESTABLISHED | wc -l" to supervise the "draining" process.
Once there are just a few (or zero) connections, "ifconfig dummy0 down" to force keepalived to failover, so all traffic will go to the other HAproxy.
I remove the iptables ruleset
(Only for "non-preempting" keepalive config) "ifconfig dummy0 up".

The IPtables ruleset can be easily integrated into a start/stop script:

#!/bin/sh

case $1 in
start)
        echo Redirection for new sessions is enabled

#       echo 0 > /proc/sys/net/ipv4/tcp_fwmark_accept
        for f in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 0 > $f; done
        iptables -t mangle -A PREROUTING -i eth1 ! -d 123.123.123.123 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
        iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
        iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
        iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
        iptables -t mangle -A PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 192.168.0.2
        iptables -t mangle -A PREROUTING -i eth1 -m mark --mark 1 -j DROP
        ;;
stop)
        iptables -t mangle -D PREROUTING -i eth1 -m mark --mark 1 -j DROP
        iptables -t mangle -D PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 192.168.0.2
        iptables -t mangle -D PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
        iptables -t mangle -D PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
        iptables -t mangle -D PREROUTING -j CONNMARK --restore-mark
        iptables -t mangle -D PREROUTING -i eth1 ! -d 123.123.123.123 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1

        echo Redirection for new sessions is disabled
        ;;
esac

HAProxy graceful reload with zero packet loss

5 Answers5

Linked