I'll explain my setup and how I solved the graceful reloads:
I have a typical setup with 2 nodes running HAproxy and keepalived. Keepalived tracks interface dummy0, so I can do a "ifconfig dummy0 down" to force switch over.
The real problem is that, I don't know why, a "haproxy reload" still drops all the ESTABLISHED connections :( I tried the "iptables flipping" proposed by gertas, but I found some issues because it performs a NAT on the destination IP address, which is not a suitable solution in some scenarios.
Instead, I decided to use a CONNMARK dirty hack to mark packets belonging to NEW connections, and then redirect those marked packets to the other node.
Here's the iptables ruleset:
iptables -t mangle -A PREROUTING -i eth1 -d 123.123.123.123/32 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 192.168.0.2
iptables -t mangle -A PREROUTING -i eth1 -m mark --mark 1 -j DROP
First two rules mark the packets belonging to the new flows (123.123.123.123 is the keepalived VIP used on the haproxy to bind the frontends on).
Third and fourth rules mark packets FIN/RST packets. (I don't know why, TEE target "ignores" FIN/RST packets).
Fifth rule sends a duplicate of all marked packets to the other HAproxy (192.168.0.2).
Sixth rule drops packets belonging to new flows to prevent reaching their original destination.
Remember to disable rp_filter on interfaces or kernel will drop those martian packets.
And last but not least, mind the returning packets! In my case there is asymmetric routing (requests come to client -> haproxy1 -> haproxy2 -> webserver, and replies go from webserver -> haproxy1 -> client), but it doesn't affect. It works fine.
I know the most elegant solution would be to use iproute2 to do the divert, but it only worked for the first SYN packet. When it received the ACK (3rd packet of the 3-way handshake), it didn't marked it :( I couldn't spend much time to investigate, as soon as I saw it works with TEE target, it left it there. Of course, feel free to try it with iproute2.
Basically, the "graceful reload" works like this:
- I enable the iptables ruleset and immediately see the new connections going to the other HAproxy.
- I keep an eye on "netstat -an | grep ESTABLISHED | wc -l" to supervise the "draining" process.
- Once there are just a few (or zero) connections, "ifconfig dummy0 down" to force keepalived to failover, so all traffic will go to the other HAproxy.
- I remove the iptables ruleset
- (Only for "non-preempting" keepalive config) "ifconfig dummy0 up".
The IPtables ruleset can be easily integrated into a start/stop script:
#!/bin/sh
case $1 in
start)
echo Redirection for new sessions is enabled
# echo 0 > /proc/sys/net/ipv4/tcp_fwmark_accept
for f in /proc/sys/net/ipv4/conf/*/rp_filter; do echo 0 > $f; done
iptables -t mangle -A PREROUTING -i eth1 ! -d 123.123.123.123 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
iptables -t mangle -A PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
iptables -t mangle -A PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 192.168.0.2
iptables -t mangle -A PREROUTING -i eth1 -m mark --mark 1 -j DROP
;;
stop)
iptables -t mangle -D PREROUTING -i eth1 -m mark --mark 1 -j DROP
iptables -t mangle -D PREROUTING -i eth1 -m mark ! --mark 0 -j TEE --gateway 192.168.0.2
iptables -t mangle -D PREROUTING -i eth1 -p tcp --tcp-flags RST RST -j MARK --set-mark 2
iptables -t mangle -D PREROUTING -i eth1 -p tcp --tcp-flags FIN FIN -j MARK --set-mark 2
iptables -t mangle -D PREROUTING -j CONNMARK --restore-mark
iptables -t mangle -D PREROUTING -i eth1 ! -d 123.123.123.123 -m conntrack --ctstate NEW -j CONNMARK --set-mark 1
echo Redirection for new sessions is disabled
;;
esac