How to setup failover with rabbitmq cluster with a loadbalancer like f5

Question

I have been trying to find information on f5 developer central and internet regarding the following setup with no luck.

We want to have a rabbitmq cluster with 3 nodes. 1 node will always be primary/master node for ALL queues. 1. Send all connections / traffic for all queues to the current primary node (A). 2. When node A is unresponsive (due to application layer or network layer issues), load balancer should automatically failover all traffic to node (B). 3. If node B fails, move to node C.

The question: How to decide a node is unresponsive and failover to separate node should happen? Is there a way to invoke a call to rabbitmq using amqp protocol through loadbalancer for this purpose? Can't find it well documented.

Even if you don't know on how to implement this with F5, feel free to answer it from a different load balancer or code perspective.

Addition to the question: I would think whatever this health check will be, it has to be concrete enough that rabbitmq cluster would have failed over the master node to node B already, when the LB switch over happens and there was no false alarm.

Thank you for taking time to read and answer.

Does rabbitMQ not have built in high availability? F5 'is alive' checks are pretty much just HTTP and icmp checks. Unlikely any product specific protocols are supported. — spuder, Jun 23 '16 at 02:58
@GregL I am trying to use a health check that would also ensure that rabbitmq software on the master node is operating at full capabilities. I think our IT team has setup partial port check or something. But, I don't think that ensures application layer is good. — cdpnet, Jun 23 '16 at 18:17
RabbitMQ may have high availability in terms of having cluster, mirroring etc. But rabbitmq leaves the responsibility to fail-over responsibility of master node from one to other in the hands of a load balancer or the application. — cdpnet, Jun 23 '16 at 18:19

score 4 · Answer 1 · answered Jun 23 '16 at 19:32

I've done L4 loadbalancing for clustered RabbitMQ with Stingray load balancers - it works well, and we have done RR without any particular issue.

In the event that one Rabbit node goes down, TCP connections fail and the load balancer sends traffic to the other node.

Now this is technically inefficient, as any record send to node A will be sent to node B as well and vice versa internally by Rabbit via Erlang's epmd.

One very important note is that you must set the load balancer to hold TCP connections open indefinitely. This is a common issue, as rabbit MQ uses long running tcp connections but most load balancers are targeted to HTTP-esque connection parameters. Some software (nginx) has very aggressive TCP cleanup windows and will shut these TCP connections, causing an connection failure to occur even though all machines are happy.

Jacob Evans · Answer 2 · 2016-06-28T03:04:47.867

0

I would skip the load balancer if you are only following the master, use keepalived and a status check to see if self is master, if it's master then it will use the vip.

edited Jun 28 '16 at 03:04

answered Jun 23 '16 at 04:27

Jacob Evans

7,636
3
25
55

I am developer, so I will try to learn a bit about it.Sounds like keepalived will go on all rabbit nodes. I just have to find out how will application know to switch to go to a different node when master node goes down. I will not have ability for application to know multiple IPs and try them all on failure. – cdpnet Jun 23 '16 at 18:32

How to setup failover with rabbitmq cluster with a loadbalancer like f5

2 Answers2