I have the following setup, Linux stack, with front-end running nginx proxy and static assets and back-end running Ruby on Rails and MySQL in master-master replication:
- Primary site:
front-end.a
,back-end.a
- Secondary site:
front-end.b
,back-end.b
- A router sitting on a shared network that can route to both primary and secondary sites
The primary site serves requests most of the time. The secondary site is redundant. back-end.b
is in master-master replication with back-end.a
but is read-only.
When the primary site goes down, requests need to be redirected to the secondary site. This will show a service unavailable 503 page until manual intervention ensures that the primary site won't come back and hits the big switch that makes the secondary site live and read-write.
The primary site can then be brought back in a controlled fashion, with back-end.a
becoming a read-only replication slave of back-end.b
. When everything on the primary site is ready again, front-end.b
will start serving service unavailable, back-end.b
will switch to read-only, requests need to be redirected to the primary site again, and finally the primary site needs to become read-write.
The priorities:
- The site must not become completely dead and unreachable
- Switchover to a live working site must be fairly fast
- Preventing data loss / inconsistency is more important than absolute reliability
Now, the current approach being used is Linux-HA / Heartbeat / Pacemaker, using a virtual IP shared between front-end.a
and front-end.b
with a location preference set to front-end.a
.
This works excellently for failing over the IP if the primary site disappears. However, the level of manual control thereafter is rather lacking.
For example, after the primary site has failed and the secondary site needs to be brought up, we need to ensure the primary site doesn't try to steal back the IP address when it comes back up. However, Linux-HA doesn't seem to support this very well. crm resource move
is the documented command to move a resource (it works by adding an infinite weight location rule), but if the resource has already failed over, this command fails saying that the resource has already been moved. Adding an explicit higher weight location preference doesn't seem to work reliably. So far the most reliable thing to do has been to remove the existing location rule and replace it with a new rule preferring the secondary site. This feels like we're fighting the tool and trying to make it do something it wasn't designed to.
And there are other oddities with Linux-HA. Frequently the cluster gets stuck in a split-brain state while booting up - both nodes are sending out heartbeat packets (verified with packet sniffing), both nodes can ping one another, but crm_mon on both reports the other node as offline. The heartbeat service needs to be restarted on one or the other nodes to get it to work - and sometimes it needs a SIGKILL rather than SIGTERM to bring it down. Also, crm_mon shows the CIB (cluster database) is replicated pretty much instantaneously when configuration is altered on either front-end.a
or front-end.b
, but Pacemaker takes its time actually moving the IP resource - it can take several minutes for it to move across, potentially putting our SLAs at risk.
So I'm starting to look at other options that are more focused on virtual IPs and IP failover rather than general clustered resources. The two other options I see are ucarp and keepalived.
However, given the amount of time I've spent setting up heartbeat etc. and trying to make it work, I'd like feedback on the best approach for this setup.