7

Apologies if this has been asked before, but I can't seem to find much on it.

We're going to be using HAProxy to load balance our MariaDB Galera Cluster. All the articles/tutorials I have seen on this use Keepalived (or something similar) for an active/passive HAProxy setup.

Is there any good reason why you shouldn't have an active/active setup?

Each HAProxy node can have a fixed IP and both have a floating IP. Under normal conditions requests are shared between the two HAProxy nodes, if one goes down, the other takes it's floating IP and handles requests under both IPs. When the other comes back up it takes its floating IP and share of load back again.

I'd appreciate your opinions on this.

Luke

Luke Cousins
  • 377
  • 1
  • 3
  • 18
  • I have found this article on the topic http://huinn.wordpress.com/2012/01/02/keepalived-2-active-servers/ but I don't understand why not many people seem to do it. – Luke Cousins Mar 18 '14 at 10:43
  • 4
    Just in case anyone is reading this now, we went with this solution over a year ago and have never had any issues with it. I'd recommend it. – Luke Cousins Jun 15 '15 at 10:01

2 Answers2

3

The important considerations not to have an active/active setup with two virtual IP addresses for the same resource is

  • how do you distribute requests over the two virtual IP's
  • how do you deal with sticky sessions, affinity, persistence and such, i.e. what happens when subsequent requests start off going to virtual IP1 and then go to virtual IP2 and do you need those to go the same back-end server.
  • what happens when the virtual IP-addresses fails over to the other host?
HBruijn
  • 72,524
  • 21
  • 127
  • 192
  • I appreciate your response. In my case I was intending to "randomly" pick one of the two virtual IPs to be the one to use to connect to the database with. If for some reason that fails, try the other (but it shouldn't fail for long if one does go down). In this case of a DB server, sticky sessions, etc are not as issue, but it is a good point with regard to other areas where it may need a work-around, or be a show-stopper. Regarding your third point, will Keepalived not try to bring the IP back when it's main node comes up again? – Luke Cousins Mar 18 '14 at 15:40
  • The [MySQL Query Cache](http://dev.mysql.com/doc/refman/5.7/en/query-cache.html) may be a good reason in some scenario's to maintain sticky sessions even with load balancing database queries. – HBruijn Mar 18 '14 at 16:05
  • That's an interesting point, that again I hadn't thought of. In our case we have the MySQL Query Cache disabled due it to being a single point of contention and a couple of other reasons (it slows down all selects, even most non-cachable ones, and all writes (invalidating caches)). Do you know of any other reasons why you would want MySQL stickiness? Thanks. – Luke Cousins Mar 18 '14 at 17:04
  • Any reason sourceIP hash stickyness shouldn't be sufficient here? We use this successfully. Of course, if quorum changes, the stickiness will be disrupted once. – namezero Oct 28 '18 at 18:18
-1

Update for 2020: keepalived has been obsolete for a while because it doesn't work in virtual clouds (AWS).

A bit of history

Once upon a time, there was an (Cisco) internet router in the office. The router provided internet access to all the machines and it was good.

... then the router died and internet was broken for everyone and it sucked.

Turns out, it takes two of anything to have redundancy. So Cisco started offering pairs of routers that work in tandem.

This is done with a protocol called HSRP, VRRP or CARP. HSRP is the original cisco-made protocol to solve the problem. It was standardized into VRRP later https://www.rfc-editor.org/rfc/rfc3768 (year 1998) that got implemented by most network devices and vendors. BSD folks reinvented their own protocols CARP to do the same thing, they couldn't adopt VRRP due to concerns around licensing or patents.

Keepalived (and uCARP) is software that implements VRRP (and CARP). It can be setup on two regular Linux servers to have failover between them.

The rise of AWS and the end of VRRP

How VRRP operates? For starters it needs a floating IP, let's say 192.168.1.254, only one router has ownership of the IP at any point in time. Devices in the network simply send traffic to that (floating) IP and reach the active router, they don't know it's floating and don't care. Both routers talk constantly to one another and if either dies, the other router takes over the IP and start processing traffic.

One needs to be familiar OSI network layers 2 and 3 at this point (MAC and IP). Network devices communicate with MAC and IP addresses, addresses are resolved with ARP.

The concept of floating IP being taken-over involves a number of shenanigans in the network stack (all the acronyms above), it's not exactly designed-in nor expected behaviour.

On a physical network, multiple computers physically plugged into one Ethernet switch, it usually works.

On a virtual machine, it usually doesn't work. The virtual network has to handle network traffic (MAC and IP layers), it typically blocks the magic packets or isolate the virtual host preventing VRRP from operating.

On the major virtual clouds (AWS, Google and co). It definitely doesn't work and it's on purpose. Imagine if an AWS instance could take over the IP -all the trafic- from another Linux instance maybe from another customer. What the hell?!

Cloud and CDN solutions

Cloud providers provide load balancers solutions, see AWS ELB and Google Cloud load balancers. They come with build-in redundancy for this problem, so you don't have to think about it. keepalived is simply obsolete.

The next aspect is CDN (CloudFlare, Akamai). All public websites run behind a CDN nowadays that provides caching, filtering and DDoS protection. CDN can provide load balancing between multiple upstream servers. Simply configure all the individual servers and the traffic is split.

Last but not least. keepalived only allows to have a single active server out of many, it's wasting resources to put it lightly. This is actually a catastrophic issue in the real world because things need to scale and it can't scale by design. Failover solutions in use today -as found in clouds and CDN- are meant to distribute traffic across multiple destinations all active. It's a lot more complicated to achieve and is done cumulatively at different layers (see DNS, Anycast, OSPF, BGP). keepalived is not part of the big picture anymore.

user5994461
  • 2,749
  • 1
  • 17
  • 30
  • Thanks @user5994461 that's a useful (and interesting) update, but `keepalived has been obsolete for a while because it doesn't work in virtual clouds (AWS)` is misleading. It's not obsolete, it just doesn't work in all scenarios, like AWS, it would still work just fine in the use case we had 6 years ago. Its last commit on GitHub was today https://github.com/acassen/keepalived/commits/master – Luke Cousins Apr 27 '20 at 13:00
  • Additionally, `All public websites run behind a CDN...` is completely incorrect, lots of websites run behind a CDN. Most don’t. Cloudlfare is the biggest CDN by number of sites by a long way and only have 12.7% of all known sites. https://w3techs.com/technologies/details/cn-cloudflare – Luke Cousins Apr 27 '20 at 13:00
  • It's correct unless one is being pedantic for the last percent. There are hundreds of individual CDN services and managed hosting solutions that act as one out of the box. CloudFlare alone is not outstanding when compared to the whole wide web. – user5994461 Apr 27 '20 at 15:40
  • I genuinely don't think VRRP is usable anymore except very limited use cases, like a fully physical setup untouched since 2010. And even there it needs support at the OS level (depends on linux and windows versions) that I didn't get into. Modern systems always have some AWS or VMWare or VirtualBox or Docker or god knows what, virtualization is just everywhere, and it doesn't play nice with VRRP. – user5994461 Apr 27 '20 at 16:02
  • I'm not looking for an argument, but you're quoting (misguided) opinions as facts. I'm not being pedantic about the last percent. I'm being realistic about the facts and backing it up with supporting evidence. – Luke Cousins Apr 28 '20 at 10:55
  • Fact: A google search for "market share per CDN" shows CloudFlare in second place. Link to the first google result https://www.datanyze.com/market-share/cdn--10 (you could say it's dodgy and I would agree. there are no facts anyway, companies do not publish information about their customer base.) – user5994461 Apr 28 '20 at 14:14