There are couple of ways to achieve HA (high availability) of a Load Balancer - or in that regards any service. Lets assume you have two machines, with IP addresses:
- 192.168.100.101
- 192.168.100.102
Users connect to an IP, so what you want to do is separate IP from specific box - eg create virtual IP. That IP will be 192.168.100.100.
Now, you can choose HA service which will take care of automatic failover/failback of IP address. Some of the simplest services for unix are (u)carp and keepalived, some of the more complex ones are for example RedHat Cluster Suite or Pacemaker.
Lets take keepalived as an example - two keepalived services - each running on its own box - and they communicate together. That communication is often called heartbeat.
| VIP | | |
| Box A | ------v^-----------v^---- | Box B |
| IP1 | | IP2 |
If one keepalived stops responding (either service goes down for whatever reason, or the box bounces or shuts down) - keepalived on other box will notice missed heartbeats, and will presume other node is dead, and take failover actions. That action in our case will be bringing up the floating IP.
| VIP |
------------------ -------------- | Box B |
| IP2 |
Worst case that can happen in this case is the loss of sessions for clients, but they will be able to reconnect. If you want to avoid that, two load balancers have to be able to sync session data between them, and if they can do that, users won't notice anything except maybe broken a short delay.
Another pitfall of this setup is split brain - when both boxes are online but the link is severed, and both boxes bring up the same IP. This is often resolved through some kind of fencing mechanism (SCSI reservation, IPMI restart, smart PDU power cut, ...), or odd number of nodes requiring majority of cluster members to be alive for service to be started.
| VIP | | VIP |
| Box A | | Box B |
| IP1 | | IP2 |
More complex cluster management software (like Pacemaker) can move whole service (eg.: stop it on one node and start it on another) - and this is the way HA for services like databases can be achieved.
Another possible way - if you are controlling routers near your load balancers, is to utilize ECMP. This approach also enables you to horizontally scale load balancers.
This works by each of your two boxes talking BGP to your router(s). Each box has to advertise virtual IP (192.168.100.100) and the the router will load balance traffic via ECMP. If a machine dies, it will stop advertising VIP, which will in turn stop routers from sending traffic to it. Only thing you have to take care of in this setup is to stop advertising IP if the load balancer itself dies.