How does HA / failover work?

Question

Let's say we have ten servers, each one has a copy of stateless application.

user enters address in a browser / client program.
dns server returns a list of ip (however many people say that dns should not be used to provide HA, especially for non-browser clients)
so the old browser/client program checks the first ip, it's down so... what happens? connection fails?

how it's solved? virtual ip? some other mechanisms? plz give me some links or at least some buzzwords so i can read more about it

EDIT: ok, i know we should have some load balancer in front of the cluster, but then the problem is moved one layer further: how to provide HA of that load balancer? after all it can go down

"how to provide HA of that load balancer?", that's what I'm talking about at the end of my answer. It is easy to set up Active/Passive (VRRP) solution for 2-nodes with low load. It gets much more tricky with 10 nodes. Especially if you do not want asymmetric load. — Fox, Jun 26 '15 at 07:31

score 5 · Answer 1 · answered Jun 26 '15 at 07:12

5

You are confusing DNS and High Availability. Repeat after me: DNS is not failover. DNS is not failover. DNS is not failover.

If you want to do high availability and load balancing, you need a reverse proxy that specialises in this. The most well known one is haproxy.

answered Jun 26 '15 at 07:12

Mark Henderson

68,316
31
175
255

some ppl use dns for failover so i mentioned it. about reversed proxy - it works on some computer with some ip. what if this computer goes down? – piotrek Jun 26 '15 at 07:16
@piotrek then you configure your HAProxy server to be highly available. But you don't do this with DNS. Yes, some people use DNS for failover, but it's far from an ideal solution. There are plenty of ways of making HAProxy highly available, most of them involving something like [keepalived](https://www.google.com.au/webhp?sourceid=chrome-instant&ion=1&espv=2&ie=UTF-8#q=haproxy%20keepalived). – Mark Henderson Jun 26 '15 at 07:20
so basically it all goes down to virtual ip, right? is it the only way of doing HA or should read also about something else? – piotrek Jun 26 '15 at 07:22
@piotrek that is a common way. Another way would be to virtualise it and use your virtualisations high availability options. – Mark Henderson Jun 26 '15 at 07:24

score 3 · Answer 2 · edited Jun 26 '15 at 08:33

For me, one of the most obvious solutions to doing HA over more than two hosts is Load Balancing, even though the name does not suggest HA.

What kind of LB is most suitable for your use-case really depends on the type of client and app, but the three most common options.

L7 load balancing. You have a proxy that understands the protocol used. There are many such proxies for HTTP. It knows how to check if backend servers are alive as well. And with some it is even easy to deal with incorrect response (500). The proxy keeps list of all servers and a list of servers that are alive and working well. Once request arrives, it forwards it to one of those OK backend servers. (This can work with HTTPS if there is SSL offloading as well.)

L4 load balancing works pretty much the same thing, but instead of looking at each request, it looks at each connection. It does not understand the protocol. So it usually works even if you are using something ephemeral or encrypted like HTTPS.

L3 load balancing takes a hash of source IP address (and maybe port) and based on this hash, forwards the connection to one of the servers. Will work even for stateful UDP protocols.

There are more ways of accomplishing this. But I'd say these are the most basic.

Of course now, you've got another SPOF - the load balancer. But as all three these methods (w/o SSL) are not very resource intensive, it is suitable to have a Acitve/Passive configuration using something like VRRP to do failover in case of problems.

L7 for HTTP can be done by software like Varnish (great caching and LB solution as far as my experience goes), HAProxy, nginx, Apache httpd etc. most webservers/proxies can do this. For other protocols, you have to use proxy specific for the protocol.

L4 can be done by HAProxy and similar software, or through firewall (though you have to implement the status checking).

L3 is done on router and/or in firewall. You could probably do it with Linux and iptables (IPVS), or some commercial software.

score 0 · Answer 3 · answered Jun 26 '15 at 07:37

HA is a fairly broad subject in terms of the particular entities you want to enable HA for. For example, web server, network gears, databases, etc. The general idea is to avoid single-point-of-failures.

In your case, you want to enable HA on a web server and there are two approaches I could think of at the moment: Active/Active and Active/Passive (These two concepts can be generalized

Active/Active: In this scenario, you have a reverse proxy (haproxy or Nginx) sitting in front of your actual web servers. What it does is essentially forwarding requests and responses. It knows a list of available web servers and normally distribute incoming requests among them. User will always access the same IP exposed by this proxy and it's up to proxy which web server to forward the requests to. In this case, if one of the web server goes down, proxy simply stop putting work on it.
Active/Passive A more interesting setup is Active/Passive, in which one server is doing actual work and others are all in standby state. For example, you have two web servers, A and B. Both of them have IP address 1.1.1.1, although only A is responding to ARP request. Hence, your cluster will only observe server A, which is doing the actual work. For now, B is just a ghost server ready to take over when failures occur. A and B will have some sort of heartbeat protocol in between to check the health. When A goes down, B will find out within certain delay and takes over.

In a cloud, basically all elements are HA-ready including network, compute, controller. So should be your services. And there are other tools such as Pacemaker&Corosync, keepalived to achieve this.

I couldn't provide any good references right now since the topic you are addressing is really broad, but I encourage you to google for certain use cases in order to grasp a deeper understanding.

Cheers, J

How does HA / failover work?

3 Answers3