80

Multiple A records pointing to the same domain seem to be used almost exclusively to implement DNS Round Robin as a cheap load balancing technique.

The usual warning against DNS RR is that it is not good for high availability. When 1 IP goes down clients will continue to use it for minutes.

A load balancer is often suggested as a better choice.

Both claims are not completely true:

  1. When the traffic is HTTP then, most of the HTML browsers are able to automatically try the next A record if the previous is down, without a new DNS look-up. Read here chapter 3.1 and here.

  2. When multiple data centers are involved then, DNS RR is the only option to distribute traffic across them.

So, is it true that, with multiple data centers and HTTP traffic, the use of DNS RR is the ONLY way to assure instant fail-over when one data center goes down?

Thanks,

Valentino

Edit:

  • Off course each data center has a local Load Balancer with hot spare.
  • It's OK to sacrifice session affinity for an instant fail-over.
  • AFAIK the only way for a DNS to suggest a data center instead of another is to reply with just the IP (or IPs) associated to that data center. If the data center becomes unreachable then all those IP are also unreachables. This means that, even if smart HTML browsers are able to instantly try another A record , all the attempts will fail until the local cache entry expires and a new DNS lookup is done, fetching the new working IPs (I assume DNS automatically suggests to a new data center when one fail). So, "smart DNS" cannot assure instant fail-over.
  • Conversely a DNS round-robin permits it. When one data center fail, the smart HTML browsers (most of them) instantly try the other cached A records jumping to another (working) data center. So, DNS round-robin doesn't assure session affinity or the lowest RTT but seems to be the only way to assure instant fail-over when the clients are "smart" HTML browsers.

Edit 2:

  • Some people suggest TCP Anycast as a definitive solution. In this paper (chapter 6) is explained that Anycast fail-over is related to BGP convergence. For this reason Anycast can employ from 15 minutes to 20 seconds to complete. 20 seconds are possible on networks where the topology was optimized for this. Probably just CDN operators can grant such fast fail-overs.

Edit 3:*

  • I did some DNS look-ups and traceroutes (maybe some expert can double check) and:
    • The only CDN using TCP Anycast seems to be CacheFly, other operators like CDN networks and BitGravity use CacheFly. Seems that their edges cannot be used as reverse proxies. Therefore, they cannot be used to grant instant failover.
    • Akamai and LimeLight seems to use geo-aware DNS. But! They return multiple A records. From traceroutes seems that the returned IPs are on the same data center. So, I'm puzzled on how they can offer a 100% SLA when one data center goes down.
Valentino Miazzo
  • 1,103
  • 1
  • 8
  • 10
  • With high availability I implied almost instant fail-over. The client should not notice any problem even if one data center goes down. I refined the question. – Valentino Miazzo Sep 30 '09 at 13:48
  • MaxCDN uses anycast TCP and its edges can be used in caching proxy mode ("origin fetch" in CDN industry terminology). – rmalayter Mar 22 '10 at 13:05
  • @vmiazzo, Your pdf link is down... Do you mean 15 minutes or 20 seconds to 15 minutes? – Pacerier May 14 '14 at 05:55

11 Answers11

35

When I use the term "DNS Round Robin" I generally mean in in the sense of the "cheap load balancing technique" as OP describes it.

But that's not the only way DNS can be used for global high availability. Most of the time, it's just hard for people with different (technology) backgrounds to communicate well.

The best load balancing technique (if money is not a problem) is generally considered to be:

  1. A Anycast'ed global network of 'intelligent' DNS servers,
  2. and a set of globally spread out datacenters,
  3. where each DNS node implements Split Horizon DNS,
  4. and monitoring of availability and traffic flows are available to the 'intelligent' DNS nodes in some fashion,
  5. so that the user DNS request flows to the nearest DNS server via IP Anycast,
  6. and this DNS server hands out a low-TTL A Record / set of A Records for the nearest / best datacenter for this end user via 'intelligent' split horizon DNS.

Using anycast for DNS is generally fine, because DNS responses are stateless and almost extremely short. So if the BGP routes change it's highly unlikely to interrupt a DNS query.

Anycast is less suited for the longer and stateful HTTP conversations, thus this system uses split horizon DNS. A HTTP session between a client and server is kept to one datacenter; it generally cannot fail over to another datacenter without breaking the session.

As I indicated with "set of A Records" what I would call 'DNS Round Robin' can be used together with the setup above. It is typically used to spread the traffic load over multiple highly available load balancers in each datacenter (so that you can get better redundancy, use smaller/cheaper load balancers, not overwhelm the Unix network buffers of a single host server, etc).

So, is it true that, with multiple data centers and HTTP traffic, the use of DNS RR is the ONLY way to assure high availability?

No it's not true, not if by 'DNS Round Robin' we simply mean handing out multiple A records for a domain. But it's true that clever use of DNS is a critical component in any global high availability system. The above illustrates one common (often best) way to go.

Edit: The Google paper "Moving Beyond End-to-End Path Information to Optimize CDN Performance" seems to me to be state-of-the-art in global load distribution for best end-user performance.

Edit 2: I read the article "Why DNS Based .. GSLB .. Doesn't Work" that OP linked to, and it is a good overview -- I recommend looking at it. Read it from the top.

In the section "The solution to the browser caching issue" it advocates DNS responses with multiple A Records pointing to multiple datacenters as the only possible solution for instantaneous fail over.

In the section "Watering it down" near the bottom, it expands on the obvious, that sending multiple A Records is uncool if they point to datacenters on multiple continents, because the client will connect at random and thus quite often get a 'slow' DC on another continent. Thus for this to work really well, multiple datacenters on each continent are needed.

This is a different solution than my steps 1 - 6. I can't provide a perfect answer on this, I think a DNS specialist from the likes of Akamai or Google is needed, because much of this boils down to practical know-how on the limitations of deployed DNS caches and browsers today. AFAIK, my steps 1-6 are what Akamai does with their DNS (can anyone confirm this?).

My feeling -- coming from having worked as a PM on mobile browser portals (cell phones) -- is that the diversity and level of total brokeness of the browsers out there is incredible. I personally would not trust a HA solution that requires the end user terminal to 'do the right thing'; thus I believe that global instantaneous fail over without breaking a session isn't feasible today.

I think my steps 1-6 above are the best that are available with commodity technology. This solution does not have instantaneous fail over.

I'd love for one of those DNS specialists from Akamai, Google etc to come around and prove me wrong. :-)

  • I added more explanations in the question. If I understand your "best load balancing technique" (point 6), it advertises just the A records of the 'best' data center. As I tried to explain in the question this doesn't permit instant fail-over on the client. – Valentino Miazzo Sep 30 '09 at 14:28
  • @vmiazzo: Yes, you understood me correctly. I'm adding a 2nd edit to my post to clarify -- but basically I think the instant fail over that you seek is impractical / impossible. –  Sep 30 '09 at 15:30
  • What I find interesting is that no-one has suggested combining the two approaches together. While not ideal, it would provide reasonable speed when things function correctly, and additional resiliency when they don't. The penalty would be a large delay as clients switched from one anycast-based DNS address to another. – Avery Payne Sep 13 '12 at 19:58
  • @JesperMortensen, When you say 'intelligent' DNS, do you mean split-horizon DNS? Or do you mean something else (deciding based on factors *beyond* source IP)? – Pacerier May 14 '14 at 06:10
19

Your question is: "Is DNS Round Robin the ONLY way to assure instant fail-over?"

The answer is: "DNS Round Robin is NEVER the right way to assure instant fail-over".

(at least not on its own)

The right way to achieve instant fail-over is to use BGP4 routing such that both sites use the same IP addresses. Using this the internet's core routing technologies are used to route the requests to the right data center, instead of using the internet's core addressing technology.

In the simplest configuration this only provides fail-over. It can also be used to provide Anycast, with the caveat that TCP based protocols will fail at the moment of switch-over if there is any instability in the routing.

Alnitak
  • 20,901
  • 3
  • 48
  • 81
6

So, is it true that, with multiple data centers and HTTP traffic, the use of DNS RR is the ONLY way to assure high availability?

Clearly it is a false claim - you have only to look at Google, Akamai, Yahoo, to see that they're not using round-robin[*] responses as their sole solution (some may use it in part, along with other approaches.)

There are many possible options, but it really depends upon what other constraints you have, with your service/application as to which you pick.

It is possible to use round-robin techniques on a simple, co-located server approach, and not have to worry about server failure, if you also arrange for the 'fail-over' of the IP address. (But most opt for load-balancing techniques, a single IP address, and fail-over between load-balancers.)

Maybe you need all requests for a single session to go to the same servers, but you want requests to be spread across different, regional server clusters? Round robin is not appropriate, for that: you need to do something that ensures any given client accesses the same physical server cluster each time (except when 'exceptions' occur, such as server failure). Either they receive a consistent IP address from a DNS query, or get routed to the same physical server cluster. Solutions for that include various commercial and non-commercial DNS "load balancers", or (if you have more control of your network) BGP network advertisements. You could simply arrange for your own domain's nameservers to give entirely different responses (but, as DNS requests can get sent all over the place, you won't achieve any location affinity with that approach.)

[* I'm going to use "round-robin", because 'RR' in DNS terminology means "resource record".]

jrg
  • 800
  • 3
  • 6
  • I added more explanations in the answer. Your suggestion to use DNS "load balancers" IMHO doesn't permit instant fail-over. About the BGP, do you refer to a Anycast TCP solution? – Valentino Miazzo Sep 30 '09 at 14:21
  • I'm not suggesting any particular solution over another - I'm saying that you need to pick the right solution for your problem (which you've not actually stated in your question) and your constraints (ditto.) DNS round-robin does not provide an instant fail-over any more than DNS LB, because browsers are not guaranteed to do "the right thing" (mainly because the "right thing" is not strictly defined or prescribed. I don't believe there are enough "smart HTML browsers", even now - I concur with Jesper that they're too varied in their behaviours to rely upon them at all.) – jrg Sep 30 '09 at 22:57
  • I understand your skepticism. Anyway, as you can read here http://crypto.stanford.edu/dns/dns-rebinding.pdf most of the current HTML browsers are already "smart". – Valentino Miazzo Oct 01 '09 at 09:47
5

Very nice observation vmiazzo +1 for you !! I'm stuck exactly where you are .. baffled with how these CDN do their magic.

Following are my guess on how CDN run their network :

  • Use Anycast DNS (mentioned by Jesper Mortensen) to get the closest data centre
  • They run a local network which span across different data centre which allow them to do something like CARP on their hosts across different data centre

Or

At the moment following solution work for me : - DNS return multiple IP, eg:

www -> CNAME www1 , www1 A -> 123.123.123.1
www -> CNAME www2 , www2 A -> 123.123.123.1 
www -> CNAME www3 , www3 A -> 123.123.123.1 
                    www3 A -> 8.4.56.7 <--- reverse proxy
  • Last entry point to a reverse proxy at amazon cloud, which intelligently pass to available server (or provide under maintenance page)

Reverse proxy still get hit but bot as heavy as main one.

Rianto Wahyudi
  • 493
  • 3
  • 11
  • Order of multiple DNS records that clients will receive is intentionally randomized so your reverse proxy is probably getting hit around 1/6th of the time (1/2 of 1/3). How is that any better or different than having 6 A records? – ColinM Jan 17 '13 at 20:14
3

Why RFC 2782 (apply the same as MX/priority for services like http, imap, ...) is not implemented in any kind of browser ? Things would be easier... There is a bug about, opened for ten years in Mozilla !!! because it will be the end of industry of commercial load-balancer ??? I'm very disappointed about that.

2

I wonder how many people answering these questions are actually running a large worldwide network of servers? Google is using round robin and my company has been using it for years. It can work pretty well, with some limitations. Yes, it needs to be augmented with other measures.

The real key is to be willing to accept a hiccup or two if a server goes down. When I pull the plug on a server, if a browser is trying to access that server, there will be a delay of a minute or so while the browser learns that the IP address is down. But it then goes to another server very quickly.

It works great, and people who claim that it causes a lot of problems do not know what they are talking about. It just requires the right design.

Failover sucks. The best HA uses all resources all of the time.

I have been working with HA since 1986. I went through extensive training to create failover systems and I am not at all a fan of failover.

Also, RR does work to distribute load, even if passively rather than actively. Our server logs clearly show the appropriate percentage of traffic on each server - within reason.

old_guy
  • 21
  • 2
2

2 - You can do this with Anycast using Quagga

(Even if there is some info that Anycast is bad with TCP there is several big companies using it like CacheFly)

rkthkr
  • 8,503
  • 26
  • 38
1

TCP Anycast is actually very stable and is used at least by CacheFly (since 2002), Prolexic and BitGravity. A good presentation on TCP Anycast was done at NANOG 37: http://198.108.95.21/meetings/nanog37/presentations/matt.levine.pdf

1

An other very simple choice is use a low (how low depends by your needs) TTL in the DNS A or CNAME record and update this record to choose which IP will be used.

We have 2 ISP and several public services and we are using succesfully this method for high availability from 3 years.

lg.
  • 4,579
  • 3
  • 20
  • 20
  • I added more explanations in the question. Many HTML browser ignore DNS TTL (DNS pinning), see the paper linked in the question. Change the DNS config when a data center goes down doesn't permit an instant fail-over on the client. – Valentino Miazzo Sep 30 '09 at 14:24
1

One spanner in the works is that a number of ISPs have badly configured resolvers that cache records for a set interval and completely ignore TTL settings. It shouldn't be so and there is no excuse for it, but sadly from my experience with migrating numerous websites and services it does happen.

Twirrim
  • 673
  • 4
  • 8
  • 2
    There is an excuse for it. Low TTLs have a large performance impact on busy DNS servers and using them permanently rather than just temporarily when a change is due is an abuse of the system and of their resources. Most ISPs will only enforce a minimum TTL once it has been set low for longer than a reasonable time period. – JamesRyan Dec 04 '09 at 11:00
-1

Multiple A records is the only way to eliminate a possible single point of failure. Any other solution forces all incoming requests to go through a single device somewhere between the server and client.

So for absolute redundancy, it is necessary. That is why google does it, or anyone else who wants to be assured of continuous service availability.

It is pretty obvious why this is the case... multiple A records are the only way of moving the point at which requests are routed to the client browser. Any other method will rely on a single point between the client browser and the server at which a failure can occur, bringing down your service. By using A records, the only single point of failure from client to server becomes the client itself.

If you don't have multiple A records setup, you are asking for downtime...

This method obviously can not be relied on for load balancing though.

  • 1
    what? multiple A recoerds do not eliminate single point of failure! it is asking for problems. you use a virtual 'floating' ip within one datacenter or routing tricks if you want to quickly failover between multiple datacenters. – pQd Jun 03 '10 at 06:04
  • Absolutelly not necessary for single ip to pass through single device. – Sandman4 Jan 16 '12 at 21:10