7

As a followup question to his very popular question: Why is DNS failover not recommended?, I think it was agreed that DNS failover is not 100% reliable due to caching.

However the highest voted answer did not really discuss what is the better solution to achieve failover between two different data centers. The only solution presented was local load balancing (single data center).

So my question is quite simply what is the real solution to cross data center failover?

IMB
  • 499
  • 2
  • 7
  • 13
  • If you failover the IPs of downed nodes to surviving nodes there is nothing wrong with DNS-failover... – Nils Sep 08 '12 at 21:27

3 Answers3

10

This started off as a comment...but it's getting too long.

Sadly most of the answers to the previous question are wrong: they assume that the failover has something to do with the TTL. The top voted answer is SPECTACTULARLY wrong, and notably cites no sources. The TTL applies to the zone record as a whole and has nothing to do with Round Robin.

From RFC 1794 (which is all about Round Robin DNS serving)

There is no use in handing out information with TTLs of an hour [or less]

(IME it's nearer to 3 hours before you get full propogation).

From RFC 1035

When several RRs of the same type are available for a
 particular owner name, the resolver should either cache them
 all or none at all

RFC 1034 set out the requirements for Negative caching - a method for indicating that all requests must be served fresh from the authoritative DNS server (in which case the TTL does control failover) - in my experience support for this varies.

Since any failover would have to be implemented high in the client stack, it's arguably not part of TCP/IP or DNS - indeed, SIP, SMTP, RADIUS and other protocols running on top of TCP/IP define how the client should work with Round Robin - RFC 2616 (HTTP/1.1) is remarkable in not mentioning how it should behave.

However, in my experience, every browser and most other HTTP clients written in the last 10 years will transparently check additional A RRs if the connection appears to be taking longer than expected. And it's not just me:

Failover times vary by implementation but are in the region of seconds. It's not an ideal solution since (due to the limits of DNS) publishing of failed node takes the DNS TTL - in the meantime you have to rely on client side detection.

Round-Robin is not a substitute for other HA mechanisms within a site. But it does complement it (the guys who wrote HAProxy recommend using a pair of installations accessed via round robin DNS). It is the best supported mechanism for implementing HA across multiple sites: indeed, as far as I can determine, it is the only supported mechansim for failover available on standard clients.

symcbean
  • 19,931
  • 1
  • 29
  • 49
  • Just when I'm getting excited to other methods like anycast I get this. LOL. I guess the big question is, will DNS failover work 100% in the future? Are all browser manufacturers required to follow this failover mechanism or is it optional? One of the biggest argument is we can't expect clients to behave as we expect it. But if every browser manufacturer is going to fix this then this is definitely cost effective than anycast implementation. – IMB Sep 06 '12 at 12:20
  • @IMB this is not a client-question. This is **IP** - and how that works is defined in the RFCs. – Nils Sep 08 '12 at 21:29
  • @Nils: from my reading of the RFCs, they don't define this behaviour (but happy to be proved incorrect). The timeouts involved are *much* shorter than those generally used - to the extent that I suspect it's an adaptive behaviour by the browsers - but it seems to be implemented in all the major browsers. – symcbean Nov 20 '12 at 22:42
  • @symcbean the client-cache has to obey RFC 1035, too doesn't it? I just took a closer look at that RFC. It has been updated by RFC 5966 (http://tools.ietf.org/html/rfc5966#page-5) which states "**It is therefore RECOMMENDED that the default application-level idle period should be of the order of seconds, but no particular value is specified.**" This might explain the high reconnect rate. On the contrary there is DNS-caching on the OS-client-side, too. – Nils Nov 22 '12 at 21:06
  • [Related discussion.](http://serverfault.com/a/774411/152073) It should be emphasized that relying on applications that *aren't* browsers (this question did not specify) is a bad idea without careful testing of the software involved, as there is no RFC requirement to implement this behavior ("`SHOULD`") and laziness often wins. – Andrew B May 17 '16 at 00:10
  • Additionally, RFC 1034 should not be cited when discussing negative caching as the entirety of section 4.3.4 was replaced by [RFC 2308](https://tools.ietf.org/html/rfc2308). This is not a mechanism for requiring that the data be served fresh, but rather a backoff for when data is not found or servers are unreachable or misconfigured. – Andrew B May 17 '16 at 00:10
5

A whole data center would need to go down or be unreachable for this to apply. Your backup at another data center would then be reached by routing the IP addresses to the other data center. This would happen through the BGP route announcements from the primary data center no longer being provided. The secondary announcements from the secondary data center would then be used.

Smaller businesses are generally not large enough to justify the expense of portable IP address allocations and their own autonomous system number to announce BGP routes with. In this case a provider would multiple locations is the way to go.

You either have to be reached via your original IP addresses, or via a change of IP address done by DNS. Since DNS is not designed to do this in the ways needed by what "failover" means (users can be out of reach by at least as long as your TTL, or the TTL imposed by some caching servers), going to the backup site with the same IPs is the best solution.

Skaperen
  • 1,064
  • 2
  • 11
  • 21
  • See also http://serverfault.com/questions/245457/what-is-needed-to-use-anycast-ips/245517#245517 – Dan Pritts Sep 06 '12 at 05:22
  • Does this mean I need to have the same IP in datacenter 1 and datacenter 2? If yes, how do you actually do that? – IMB Sep 06 '12 at 06:52
  • Yes. You get portable addresses, either on your own, or host with a provider that can suballocate address space for you out of their address space already set up in multiple data centers. – Skaperen Sep 06 '12 at 20:19
2

The simplest approach to dual DC redundancy would be a L2 MPLS VPN between the two sites, along with maintaining the BGP sessions between the two.

You essentially can then just have a physical IP per server and a virtual IP that floats between the two (HSRP/VRRP/CARP etc.). Your DNS would be routed to this particular IP and directed accordingly.

The next consideration would be split brain - but that's another question for another time.

Juniper wrote a good white paper on dual DC management with MPLS, you can grab the PDF here http://www.juniper.net/us/en/local/pdf/whitepapers/2000407-en.pdf

Ben Lessani
  • 5,174
  • 16
  • 37