DNS failover across multiple datacenters?

Question

I've got a site that is starting to get a lot of traffic and just the other day, we had a network outage at the datacenter where our loadbalancer (haproxy) is hosted at. This worried me as despite all my efforts of making the system fully redundant, I still could not make our DNS redundant, which I think isn't an easy solution.

Only thing I was able to find was to sign up for DNS failover from places like dnsme, etc .... but they cost too much for budding startups. Even their Corporate plan only gives you 50 million queries per month and we use that up in a week.

So my question is, are there any self hosted DNS we can do that provides the failover like how dnsme does it?

Your title asks about making haproxy redundant, but your question asks about making your DNS redundant. Which is it? — MadHatter, Oct 10 '12 at 06:47
Your budding startup is getting more then 50 million DNS queries a week? That's quite the startup! That part aside, we use DNSME's DNSFO between two datacenters and it works great (I've been running tests all morning). So if you could get the queries thing figured out, I give a good recommendation for DNSME. — Safado, Oct 10 '12 at 18:12
@madhatter sorry, my question wasn't asked properly. but basically i want to make, haproxy redundant by having as many as i want anywhere in the world. this way, if my load balancer goes down, other haproxy can kick in and i can finally sleep easy at night. — Jae Lee, Oct 11 '12 at 07:20

Grant · Answer 1 · 2012-10-11T00:26:30.360

1

DNS is designed for redundancy. Setup two bind servers at seperate datacenters. Set one to be the slave of the other. Make sure both are listed at your domain registrar. Done.

Here's a randomly selected guide on setting up a slave server: http://docstore.mik.ua/orelly/networking_2ndEd/dns/ch04_08.htm

Ok, you're looking for failover by switching your A record to a different IP address. That's pretty easy to do as well if you are using BIND for your name servers. You can write a script that will call nsupdate to add/delete/change your DNS records. Whatever clustering or monitoring systems you use can check if your loadbalancer is inaccessible and change the DNS record to point to one that still works, or if using round robin dns to remove failed nodes and add then back in when they come back up.

edited Oct 11 '12 at 00:26

answered Oct 10 '12 at 17:45

Grant

17,671
14
69
101

He's not referring to redundancy with the DNS server. DNSME provides a DNS failover solution that changes your A record to a different IP address if the monitor detects that your server is down. – Safado Oct 10 '12 at 18:13
@Safado Just a quick comment about part of your comment: I don't think DNSME would do the job adequately, because simply changing the A record IP address isn't a HA "failover" solution - if you just changed the A record, it would still take 24-48 hours+ to propagate. – David W Oct 11 '12 at 00:41
If you keep your TTL really low, it will failover faster **sometimes**. Many DNS servers out there don't respect low TTLs and cache for longer. And it increases the load on your DNS server substantially. – Grant Oct 11 '12 at 00:48
@grant safado is right, i wasn't really asking for how to make DNS redundant. my question is more about, how i can have haproxy redundant across multiple datacenters, where i don't have ability to assign a virtualip to do a heartbeat/keepalived. – Jae Lee Oct 11 '12 at 07:16
what i'm currently doing now is when we detect network outage, server down, etc ..., i just manually update the IP at DNS level to point to a different haproxy, but from testing, we noticed 10-20% of traffic still points to old ip even after 24 hours. this was with TTL set to 600. so i wonder, how do big companies handle these sort of situation, like amazon for instance? surely they must have geographically distributed load balancers all appointed as a single IP? – Jae Lee Oct 11 '12 at 07:27
You can do that...but then you are relying on updating routing tables when things fail and you have the same problem - it takes time. I don't know of any inter-datacenter failover method that immediately switches 100% of the traffic to the failover site. – Grant Oct 11 '12 at 12:35
Our ISP is one of those nice ones that apparently respects TTLs because we have a TTL of 60 on our domain that we're using the DNSFO feature on and the fail over is pretty instantaneous. However, we have yet to have a REAL failover on our production servers occur, so the delay for our client base world wide is still unknown. Through my testing though, I have used online services that query DNS servers all around the world and about 95% of DNS servers I queried (roughly 30 or so) updated within 5 minutes. – Safado Oct 11 '12 at 14:40
The small percent that didn't update for more than an hour was an acceptable enough risk to our management that they gave the go ahead for this solution (combined with the fact that it doesn't cost additional money on top of the DNS services we're already getting through DNSME). But like I said, in my tests I've had full failover in under 5 minutes. – Safado Oct 11 '12 at 14:44
While researching the solution, I came across this http://serverfault.com/questions/60553/why-is-dns-failover-not-recommended?rq=1 and some of the answers (Particularly Scott and Ryan) convinced me to give it a shot. And they were right. The amount of DNS servers that don't respect the TTL is not nearly as high as those that do. – Safado Oct 11 '12 at 14:52
@Jae Lee - One other solution you could look into is Akamai's Global Load Balancing http://www.akamai.com/html/solutions/gtm.html - I really liked this one but I couldn't get the price point approved once I had pitched the DNSFO option. They have an Active/Passive "load balance" option as part of this deal if you're looking to have one active datacenter and one standby datacenter. – Safado Oct 11 '12 at 14:55
thanks, so it looks like DNSFO is really the only method for me. – Jae Lee Oct 12 '12 at 00:38

score 0 · Answer 2 · answered Oct 12 '12 at 04:44

I do something similar, and run multiple haproxy instances (failover clustered in each datacenter even) in several data centers around the world. I also needed GeoIP based traffic segregation to these different data centers, so I went with Dyn.com's "Advanced Traffic Management" solution, which allows different regions of the world to be served up to their fastest location globally, but also handles the monitoring and fail over like you are looking for. Dyn (and i'm sure others) offer solutions for monitoring/failover as stand alone offerings, e.g., http://dyn.com/dns/dynect-managed-dns/active-failover/

If you are trying to do this on the cheap, and when you say you haproxy is "down" means not responding due to a datacenter outage, you could try serving up multiple A records from your DNS server for each request. This would essentially round robin the requests to your different servers, and let the clients try the others if the first fails.

I do recommend going with a hosted solution though, as it's worked out great for me. I think DnsMadeEasy also offers a similar product for Geographic distribution that includes monitoring as well (for cheaper thank Dyn's).

You could of course build out some solution your self, but you should consider the all up cost of doing this vs focusing on your core service your company is offering. All about the trade-offs... :)

Also, if your DNS queries are crushing 50M/week, unless you have a lot of one time visitors (which i actually do) that sounds like a lot. Make sure your TTL settings aren't too low. If they are, you may be expecting you'll have to pay a lot more for a hosted service than you require.

DNS failover across multiple datacenters?

2 Answers2