1

I'm sure this is an easy solution, I'm just not too familiar with how DNS works or if that's even related to this problem.

If I'm running a web service on amazon ec2, distributed across many instances, how can I make it so a single domain name can be used to access the entire pool of servers, which will be changing from time to time?

Since the instances may be present one second but gone the next (and vice versa), I need a way to randomly pick an active member of the cluster to route to. The updates would have to be instantaneous. Is this even possible, with dns caching and all?

ryeguy
  • 1,071
  • 1
  • 11
  • 11

2 Answers2

3

There are several approaches to this, of which some within your reach.

In your case I recommend simply having multiple DNS records, with a relatively short TTL. The distribution is not optimal, and clients may not pick the lowest-latency node. However, it is extremely simple: all you need to do is to be able to add DNS records. This is a widely used and tested technique.

Should you need to remove a server from the pool, simply remove it's DNS records and most clients will stop using it after the TTL expires. The same goes for new servers: add them, and after the TTL expiry time clients will start using them.

Google for example, uses this as part of their balancing techniques:

$ dig A google.com

;; QUESTION SECTION:
;google.com.                    IN      A

;; ANSWER SECTION:
google.com.             297     IN      A       74.125.77.99
google.com.             297     IN      A       74.125.77.104
google.com.             297     IN      A       74.125.77.147

Google extends this by, once you have reached their website, redirecting you to a version which is hosted close to you (and in your language), based on the country they guess you are in. But that's a level you often do not need.

  • Isn't it possible that a client could receive the IP of a downed node before the TTL is up, though? – ryeguy Dec 22 '10 at 19:43
  • Yes, that is possible. You can try with a TTL of zero, but I'm not entirely sure about the behaviour about resolvers in the world with that. I'm pretty sure the standard allows for it. –  Dec 22 '10 at 20:00
  • Hmm..I guess the best thing to do then is put a couple of HAproxy nodes and put *those* in the DNS since they're less likely to change. When a node goes up or down, I just edit HAproxy's destination list. – ryeguy Dec 22 '10 at 20:05
  • 1
    yeah it is important to note that while the multiple DNS records can be nice and useful, there aer public DNS servers that do not advertise TTLs lower than an hour, no matter what you authoritative NS tells them. So this isn't necessarily the best solution if you plan on stopping here but more of one piece of a multpile piece solution. – Charles Dec 22 '10 at 21:21
  • 1
    If you know when the nodes are going to go down (i.e. you're cancelling them), then you can arrange for the records to be removed before the instances are decommissioned. Otherwise you might as well just live with occasional loss on proper failures, it'll cost another order of magnitude to fix this tiny window of loss (and by the sounds of it, you don't have the experience). Pick your fights... – Dominic Cleal Dec 22 '10 at 21:30
  • @Erik: Sorry, but downvoted. DNS round robin is *not* a high availability solution, and this is well known. See fx http://serverfault.com/questions/101053/is-round-robin-dns-good-enough-for-load-balancing-static-content –  Dec 22 '10 at 22:16
2

The updates would have to be instantaneous. Is this even possible, with dns caching and all?

No, it's not possible with DNS. DNS records are served with a Time To Live (TTL), which specifies the amount of time caches may use the record without checking back with the authoritative DNS server. And for various reasons, DNS TTLs can't effectively be less than 10 minutes. DNS round robing is not the solution to load balancing, at least not if you need service uptime. See fx this older question by Jeff Atwood.

You can use 3rd party DNS services which combines DNS Round Robin with proactive monitoring of the servers, and automatically remove dead servers from DNS. It's not a good solution, but it can be good enough for less important sites, and it's trivial to set up using fx DNSMadeEasy or EdgeDirector.

The industry standard way to handle webserver availability is a Layer 4 or Layer 7 load balancer in front of the webservers.

web service on amazon ec2, distributed across many instances, how can I make it so a single domain name can be used to access the entire pool of servers

Amazon offers a plug'n'play service for this, called Amazon Elastic Load Balancing. Basically it's a managed service from Amazon, which sets up a Layer 7 (HTTP) or Layer 4 (TCP) load balancer in front of your EC2 web servers.

Another common option is to set up an EC2 instance with a L7 load balancer such as nginx, HAProxy, Apsis Pound, Apache 2.2, Zeus Load Balancer or something else (there are several). But if you go this route, you will need to manage the OS + load balancing software yourself, and consider how to make the EC2 load balancer instance itself sufficiently highly available.