70

We have a set of shared, static content that we serve up between our websites at http://sstatic.net. Unfortunately, this content is not currently load balanced at all -- it's served from a single server. If that server has problems, all the sites that rely on it are effectively down because the shared resources are essential shared javascript libraries and images.

We are looking at ways to load balance the static content on this server, to avoid the single server dependency.

I realize that round-robin DNS is, at best, a low end (some might even say ghetto) solution, but I can't help wondering -- is round robin DNS a "good enough" solution for basic load balancing of static content?

There is some discussion of this in the [dns] [load-balancing] tags, and I've read through some great posts on the topic.

I am aware of the common downsides of DNS load balancing through multiple round-robin A records:

  • there's typically no heartbeats or failure detection with DNS records, so if a given server in the rotation goes down, its A record must manually be removed from the DNS entries
  • the time to live (TTL) must necessarily be set quite low for this to work at all, since DNS entries are cached aggressively throughout the internet
  • the client computers are responsible for seeing that there are multiple A records and picking the correct one

But, is round robin DNS good enough as a starter, better than nothing, "while we research and implement better alternatives" form of load balancing for our static content? Or is DNS round robin pretty much worthless under any circumstances?

Jeff Atwood
  • 12,994
  • 20
  • 74
  • 92
  • 3
    HAProxy not an option? – Howiecamp Jan 09 '10 at 03:20
  • 7
    as I said in the post, this is a specific question about *this* solution -- can we stay on topic? – Jeff Atwood Jan 09 '10 at 03:43
  • 4
    load-balancing(http://en.wikipedia.org/wiki/Load_balancing_%28computing%29) is very different then redundancy(http://en.wikipedia.org/wiki/Redundancy_%28engineering%29). As Jeff stated in his opening paragraph, he's looking for a means of removing single point of failure(redundancy), not actual load-balancing. Can someone retag? – antony.trupe Jan 09 '10 at 04:51
  • true, I wasn't precise.. I think of them in similar terms, but they're technically different. To me one implies the other -- can you even have load balancing *without* redundancy? – Jeff Atwood Jan 09 '10 at 08:05
  • 3
    @jeff - absolutely, a dumb load balancer (which plain round robin DNS is) does not do redundancy. It's even harder if you're talking about balancing / redundancy across multiple sites. – Alnitak Jan 09 '10 at 18:00
  • You've been badly misinformed. You don't need a low TTL. You don't need to reconfigure DNS in the event of a failure. The DNS server (not the client) defines the preference list - the client should only choose a different host if the first choice is unavailable. You can easily add failure detection to your monitoring system by adding a unique ip name for each node as well as the round-robin name. Having worked with several mid-scale websites, round-robin has consistently proved more reliable and cheaper than a dedicated load balancing controller. – symcbean Aug 26 '10 at 15:31
  • 1
    @symcbean No, if a DNS answer contains multiple responses the client may chose any of them and nothing should be assumed based on the order returned by the server. In particular, a recursive server may send out the answers in a completely different order to the original authoritative server. – Alnitak Sep 07 '11 at 07:36
  • @Alnitak: please go read the definitions for 'may', 'should','must' included in most RFCs. Yes, order is not significant - but what's that got to do with what we are talking about? – symcbean Sep 07 '11 at 10:54
  • 2
    @symcbean I am intimately familiar with the terminology terms documented in RFC 2119. You said that the DNS server defines the preference list. Unless you have some particularly odd definition of "preference lists" that is simply not true. – Alnitak Sep 07 '11 at 13:40
  • I found this article really helpful for explaining how to combine Round-Robin DNS and software load balancers: http://www.rightscale.com/blog/enterprise-cloud-strategies/dns-load-balancing-and-using-multiple-load-balancers-cloud – OzzieOrca Jan 11 '15 at 07:54

18 Answers18

59

Jeff, I disagree, load balancing does not imply redundancy, it's quite the opposite in fact. The more servers you have, the more likely you'll have a failure at a given instant. That's why redundancy IS mandatory when doing load balancing, but unfortunately there are a lot of solutions which only provide load balancing without performing any health check, resulting in a less reliable service.

DNS roundrobin is excellent to increase capacity, by distributing the load across multiple points (potentially geographically distributed). But it does not provide fail-over. You must first describe what type of failure you are trying to cover. A server failure must be covered locally using a standard IP address takeover mechanism (VRRP, CARP, ...). A switch failure is covered by resilient links on the server to two switches. A WAN link failure can be covered by a multi-link setup between you and your provider, using either a routing protocol or a layer2 solution (eg: multi-link PPP). A site failure should be covered by BGP : your IP addresses are replicated over multiple sites and you announce them to the net only where they are available.

From your question, it seems that you only need to provide a server fail-over solution, which is the easiest solution since it does not involve any hardware nor contract with any ISP. You just have to setup the appropriate software on your server for that, and it's by far the cheapest and most reliable solution.

You asked "what if an haproxy machine fails ?". It's the same. All people I know who use haproxy for load balancing and high availability have two machines and run either ucarp, keepalived or heartbeat on them to ensure that one of them is always available.

Hoping this helps!

Willy Tarreau
  • 3,894
  • 1
  • 19
  • 12
  • 1
    BTW you might be interested in an article I wrote about 4 years ago on these concepts : http://1wt.eu/articles/2006_lb/ (take the PDF, reading the HTML through the pages is boring). – Willy Tarreau Jan 09 '10 at 11:19
  • 2
    -1: "does not provide fail-over" - yes it does - and it implements it at the only place where non-availability can be reliably determined - at the client. – symcbean Aug 26 '10 at 15:21
  • 7
    Not at all. It would work if DNS did not make use of caches, but this is not the case and clients can't force caches to refresh. Talk to any person who regularly switches DNS entries and they'll tell you that eventhough they observe 80% switch in 5 minutes, it generally takes more than one week to get close to 100%. So DNS does not provide fail-over. – Willy Tarreau Aug 28 '10 at 19:10
  • 13
    A simple example of "load balancing without redundancy" is RAID0. – robbyt Feb 26 '11 at 23:28
  • 1
    Willy you're right for DNS records taking ages to update. But RR-DNS with browsers is handled at the browser level, testing all IP one after the other if the first one sent by the DNS seems down. In this case, you never change your DNS records, so there's no updates to wait for. – Yvan Aug 14 '17 at 13:18
21

As load-balancing, it's ghetto but more-or-less effective. If you had one server that was falling over from the load, and wanted to spread it to multiple servers, that might be a good reason to do this, at least temporarily.

There are a number of valid criticisms of round-robin DNS as load "balancing," and I wouldn't recommend doing it for that other than as a short-term band-aid.

But you say your primary motivation is to avoid a single-server dependency. Without some automated way of taking dead servers out of rotation, it's not very valuable as a way of preventing downtime. (With an automated way of pulling servers from rotation and a short TTL, it becomes ghetto failover. Manually, it's not even that.)

If one of your two round-robined servers goes down, then 50% of your customers will get a failure. This is better than 100% failure with only one server, but almost any other solution that did real failover would be better than this.

If the probability of failure of one server is N, with two servers your probability is 2N. Without automated, fast failover, this scheme increases the probability that some of your users will experience failure.

If you plan to take the dead server out of rotation manually, you're limited by the speed with which you can do that and the DNS TTL. What if the server dies at 4 AM? The best part of true failover is getting to sleep through the night. You already use HAProxy, so you should be familiar with it. I strongly suggest using it, as HAProxy is designed for exactly this situation.

Schof
  • 962
  • 1
  • 6
  • 10
  • 4
    totally off-topic, but we also have the problem of needing multiple HAProxy instances to fail over to -- what if the HAProxy machine fails? Subject of future questions, though, REALLY off topic for this one. – Jeff Atwood Jan 09 '10 at 03:55
  • 2
    +1 - The "With an automated way ... it becomes ghetto failover. Manually it's not even that." should be in big bold letters. DNS round-robin becomes a *liability* if you're not monitoring machines and removing them from the DNS if they fail, and the only reasonable way to do this is with an automated solution. There are much better solutions than DNS round-robin. – Evan Anderson Jan 09 '10 at 04:56
  • 1
    totally agree, but 20% of your customers calling you with complaints *is* better than 100% of them calling with complaints.. – Jeff Atwood Jan 09 '10 at 08:07
  • 1
    The key point (for me) that Schof makes in answering Jeff's question is that without fast failover Round Robin means that over time you have more customers impacted than without it but each (more frequent) incident impacts only a subset of customers rather than all. Whether this is "better" or not depends on the scenario but in most cases I would say it is not. – Helvick Jan 10 '10 at 21:39
  • 1
    `The best part of true failover is getting to sleep through the night.` That is one clear definition! – Basil Bourque Feb 06 '14 at 10:23
  • `If one of your two round-robined servers goes down, then 50% of your customers will get a failure` is wrong if your clients are browsers. I shut down webservers and all the traffic goes to all other servers without any issue. – Yvan Aug 14 '17 at 13:14
15

Round robin DNS is not what people think it is. As an author of DNS server software (namely, BIND) we get users who wonder why their round robin stops working as planned. They don't understand that even with a TTL of 0 seconds there will be some amount of caching out there, since some caches put a minimum time (often 30-300 seconds) no matter what.

Also, while your AUTH servers may do round robin, there is no guarantee the ones you care about -- the caches your users speak to -- will. In short, round robin doesn't guarantee any ordering from the client's point of view, only what your auth servers provide to a cache.

If you want real failover, DNS is but one step. It's not a bad idea to list more than one IP address for two different clusters, but I'd use other technology there (such as simple anycast) to do the actual load balancing. I personally despise hardware load balancing hardware which mucks with DNS as it usually gets it wrong. And don't forget DNSSEC is coming, so if you do choose something in this area ask your vendor what happens when you sign your zone.

Michael Graff
  • 6,588
  • 1
  • 23
  • 36
  • 1
    and some DNS servers (or the control panels) are configured to give you a TTL of 7200 regardless of what you set it to - some large hosting companies do this IIRC. – gbjbaanb Jan 09 '10 at 14:08
  • Caching is accounted for in round-robin DNS. Every cache on the way will rotate the list on each subsequent query, so if two customers at the same ISP ask their ISP's DNS resolver for a particular name, they will get a different address as the first response, and over a large number of requests this evens out fairly nicely. – Simon Richter Jul 31 '20 at 13:00
14

I've said it several times before, and I'll say it again - if resiliency is the problem then DNS tricks are not the answer.

The best HA systems will allow your clients to keep using the exact same IP address for every request. This is the only way to ensure that clients don't even notice the failure.

So the fundamental rule is that true resilience requires IP routing level trickery. Use a load-balancer appliance, or OSPF "equal cost multi-path", or even VRRP.

DNS on the other hand is an addressing technology. It exists solely to map from one namespace to another. It was not designed to permit very short term dynamic changes to that mapping, and hence when you try to make such changes many clients will either not notice them, or at best will take a long time to notice them.

I would also say that since load isn't a problem for you, that you might just as well have another server ready to run as a hot standby. If you use dumb round-robin you have to proactively change your DNS records when something breaks, so you might just as well proactively flip the hot standby server into action and not change your DNS.

Alnitak
  • 20,901
  • 3
  • 48
  • 81
9

I've read through all answers and one thing I didn't see is that most modern web browsers will try one of the alternative IP addresses if a server is not responding. If I remember correctly then Chrome will even try multiple IP addresses and continue with the server that responds first. So in my opinion DNS Round Robin Load balancing is always better then nothing.

BTW: I see DNS Round Robin more as simple load distribution solution.

SjorsH
  • 91
  • 1
  • 1
  • Oops, didn't see your answer before posting mine, so +1 on yours so that the truth comes out! – Yvan Aug 14 '17 at 13:11
5

I'm late to this thread, so my answer will probably just hover alone at the bottom, neglected, sniff sniff.

First off, the right answer to the question is not to answer the question, but to say:

  1. "You probably want Windows Network Load Balancing instead." OR
  2. "Get with the times, place your static content on something like Cloud Files or S3, and have a CDN mirror it worldwide."

NLB is mature, well suited to the task, and pretty easy to set up. Cloud solutions come with their own pros and cons, which are outside the scope of this question.

Question

is round robin DNS good enough as a starter, better than nothing, "while we research and implement better alternatives" form of load balancing for our static content?

Between, say, 2 or 3 static web servers? Yes, it is better than nothing, because there are DNS providers who will integrate DNS Round Robin with server health checks, and will temporarily remove dead servers from the DNS records. So in this way you get decent load distribution and some high availability; and it all takes less than 5 minutes to set up.

But the caveats outlined by others in this thread do apply:

  • Current Microsoft browsers cache DNS data for 30 minutes, so you're looking at more than 30 minutes failover time for a subset of your users, depending on their initial DNS cache state.
  • What the users sees during fail-over can be ... strange (you're not using auth on static content, and certainly not form auth, but the link shows something to watch out for).

Other solutions

HAProxy is fantastic, but since Stack Overflow is on the Microsoft technology stack, maybe using the Microsoft load balancing & high availability tools will have less admin overhead. Network Load Balancing takes care of one part of the problem, and Microsoft actually has a L7 HTTP reverse proxy / load balancer now.

I have never used ARR myself, but given that its on its second major release, and coming from Microsoft, I assume it has been tested well enough. It has easy to understand docs, here is one on how they see distribution of static and dynamic content on webnodes, and here is a piece on how to use ARR with NLB to achieve both load distribution and high availability.

5

Its remarkable how many of the contributors are helping contribute dis-information about DNS Round Robin as a load spreading and resilience mechanism. It does usually work, but you do need to understand how it works, and avoid the mistakes caused by all that disinformation.

1) The TTL on DNS records used for Round robin should be short - but NOT ZERO. Having the TTL at zero breaks the main way that resilience is provided.

2) DNS RR spreads, but does not balance load, it spreads it because over a large client base, they tend to query the DNS server independently, and so end up with different first choice DNS entries. Those different first choices mean the clients are serviced by different servers, and the load is spread out. But it all depends on which device is doing the DNS query, and how long it holds the result. A common example is that all the clients behind a corporate proxy (that performs the DNS query for them) will all end up targeting a single server. Load is spread - but it isn't balanced evenly.

3) DNS RR provides resilience as long as the client software implements it properly (and both the TTL and the users attention span isn't too short). This is because DNS round robin provides an ordered list of server IP addresses, and the client software should try to contact each one of them in turn, until it finds a server that accepts the connection.

So if the first choice server is down then the client TCP/IP connection times out, and provided neither the TTL or attention span has expired, then the client software makes another connection attempt to the second entry in the list - and so on until the TTL expires, or it gets to the end of the list (or the user gives up in disgust).

A long list of broken servers (your fault) and large TCP/IP connect retry limits (Client configuration misfeature) can make for a long period before the Client actually finds a working server. Too short a TTL means that it never gets to work its way to the end of the list, and instead issues a new DNS query and gets served a new list (hopefully in a different order).

Sometimes the Client gets unlucky and the new list still starts with broken servers. To give the system the best chance of providing client resilience you should ensure the TTL is longer than the typical attention span and for the client to get to the bottom of the list.

Once the client has found a working server it should remember it, and when it needs to make the next connection it should not repeat the search (unless the TTL has expired). A longer TTL reduces the frequency with which users experience a delay while the client searches for a working server - giving a better experience.

4) DNS TTL comes into its own, when you want to manually change the DNS records (e.g. to remove a long term broken server) then a short TTL allows that change to propagate quickly (once you have got around to doing it), so consider the balance between how long it will take before you know about the issue, and make that manual change - and the fact that normal clients will only have to do a new search for a working server when the TTL expires.

DNS round robin has two outstanding feature that makes it very cost effective in a wide range of scenarios - firstly its free, and secondly it is almost as geographically dispersed as your client base.

It does not introduce a new 'unit of failure' which all the other 'clever' systems do. There are no added components which could experience a common and simultaneous failure over a whole load of inter-linked elements.

The 'clever' systems are great and introduce wonderful mechanisms to coordinate and provide a seamless balancing and fail over mechanism, but ultimately the very methods that they use to provide that seamless experience are their Achilles heel - the additional complicated thing that can go wrong, and when it does, will provide a seamless experience of failure system wide.

So YES, DNS round robin is definitely "good enough" for your first step beyond a single server hosting all your static content in one place.

Old Fogey
  • 51
  • 1
  • 1
  • 1
    And I forgot to say that the mechanism is rather dumb. It works when the server fails totally, but not when it is merely 'unhelpful' or 'unhealthy'. A server that merely returns HTTP 500 errors in response to each and every request, will not be removed from the DNS RR list, and will carry on frustrating its random share of your client base. The 'clever' mechanisms should always implement a robust health check that can ditch a zombie like that. – Old Fogey Aug 13 '16 at 13:33
  • If you have a good logic after the RR-DNS, you won't return 500 errors. Use Varnish with directors for example, and you can query multiple backend servers until one correctly answers. If you have RR, it means that you have multiple backends, so you should not handle these as they are all alone. Or you should monitor 500 errors and take automatic or manual measures when it does. But you're right to point out the fact that the webserver must be down for RR to be handled by browsers accordingly. – Yvan Aug 14 '17 at 13:09
  • Just a comment to thanks you about your answer. I don't understand why the top answer don't recommend RR. Which it's a first step to HA infrastructure, simple and easy to implement. – Jérôme B Jun 03 '19 at 17:20
4

Windows Vista & Windows 7 implement client support for round robin differently as they backported the IPv6 address selection to IPv4. (RFC 3484)

So, if you have significant numbers of Vista, Windows 7, and Windows 2008 users, you're likely going to find behavior inconsistent to your planned thinking in your ersatz load balancing solution.

duffbeer703
  • 20,077
  • 4
  • 30
  • 39
  • ah, thank you, excellent, I was looking for this link -- I had heard about this but couldn't find the reference! – Jeff Atwood Jan 10 '10 at 10:08
4

I've always used Round-Robin DNS, with long TTL, as load-balancer. It works really fine for HTTP/HTTPS services with browsers.

I really stress out with browsers as most browsers implement some sort of «retry on another IP», but I don't know how would other libraries or softwares handle the multiple IP solution.

When the browser doesn't get a reply from one server, it will automatically call the next IP, and then stick with it (until it's down... and then tries another one).

Back in 2007, I've done the following test:

  • add an iframe on my website, pointing to one Round-Robin entry, such as http://roundrobin.test:10080/ping.php
  • the page was served by 3 PHP sockets, listening on 3 differents IP, all on port 10080 (I couldn't afford to test on port 80, as my website was running on it)
  • one socket (say A) was there to check that the browser could connect on the 10080 port (as many companies allow only standard ports)
  • other two sockets (say B and C) could be enabled or disabled on the fly.

I let it run one hour, had a lot of data. Results were that for 99.5% of the hits on socket A, I had a hit on either socket B or C (I didn't disable both of these at the same time, of course). Browsers were: iPhone, Chrome, Opera, MSIE 6/7/8, BlackBerry, Firefox 3/3.5... So even not-that-compliant browsers were handling it right!

To this day, I never tested it again, but perhaps I'll setup a new test one day or release the code on github so that others can test it.

Important note: even if it's working most of the time, it doesn't remove the fact that some requests will fail. I do use it for POST requests too, as my application will return an error message in case it doesn't work, so that user can send the data again, and most probably the browser will use another IP in this case and save will work. And for static content, it's working really great.

So if you're working with browsers, do use Round-Robin DNS, either for static or dynamic content, you'll be mostly fine. Servers can also go down in the middle of a transaction, and even with the best load-balancer you can't handle such a case. For dynamic content, you have to make your sessions/database/files synchronous, else you won't be able to handle this (but that's also true with a real load-balancer).

Additional note: you can test the behaviour on your own IP using iptables. For example, before your firewall rule for HTTP traffic, add:

iptables -A INPUT -p tcp --dport 80 --source 12.34.56.78 -j REJECT

(where 12.34.56.78 is obviously your IP)

Don't use DROP, as it leave the port filtered, and your browser will wait until timeout. So now, you can enable or disable one server or the other. The most obvious test is to disable server A, load the page, then enable server A and disable server B. When you'll load the page again, you'll see a little wait from the browser, then it will load from the server A again. In Chrome, you can confirm the server's IP by looking at the request in the network panel. In the General tab of Headers, you'll see a fake header named Remote Address:. This is the IP from where you got an answer.

So, if you need to go in maintenance mode on one server, just disable the HTTP/HTTPS traffic with one iptables REJECT rule, all requests will go to other servers (with one little wait, almost not noticeable for users).

Yvan
  • 350
  • 3
  • 8
1

I do not think it is a good enough solution because let's say you have two servers now and you round robin using DNS to each server's IP address. When one server goes down, the DNS servers have no knowledge that it went down and will continue to serve that IP address, as part of the RR process. Then 50% of your audience will get a broken site missing javascript or images.

Perhaps it is easier to point to a common IP address that is handled by Windows NLB representing two servers behind. Unless you are using a Linux server for your static content, if i remember reading that somewhere?

icelava
  • 870
  • 4
  • 13
  • 28
  • NLB is just round-robin at the server NICs, rather than at the DNS server. For this on Linux you want a HA solution - RedHat has one, or look at UltraMonkey for lots of detail. – gbjbaanb Jan 09 '10 at 14:06
  • yeap i know what NLB does. I'm recommending that over DNS RR because a server failure won't cripple half the users. – icelava Jan 09 '10 at 16:59
  • @gbjbaanb or put another way, NLB is round robin at Layer 2. DNS based round robin is at (or depends on) Layer 7 – Alnitak Sep 07 '11 at 07:40
1

Round-robin load balancing only works when you are also in control of the DNS Zone so that you can change the list of servers and push it to the zone masters in a timely manner.

As mentioned in one of the other answers, the hidden evil of round-robin is DNS caching which can happen anywhere between your servers and the client which completely negate the small benefit of this solution. Even with DNS TTL set to a very low value you have little control over how long ISP's or even the client's DNS cache will keep the now-dead IP address active.

It's an improvement over a SPOF for sure, but only marginal. I would take a look at who ever is hosting your server and see what they have to offer, many have some sort of basic load balancer service they can provide.

You may as well have a single server with the static content duplicated in S3 and switch to the S3 CNAME when your primary goes down. You will end up with the same delay but without the multiple server cost.

bear
  • 736
  • 5
  • 4
1

This really depends on what you're talking about and how many servers you're rotating through. I once had a site that ran on several servers, and I used DNS round robin on that due to mainly my novice at the time, and it really wasn't a big issue. It wasn't a big issue because it didn't crash. It was a really stupid non-complicated system, so it held up, and had a pretty constant traffic level. If it did crash from traffic, it was during the day and something I could easily take care of. I'd say your static content qualifies as simple enough to not cause crashes on its own.

Outside of hardware failure etc., how stable has your server been? How "spikey" is your traffic on this content? Assuming straight up Apache or something and relatively flat traffic, it's not going to crash a lot, and I would say round-robin is "good enough".

I'm sure I'll get down voted because I'm not preaching a 100% HA solution, but that's not what you asked for. It comes down to what you're willing to accept as a solution vs. effort spent.

UltimateBrent
  • 459
  • 2
  • 7
  • 13
1

If you were using RR DNS for load balancing, it would be fine, but you aren't. You're using it to enable a redundant server, in which case it is not fine.

As a previous post said, you need something to detect heartbeat and stop hitting it until it comes back.

The good news is heartbeat is available really cheaply, either in switches or in Windows.

Dunno about other OSs but I assume it's there as well.

1

I suggest that you assign an additional IP address to each of your servers (in addition to the static IP that you use for, say, ssh), and you take that into the DNS pool. And then you use some software to switch around these IP addresses in case a server fails. Heartbeat or CARP can do that, for example, but there are other solutions out there.

This has the advantage that for the clients of your service, nothing has to change in the setup, and you don't have to worry about DNS caching or TTL, but you can still take advantage of the DNS round-robin "load balancing".

Peter Eisentraut
  • 3,575
  • 1
  • 23
  • 21
1

It'll probably do the job, especially if you can have multiple IPs on your static boxes. have one "serve static content" IP and one "manage machine" IP. If a box then goes down, you can either use an existing HA solution or manual intervention to bring the IP from the failed machine up on either one of the other "cluster members" or a completely new machine (depending on how fast it would be to get that up and running).

However, such a solution will have some small issues. The load balancing will not be anywhere close to perfect and if you're relying on manual intervention you may have outages for some visitors.

A hardware load balancer can probably do a better job of both sharing the load and providing "cluster uptime" than DNS round-robin will. On the flip side, that is one (or two, since ideally you have teh LBs in a HA cluster) pieces of hardware that will need buying, power and cooling and (possibly) some time to get acquainted with (if you do not already have dedicated load balancers).

Vatine
  • 5,390
  • 23
  • 24
1

To succinctly answer the question (is round robin DNS good enough as a starter, better than nothing, "while we research and implement better alternatives" form of load balancing for our static content?), I would say that it is better than nothing, but you should definitely continue to research other forms of load balancing.

hmallett
  • 2,425
  • 14
  • 26
1

When researching Windows Load Balancing several years ago, I saw a document that stated that Microsoft's web farm was configured as multiple load-balancing groups, with DNS round robin between them. Since you can have multiple DNS servers responding in each namespace, and since Microsoft's load balancing is self-healing, this provides both redundancy and load balancing.

Downside: you need at least 4 servers (2 servers x 2 groups).

Answering Jeff's comment on Schof's answer, is there a way to DNS round-robin between HAProxy servers?

Graham Powell
  • 410
  • 2
  • 8
0

It has very marginal use, enough to get you by while you put a real solution in place. Like you say, the TTL's have to be set quite low. This has the side benefit, though, of pulling out a problematic machine from DNS while it's having issues. Say you have SvrA, SvrB and SvrC handing out your content and SvrA goes down. You pull it out of DNS and after the short time period defined by your low TTL, resolvers will figure out a different server (SvrB or SvrC) that are up. You get SvrA back online and put it back into DNS. A short downtime for some folks, none for others. Not great, but workable. The more static servers you put in the mix the less likely you will be to have majority groups of users down.

You certainly will not get the true balanced distribution that a real load balancing solution will provide due to the topology of the Internet. I'd still watch the load on all of the servers involved.

squillman
  • 37,618
  • 10
  • 90
  • 145
  • the content is 100% static so the load is negligible -- even on one server. It's mostly bandwidth. – Jeff Atwood Jan 09 '10 at 03:17
  • 1
    All out the same pipe? – squillman Jan 09 '10 at 03:35
  • TTL are most of the time never used by DNS you'll hit along the way. Each DNS would do what its administrator wants. And most of them would never allow a TTL of 5 min, meaning reloading the data from the DNS source every 5 min... best way to outage a DNS server for no valid reason. And you're wrong with «marginal use», Google uses it for all its search servers... and I really doubt they're the only ones to do it. RR-DNS is great, when you know what it does. – Yvan Aug 14 '17 at 13:03