1

A bit of a bodged up title but I don't know enough of the subject to come up with a more suitable one.

I've read time and time again that anycast is a great solution for load balancing and is the preferred solution to DNS load balancing. However, I am wondering, anycast only appears to have an advantage of load balancing and provides no help of redundancy. Whereas a plain DNS solution with no load balancing (i.e. just multiple A records) doesn't offer any load balancing but does appear to offer better redundancy.

I have been taking a closer look at DNS services and noticed that in 2016 Dyn suffered an outage: https://en.wikipedia.org/wiki/2016_Dyn_cyberattack . But two things:

1) If something goes wrong with the server behind a particular anycast announcement, are other routes automatically tried? If so, why did Dyn suffer such an outage - or is this due to DNS running on UDP?

For example, if we are trying to connect to a blue node, and follow the route 1-2-6, and find route 6 is broken (cannot connect to server or some error), will routes 1-2-5 or 1-3-4 automatically be tried?

enter image description here

2) Is there anything that a client could do to mitigate this problem?

3) It seems to me that anycast is more likely to sacrifice a particular region to keep other regions online, as opposed to more of a DNS round-robin affair that would not offer the same performance but would offer better cushioning of such an attack. So, why is it (assuming my thoughts are correct) that there seems be be a big push for anycast and less of a push for more round robin DNS services that would return the order of servers relevant for the user.

I'm aware of this question Multiple data centers and HTTP traffic: DNS Round Robin is the ONLY way to assure instant fail-over? although I don't consider this a duplicate as I'm interested in the reasons why anycast can fail as it does.

R4D4
  • 189
  • 6

1 Answers1

4

So first off let's briefly review what's implied by using anycast for DNS:

  1. A given IP address a is the resolver that we wish to make more available. The a host is a member of the A /24 subnet. Anycast can be accomplished with specific host routes (i.e. a/32) but this is generally only seen within private networks, not on the general Internet.

  2. There is some mechanism in place such that the A subnet is dynamically announced only when the corresponding DNS service is operational. Please note (and this is really important) that the advertisement itself could be coming from a single host within a site that runs a resolver, from an entire physical site containing multiple instances of said resolver (i.e. many hosts running resolvers, the site as a whole sharing a single route).

  3. The same route (A) will be advertised from multiple points on the public Internet. This might take the form of a large provider (read: points of presence dispersed across the globe) presenting the same route at each point of interconnection with foreign networks or the same route coming from points hosted within multiple carriers.

So - when an arbitrary client sends a packet toward the anycast IP, said packet will tend to find itself to the "closest" point of advertisement. I've put scare-quotes around closest because it's only close in the sense of how the routing topology has been laid out and what policies are in place for the routers along the way. It's entirely possible that the closest instance of the anycast address might actually be the furthest physically.

If, in turn, the point at which this route is advertised fails (...which could be result of the service failing on the host and the route retracting or a more traditional network reachability issue) then packets bound to the anycast address will be routed to the next-closest (again - in routing protocol terms) instance of the route. During network reconvergence the client's resolution might fail and be re-attempted, with the re-attempt now following a longer path to reach what is - apparently - the same address. This is all transparent to both the client process and the user and is best thought of in network terms as following an alternate path to a given network.

It's sometimes helpful to think of an anycast network as a logical construct. It's a virtual subnet that contains the service you're interested in. That virtual subnet is reachable via many paths through the network.

That said, here are the major caveats to anycast designs:

  1. Since there's no guarantee that a given packet to the anycast IP will reach the same physical host, this approach really only maps to connectionless protocols.

  2. The reliability of the solution is only as good as the logic tying the correct operation of the service to the advertisement of the route. If the service dies and the route continues to be advertised then there will be a potentially significant black hole.

  3. Getting the anycast route advertisements well- and properly- distributed across the public Internet is not trivial. It's very easy to create hot-spots: a particular instance of an anycast route that happens to be preferable to most clients. This is still a potentially decent HA solution (for the easier types of failures) but it doesn't speak to load balancing.

Now - finally - with all of this laid out, your question is easier to answer:

There's nothing inherent to anycast that makes it more resistant to DDoS. Each of the potentially millions of flows of DDoS traffic will find their way to their nearest instance, likely making it unavailable to any other legitimate clients who are would otherwise be routed to these points.

Now, if the vast majority of the hosts on the botnets in use happened to be in, say, Eastern Europe and one of the anycast routes happened to be originated in a nearby PoP (again - "nearby" in terms of routing topology) then this traffic would be sunk to one point while much of the rest of the world continued to resolve to the same route that was also hosted at convenient points on other continents. In this particular case anycast would arguably be one of the best mechanisms to minimize the damage of a DDoS attack. This is highly contingent on how the anycast routes have been distributed and how policy has been configured (see #3 above - not a trivial problem).

Clearly this use-case isn't as compelling in the case of a DDoS attack that's truly distributed. If properly engineered, though, the localization of the anycast routes means that the attack load can now be spread across an arbitrary number of geographically dispersed physical hosts. This will tend to dilute the effect of the attack on the target as well as potentially spreading the impact across a bigger chunk of the network. Again - a huge amount is contingent on how things have been engineered and configured.

Why is this considered a win over round-robin? Simply because it's possible to deploy an arbitrary number of hosts without the need for separate load-balancers on the individual IP's and there's also no reliance on the timeout value for particular clients deciding to move over to another resolver. One could literally deploy a thousand hosts within a single data center with the same IP and balance the traffic accordingly (nb - obviously massive practical limits based on size of ECMP tables, etc) or deploy a thousand geographically disparate sites each with a thousand hosts. All this could be accomplished without changing a client configuration, without the (admittedly usually clustered) point of failure of a load balancer, etc. In short - when properly engineered it scales as well as the Internet as a whole.

rnxrx
  • 8,103
  • 3
  • 20
  • 30
  • long answer but I think you covered everything, additionally anycast is specific to routing, whereas round robin dns is dns, you would topically use both backed by load balancers on vips or ecmp as suggested – Jacob Evans Mar 26 '17 at 02:25
  • Really didn't expect such a detailed answer - thank you very much! – R4D4 Mar 27 '17 at 16:24
  • This is a great answer, just keep in mind that the benefits aren't limited to DDoS related failures. (which the question didn't single out) Devices can also retract their routes as part of scripted software conditions, which can be beneficial if a health check script detects a problem but is unable to automatically remedy the situation. – Andrew B Mar 27 '17 at 22:23