Did the October 21, 2016 DDoS attack on Dyn's DNS service cause Bandwidth Exhausion?

Question

A recent attack on Dyn's DNS services affected several major websites last Friday, Oct 21, 2016. I would be interested to know if this was Bandwidth Exhaustion, or if it was mainly load on the server and/or router equipment?

The official statement states that 10s of millions of IP addresses were involved in the DDoS attack. I presume that Bandwidth Exhaustion would occur under such circumstances. Bandwidth Exhaustion attacks of course can only be solved in cooperation with the ISP.

However, based in this description, it seems like that is not the substance of the attack. (emphasis mine)

[Dale Drew, CSO of Level 3] says the attack consisted mainly of TCP SYN floods aimed directly at against port 53 of Dyn’s DNS servers, but also a prepend attack, which is also called a subdomain attack. That’s when attackers send DNS requests to a server for a domain for which they know the target is authoritative. But they tack onto the front of the domain name random prepends or subnet designations. The server won’t have these in its cache so will have to look them up, sapping computational resources and effectively preventing the server from handling legitimate traffic, he says.

On the other hand, both of those strike me as fairly simple to mitigate quickly. For example, temporarily disabling/throttling TCP requests and recursive lookups
(or is my interpretation wrong? Perhaps it's not a recursive lookup, just a database lookup?)
to make room for more important requests; such as authoritative lookups of i.e. twitter.com, that are almost exclusively served over UDP.

Besides better understanding why those vectors could not have been temporarily sacrificed to mitigate the attack; my main question is whether or not Bandwidth Exhaustion occurred? (Is this public knowledge?) That would certainly explain the impact more so than the above reported vectors.

For authoritative DNS servers like the ones hosted by Dyn, it is indeed just a database lookup and not a recursive one. But looking into a database to verify that a requested subdomain does not exist is still much more resource intensive than returning a cached result. — tlng05, Oct 24 '16 at 17:23
Most authorative DNS servers - and certainly all high-volume ones like dyn - disable recursive lookups, and have their zone files in memory. I'm sceptical of the article's claim that the "prepend attack" will sap resources any more than bandwidth exhaustion. But I don't have any inside info. — paj28, Oct 24 '16 at 17:55
Unless someone has insider information OR Dyn makes a statement, I think this question is unanswerable. — Jesse K, Oct 24 '16 at 18:02
Both TCP SYN packets and DNS query packets (which are mainly UDP, unless you are not running DNSSEC) are of small size ( in bytes or at max in kb). So bandwidth choking is not possible through tcp syn or dns queries. — Gaurav Kansal, Oct 25 '16 at 04:08
@GauravKansal, That is surprising to me. Why not simply send more of them? — 700 Software, Oct 25 '16 at 12:21
@GeorgeBailey.... DNS request will generally be in the range of 100 bytes. So, for generating a 1 GB request, you will need to generate 1024*1024*1024/100 = 10^7. i.e, 1 million packets per sec. TCP SYN packets is of 40 bytes. So, ideally its tough to choke a link with DNS requests and tcp syn packets. — Gaurav Kansal, Oct 25 '16 at 12:29

phyrfox · Answer 1 · 2016-10-24T18:10:08.517

TL;DR: The attacks effectively capped out allocated memory buffers and ran CPU usage up to capacity, leaving the servers unable to respond to legitimate requests. Bandwidth Exhaustion rarely occurs with SYN attacks, because very little bandwidth is required for SYN attacks and/or bogus DNS lookups.

A TCP SYN flood simply fills up available memory buffers while the server waits for non-existent ACK packets to return. Very little bandwidth is consumed in this attack, and can take down a server with small TCP pools even if they had literally unlimited bandwidth, because the bandwidth isn't the source of the bottleneck, the memory held by the pending connections are.

You can visualize this like a telephone operator's switchboard, where many of the calls are just people breathing into the other end of line without actually saying anything useful; legitimate callers have to wait for the operator to decide to hang up on all the prank calls. Eventually, there's no more spare phone lines so legitimate callers effectively get a busy tone and have to try again later.

A prepend attack also uses very little bandwidth, as the scripts simply ask for random sub-domain names of the target domain, forcing the DNS server to spend time performing recursive look ups and repeatedly querying DNS records for entries not in its cache. In this case, instead of memory being the bottleneck, the CPU fills that role by performing a lot more work than it has to for legitimate requests.

You can visualize this sort of attack as many people calling the operator at the switchboard, each asking to speak to non-existent people. The operator has to take time initially to look up each request before telling the caller that there's nobody there by that name, while callers that want to talk to known people have to wait for all the lookups the operator has to go through. This works best if you visualize the operator having a huge phone book and they have to spend a lot of time flipping pages to determine if the name actually exists.

Bandwidth Exhaustion, by way of comparison, is when the network becomes saturated with data trying to reach the server. In this case, neither the server's memory nor processing power are the bottleneck, but instead depends on the network exceeding 100% capacity for some length of time. The best way to accomplish this type of attack is to send really large packets really fast, from as many locations as possible.

This is harder to visualize, but imagine many prank calls coming in at once, and the operator at the switchboard has to wait for each person to finish saying what they have to say before going on to the next call. Each prank call is actually some person prattling on about their entire life story and family history for the last 50 years, tying up the lines for that caller that simply wants to be transferred. It's not a such a perfect analogy as it is for the others, though, because this is actually effectively occurring upstream from the operator. It's out of their control.

Do you have any sources for these claims? [SYN cookies](https://en.wikipedia.org/wiki/SYN_cookies) are an effective defence against SYN flood. And dyn's servers have recursion disabled. So I doubt either of your explanations apply in this case. I won't downvote you though because your answer is well written. — paj28, Oct 24 '16 at 18:25

score 1 · Answer 2 · answered Oct 24 '16 at 17:36

Why not just disable lookup?

You have to consider what the goal of the attacker here is to deny users of the sites that uses Dyn's DNS from resolving names. If Dyn disabled the subdomain lookup while an attack is happening, that would've denied legitimate users of these sites from being able to resolve these names as well. This is exactly what the attacker wanted, goal achieved for them.

Every second that Dyn disabled their service, their customers, the big sites that relies on their DNS service is unreachable and these big sites are losing huge money. When the gate is closed, the attacker can just dial down the attack (reducing their own cost and making themselves less visible and harder to block) and wait until Dyn reopens and then just reflood, rinse and repeat. All the while legitimate customers can't connect to their favorite sites.

Those are helpful points about dialing down and up again. My main question is whether or not Bandwidth Exhaustion occurred? — 700 Software, Oct 24 '16 at 17:42

score 1 · Answer 3 · answered Oct 24 '16 at 19:02

This is all speculative:

Looking at the events and the services this company provides it looks like this was very targeted against the DNS pools that try to optimize geo placement and traffic direction. The reason TCP is utilized is because of RTT measurements (Set the truncation bit and the client sends the request of tcp) and auth. This can give you approximates to the GEO and backed by ISP and ASN geo tags.

If these were linux based systems providing tcp syn backlogs then tuning could have been off for the volume. Because of the quantity of BOTs even at low volume (100 syns a second per bot) there could be ~100million syns a second at peak if the CNC issues 100 syns per second. Also, there could be a secondary effect associated with spoofing if the source address ranges happen to focus on IP addresses associated with a specific geo region like the east cost of the US.

Throttling, does work but again only at specific scales. Then you run into the Needle and haystack problem of legit traffic vs spoofed traffic with the same source. Like I said above the geo ip range could have been targeted for spoof pool just as much as the service it self.

Did the October 21, 2016 DDoS attack on Dyn's DNS service cause Bandwidth Exhausion?

3 Answers3