53

Why are the recent DDoS attack against DNS provider Dyn, and other similar attacks successful? Sure a DDoS attack can bring an entity down, and if that entity controls DNS servers then queries to those nameservers will fail, and domains listed under those nameservers will not be reachable by any host that doesn't already have IP information for them.

But since browsers cache DNS records, many hosts will already have IP information for those domains (at least until their cache entries expire) and therefore the fact that the nameservers are down shouldn't matter to the hosts with caches. But this does not appear to be the case: during yesterday's attack I was unable to access github, npm, etc.

aeb0
  • 640
  • 5
  • 8
  • Although most answers here are talking about browser caches, I'm also interested in the intermediate caches (such as your ISP). When I make a DNS change I'm often told that it can take a day for the change to propagate across the internet. Why are those intermediate DNS caches not helping? – Alex White Oct 31 '16 at 15:47

6 Answers6

56

You are correct that the DNS cache would mitigate against a nameserver being unavailable. It is extremely common to have a TTL of 5 minutes or lower. Hence, 5 minutes after the DDOS attack brought down Dyn, your cache would've been invalid and you wouldn't have been able to hit github, etc.

Shackledtodesk
  • 1,201
  • 10
  • 10
  • 13
    How frequently do IP addresses for such major sites change? I would've thought the cache would be at least a few days, maybe weeks. – Alexander Oct 23 '16 at 05:42
  • 33
    @AlexanderMomchliov The TTL is chosen for latency rather than frequency. If your IP address is changed you don't want to wait weeks before people can use it again. – OrangeDog Oct 23 '16 at 08:12
  • 3
    Out of curiosity... DNS driven round-robin load balancing and failover (which is usually done entirely differently!) set aside, what is the rationale about making the cache so needlessly short-lived? It's not like a site's DNS entry should normally change 200 times per day. One would think two hours should work just fine, too. – Damon Oct 23 '16 at 14:33
  • 22
    @Damon: Back to OrangeDog comment. It does not change often, but when it does change you'd rather it changed *now*, not 2 days later. Also, some sites/services based on clouds *are* dynamic: VMs get killed, shuffled, stopped, spawned, ... and the end-user should still be directed to a server/port where the desired site/service is actually located. On the other hand, DNS Caches like [EdgeDNS](https://github.com/jedisct1/edgedns) will keep the expired entries around and use them until they manage to refresh them from the authoritative DNS, which is useful when it's down/slow. – Matthieu M. Oct 23 '16 at 15:34
  • 2
    As mentioned, but worth repeating. If you have a sub-5minute TTL on your DNS, you are probably doing geographically distributed load balancing through DNS. At the 5 minute mark, we are talking about being able to fail-over to a DR (Disaster Recovery) site when the primary has failed. Hence, you don't want a long TTL for your DNS for either reason. While EdgeDNS may keep DNS entries upon expiry of cache AND the primary not responding, this is outside of the RFC spec for DNS and both nor normal and usually not what you want. – Shackledtodesk Oct 24 '16 at 01:42
  • 5
    This is what happens when people use Stupid DNS Tricks® to provide failover and other features (e.g. geo-routing) that DNS was not designed to do. Failover belongs in the IP layer, not the naming layer. – Alnitak Oct 24 '16 at 15:54
  • @MatthieuM. Notice that in the cloud you rarely directly attach a domain to 1 server. You attach it to a load balancer. – Anemoia Oct 24 '16 at 19:56
  • 2
    @Alnitak Can you add that as an answer? I think it's underemphasized. – bright-star Oct 24 '16 at 23:39
  • @Shackledtodesk `usually not what you want.` When would it ever be better to provide no DNS record than to provide an expired DNS record, if the primary DNS server for the host is unreachable? – Nateowami Oct 25 '16 at 10:18
  • @TrevorAlexander done – Alnitak Oct 25 '16 at 21:48
49

A small design change to DNS caches could make a big difference. Most DNS caches remove an entry when the TTL expires. A cache could instead keep the entry, but mark it as expired. If a query comes in for an expired entry, the cache would first try to resolve the name upstream, and if that fails, return the expired entry. I expect this is technically in violation of the DNS protocol, but still a better failure behaviour.

However, I don't expect to see this happen. The impact of DNS servers being down would still be significant - all the sites you don't have in your cache. The focus will remain on keeping the DNS infrastructure operational.

Update: @MatthieuM has pointed out that EdgeDNS does this.

paj28
  • 32,736
  • 8
  • 92
  • 130
  • 16
    Note that [EdgeDNS](https://github.com/jedisct1/edgedns) does exactly this. It keeps expired entries around and uses them until it manages to get a reply from the authoritative DNS for the entry. – Matthieu M. Oct 23 '16 at 15:29
  • 11
    This is a security vulnerability. If I gain control of a site's old IP address, then I trick people into visiting my page instead by DoS-ing their DNS. This could happen long after the IP changes if I know someone hasn't visited the site in a long time. – BlueRaja - Danny Pflughoeft Oct 23 '16 at 22:28
  • Is there any software that does this for Windows? @MatthieuM. – user541686 Oct 24 '16 at 02:31
  • 10
    @BlueRaja-DannyPflughoeft - Any site where the security matters should have SSL which stops that – paj28 Oct 24 '16 at 09:25
  • @paj28 It's not a mediocre answer, it's the _correct_ answer. The current accepted answer basically says "because that's not the way browser-based DNS caches are currently designed", whereas this answer gets to the root of the question of "why" by explaining that, while browsers _could_ mitigate the effects of DNS going down, doing so would be of limited impact. – Ajedi32 Oct 24 '16 at 17:04
  • 2
    From a performance standpoint, I would think that it would be advantageous to say that if program wants to establish a connection to a host whose cache entry is more than e.g. 5 minutes old but less than e.g. a day old, the program should immediately connect using the cached address, but the cache should issue a DNS request and update the cache if the response indicates a new address. – supercat Oct 24 '16 at 22:46
  • 3
    @paj28 `The impact of DNS servers being down would still be significant - all the sites you don't have in your cache.` Assuming we're talking about the browser cache. But couldn't non-authoritative DNS servers do the same? Then if Dyn went down again, all the other DNS servers would keep DNS entries cached. Smaller websites wouldn't always be cached, but it would make it a ton harder to take down large portions of the net by hitting one centralized target. (Unless of course I am wrong). – Nateowami Oct 25 '16 at 10:32
  • @Nateowami - Nice idea! – paj28 Oct 25 '16 at 10:49
  • The DDoS will just move up the stack.... – Ian Ringrose Oct 25 '16 at 18:23
  • @IanRingrose - Too true :( We don't have a comprehensive solution to DDoS a the moment, just a series of mitigations from syn cookies to distributed scrubbers. For now, tactical counter measures are the best anyone's got. – paj28 Oct 25 '16 at 20:53
11

@Shackledtodesk is correct (+1), the browser cache is kept for a short time. Ironically enough some of the best references about this fact have been published by Dyn:

A simple program I wrote to query the top 1000 websites (according to Alexa) shows 212 hits with a TTL value of 300 (5 mins), 192 hits with a TTL of 3600 (1 hr), 116 hits with a TTL of 600 (10 mins) and 79 hits with a TTL of 86400. The rest of the results had hits in the 50s and less, ranging anywhere from a TTL of 5 (1 hit) to a TTL of 864000 (1 hit).

This is a quote from Ben Anderson, a researcher and technical writer at Dyn.

Looking at those results you can see that over a small amount if time your browser is invalidating the DNS cache. And your DNS resolution starts to fail.

Reference


PS: To add insult to injury, the linked article from Dyn argues that the browser DNS cache is a bad thing.

grochmal
  • 5,677
  • 2
  • 19
  • 30
5

Browsers do not cache DNS records

This is a function of the resolver which is an adjunct to the network stack.

DNS Caching would not help much

The mirai enslaved devices are capable of carrying out any number of different attacks as directed by the CnC. In the case of both the attack on Krebbs security and DYN, the attackers simply filled their bandwidth with traffic - it didn't actually matter what the traffic was. While DNS can be exploited for an indirect amplification attack, it is my understanding that this does not apply in the case of the attacks on Krebss and DYN. DNS was used in the latter attack as it made it impractical to filter out real traffic from the attack traffic.

If the DNS records were cached elsewhere accessible to normal users (on DNS caches, not in browsers) then the attack would have had much less impact, however the DYN business model primarily targets DNS hosting and end-user provision. The latter would have been disrupted regardless. The presence of the data in intermediate caches / other end-user providers is predicated on the volume of traffic and the expiry time (its my experience that expiry times of less than 2 hours are ineffective). Further, a high traffic site will have multiple geographic points of presence along with multiple A records at each POP - multicast addresses are expensive, and (due to edns-client-subnet) not required other than for DNS (in the absence of DOS attacks).

Scott Koland
  • 103
  • 2
symcbean
  • 18,278
  • 39
  • 73
  • 7
    They don't? Perhaps at least chrome caches DNS records? Or perhaps 'cache' is not exactly the right word for what they do? chrome://net-internals/#dns – aeb0 Oct 23 '16 at 17:33
  • 4
    Caches, caches everywhere http://superuser.com/questions/203674/how-to-clear-flush-the-dns-cache-in-google-chrome – leonbloy Oct 23 '16 at 18:46
3

The DNS was primarily designed to provide a stable (and loosely coherent) mapping of names to addresses. In the good ol' days, the Time To Live (TTL) on DNS records was typically in the range of 3600 to 86400 seconds. It was expected that whoever asked for a particular record you would always get the same answer.

Some people then figured that if they used really short TTLs that they could perform Stupid DNS Tricks® that coerce the DNS into doing things it wasn't intended to do.

For example, some load-balancing appliances have integrated DNS servers that monitor the health of the back-end servers and serve out a different answer to each inbound request based on their current load.

Some operators look at the source address of the incoming query and send back different answers to redirect the client to the nearest application cluster (aka "Global Server Load Balancing").

Apropos last week's attack on Dyn - good DNS practise used to be that you'd distribute your authoritative DNS servers across multiple networks (and/or operators) so that an attack or outage on one would still leave you with running DNS.

However the aforementioned "tricks" involve bespoke algorithms and "intelligence" that are not inherent to the DNS itself such that it becomes very hard (if not impossible) to rely on the built-in resilience of the DNS. A system that generates synthesised answers instead of using a zone file cannot be shared across multiple operators using AXFR.

Alnitak
  • 130
  • 5
1

DNS cache does mitigate DDOS attacks on DNS providers, but the cache only SHOULD last for a short time.

The maximum time a resource record should be cached is specified by the server, called TTL.

The meaning of the TTL field is a time limit on how long an RR can be kept in a cache. This limit does not apply to authoritative data in zones; it is also timed out, but by the refreshing policies for the zone. The TTL is assigned by the administrator for the zone where the data originates. While short TTLs can be used to minimize caching, and a zero TTL prohibits caching, the realities of Internet performance suggest that these times should be on the order of days for the typical host. If a change can be anticipated, the TTL can be reduced prior to the change to minimize inconsistency during the change, and then increased back to its former value following the change.

(taken from RFC 1034)

The server can tell the resolver that the record can be cached for over 68 years, which is usually long enough for an attack to be fixed. But servers usually don't do so. Big websites don't want a failure in the network to affect them for a long time. One way to do so is to set the TTL of their resource records to a relatively short time, such as 5 minutes. That way, they can to change their DNS record in case some of their servers fail. And clients querying the RR every 5 minutes isn't much overload than querying it only once.

Additionally, the applications usually cache the RR in RAM. So the records are lost once the application is restarted. (There are exceptions. You can dump BIND's cache to the filesystem for example.)

I want to mention Namecoin here. It stores domain names on the disk, in a blockchain. If your website uses a .bit domain, it is unlikely to go down solely because of the DNS provider.

v7d8dpo4
  • 267
  • 1
  • 5