29

This is a Canonical Question about securing public DNS resolvers

Open DNS servers seem pretty neat and convenient, as they provide IP addresses that we can use consistently across our company regardless of where they are located. Google and OpenDNS provide this functionality, but I'm not sure that I want these companies to have access to our DNS queries.

I want to set up something like this for use by our company, but I hear a lot about this being dangerous practice (particularly in regards to amplification attacks) and I want to make sure that we do this right. What things do I need to keep in mind when building this type of environment?

Mark Henderson
  • 68,316
  • 31
  • 175
  • 255
Andrew B
  • 31,858
  • 12
  • 90
  • 128

2 Answers2

33

There are a few things you need to understand going into this:


This is a network engineering problem.

Most of the people who are looking to set up this type of environment are system administrators. That's cool, I'm a system administrator too! Part of the job is understanding where your responsibilities end and someone else's begins, and believe me, this is not a problem system administrators can solve on their own. Here's why:

  • UDP is a stateless protocol. There is no client handshake.
  • Queries against a DNS server are an unauthenticated two-step transaction (query, reply). There is no way for the server to know whether the source IP is spoofed before it replies.
  • By the time a query has reached the server, it is already too late to prevent a spoofed UDP packet. Spoofing can only be prevented by a practice known as ingress filtering, a topic which is covered by documents BCP 38 and BCP 84. These are implemented by the networking devices sitting in front of your DNS server.
  • We can't give you a walkthrough on how to set up your datacenter from end to end, or how to implement these best practices. These things are very specific to your own needs. Q&A format just doesn't work for this, and this site is not intended to be a substitute for hiring professional people to do professional work.
  • Do not assume that your billion dollar too-big-to-fail company implements ingress filtering correctly.

This is not a best practice. The best practice is not to do this.

It's very easy to set up an internet facing DNS resolver. It takes far less research to set one up than to understand the risks involved in doing so. This is one of those cases where good intentions inadvertently enable the misdeeds (and suffering) of others.

  • If your DNS server will respond to any source IP address it sees, you're running an open resolver. These are constantly being leveraged in amplification attacks against innocent parties. New system administrators are standing up open resolvers every day, making it lucrative for malicious individuals to scan for them constantly. There isn't really a question whether or not your open resolver is going to be used in an attack: as of 2015, it's pretty much a given. It may not be immediate, but it's going to happen for sure.
  • Even if you apply an ACL using your DNS software (i.e. BIND), all this does is limit which spoofed DNS packets your server will reply to. It's important to understand that your DNS infrastructure can be used not only to attack the devices in the ACL, but any networking devices between your DNS server and the devices it will respond for. If you don't own the datacenter, that's a problem for more than just you.

Google and OpenDNS do this, so why can't I?

Sometimes it's necessary to weigh enthusiasm against reality. Here are some hard questions to ask yourself:

  • Is this something you want to set up on a whim, or is this something you have a few million dollars to invest in doing it right?

  • Do you have a dedicated security team? Dedicated abuse team? Do both of them have the cycles to deal with abuse of your new infrastructure, and complaints that you'll get from external parties?

  • Do you have a legal team?

  • When all of this is said and done, will all of this effort even remotely begin to pay for itself, turn a profit for the company, or exceed the monetary value of dealing with the inconvenience that led you in this direction?


In closing, I know this thread is Q&A is kind of a letdown for most of you who are being linked to it. Serverfault is here for providing answers, and an answer of "this is a bad idea, don't do it" isn't usually perceived as very helpful. Some problems are much more complicated than they appear to be at the outset, and this is one of them.

If you want to try to make this work, you can still ask us for help as you try to implement this kind of solution. The main thing to realize is that the problem is too big by itself for the answer to be provided in convenient Q&A format. You need to have invested a significant amount of time researching the topic already, and approach us with specific logic problems that you've encountered during your implementation. The purpose of this Q&A is to give you a better understanding of the larger picture, and help you understand why we can't answer a question as broad as this one.

Help us keep the internet safe! :)

Andrew B
  • 31,858
  • 12
  • 90
  • 128
  • 5
    As a complement, people can check they don't have open dns relay on their public range via the [openresolver project](http://openresolverproject.org). Everyone should have in mind that the internet contains about [20 million](http://openresolverproject.org/breakdown.cgi) of open relays accepting recursive queries. An example of the consequences : CloudFlare suffered a 300 Gb/s DNS amplification attack using 0.1% of these – Xavier Lucas Oct 09 '14 at 21:33
  • Couldn't you disable UDP and force all queries to use TCP instead? – 小太郎 Oct 10 '14 at 02:38
  • @小太郎 [Refer to this question.](http://serverfault.com/q/348399/152073) A resolver library will default to UDP mode and in many cases retry with TCP if a reply was truncated, but that's about it. It would work if the application was bypassing the OS and performing its own lookup, but that would usually defeat the purpose of what people are trying to accomplish with this setup. – Andrew B Oct 10 '14 at 02:49
1

Whether you are running an open DNS recursor or an authoritative DNS server, the problem is the same and most of the possible solutions are also the same.

The best solution

DNS cookies is a proposed standard which gives DNS servers a way to require clients to send a cookie in order to prove that the client IP address has not been spoofed. This will cost one additional roundtrip for the first lookup, which is the lowest overhead any solution could offer.

Fallback for older clients

Because DNS cookies are not yet standardized it will of course be necessary to support older clients now and for years to come.

You can rate limit requests from clients without DNS cookie support. But rate limits make it easier for an attacker to DoS your DNS server. Beware that some DNS servers have a rate limit feature designed only for authoritative DNS servers. Since you are asking about a recursive resolver, such rate limiting implementations may not be applicable to you. The rate limit by design will become the bottleneck for your server, and thus an attacker will need to send you less traffic in order to cause legitimate requests to be dropped than he would have if there was no rate limit.

One advantage of rate limits is that in case an attacker does flood your DNS server with DNS requests, you are more likely to have capacity left over that will allow you to ssh to the server and investigate the situation. Additionally rate limits can be designed to primarily drop requests from client IPs sending many requests, which may be enough to protect you against DoS from attackers who don't have access to spoof client IPs.

For those reasons a rate limit a little under your actual capacity may be a good idea, even if it doesn't actually protect against amplification.

Using TCP

It is possible to force a client to use TCP by sending an error code indicating that the answer is too large for UDP. This has a couple of drawbacks. It costs two additional roundtrips. And some faulty clients do not support it.

The cost of two additional roundtrips can be limited to only the first request using this approach:

When the client IP has not been confirmed, the DNS server can send a truncated response to force the client to switch to TCP. The truncated response can be as short as the request (or shorter if the client uses EDNS0 and the response does not) which eliminates the amplification.

Any client IP which completes a TCP handshake and send a DNS request on the connection can be temporarily whitelisted. Once whitelisted that IP gets to send UDP queries and receive UDP responses up to 512 bytes (4096 bytes if using EDNS0). If a UDP response triggers an ICMP error message, the IP is removed from the whitelist again.

The method can also be reversed using a blacklist, which just means that client IPs are allowed to query over UDP by default but any ICMP error message cause the IP to be blacklisted needing a TCP query to get off the blacklist.

A bitmap covering all relevant IPv4 addresses could be stored in 444MB of memory. IPv6 addresses would have to be stored in some other way.

I do not know if any DNS server has implemented this approach.

It has also been reported that some TCP stacks can be exploited in amplification attacks. That however applies to any TCP based service and not just DNS. Such vulnerabilities should be mitigated by upgrading to a kernel version where the TCP stack has been fixed to not send more than one packet in response to a SYN packet.

kasperd
  • 29,894
  • 16
  • 72
  • 122
  • To be fair, my answer is focused on out of the box tech that is in our hands now. Most of the people who have asked this question on Serverfault aren't looking to develop their own nameserver software or write patches for existing nameserver software. Alnitak has advised us that the TCP+whitelisting approach you're suggesting [appears to be patented](http://serverfault.com/questions/708076/what-kinds-of-security-vulnerabilities-does-providing-dnssec-expose/747213#comment936499_708143), though he hasn't cited the exact patent. – Andrew B Jan 07 '16 at 13:24
  • Also, have you been able to produce the DoS attack you've mentioned in using any of the current DNS server software implementing RRL, or know of a case where someone else has achieved it? I'm pretty sure this would have come up on any number of mailing lists I subscribe to. – Andrew B Jan 07 '16 at 13:25
  • @AndrewB I haven't tested yet because I wouldn't want to cause a flood on somebody else's server. And some of the people mentioning rate limiting have an attitude that makes me think they wouldn't trust the results if I did it on my own server. But since you are asking I am going to give it a try, I just need to set up a separate DNS server for testing it. Does using the default Bind version on Ubuntu LTS 14.04 sound like a sensible setup? Which exact settings on the authoritative server would you consider reasonable for such a test? – kasperd Jan 07 '16 at 13:51
  • I'm not the best person to ask for settings unfortunately, we haven't started lab testing yet. I would still encourage you to try and create the proposed scenario: regardless of the attitudes of the parties you've been conversing with, there are numerous parties across multiple software install bases who would take interest in a practical exploit. I also suggest that you monitor UDP receive queue overflows using SNMP, graphing that will help to demonstrate whether you're successfully bogging down the software's ability to accept packets. – Andrew B Jan 07 '16 at 13:59
  • @AndrewB I just realized a minor discrepancy here. This question is about recursive resolvers. But rate limiting is not designed for recursive resolvers. `Deliberately open recursive DNS servers are outside the scope of this document.` For now I have added a warning about that. I should test if it is even possible to enable rate limiting on Bind configured as recursive resolver, and if it will behave properly. – kasperd Jan 07 '16 at 16:12
  • @AndrewB I tested `bind9` on Ubuntu 14.04 configured as recursor. I tested with 1 QPS of legitimate traffic and 10 QPS of attack traffic. Without rate limiting the legitimate requests saw an average response time of 24ms and 0% errors. With a configured rate limit of 5 responses per second, the legitimate requests saw an average response time of 7s and 19% failed to resolve the domain at all. So now I can conclusively say that I have tested it, and it is a real attack vector. – kasperd Jan 07 '16 at 17:42
  • [an open DNS recursor or an authoritative DNS server, the problem is the same] no. a recursive server can be killed without spoofing; consider the now-common random-subdomain attack. upstream context is expensive. [But rate limits make it easier for an attacker to DoS your DNS server. ... The rate limit by design will become the bottleneck for your server, and thus an attacker will need to send you less traffic in order to cause legitimate requests to be dropped than he would have if there was no rate limit. ... it doesn't actually protect against amplification.] this is false, all of it. – Paul Vixie Jan 07 '16 at 21:26
  • I'm not opposed to you finding a problem (the opposite, actually), but those numbers are awfully low. Are you missing zeroes or a k in there? What was the duration of the drops being seen? What type of attack were you using? – Andrew B Jan 07 '16 at 21:34
  • [I do not know if any DNS server has implemented (the force-TCP-fallback) approach.] yes, and their users don't turn it on, because as you say, [some faulty clients do not support it,] in fact many clients can't do TCP/53 at all, and end-user blackouts and complaints are expensive. this method is also reported to have been patented. [any ICMP error message cause the IP to be blacklisted] did you know that ICMP can be spoofed? [kernel version where the TCP stack has been fixed to not send more than one packet in response to a SYN packet] that's what the TCP specification requires #notabug. – Paul Vixie Jan 07 '16 at 21:49
  • @AndrewB I got the 5 QPS number from here: http://ss.vix.su/~vixie/isc-tn-2012-1.txt Maybe you didn't notice that the number specified is per client IP address. In order for the experiment to make any sense the rate of legitimate requests from that IP address would have to be lower than that rate, and the rate of spoofed requests from that IP address would have to be higher. That made 1 QPS and 10 QPS from that client IP address seem like reasonable values for a first test, and the DoS attack was effective with those values. – kasperd Jan 07 '16 at 21:58
  • @PaulVixie Excuse me. Are you accusing me of lying? Because that is almost what it sounds like, when you call my statements for false after I have just performed a test that showed the exact behavior I was predicting. – kasperd Jan 07 '16 at 22:01
  • [Let's take this to chat.](http://chat.stackexchange.com/rooms/33990/dns-rrl-fuzzing) Given that you've found something, I don't want the limitations of the comment system to be working against us here. If you share the config, methodologies, etc., in detail this will be much easier to comment on appropriately. – Andrew B Jan 07 '16 at 22:05
  • kasperd: [Are you accusing me of lying?] I am not in a position to know whether you believe what you're saying. I am in a position to know that what you're saying isn't true. Almost all the root name servers and over half the GTLD servers now run DNS RRL. Ask yourself these questions: (1) would vixie and schryver have designed RRL with the flaw you describe? (2) would any server operator anywhere use RRL if it had the flaw you describe? ... I think you should read the spec (http://www.redbarn.org/dns/ratelimits, including all its references) and share your questions and observations. – Paul Vixie Jan 07 '16 at 22:52
  • @PaulVixie But those are authoritative servers. You are responding to a question about recursive resolvers. That makes a huge difference. If you take a rate limiting algorithm designed for authoritative servers only and apply that to a recursive resolver, you can't expect a reasonable outcome. – kasperd Jan 07 '16 at 22:56
  • DNS RRL is explicitly designed for authority servers; the spec explains why. when you said [Beware that some DNS servers have a rate limit feature designed only for authoritative DNS servers] I wondered why you used the word "beware", but I offered no correction to your description of DNS RRL as inappropriate for recursive-only servers, because that's absolutely true. recursive servers can only be protected the easy way (firewall it off, use it internally only) or the hard way (like google and opendns, use deep learning and 24x7 human monitoring). dns cookies will someday offer a third way. – Paul Vixie Jan 07 '16 at 23:07
  • @PaulVixie It is possible to configure a recursive resolver with rate limiting enabled. An administrator doing so is not going to get as much benefit from the rate limiting as he may have been hoping for. Some might go as far as calling it a misconfiguration to enable the rate limiting on a recursive resolver. That is why I suggest administrators beware of such configurations. – kasperd Jan 07 '16 at 23:12
  • it's documented as not being suitable. anyone who turns RRL on with recursion enabled will provide bad service. stub queries can reasonably repeat since the initiators have no cache. no one, ever, should turn RRL on, if recursion is enabled. the spec is very clear on this point. ... on the other hand, every authority server needs RRL as a bare minimum config, and it should become the default for all servers. ... finally, there is no off-the-shelf technology, including your TCP fallback idea, that can make recursion safe except behind a firewall. i'd love it if you'd stop asserting so. – Paul Vixie Jan 07 '16 at 23:39
  • @PaulVixie In that case I don't get why you were first opposed to me warning against using rate limiting for a recursor. And where do you see me asserting that there are any bullet-proof off-the-shelf solutions for the problem? My only aim is to explain what is the best you can do as an administrator and/or developer of a server intended to be used as a public resolver as well as warn against the shortcomings of those solutions. – kasperd Jan 08 '16 at 00:03
  • i'm being asked to move this discussion into a chat, which i'm happy to do. my objection was not to you discommending dns rrl for recusion. it was to your descriptions of rrl in general that I objected. as to your aim, the tcp fallback idea isn't widely practicable today, in other words, like dns cookies it is vaporware. and forced TCP fallback is not practiced by operators who have that knob, so in other words, it's controversial vaporware. that's why I downvoted your answer. i'd thank you to walk back those proposals and leave the audience with practical, noncontroversial proposals. – Paul Vixie Jan 08 '16 at 00:27
  • Let us [continue this discussion in chat](http://chat.stackexchange.com/rooms/33996/discussion-between-kasperd-and-paul-vixie). – kasperd Jan 08 '16 at 00:29