69

If a DNS server looks up a record and it's missing, it will often "negatively cache" the fact that this record is missing, and not try to look it up again for a while. I don't see anything in the RFC about the TTL on negative caching should be, so I'm guessing it's somewhat arbitrary. In the real world, how long do these negative records stick around for?

Leopd
  • 1,617
  • 4
  • 21
  • 30
  • 9
    [RFC 2308, Negative Caching of DNS Queries](http://tools.ietf.org/html/rfc2308) explains how this is supposed to work. (Related SO question: [Does a caching nameserver usually cache the negative DNS response SERVFAIL?](http://stackoverflow.com/questions/73433/does-a-caching-nameserver-usually-cache-the-negative-dns-response-servfail)) – Skyhawk Sep 12 '12 at 16:06

3 Answers3

82

The TTL for negative caching is not arbitrary. It is taken from the SOA record at the top of the zone to which the requested record would have belonged, had it existed. For example:

example.org.    IN      SOA     master-ns1.example.org. Hostmaster.example.org. (
            2012091201 43200 1800 1209600 86400 )

The last value in the SOA record ("86400") is the amount of time clients are asked to cache negative results under example.org..

If a client requests doesnotexist.example.org., it will cache the result for 86400 seconds.

Celada
  • 6,060
  • 1
  • 20
  • 17
  • 2
    @MarcusAdams ...and a client won't negative-cache any records on SERVFAIL. The TTL in the SOA record is, in fact, used for negative caching. That's why the SOA record is produced in NXDOMAIN answers. – Celada Jul 17 '16 at 11:45
  • 4
    @MarcusAdams Correct. If you get a SERVFAIL then you don't get a SOA nor a TTL. There is no answer for you to negative-cache. If instead you get a NXDOMAIN than you *do* get a SOA, with a TTL. You will negative-cache that response for the duration of the TTL. – Celada Jul 20 '16 at 20:37
  • Beartrap for DNS RBL users: since RBL answers tend to be minimal (and the DNS server implementation possibly non-conforming) you might not get an SOA with the NXDOMAIN answer. This may mean your DNS cache doesn't cache NXDOMAIN (i.e. the non-spammers) at all :-/ – mr.spuratic Nov 22 '17 at 16:26
  • 9
    It's actually `MIN(SOA TTL, SOA.MINIMUM)`, not simply `SOA.MINIMUM`. (See https://tools.ietf.org/html/rfc2308#section-5) – Håkan Lindqvist Aug 20 '19 at 06:27
18

This depends on your exact definition of a "negative query", but in either case, this is documented in rfc2308 «Negative Caching of DNS Queries (DNS NCACHE)»:


NXDOMAIN

  • If the resolution is successful, and results in NXDOMAIN, the response will come with a SOA record, which would contain the NXDOMAIN TTL (traditionally known as the MINIMUM field). rfc2308#section-4

SERVFAIL

  • If the resolution is not successful, and results in a timeout ( SERVFAIL), then it may as well not be cached at all, and in all circumstances MUST NOT be cached for longer than 5 minutes. rfc2308#section-7.1

    Note that in practice, caching such results for the full allowable 5 minutes is a great way to diminish the experience of a client should their cache server occasionally suffer brief connectivity issues (and effectively make it easily vulnerable to a Denial-of-Service amplification, where a few seconds of downtime would result in the certain parts of the DNS being down for the five full minutes).

    Prior to BIND 9.9.6-S1 (released in 2014), apparently, SERVFAIL was not cached at all. a878301 (2014-09-04)

    E.g., at the time of your question and in all versions of BIND released prior to 2014, the BIND recursive resolver DID NOT cache SERVFAIL at all, if the above commit and the documentation about the first introduction in 9.9.6-S1 is to be believed.

    In the latest BIND, the default servfail-ttl is 1s, and the setting is hardcoded to a ceiling of 30s (in place of the RFC-mandated ceiling of 300s). 90174e6 (2015-10-17)

    Furthermore, the following are some noteworthy quotes on the matter:

    • https://kb.isc.org/article/AA-01178/ (2014/2016-01-07)

      The outcome of caching SERVFAIL responses has included some situations where it was seen to be detrimental to the client experience, particularly when the causes of the SERVFAIL being presented to the client were transient and from a scenario where an immediate retry of the query would be a more appropriate action.

    • http://cr.yp.to/djbdns/third-party.html (2003-01-11)

      The second tactic is to claim that widespread DNS clients will do something Particularly Evil when they are unable to reach all DNS servers. The problem with this argument is that the claim is false. Any such client is clearly buggy, and will be unable to survive in the marketplace: consider what happens if the client's routers briefly go down, or if the client's network is temporarily flooded.


In summary, an NXDOMAIN response would be cached as specified in the SOA of the applicable zone, whereas SERVFAIL is unlikely to be cached, or, if cached, it'll be at most a double-digit number of seconds.

cnst
  • 12,948
  • 7
  • 51
  • 75
14

There is an RFC dedicated to this topic: RFC 2308 - Negative Caching of DNS Queries (DNS NCACHE).

The relevant section to read is 5 - Caching Negative Answers which states:

Like normal answers negative answers have a time to live (TTL). As there is no record in the answer section to which this TTL can be applied, the TTL must be carried by another method. This is done by including the SOA record from the zone in the authority section of the reply. When the authoritative server creates this record its TTL is taken from the minimum of the SOA.MINIMUM field and SOA's TTL. This TTL decrements in a similar manner to a normal cached answer and upon reaching zero (0) indicates the cached negative answer MUST NOT be used again.

Firstly lets identify the SOA.MINIMUM and SOA TTL described in the RFC. The TTL is the number before the the record type IN (900 seconds in the example below). While the minimum is last field in the record (86400 seconds in the example below).

$ dig serverfault.com soa @ns-1135.awsdns-13.org +noall +answer +multiline

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> serverfault.com soa @ns-1135.awsdns-13.org +noall +answer +multiline
;; global options: +cmd
serverfault.com.    900 IN SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. (
                1          ; serial
                7200       ; refresh (2 hours)
                900        ; retry (15 minutes)
                1209600    ; expire (2 weeks)
                86400      ; minimum (1 day)
                )

Now lets look at some examples, the serverfault.com zone is illustrative as it has authoritative servers from two different providers that are configured differently.

Lets find the authoritative nameservers for the serverfault.com zone:

$ host -t ns serverfault.com
serverfault.com name server ns-860.awsdns-43.net.
serverfault.com name server ns-1135.awsdns-13.org.
serverfault.com name server ns-cloud-c1.googledomains.com.
serverfault.com name server ns-cloud-c2.googledomains.com.

Then check the SOA record using an aws nameserver:

$ dig serverfault.com soa @ns-1135.awsdns-13.org | grep 'ANSWER SECTION' -A 1
;; ANSWER SECTION:
serverfault.com.    900 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

From this we can see that the TTL of the SOA record is 900 seconds while the negative TTL value is 86400 seconds. The SOA TTL value of 900 is lower so we expect this value to be used.

Now if we query an authoritative server for a non existent domain we should get a response without an answer and with a SOA record in the authority section:

$ dig nxdomain.serverfault.com @ns-1135.awsdns-13.org

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> nxdomain.serverfault.com @ns-1135.awsdns-13.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 51948
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
;; QUESTION SECTION:
;nxdomain.serverfault.com.  IN  A

;; AUTHORITY SECTION:
serverfault.com.    900 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

;; Query time: 125 msec
;; SERVER: 205.251.196.111#53(205.251.196.111)
;; WHEN: Tue Aug 20 15:49:47 NZST 2019
;; MSG SIZE  rcvd: 135

When a recursive (caching) resolver receives this answer it will parse the SOA record in the AUTHORITY SECTION and use the TTL of this record to determine how long it should cache the negative result (in this case 900 seconds).

Now lets follow the same procedure with a google nameserver:

$ dig serverfault.com soa @ns-cloud-c2.googledomains.com | grep 'ANSWER SECTION' -A 1
;; ANSWER SECTION:
serverfault.com.    21600   IN  SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300

You can see that the google nameservers have different values for both the SOA TTL and the Negative TTL values. In this case the negative TTL of 300 is lower than the SOA TTL of 21600. Therefore the google server should use the lower value in the AUTHORITY SECTION SOA record when returning an NXDOMAIN response:

$ dig nxdomain.serverfault.com @ns-cloud-c2.googledomains.com

; <<>> DiG 9.11.3-1ubuntu1.8-Ubuntu <<>> nxdomain.serverfault.com @ns-cloud-c2.googledomains.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 25920
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 512
;; QUESTION SECTION:
;nxdomain.serverfault.com.  IN  A

;; AUTHORITY SECTION:
serverfault.com.    300 IN  SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300

;; Query time: 130 msec
;; SERVER: 216.239.34.108#53(216.239.34.108)
;; WHEN: Tue Aug 20 16:05:24 NZST 2019
;; MSG SIZE  rcvd: 143

As expected the TTL of the SOA record in the NXDOMAIN response is 300 seconds.

The example above also demonstrates how easy it is to get different answers to the same query. The answer that an individual caching resolver ends up using is down to which authoritative namserver was queried.

In my testing I have also observed that some recursive (caching) resolvers do not return an AUTHORITY SECTION with a SOA record with a decrementing TTL for subsequent requests whereas others do.

For example the cloudflare resolver does (note the decrementing TTL value):

$ dig nxdomain.serverfault.com @1.1.1.1 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com.    674 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400
$ dig nxdomain.serverfault.com @1.1.1.1 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com.    668 IN  SOA ns-1135.awsdns-13.org. awsdns-hostmaster.amazon.com. 1 7200 900 1209600 86400

While the default resolver in an AWS VPC will respond with an authority section only on the first request:

$ dig nxdomain.serverfault.com @169.254.169.253 | grep 'AUTHORITY SECTION' -A 1
;; AUTHORITY SECTION:
serverfault.com.    300 IN  SOA ns-cloud-c1.googledomains.com. cloud-dns-hostmaster.google.com. 1 21600 3600 259200 300
$ dig nxdomain.serverfault.com @169.254.169.253 | grep 'AUTHORITY SECTION' -A 1 | wc -l
0

Note: This answer addresses the behavior of NXDOMAIN answers.

Glossary:

htaccess
  • 406
  • 5
  • 5