12

I enabled DNSSEC on my primary domain about a week ago. It's not a major website or anything -- just my personal domain name that I use for email and the like (TLD: com; DNSSEC algorithm 13; authoritative DNS provider: Cloudflare).

Over the last 24 hours, the domain has received 15,605 queries. In response, it has dished out 15,601 NOERROR response codes and a total of 4 NXDOMAIN response codes.

How are NXDOMAIN responses still possible? What could be generating them?

Personally I cannot trigger one no matter what query I attempt, and my understanding is that DNSSEC should, at least in theory, eliminate this response code entirely.

Am I incorrect?

Patrick Mevzek
  • 9,273
  • 7
  • 29
  • 42
Collin
  • 141
  • 9

2 Answers2

24

TL;DR

The lack of NXDOMAIN responses for Cloudflare hosted domains is a consequence of their specific DNSSEC implementation (using so called "black lies") and not a design of the DNSSEC protocol itself; hence observations will be different with other providers doing DNSSEC.

Initial questions

How are NXDOMAIN responses still possible?

Why wouldn't they be possible? DNSSEC or not, if you query for a name that doesn't exist, you get NXDOMAIN reply back.

my understanding is that DNSSEC should, at least in theory, eliminate this response code entirely

Why? And from where do you get that feeling?

Live example with a DNSSEC enabled domain

icann.org is DNSSEC enabled right now. If I query for a name that does not exist under it, I get a NXDOMAIN:

$ dig NS icann.org +short
b.icann-servers.net.
c.icann-servers.net.
ns.icann.org.
a.icann-servers.net.

$ dig @a.icann-servers.net does-not-exist-foobar.icann.org

; <<>> DiG 9.18.4 <<>> @a.icann-servers.net does-not-exist-foobar.icann.org
; (1 server found)
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38891
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 98228e9e0c5ef4e6
;; QUESTION SECTION:
;does-not-exist-foobar.icann.org. IN A

;; QUERY SIZE: 72

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38891
                                       ^^^^^^^^

DNSSEC is an extension of DNS in the sense that for a non validating resolver, answers are not different, even if the domain is DNSSEC enabled. So all return codes work in the same way.

Explanations about NSEC/NSEC3/RRSIG

What it does change, that you can see if adding +dnssec to dig (which doesn't mean "activate DNSSEC" but means "display DNSSEC related records - those are RRSIG, NSEC and NSEC3 - as they are normally not displayed), is that the AUTHORITY section in case of the NXDOMAIN gives further explanations with NSEC or NSEC3 records:

;; AUTHORITY SECTION:
icann.org.      1h IN SOA sns.dns.icann.org. noc.dns.icann.org. (
                2022070670 ; serial
                10800      ; refresh (3 hours)
                3600       ; retry (1 hour)
                1209600    ; expire (2 weeks)
                3600       ; minimum (1 hour)
                )
j93jujiqg7ge3616mub4r5bei85poet9.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
                J9HKD4G746GMUTGGUV6AM37GSJAD6NRR
                A NS SOA MX TXT AAAA RRSIG DNSKEY NSEC3PARAM )
tdr1at6eafsrigdrlj6atpb2dge2aof0.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
                TE4FB4PVMU1GQNPG9P01ID48U1BTN2G4
                A RRSIG )
lsrp57e1pe333jadkpdgh3v1i8vs80rd.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
                LT4I8S7OTQ7ACOSF73M7LHCIC7C1J17I
                A RRSIG )
icann.org.      1h IN RRSIG SOA 7 2 3600 (
                20220804192816 20220714153322 3425 icann.org.
                NMcD1TeozFyCRDlmqFMoM/V/VmWQUmRNIH0/igPzdj2S
                hemnQHeXDOudBxsUgE/DpSV4KHsgqLQKdgbQruqCO7Dt
                iLK1bCLBZs38LdOadyJs3jWjjuJ9+mEnLXTsqMeeMllw
                YFL6pPyo1TfChZm05KJ+DJNw0SHJw3MWBRtV4iI= )
j93jujiqg7ge3616mub4r5bei85poet9.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
                20220724054620 20220703065347 58935 icann.org.
                gmo0VP8k9Li9lutMA3uTrMfABMmFBN23GonYo72Twk9l
                wGYqFvlU/naN0KKtEd3g+zOiYB0Jb1J1270Dveew/vYa
                hTmeMYrwUbEt9gZYCvi74zm6Ss0cQ8uxJ5bZw70nZ7oU
                LAtWYVGJMgupfjtne6021AJoLNB1CaMhFwo+TPo= )
tdr1at6eafsrigdrlj6atpb2dge2aof0.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
                20220724101659 20220703045347 58935 icann.org.
                hGsUeE4di9yFuDMq8ly1YQEs1OvOFAHVctOQrs6Poixl
                STqcErjC20V2CI0YApX6SbiI8AP/dqMjBm3fZh91mtDf
                aSrZypfScBEO/KVdlqbW9G+y8VR65ryjTAA7TZIzqN+z
                7YyTAESWb8E7T4NCtQPPwYpjl/S9krbEGSiKfaw= )
lsrp57e1pe333jadkpdgh3v1i8vs80rd.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
                20220724151521 20220703105347 58935 icann.org.
                P9qwkFoGkCd+m3aDQkzF/g7SJfn/byt6d4zugLzRKuH1
                rLmYZdlJNOC+fI1saCZySarsP9KavFSBzw6S9GMLobQJ
                hTVpu1ZUkEP9BMOZo28eeRLrGvAbrVb7aB9CWl9TgUMc
                2+s4nG87HTvD2TCJHmyPC1mIbBLYmJoa7iGLGiI= )

NSEC3 is more complicated (less human friendly) as it uses hashes of domain names. But what all the above means in summary is that the name I requested does not exists because it lands between two names that exist (but can't be seen immediately, because hashed), and that no wildcard exists (which is why you have three NSEC3 records). The RRSIG records sign the NSEC3 ones, so all the above allows a resolving nameserver to indeed double check the NXDOMAIN is legit and not introduced by some on-path attacker, because all the NSEC3 and RRSIG records match the expectations.

Simpler example with NSEC case

Let us take a domain DNSSEC enabled with NSEC instead of NSEC3: the root itself :-)

If I do dig @g.root-servers.net foobar. +dnssec right now I get NXDOMAIN, again for the same reasons as above and that TLD does not exist (yet?)

But let us look in the results and especially one NSEC record:

foo.            1d IN NSEC food. NS DS RRSIG NSEC

This is an affirmative signed (there is a corresponding RRSIG record) assertion from the nameserver telling me that foobar does not exist in zone, because both foo and food exists, but nothing in between. And per DNSSEC ordering rules foobar would sort between foo and food and hence the above proves that foobar does not exist. Incidentally it proves that a lots of other names do not exist, and some resolver could cache this NSEC and derives answer without requesting anything.

Why? Because if I know that nothing exists between foo and food I immediately know that fooa doesn't exist, nor fooa42 or foobie or fooccc or similar…

Back to CloudFlare specific case

CloudFlare implements "DNSSEC White Lies" AND "Black Lies", see https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/ and https://blog.cloudflare.com/black-lies/ for their own various reasons (in part because they do dynamic signatures generation, they generate the RRSIG records at the moment the request come, and not in advance; this is a compromise, both cases have advantages and drawbacks).

What does that mean? They fake existence of ALL names, hence there is almost never an NXDOMAIN.

Let us see one example:

$ dig dwewgewfgewfee-32cewcewcew-2284.cloudflare.com @ns3.cloudflare.com. +dnssec

; <<>> DiG 9.18.4 <<>> dwewgewfgewfee-32cewcewcew-2284.cloudflare.com @ns3.cloudflare.com. +dnssec
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9469
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: fd8d36048320c848
;; QUESTION SECTION:
;dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.    IN A

;; QUERY SIZE: 87

;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9469
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.    IN A

;; AUTHORITY SECTION:
cloudflare.com.     5m IN SOA ns3.cloudflare.com. dns.cloudflare.com. (
                2282614227 ; serial
                10000      ; refresh (2 hours 46 minutes 40 seconds)
                2400       ; retry (40 minutes)
                604800     ; expire (1 week)
                300        ; minimum (5 minutes)
                )
dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. 5m IN NSEC \000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. RRSIG NSEC

(I removed the RRSIG records).

So what does that tell? First: NOERROR and not NXDOMAIN instead, so the resolver tells me the name I query for exists (but maybe not for the type I asked, A which is default dig type, and this is valid and known as NODATA which means NOERROR but no content either, no ANSWER section, as it happens when the name exists, but not that type).

The AUTHORITY part and specifically that NSEC record tells me that there are no names between dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. (the name I asked for in fact, so not the previous one, just mine), and \000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. which may look like a strange name but 1) is totally valid (it is not a valid hostname because \000 means byte value 0 which has to be encoded as \000 for DNS operations, but still a valid domain names, as domain names in the DNS specifications can be any arbitrary bytes) and 2) is, with DNSSEC ordering algorithm, the name "right after" my name (so basically the range of the two names do not include any other name in between).

The RRSIG NSEC part at the end of the NSEC record means that there are no record type A on the name but there are record types RRSIG and NSEC, which makes sense because I am exactly looking at the NSEC record of that name, and as we are in DNSSEC land, of course there is an RRSIG.

So this is called a "lie" because the nameserver is replying to you: this name exists, but not this record type. And no matter which record type you ask for (except NSEC and RRSIG) the nameserver will tell you: "this name does not exist for this record type". At the end, if it does not exist for any record type (besides NSEC and RRSIG) it is really as if it (the name) does not exist at all, but it is just presented in a different way for reasons quickly detailed below.

I recommend reading the second link but the gist of it explaining things is (I am skipping the whole points regarding NSEC/NSEC3 and wildcard records, with all the details on "closest encounter" and so on, but those are important if going deep on NSEC stuff):

NSEC3 was a “close but no cigar” solution to the problem. While it’s true that it made zone walking harder, it did not make it impossible.

(which is why they don't use NSEC3 and keep NSEC but then still need another solution to avoid walking the zone and hence enumerating all names)

There are two problems with negative answers:

The first is that the authoritative server needs to return the >previous and next name. As you’ll see, this is computationally >expensive for CloudFlare, and as you’ve already seen, it can leak >information about a zone.

The second is that negative answers require two NSEC records and >their two subsequent signatures (or three NSEC3 records and three >NSEC3 signatures) to authenticate the nonexistence of one name. >This means that answers are bigger than they need to be.

So that part above is the basic explanation of why wanting to avoid using NXDOMAIN and "emulating" it with success (NOERROR) but at the same time responding negatively to any query (name+type for any type requested).

The other point, again very specific to CloudFlare, is that it is difficult in their case to compute the "next" name (because NSEC is really giving a "range" of two names, as a link between two things existing), so instead of using the real next name as existing in their storage, they compute the mimimal "next" one following the DNSSEC algorithm, hence the strange name above with \000. as prefix, a name that obviously don't exist either, so if you query for it you will get again the same kind of reply, but this time with an NSEC record listing on right \001. or \000.\000. in fact, etc. and so on...

Further down:

For an NXDOMAIN, we always return \000.(the missing name) as the next name, and because we return an NSEC directly on the missing name, we do not have to return an additional NSEC for the wildcard. This way we only have to return SOA, SOA RRSIG, NSEC and NSEC RRSIG, and we do not need to search the database or precompute dynamic answers.

The goal reached with all that is smaller replies. And this is important in DNS land, because of various problems around fragmentation. From their example they go from 1096 bytes to just 357 bytes with black lies, cutting almost 2/3, quite an accomplishment!

All the above may become a "standard" in the future, for those wanting to do the same, as they wrote a document that can become maybe an IETF RFC one day: https://datatracker.ietf.org/doc/html/draft-valsorda-dnsop-black-lies

Do note it has consequences though:

  • NXDOMAIN is an important signal: various other stuff is built on top of that, see RFC 8020 "NXDOMAIN: There Really Is Nothing Underneath" and RFC 8198 "Aggressive Use of DNSSEC-Validated Cache", so not having this signal anymore can have side effects (and it wouldn't be a good idea to change other recursive resolvers to try finding out if the authoritative side is using black lies and then consider them, that would be brittle; that point is exactly discussed in the draft above)
  • it also impacts ENT or "Empty Non Terminal", where a name has to exist in the DNS tree not because it has any type attached to it, but just because there are names below it; see https://www.ietf.org/archive/id/draft-huque-dnsop-blacklies-ent-01.html for more details on that topic
  • no implementation is free of bugs, and DNSSEC is complicated, and tricks around DNSSEC are even more so complicated; now I am not sure anymore and I can't find references, but I think there was a bug in the beginning, and the returned types (in the NSEC bitmap) were not computed correctly, hence breaking some stuff. Will try to update this if I do find back what I am thinking I have seen, but I could be delusional (easy to be with DNSSEC...); in fact I think it is related to the observation that all their initial examples did put far more types in NSEC last section, where now they put only RRSIG and NSEC. See https://indico.dns-oarc.net/event/40/contributions/899/attachments/862/1563/nsec-bitmaps.pdf for live examples of errors in NSEC bitmaps and their consequences

Ah no in fact I remembered right, a bug in this NSEC bitmap is right at the source of a recent Slack outage :-), but it was not on Cloudflare fault, it was AWS Route53 where the problem was. See https://www.potaroo.net/ispcol/2021-12/oarc36.pdf for those details, but in short:

Now you can lie with NSEC records, [..] But what a server should never do is return an empty bit-vector in the NSEC record. Because some resolvers, including Google’s Public DNS service interpret an empty NSEC bit-vector as claiming that there are no resource records at all for that domain name. This is not a Google DNS bug. It's a perfectly legitimate interpretation of the DNSSEC specification. The problem that Slack encountered was that the Route 53 server was returning a NSEC response with an almost empty RR-type bit-vector when the wildcard entry was used to form the response and the query type was not defined for the wildcard resource. This was a bug in the Route 53 implementation.

So, in short, lying does have bad consequences some times :-) (and/or: DNSSEC is complicated, and wildcards in the DNS do create all sorts of complications too; in fact DNSSEC + wildcards + CNAME records are like 3 sure signs of apocalypse somehow...).

This is only ONE way to do things, the consequences (almost no NXDOMAIN responses) are absolutely not a consequence of the protocol (DNSSEC) but just of their implementation. So don't take this as granted at all, it will be different with other providers. But does it really change anything for you as owner of the zone or users of it? Not so much. Why were you so worried about NXDOMAIN responses :-) ?

PS:

briantist
  • 2,535
  • 18
  • 34
Patrick Mevzek
  • 9,273
  • 7
  • 29
  • 42
  • Hi Patrick. Thank you so much for the detailed response. Okay, this is strange. Now I am even more confused, as I do receive NXDOMAIN responses for ICANN's DNSSEC-enabled domain, but I cannot trigger NXDOMAIN for my own domains -- only NOERROR (querying with 1.0.0.1 in both cases). I would think that this has to be related to my provider's implementation of DNSSEC (Cloudflare). Are you able to generate NXDOMAIN responses for cloudflare.com? Both cloudflare.com and my own domains have the same DNSSEC configuration (except for unique data like the keys and hash, of course). – Collin Jul 14 '22 at 20:31
  • 2
    "I would think that this has to be related to my provider's implementation of DNSSEC (Cloudflare). " Yes, it is. I have added a full section for that now in my answer, plus a TL;DR. Please make sure to really remember that what you see is SPECIFIC to one provider, it is not a core property of DNSSEC, where `NXDOMAIN` exists and is totally "fine". – Patrick Mevzek Jul 14 '22 at 22:05
  • 1
    Thank you so much, Patrick. I really appreciate it! – Collin Jul 15 '22 at 03:04
  • Where's the NXT record? – joshudson Jul 17 '22 at 15:28
  • @joshudson https://www.rfc-editor.org/rfc/rfc3755.html §4.1 "Type 30 (NXT) should be marked as Obsolete." That's its fate, mostly because when used: "This results in unsecure delegations being invisible to 2535-aware resolvers and violates the basic architectural principle that DNSSEC must do no harm -- the signing of zones must not prevent the resolution of unsecured delegations. " – Patrick Mevzek Jul 18 '22 at 03:29
5

In Cloudflare's particular case, their authoritative servers still generate NXDOMAIN responses for queries that do not set the DO bit (do not request DNSSEC records).

There may or may not be other cases where they return NXDOMAIN as well.

$ dig +norecurse nxdomain.cloudflare.com @ns6.cloudflare.com

; <<>> DiG 9.19.2-1+ubuntu20.04.1+isc+1-Ubuntu <<>> +norecurse nxdomain.cloudflare.com @ns6.cloudflare.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 49384
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 1, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 1232
;; QUESTION SECTION:
;nxdomain.cloudflare.com.   IN  A

;; AUTHORITY SECTION:
cloudflare.com.     300 IN  SOA ns3.cloudflare.com. dns.cloudflare.com. 2282614227 10000 2400 604800 300

;; Query time: 3 msec
;; SERVER: 2400:cb00:2049:1::a29f:506#53(ns6.cloudflare.com) (UDP)
;; WHEN: Fri Jul 15 04:33:54 UTC 2022
;; MSG SIZE  rcvd: 96

$ dig +dnssec +norecurse nxdomain.cloudflare.com @ns6.cloudflare.com

; <<>> DiG 9.19.2-1+ubuntu20.04.1+isc+1-Ubuntu <<>> +dnssec +norecurse nxdomain.cloudflare.com @ns6.cloudflare.com
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42126
;; flags: qr aa; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;nxdomain.cloudflare.com.   IN  A

;; AUTHORITY SECTION:
cloudflare.com.     300 IN  SOA ns3.cloudflare.com. dns.cloudflare.com. 2282614227 10000 2400 604800 300
nxdomain.cloudflare.com. 300    IN  NSEC    \000.nxdomain.cloudflare.com. RRSIG NSEC
cloudflare.com.     300 IN  RRSIG   SOA 13 2 300 20220716053359 20220714033359 34505 cloudflare.com. VI/f0QNfsim677htUOQ4yZxFK41C2jzhXsF+T5/oFiQtwPIm3m3gLr3Y WB8NUpsje9v+ARFcQUPqM6SKGJ7CJQ==
nxdomain.cloudflare.com. 300    IN  RRSIG   NSEC 13 3 300 20220716053359 20220714033359 34505 cloudflare.com. 9BuyhrKElvzvzv5w4eOJRikcX3eFUOr9z6IYOgWjwez2tfVaR+P8x9kN uSYporOFom3KxhS3krcq9zbDO7kxdw==

;; Query time: 3 msec
;; SERVER: 2400:cb00:2049:1::a29f:506#53(ns6.cloudflare.com) (UDP)
;; WHEN: Fri Jul 15 04:33:59 UTC 2022
;; MSG SIZE  rcvd: 363

(Setting DO is no guarantee that a resolver will validate the response. For example, a non-validating resolver may request DNSSEC records so that it can pass them through to validating clients, or a deployment in testing may log errors but accept bogus responses.)

Matt Nordhoff
  • 319
  • 2
  • 5
  • Ah, so this is what triggers it! Or at least it's one thing that does. Thank you! Was unreasonably curious about this. – Collin Jul 16 '22 at 02:18