TL;DR
The lack of NXDOMAIN
responses for Cloudflare hosted domains is a consequence of their specific DNSSEC implementation (using so called "black lies") and not a design of the DNSSEC protocol itself; hence observations will be different with other providers doing DNSSEC.
Initial questions
How are NXDOMAIN responses still possible?
Why wouldn't they be possible? DNSSEC or not, if you query for a name that doesn't exist, you get NXDOMAIN
reply back.
my understanding is that DNSSEC should, at least in theory, eliminate this response code entirely
Why? And from where do you get that feeling?
Live example with a DNSSEC enabled domain
icann.org
is DNSSEC enabled right now. If I query for a name that does not exist under it, I get a NXDOMAIN
:
$ dig NS icann.org +short
b.icann-servers.net.
c.icann-servers.net.
ns.icann.org.
a.icann-servers.net.
$ dig @a.icann-servers.net does-not-exist-foobar.icann.org
; <<>> DiG 9.18.4 <<>> @a.icann-servers.net does-not-exist-foobar.icann.org
; (1 server found)
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 38891
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4096
; COOKIE: 98228e9e0c5ef4e6
;; QUESTION SECTION:
;does-not-exist-foobar.icann.org. IN A
;; QUERY SIZE: 72
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NXDOMAIN, id: 38891
^^^^^^^^
DNSSEC is an extension of DNS in the sense that for a non validating resolver, answers are not different, even if the domain is DNSSEC enabled. So all return codes work in the same way.
Explanations about NSEC/NSEC3/RRSIG
What it does change, that you can see if adding +dnssec
to dig
(which doesn't mean "activate DNSSEC" but means "display DNSSEC related records - those are RRSIG
, NSEC
and NSEC3
- as they are normally not displayed), is that the AUTHORITY
section in case of the NXDOMAIN
gives further explanations with NSEC
or NSEC3
records:
;; AUTHORITY SECTION:
icann.org. 1h IN SOA sns.dns.icann.org. noc.dns.icann.org. (
2022070670 ; serial
10800 ; refresh (3 hours)
3600 ; retry (1 hour)
1209600 ; expire (2 weeks)
3600 ; minimum (1 hour)
)
j93jujiqg7ge3616mub4r5bei85poet9.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
J9HKD4G746GMUTGGUV6AM37GSJAD6NRR
A NS SOA MX TXT AAAA RRSIG DNSKEY NSEC3PARAM )
tdr1at6eafsrigdrlj6atpb2dge2aof0.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
TE4FB4PVMU1GQNPG9P01ID48U1BTN2G4
A RRSIG )
lsrp57e1pe333jadkpdgh3v1i8vs80rd.icann.org. 1h IN NSEC3 1 0 5 9714B5ACB8F7A193 (
LT4I8S7OTQ7ACOSF73M7LHCIC7C1J17I
A RRSIG )
icann.org. 1h IN RRSIG SOA 7 2 3600 (
20220804192816 20220714153322 3425 icann.org.
NMcD1TeozFyCRDlmqFMoM/V/VmWQUmRNIH0/igPzdj2S
hemnQHeXDOudBxsUgE/DpSV4KHsgqLQKdgbQruqCO7Dt
iLK1bCLBZs38LdOadyJs3jWjjuJ9+mEnLXTsqMeeMllw
YFL6pPyo1TfChZm05KJ+DJNw0SHJw3MWBRtV4iI= )
j93jujiqg7ge3616mub4r5bei85poet9.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
20220724054620 20220703065347 58935 icann.org.
gmo0VP8k9Li9lutMA3uTrMfABMmFBN23GonYo72Twk9l
wGYqFvlU/naN0KKtEd3g+zOiYB0Jb1J1270Dveew/vYa
hTmeMYrwUbEt9gZYCvi74zm6Ss0cQ8uxJ5bZw70nZ7oU
LAtWYVGJMgupfjtne6021AJoLNB1CaMhFwo+TPo= )
tdr1at6eafsrigdrlj6atpb2dge2aof0.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
20220724101659 20220703045347 58935 icann.org.
hGsUeE4di9yFuDMq8ly1YQEs1OvOFAHVctOQrs6Poixl
STqcErjC20V2CI0YApX6SbiI8AP/dqMjBm3fZh91mtDf
aSrZypfScBEO/KVdlqbW9G+y8VR65ryjTAA7TZIzqN+z
7YyTAESWb8E7T4NCtQPPwYpjl/S9krbEGSiKfaw= )
lsrp57e1pe333jadkpdgh3v1i8vs80rd.icann.org. 1h IN RRSIG NSEC3 7 3 3600 (
20220724151521 20220703105347 58935 icann.org.
P9qwkFoGkCd+m3aDQkzF/g7SJfn/byt6d4zugLzRKuH1
rLmYZdlJNOC+fI1saCZySarsP9KavFSBzw6S9GMLobQJ
hTVpu1ZUkEP9BMOZo28eeRLrGvAbrVb7aB9CWl9TgUMc
2+s4nG87HTvD2TCJHmyPC1mIbBLYmJoa7iGLGiI= )
NSEC3
is more complicated (less human friendly) as it uses hashes of domain names. But what all the above means in summary is that the name I requested does not exists because it lands between two names that exist (but can't be seen immediately, because hashed), and that no wildcard exists (which is why you have three NSEC3
records). The RRSIG
records sign the NSEC3
ones, so all the above allows a resolving nameserver to indeed double check the NXDOMAIN
is legit and not introduced by some on-path attacker, because all the NSEC3
and RRSIG
records match the expectations.
Simpler example with NSEC case
Let us take a domain DNSSEC enabled with NSEC
instead of NSEC3
: the root itself :-)
If I do dig @g.root-servers.net foobar. +dnssec
right now I get NXDOMAIN
, again for the same reasons as above and that TLD does not exist (yet?)
But let us look in the results and especially one NSEC
record:
foo. 1d IN NSEC food. NS DS RRSIG NSEC
This is an affirmative signed (there is a corresponding RRSIG
record) assertion from the nameserver telling me that foobar
does not exist in zone, because both foo
and food
exists, but nothing in between. And per DNSSEC ordering rules foobar
would sort between foo
and food
and hence the above proves that foobar
does not exist. Incidentally it proves that a lots of other names do not exist, and some resolver could cache this NSEC
and derives answer without requesting anything.
Why? Because if I know that nothing exists between foo
and food
I immediately know that fooa
doesn't exist, nor fooa42
or foobie
or fooccc
or similar…
Back to CloudFlare specific case
CloudFlare implements "DNSSEC White Lies" AND "Black Lies", see https://www.cloudflare.com/dns/dnssec/dnssec-complexities-and-considerations/ and https://blog.cloudflare.com/black-lies/ for their own various reasons (in part because they do dynamic signatures generation, they generate the RRSIG
records at the moment the request come, and not in advance; this is a compromise, both cases have advantages and drawbacks).
What does that mean? They fake existence of ALL names, hence there is almost never an NXDOMAIN
.
Let us see one example:
$ dig dwewgewfgewfee-32cewcewcew-2284.cloudflare.com @ns3.cloudflare.com. +dnssec
; <<>> DiG 9.18.4 <<>> dwewgewfgewfee-32cewcewcew-2284.cloudflare.com @ns3.cloudflare.com. +dnssec
;; global options: +cmd
;; Sending:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9469
;; flags: rd ad; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 4096
; COOKIE: fd8d36048320c848
;; QUESTION SECTION:
;dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. IN A
;; QUERY SIZE: 87
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 9469
;; flags: qr aa rd; QUERY: 1, ANSWER: 0, AUTHORITY: 4, ADDITIONAL: 1
;; WARNING: recursion requested but not available
;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags: do; udp: 1232
;; QUESTION SECTION:
;dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. IN A
;; AUTHORITY SECTION:
cloudflare.com. 5m IN SOA ns3.cloudflare.com. dns.cloudflare.com. (
2282614227 ; serial
10000 ; refresh (2 hours 46 minutes 40 seconds)
2400 ; retry (40 minutes)
604800 ; expire (1 week)
300 ; minimum (5 minutes)
)
dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. 5m IN NSEC \000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com. RRSIG NSEC
(I removed the RRSIG
records).
So what does that tell? First: NOERROR
and not NXDOMAIN
instead, so the resolver tells me the name I query for exists (but maybe not for the type I asked, A
which is default dig
type, and this is valid and known as NODATA
which means NOERROR
but no content either, no ANSWER
section, as it happens when the name exists, but not that type).
The AUTHORITY
part and specifically that NSEC
record tells me that there are no names between dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.
(the name I asked for in fact, so not the previous one, just mine), and \000.dwewgewfgewfee-32cewcewcew-2284.cloudflare.com.
which may look like a strange name but 1) is totally valid (it is not a valid hostname because \000
means byte value 0 which has to be encoded as \000
for DNS operations, but still a valid domain names, as domain names in the DNS specifications can be any arbitrary bytes) and 2) is, with DNSSEC ordering algorithm, the name "right after" my name (so basically the range of the two names do not include any other name in between).
The RRSIG NSEC
part at the end of the NSEC
record means that there are no record type A
on the name but there are record types RRSIG
and NSEC
, which makes sense because I am exactly looking at the NSEC
record of that name, and as we are in DNSSEC land, of course there is an RRSIG
.
So this is called a "lie" because the nameserver is replying to you: this name exists, but not this record type. And no matter which record type you ask for (except NSEC
and RRSIG
) the nameserver will tell you: "this name does not exist for this record type".
At the end, if it does not exist for any record type (besides NSEC
and RRSIG
) it is really as if it (the name) does not exist at all, but it is just presented in a different way for reasons quickly detailed below.
I recommend reading the second link but the gist of it explaining things is (I am skipping the whole points regarding NSEC
/NSEC3
and wildcard records, with all the details on "closest encounter" and so on, but those are important if going deep on NSEC
stuff):
NSEC3 was a “close but no cigar” solution to the problem. While it’s true that it made zone walking harder, it did not make it impossible.
(which is why they don't use NSEC3
and keep NSEC
but then still need another solution to avoid walking the zone and hence enumerating all names)
There are two problems with negative answers:
The first is that the authoritative server needs to return the >previous and next name. As you’ll see, this is computationally >expensive for CloudFlare, and as you’ve already seen, it can leak >information about a zone.
The second is that negative answers require two NSEC records and >their two subsequent signatures (or three NSEC3 records and three >NSEC3 signatures) to authenticate the nonexistence of one name. >This means that answers are bigger than they need to be.
So that part above is the basic explanation of why wanting to avoid using NXDOMAIN
and "emulating" it with success (NOERROR
) but at the same time responding negatively to any query (name+type for any type requested).
The other point, again very specific to CloudFlare, is that it is difficult in their case to compute the "next" name (because NSEC
is really giving a "range" of two names, as a link between two things existing), so instead of using the real next name as existing in their storage, they compute the mimimal "next" one following the DNSSEC algorithm, hence the strange name above with \000.
as prefix, a name that obviously don't exist either, so if you query for it you will get again the same kind of reply, but this time with an NSEC
record listing on right \001.
or \000.\000.
in fact, etc. and so on...
Further down:
For an NXDOMAIN, we always return \000.(the missing name) as the next name, and because we return an NSEC directly on the missing name, we do not have to return an additional NSEC for the wildcard. This way we only have to return SOA, SOA RRSIG, NSEC and NSEC RRSIG, and we do not need to search the database or precompute dynamic answers.
The goal reached with all that is smaller replies. And this is important in DNS land, because of various problems around fragmentation. From their example they go from 1096 bytes to just 357 bytes with black lies, cutting almost 2/3, quite an accomplishment!
All the above may become a "standard" in the future, for those wanting to do the same, as they wrote a document that can become maybe an IETF RFC one day: https://datatracker.ietf.org/doc/html/draft-valsorda-dnsop-black-lies
Do note it has consequences though:
NXDOMAIN
is an important signal: various other stuff is built on top of that, see RFC 8020 "NXDOMAIN: There Really Is Nothing Underneath" and RFC 8198 "Aggressive Use of DNSSEC-Validated Cache", so not having this signal anymore can have side effects (and it wouldn't be a good idea to change other recursive resolvers to try finding out if the authoritative side is using black lies and then consider them, that would be brittle; that point is exactly discussed in the draft above)
- it also impacts ENT or "Empty Non Terminal", where a name has to exist in the DNS tree not because it has any type attached to it, but just because there are names below it; see https://www.ietf.org/archive/id/draft-huque-dnsop-blacklies-ent-01.html for more details on that topic
- no implementation is free of bugs, and DNSSEC is complicated, and tricks around DNSSEC are even more so complicated; now I am not sure anymore and I can't find references, but I think there was a bug in the beginning, and the returned types (in the
NSEC
bitmap) were not computed correctly, hence breaking some stuff. Will try to update this if I do find back what I am thinking I have seen, but I could be delusional (easy to be with DNSSEC...); in fact I think it is related to the observation that all their initial examples did put far more types in NSEC
last section, where now they put only RRSIG
and NSEC
. See https://indico.dns-oarc.net/event/40/contributions/899/attachments/862/1563/nsec-bitmaps.pdf for live examples of errors in NSEC
bitmaps and their consequences
Ah no in fact I remembered right, a bug in this NSEC
bitmap is right at the source of a recent Slack outage :-), but it was not on Cloudflare fault, it was AWS Route53 where the problem was. See https://www.potaroo.net/ispcol/2021-12/oarc36.pdf for those details, but in short:
Now you can lie with NSEC records, [..] But what a server should never do
is return an empty bit-vector in the NSEC record. Because some resolvers, including Google’s Public
DNS service interpret an empty NSEC bit-vector as claiming that there are no resource records at all for
that domain name. This is not a Google DNS bug. It's a perfectly legitimate interpretation of the
DNSSEC specification. The problem that Slack encountered was that the Route 53 server was returning
a NSEC response with an almost empty RR-type bit-vector when the wildcard entry was used to form
the response and the query type was not defined for the wildcard resource. This was a bug in the Route
53 implementation.
So, in short, lying does have bad consequences some times :-)
(and/or: DNSSEC is complicated, and wildcards in the DNS do create all sorts of complications too; in fact DNSSEC + wildcards + CNAME records are like 3 sure signs of apocalypse somehow...).
This is only ONE way to do things, the consequences (almost no NXDOMAIN responses) are absolutely not a consequence of the protocol (DNSSEC) but just of their implementation. So don't take this as granted at all, it will be different with other providers. But does it really change anything for you as owner of the zone or users of it? Not so much. Why were you so worried about NXDOMAIN
responses :-) ?
PS: