9

I recently had a problem where a remote service requesting the IP address for my server (with a hosted DNS provider) was responding with:

DNS problem: SERVFAIL looking up A for mysql.xavamedia.nl

(Update: the remote service mentioned here is Let's Encrypt; I filed a bug against their issue tracker, which led me on this path.)

In testing on my local network, I was able to see that I sometimes get an empty DNS response from the hosted DNS server. Apparently this is intermittent because it happens only when the DNS records are not in the cache, and it's only a problem when the DNS server is really busy.

Here's a Wireshark description of an empty response message:

Wireshark screenshot of empty response

Of course, since most DNS queries and responses are sent over UDP, a local resolver will just wait a while for the response, and then give up. What I am now left wondering is, are there guidelines for DNS response times? My DNS hoster sort of shrugged and said that my local resolver sent the empty response too soon. I've never had this problem before, but I'm surprised at the failure mode -- an empty DNS response without an error code.

Does someone know of some guidelines on how this is supposed to work, and when/how I can prove my DNS hosting is doing something wrong?

djc
  • 344
  • 3
  • 11
  • 1
    Can you please update the question to provide more information about the empty response? That can mean a number of things depending on the flags set and what the authority section looks like. We'd either need to see the output of `dig`/`nslookup` or a Wireshark dissection. (`tcpdump` won't be good enough) If you're using `nslookup`, execute `set debug` first. – Andrew B Feb 12 '16 at 19:39
  • I have a pcap, but not sure how I can best show it here? – djc Feb 12 '16 at 21:43
  • 1
    Open it in Wireshark, click on the packet, then expand the information for the DNS protocol. Expand the subcategories as well, then post a screenshot in your question using the insert image button. You can crop the screenshot to the DNS protocol stuff. – Andrew B Feb 12 '16 at 21:45

2 Answers2

6

The empty response that you're looking at is a synthetic state known as NODATA. NODATA and NXDOMAIN both indicate that the name does not exist, but NXDOMAIN applies to all names beneath the indicated record as well. NODATA is advising that either that name is associated with records of an unrequested type, or that there are other records that are beneath what you're requesting. (i.e. example.test.xavamedia.nl.)

Your takeaway from NODATA and NXDOMAIN is effectively the same in this context: the record of the requested name and type did not exist. An authoritative nameserver was reached for the requested domain, and it replied back stating that a record of that name and type did not exist. This is not a communication error. The authoritative server said that it didn't have the data. More than likely the server you were talking to had already processed this request and negative cached the absence of that record within the last four hours. (14400 seconds is the negative cache interval defined by the SOA record for xavamedia.nl.)

Neither NXDOMAIN or NODATA by themselves will result in a timeout when encountered in this instance, but your resolver library will probably move on from here to appending the DNS search suffix, which may in turn trigger a timeout for the authoritative DNS servers of the search domain.

It should be noted that none of this explains why you encountered a SERVFAIL response when looking up mysql.xavamedia.nl.. That points at a problem with the recursive server getting the answer from the authoritative servers. Either the authoritative server replied with SERVFAIL, the recursive server could not reach any of the authoritative servers, or the recursive server determined that the data returned was invalid. None of this can be proven with the information that you've provided.

Andrew B
  • 31,858
  • 12
  • 90
  • 128
  • Thanks for your detailed answer! Some things are still unclear: if the NODATA response is initiated by the authoritative server somehow, my DNS hosting has a problem, because these domains had been existing for a long time (by virtue of a wild card A record). So then my other question is, how might I prove whether the authoritative server did something wrong? – djc Feb 13 '16 at 08:28
  • The `NODATA` in your packet capture is the proof. The pertinent question is *"why did an authoritative server reply and say that no such record existed?"*. Unfortunately it's a hard issue to press unless you can prove it with direct lookups against the authoritative servers (removing the ability to shrug and blame the operators of the recursive servers), keeping in mind that only one of the three may be occasionally misbehaving. – Andrew B Feb 13 '16 at 08:48
  • `NODATA` means the name **does** exist, but it doesn't have a record of the type requested. E.g. you ask for `A` record, but it only has `MX` record. It could also happen if the name is for an intermediate node in the DNS hierarchy and has no records of its own. – Barmar Feb 16 '16 at 22:35
  • @Barmar Yes, what is being said here is that the authoritative server is reporting an absence of that record name+type pair, and djc is expressing confusion over this due to a wildcard record that has been present for some time. – Andrew B Feb 16 '16 at 22:36
  • My comment is addressed to your first point "NODATA and NXDOMAIN both indicate that the name does not exist". `NXDOMAIN` means the name doesn't exist, `NODATA` means the name does exist but the requested record type doesn't. – Barmar Feb 16 '16 at 22:38
  • @Barmar It's a bit of a nit because the first paragraph already explains that and puts it into context, but I've tweaked the wording to avoid the confusion of a too literal interpretation. – Andrew B Feb 16 '16 at 22:40
2

I don't know of any specific guidelines except those defined in section "6.1.3.3 Efficient Resource Usage" of RFC 1123 http://tools.ietf.org/rfcmarkup?rfc=1123#page-77

There a timeout value of "no less than 5 seconds" is specified. The RFC also states that temporary failures should be cached. This is to prevent excessive amount of DNS requests if clients violate section 2.2 of the RFC. That section states that clients should wait a "reasonable" amount of time between retries in case of soft failures.

There is also a Stackoverflow thread about this topic, but it doesn't contain much more information except for some real-world observations. https://stackoverflow.com/questions/3036054/ideal-timeout-period-for-dns-lookup

That's all I can say about this topic. If someone else has more to add, I'd be interested as well.