1

I'm here to ask informations regarding a problem I resolved, but without understanding how.

Today, my home DNS server (9.10.3-P4-Raspbian) serving the different machines at home, suddenly stopped working.

I could, from this server run the following without problem:

ping 8.8.8.8

and even:

telnet 8.8.8.8 53

However, a simple:

host google.fr

led to the answer:

Trying "google.fr"
;; connection timed out; no servers could be reached

I tried restarting Bind to no avail.

Setting manually an external resolver in /etc/resolv.conf made the resolution working again, so it was clearly a bind problem.

Here comes the weird part. Here is my

acl goodclients {
    192.0.0.0/24;
    localhost;
    localnets;
};

options {
        directory "/var/cache/bind";
        dnssec-enable yes;
        dnssec-lookaside auto;
        dnssec-validation auto;

        auth-nxdomain no;    # conform to RFC1035                                                                                                                                            
        listen-on-v6 { any; };

        recursion yes;
        allow-query { goodclients; };
};

I found a few errors in the dnssec-related logs:

validating dlv.isc.org/SOA: verify failed due to bad signature (keyid=64263): RRSIG has expired

So I tried disabling all DNSSEC-related parameters, like so:

# dnssec-enable yes;
# dnssec-lookaside auto;
# dnssec-validation auto;

Restarting Bind afterwards led to a correct resolution. So I thought it was coming from these parameters. I therefore decided to uncomment them one by one to understand. After each time I uncommented a parameter, I restarted Bind.

It kept woking after each restart, event after I have uncommented everything.

So I'm now in the weird situation where everything is working as before, with exactly the same configuration. It is as if the simple fact to to have changed, then rolled back the configuration solved the problem.

I'd like to understand what happened, so I'm asking to the wise around here: Did anyone ever met such a situation?

Thanks in advance.

David Verdin
  • 113
  • 4
  • are you sure it was dnssec related (despite the error) and not simply that the bind service was shutdown for some reason ? as the error you are having on the client clearly says that no servers are available, as if your DNS server was simply not responding on port 53 at all (i would have expected a different DNS error in the case of dnssec errors) – olivierg Mar 25 '20 at 21:01
  • Hi @olivierg! Nope, I'm sure the bind daemon was running. I resterted it several times and each time ensured it was running (using the `systemctl status bind9` command). I also would have expected some errors but it Maybe the problem quoted by @MustardCat in the answer' was the cause. – David Verdin Mar 26 '20 at 14:00
  • There is a huge Reddit Thread about this. Basically a certificate that wasn't in use expired. The functionality provided was depreciated in 2017. Bind9 is the key failure point. – Rowan Hawkins Mar 26 '20 at 18:14

1 Answers1

2

This is the cause: https://lists.isc.org/pipermail/bind-users/2020-March/102822.html

Set your dnssec-lookaside to "no".

MustardCat
  • 36
  • 1
  • Also relevant regarding why `dnssec-lookaside` should be disabled whether it "works" again or not: https://www.isc.org/blogs/dlv/ (currently supported BIND versions do not have the `dnssec-lookaside auto` option) – Håkan Lindqvist Mar 25 '20 at 22:20
  • Thanks for the answer (and for the precision @håkan-lindqvist)! I set it the parameter to "no" and will keep an eye on the daemon behaviour in the coming days. It is perfectly possible that the error yesterday had an impact on the server, though I'm at lost of explanation about why. – David Verdin Mar 26 '20 at 14:05
  • As pointed by Rowan Hakins, this Reddit thred explains all the problem: https://www.reddit.com/r/sysadmin/comments/fovkv7/bind_98_dnsseclookaside_rrsig_expiration/ – David Verdin Mar 26 '20 at 21:20