9

We have just setup a recursive DNS server using the latest stable release of Bind 9.10

We are finding that recursive DNS lookups are quite slow. Anywhere from 1 - 3 seconds. Once the lookup is in cache, DNS resolves in a matter of milliseconds as expected.

We are utilising ROOT hints for the recursive lookups and this seems to be where the slowness is coming from. If we configure a forwarder the DNS resolution comes down to a sensible recursion time of 100 - 300ms.

For the service we are setting up, I don't want to rely on forwarders, I would prefer to use root hints.

Here is the main config from our named.conf file. Any pointers to help improving the performance would be great.

options{
allow-recursion  { any; };
allow-query-cache  { any; };
allow-query  { any; };

listen-on  port 53  { any; };
listen-on-v6  port 53  { any; };

dnssec-enable yes;
dnssec-validation yes;
dnssec-lookaside auto;

zone-statistics yes;
max-cache-ttl 3600;
max-ncache-ttl 3600;

/* Path to ISC DLV key */
bindkeys-file "/etc/named.iscdlv.key";
managed-keys-directory "/var/named/dynamic";

directory  "/var/named";
dump-file  "/var/named/data/cache_dump.db";
statistics-file  "/var/named/stats/named_stats.txt";
memstatistics-file  "/var/named/stats/named_mem_stats.txt";

rate-limit {
    responses-per-second 10;
    log-only yes;
};

prefetch 5;};

zone "." {
type hint;
file "named.ca";};

include "/var/named/conf/logging.conf";
ausip
  • 396
  • 4
  • 8

2 Answers2

7

We found the issue. It was a NIC hardware offloading issue.

Running tcpdump -vvv -s 0 -l -n port 53 found a handful of [bad udp cksum 6279!] errors for each DNS query.

A little browse on Google pointed me in the right direction. As it turns out, due to our CentOS system running as VM on XenServer (similar issues reported with VMWare etc) NIC hardware offloading is enabled by default.

Running ethtool -k eth0 | grep on showed the following

x-checksumming: on
tx-checksum-ipv4: on
scatter-gather: on
tx-scatter-gather: on
tcp-segmentation-offload: off
tx-tcp-segmentation: off
tx-tcp-ecn-segmentation: off
tx-tcp6-segmentation: off
udp-fragmentation-offload: off [fixed]
generic-segmentation-offload: on
generic-receive-offload: on
tx-gso-robust: on [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: off [fixed]
tx-udp_tnl-segmentation: off [fixed]

Running ethtool -K eth0 tx off rx off disabled TCP TX offloading. I restarted the networking service for good measure

service network restart

and tested BIND. We are now getting very speedy response times from BIND

dig centos.org

; <<>> DiG 9.10.2-P4-RedHat-9.10.2-P4.el6 <<>> centos.org
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 61933
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;centos.org.INA

;; ANSWER SECTION:
centos.org.60INA85.12.30.227

;; Query time: 268 msec
;; SERVER: 192.168.10.25#53(192.168.10.25)
;; WHEN: Thu Sep 17 08:25:39 AEST 2015
;; MSG SIZE  rcvd: 55
ausip
  • 396
  • 4
  • 8
2

I had this same problem with very slow recursive queries on a physical CentOS 7 BIND server and found this answer (TX Offloading) and many IPv6-oriented fixes around various threads, none of which worked for me.

It turns out the location of the server in question had an older Cisco ASA firewall which was limiting the size of UDP response packets to 512 bytes; it seems these days UDP responses for DNS queries are often much larger, up to around 2000 bytes. There's a page about it here:

Why DNS through UDP has a 512 bytes limit?

I configured the ASA to allow larger UDP response packets (there's a specific fixup command for this) which resolved the issue:

https://supportforums.cisco.com/t5/getting-started-with-lans/dns-dropped-because-packets-to-big-for-configured-512/td-p/861718