Cached response times from Google Public DNS slow

Question

I noticed very strange but consistent performance behavior from Google Public DNS 8.8.8.8. Even-though a DNS record was cached, the response time was in the range of 20-30ms, which seems high. When I switched over to OpenDNS, cached response times dived down to 1ms. Needless to say just made the switch on all the servers to OpenDNS. Can anybody explain this poor performance behavior from Google Public DNS?

Here is the output from dig for each test case:

Google Public DNS

Uncached (389ms)

➜  ~  dig @8.8.8.8 commando.io

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @8.8.8.8 commando.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 655
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;commando.io.       IN  A

;; ANSWER SECTION:
commando.io.        300 IN  A   192.241.225.51

;; Query time: 389 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Sep 22 12:08:37 2013
;; MSG SIZE  rcvd: 45

Cached (24ms)

➜  ~  dig @8.8.8.8 commando.io

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> @8.8.8.8 commando.io
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 55425
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;commando.io.           IN  A

;; ANSWER SECTION:
commando.io.        295 IN  A   192.241.225.51

;; Query time: 24 msec
;; SERVER: 8.8.8.8#53(8.8.8.8)
;; WHEN: Sun Sep 22 12:08:42 2013
;; MSG SIZE  rcvd: 45

OpenDNS

Uncached (46ms)

➜  ~  dig commando.io 

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> commando.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 49578
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;commando.io.       IN  A

;; ANSWER SECTION:
commando.io.        300 IN  A   192.241.225.51

;; Query time: 46 msec
;; SERVER: 208.67.222.222#53(208.67.222.222)
;; WHEN: Sun Sep 22 12:09:43 2013
;; MSG SIZE  rcvd: 45

Cached (1ms)

➜  ~  dig commando.io

; <<>> DiG 9.8.2rc1-RedHat-9.8.2-0.17.rc1.el6_4.6 <<>> commando.io
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 42532
;; flags: qr rd ra; QUERY: 1, ANSWER: 1, AUTHORITY: 0, ADDITIONAL: 0

;; QUESTION SECTION:
;commando.io.       IN  A

;; ANSWER SECTION:
commando.io.        273 IN  A   192.241.225.51

;; Query time: 1 msec
;; SERVER: 208.67.222.222#53(208.67.222.222)
;; WHEN: Sun Sep 22 12:10:10 2013
;; MSG SIZE  rcvd: 45

score 1 · Accepted Answer · answered Sep 22 '13 at 19:36

1

This probably has to do with proximity more than anything else - a response time of 1ms indicates that the OpenDNS server you're hitting is extremely close to your system from a routing perspective.

How do the DNS query times compare to the raw round trip times (ping)?

answered Sep 22 '13 at 19:36

Shane Madden

112,982
12
174
248

Forgive me if I'm wrong, but if the record is cached, shouldn't it never hit the external DNS server? Your right though, pinging `8.8.8.8` is around `20ms`, while pinging `208.67.222.222` is `1ms`. – Justin Sep 22 '13 at 19:41
By "external DNS server", are you referring to the recursive DNS server that you're querying (OpenDNS / Google), or are you referring to the authoritative DNS server for the `commando.io` domain? – Shane Madden Sep 22 '13 at 19:43
It is my understanding that when a DNS record is cached according to the TTL, a request does not need to hit the external DNS server (Google/OpenDNS). – Justin Sep 22 '13 at 19:44
@Justin That's only if the client system is caching. `dig` is not interested in using a cache if there is one (which there probably isn't on that system - is `nscd` running?) - it's specifically for sending remote queries. – Shane Madden Sep 22 '13 at 19:45
But with **dig** if I keep hitting it, I'll see the TTL go down until `0`, then the next time I query dig, the response jumps back to around `50ms`. Then for 5 minutes again, the response time is `1ms`. – Justin Sep 22 '13 at 19:47
3

@Justin OpenDNS and Google are caching the response from the authoritative name server. With the initial request, the recursive resolver you're using (OpenDNS/Google) has no entry in its cache, so they must send the query to the authoritative server (causing the rather slow response). For subsequent requests, the OpenDNS/Google server has the entry in ***its*** cache, so it responds to you directly without querying the authoritative server. – Shane Madden Sep 22 '13 at 19:49
I see, thanks for the explanation. Makes total sense now. The switch to **OpenDNS** is staying, better performance, probably because they have an anycast node in my DC. – Justin Sep 22 '13 at 19:51
@Justin Yup, 1ms is almost certainly same building. Glad to help! – Shane Madden Sep 22 '13 at 19:51