17

We're experiencing a frustrating problem on our LAN. Periodically, DNS queries to our ISP nameservers timeout forcing a 5 second delay. Even if I bypass /etc/resolv.conf by using a direct dig to one of our DNS servers, I still encounter the problem. Here's an example:

mv-m-dmouratis:~ dmourati$ time dig www.google.com @209.81.9.1 

; <<>> DiG 9.8.3-P1 <<>> www.google.com @209.81.9.1
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 14473
;; flags: qr rd ra; QUERY: 1, ANSWER: 5, AUTHORITY: 4, ADDITIONAL: 4

;; QUESTION SECTION:
;www.google.com.            IN  A

;; ANSWER SECTION:
www.google.com.     174 IN  A   74.125.239.148
www.google.com.     174 IN  A   74.125.239.147
www.google.com.     174 IN  A   74.125.239.146
www.google.com.     174 IN  A   74.125.239.144
www.google.com.     174 IN  A   74.125.239.145

;; AUTHORITY SECTION:
google.com.     34512   IN  NS  ns2.google.com.
google.com.     34512   IN  NS  ns1.google.com.
google.com.     34512   IN  NS  ns3.google.com.
google.com.     34512   IN  NS  ns4.google.com.

;; ADDITIONAL SECTION:
ns2.google.com.     212097  IN  A   216.239.34.10
ns3.google.com.     207312  IN  A   216.239.36.10
ns4.google.com.     212097  IN  A   216.239.38.10
ns1.google.com.     212096  IN  A   216.239.32.10

;; Query time: 8 msec
;; SERVER: 209.81.9.1#53(209.81.9.1)
;; WHEN: Fri Jul 26 14:44:25 2013
;; MSG SIZE  rcvd: 248


real    0m5.015s
user    0m0.004s
sys 0m0.002s

Other times, the queries respond instantly, as in under 20 ms or so. I've done a packet trace and discovered something interesting. The DNS server is responding but the client ignores the initial response, then sends a second identical query which is immediately responded to.

See packet trace. Note the identical source ports to the queries (62076).

Question: what is causing the first DNS query to fail?

UPDATE

Resources:

Packet trace:

http://www.cloudshark.org/captures/8b1c32d9d015

Dtruss (strace for mac):

https://gist.github.com/dmourati/6115180

Mountain Lion firewall is randomly delaying DNS requests from apple.stackexchange.com:

https://apple.stackexchange.com/questions/80678/mountain-lion-firewall-is-randomly-delaying-dns-requests

UPDATE 2

System Software Overview:

  System Version:   OS X 10.8.4 (12E55)
  Kernel Version:   Darwin 12.4.0
  Boot Volume:  Macintosh HD
  Boot Mode:    Normal
  Computer Name:    mv-m-dmouratis
  User Name:    Demetri Mouratis (dmourati)
  Secure Virtual Memory:    Enabled
  Time since boot:  43 minutes

Hardware Overview:

  Model Name:   MacBook Pro
  Model Identifier: MacBookPro10,1
  Processor Name:   Intel Core i7
  Processor Speed:  2.7 GHz
  Number of Processors: 1
  Total Number of Cores:    4
  L2 Cache (per Core):  256 KB
  L3 Cache: 6 MB
  Memory:   16 GB

Firewall Settings:

  Mode: Limit incoming connections to specific services and applications
  Services:
  Apple Remote Desktop: Allow all connections
  Screen Sharing:   Allow all connections
  Applications:
  com.apple.java.VisualVM.launcher: Block all connections
  com.getdropbox.dropbox:   Allow all connections
  com.jetbrains.intellij.ce:    Allow all connections
  com.skype.skype:  Allow all connections
  com.yourcompany.Bitcoin-Qt:   Allow all connections
  org.m0k.transmission: Allow all connections
  org.python.python:    Allow all connections
  Firewall Logging: Yes
  Stealth Mode: No
dmourati
  • 24,720
  • 2
  • 40
  • 69
  • `dtruss` output looks truncated. We never see the system calls that write the program output to STDOUT. – Andrew B Jul 30 '13 at 23:42
  • Have you tried other public name server e.g Google DNS. – vasco.debian Jul 31 '13 at 04:29
  • @vasco.debian yes, same behavior. – dmourati Jul 31 '13 at 15:15
  • Can we get pcap and dtrace output from working and delayed DNS requests. Can we also get timestamps on the dtrace output? – Etan Reisner Jul 31 '13 at 16:14
  • does this happens on every LAN client? – Giovanni Toraldo Jul 31 '13 at 20:23
  • @dmourati Did you see the comment about the `dtruss` output being truncated? The number of socket operations seem too few. – Andrew B Jul 31 '13 at 21:59
  • 1
    Only difference I see between these two request-response pairs are delays between request and response. I don't see any problems on the network too. Experiment and check if delay matters - OS might drop some udp packages to application for some reason, despite it is shown in the analyzer. Definitely, it's not a problem with network or general configuration, "dig" must work. Maybe something is wrong with the network stack tuning. Check sysctl settings for the network. Like this http://rolande.wordpress.com/2010/12/30/performance-tuning-the-network-stack-on-mac-osx-10-6/ – GioMac Aug 01 '13 at 00:46
  • Does this problem also occur with a DNS server in the same subnet as the client? – Mels Aug 01 '13 at 10:02
  • 1
    You don't say if you have a firewall running on the mac? – JustinP Aug 01 '13 at 10:06

2 Answers2

3

This appears to be a bug in Lion's firewall. Is it enabled on your system?

In this MacRumors thread (DNS problems after updating to Mountain Lion (10.8)), a possible workaround is discussed:

Try reducing MTU size.

System Preferences > Network > WiFi > Advanced > Hardware > Manually > MTU: Custom > 1300

Worked for me.

Could you check whether reducing the MTU size mitigates your problem?

Mels
  • 685
  • 3
  • 6
  • Changing the firewall settings made the issue go away. The MTU had no effect. Firewall needs to be either disabled, or "Block all incoming connections." – dmourati Aug 01 '13 at 18:40
  • Changing the firewall to either setting decreased problem frequency but did not entirely eliminate the problem. Able to repro 1/200 times or so. – dmourati Aug 01 '13 at 18:57
  • I would consider a packet loss of that magnitude quite reasonable when traversing the internet, especially if there are congested hops on the route. Remember, DNS uses UDP, which doesn't guarantee datagram delivery. Which is exactly why the DNS protocol itself has retries and a timeout mechanism built in. – Mels Aug 02 '13 at 09:42
  • 1
    By the way, I know we aren't supposed to post "thank you" comments on here, but you just increased my reputation sixfold :) – Mels Aug 02 '13 at 09:43
-1

I had a similar issue recently and found that the Cisco ASA firewall wasn't configured to support EDNS0, the spec that allows DNS UDP packets larger then 512 bytes. Once my fw admin allowed up to 4096 bytes the issue was resolved. Great info here:

http://www.petenetlive.com/KB/Article/0000312.htm

Rob
  • 1
  • I don't think that applies here. The response is well beneath 512 bytes for this particular DNS query, even with the authority and additional sections. – Andrew B Jul 30 '13 at 23:10