17

(Rewriting most of this question since a lot of my original tests are irrelevant in light of new information)

I'm having issues with Server 2012R2 DNS servers. The biggest side effect of these issues is Exchange emails not going through. Exchange queries for AAAA records before trying A records. When it sees SERVFAIL for the AAAA record, it doesn't even try A records, it just gives up.

For some domains, when querying against my active directory DNS servers, I get SERVFAIL instead of NOERROR with no results.

I have tried this from several different Server 2012R2 domain controllers that are running DNS. One of them is an entirely separate domain, on a different network behind a different firewall and internet connection.

Two addresses that I know cause this problem are smtpgw1.gov.on.ca and mxmta.owm.bell.net

I've been using dig on a linux machine to test this (192.168.5.5 is my domain controller):

grant@linuxbox:~$ dig @192.168.5.5 smtpgw1.gov.on.ca -t AAAA

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> @192.168.5.5 smtpgw1.gov.on.ca -t AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: SERVFAIL, id: 56328
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 4000
;; QUESTION SECTION:
;smtpgw1.gov.on.ca.             IN      AAAA

;; Query time: 90 msec
;; SERVER: 192.168.5.5#53(192.168.5.5)
;; WHEN: Wed Oct 21 14:09:10 EDT 2015
;; MSG SIZE  rcvd: 46

But queries against a public domain controller work as expected:

grant@home-ssh:~$ dig @4.2.2.1 smtpgw1.gov.on.ca -t AAAA

; <<>> DiG 9.9.5-3ubuntu0.5-Ubuntu <<>> @4.2.2.1 smtpgw1.gov.on.ca -t AAAA
; (1 server found)
;; global options: +cmd
;; Got answer:
;; ->>HEADER<<- opcode: QUERY, status: NOERROR, id: 269
;; flags: qr rd ra; QUERY: 1, ANSWER: 0, AUTHORITY: 0, ADDITIONAL: 1

;; OPT PSEUDOSECTION:
; EDNS: version: 0, flags:; udp: 8192
;; QUESTION SECTION:
;smtpgw1.gov.on.ca.             IN      AAAA

;; Query time: 136 msec
;; SERVER: 4.2.2.1#53(4.2.2.1)
;; WHEN: Wed Oct 21 14:11:19 EDT 2015
;; MSG SIZE  rcvd: 46

As I said, I've tried this on two different networks and domains. One is a brand new domain, which definitely has all default settings for DNS. The other has been migrated to Server 2012, so some old settings from 2003/2008 may have carried over. I get the same results on both of them.

Disabling EDNS with dmscnd /config /enableednsprobes 0 fixes it. I see many search results about EDNS being a problem in Server 2003, but not much that matches what I'm seeing in Server 2012. Neither firewall has a problem with EDNS. Disabling EDNS should just be a temporary workaround though - it prevents the use of DNSSEC, and might cause other issues.

I have also seen some posts about issues with Server 2008R2 and EDNS, but those same posts say things are fixed in Server 2012, so it should work properly.

I have also tried enabling the debug log for DNS. I can see the packets that I expected, but it doesn't give me much insight as to why it's returning SERVFAIL. Here is the relevant portions of the DNS server debug log:

First packet - query from client to my DNS server

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF1BF01A0 UDP Rcv 172.16.0.254    a61e   Q [2001   D   NOERROR] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP question info at 000000EFF1BF01A0
  Socket = 508
  Remote addr 172.16.0.254, port 50764
  Time Query=4556080, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x002e (46)
  Message:
    XID       0xa61e
    Flags     0x0120
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        0
      Z         0
      CD        0
      AD        1
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x0023, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4096
      TTL    0
      DLEN   0
      DATA   
        Buffer Size  = 4096
        Rcode Ext    = 0
        Rcode Full   = 0
        Version      = 0
        Flags        = 0

Second packet - query from my DNS server to their DNS server

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF0A22160 UDP Snd 204.41.8.237    3e6c   Q [0000       NOERROR] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP question info at 000000EFF0A22160
  Socket = 9812
  Remote addr 204.41.8.237, port 53
  Time Query=0, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0023 (35)
  Message:
    XID       0x3e6c
    Flags     0x0000
      QR        0 (QUESTION)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        0
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

Third packet - response from their DNS server (NOERROR)

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF2188100 UDP Rcv 204.41.8.237    3e6c R Q [0084 A     NOERROR] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP response info at 000000EFF2188100
  Socket = 9812
  Remote addr 204.41.8.237, port 53
  Time Query=4556080, Queued=0, Expire=0
  Buf length = 0x0fa0 (4000)
  Msg length = 0x0023 (35)
  Message:
    XID       0x3e6c
    Flags     0x8400
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        1
      TC        0
      RD        0
      RA        0
      Z         0
      CD        0
      AD        0
      RCODE     0 (NOERROR)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   0
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
      empty

Fourth packet - response from my DNS server to client (SERVFAIL)

10/16/2015 9:42:29 AM 0974 PACKET  000000EFF1BF01A0 UDP Snd 172.16.0.254    a61e R Q [8281   DR SERVFAIL] AAAA   (7)smtpgw1(3)gov(2)on(2)ca(0)
UDP response info at 000000EFF1BF01A0
  Socket = 508
  Remote addr 172.16.0.254, port 50764
  Time Query=4556080, Queued=4556080, Expire=4556083
  Buf length = 0x0fa0 (4000)
  Msg length = 0x002e (46)
  Message:
    XID       0xa61e
    Flags     0x8182
      QR        1 (RESPONSE)
      OPCODE    0 (QUERY)
      AA        0
      TC        0
      RD        1
      RA        1
      Z         0
      CD        0
      AD        0
      RCODE     2 (SERVFAIL)
    QCOUNT    1
    ACOUNT    0
    NSCOUNT   0
    ARCOUNT   1
    QUESTION SECTION:
    Offset = 0x000c, RR count = 0
    Name      "(7)smtpgw1(3)gov(2)on(2)ca(0)"
      QTYPE   AAAA (28)
      QCLASS  1
    ANSWER SECTION:
      empty
    AUTHORITY SECTION:
      empty
    ADDITIONAL SECTION:
    Offset = 0x0023, RR count = 0
    Name      "(0)"
      TYPE   OPT  (41)
      CLASS  4000
      TTL    0
      DLEN   0
      DATA   
        Buffer Size  = 4000
        Rcode Ext    = 0
        Rcode Full   = 2
        Version      = 0
        Flags        = 0

Other things of note:

  • One of the networks has native IPv6 internet access, the other does not (but IPv6 stack is enabled on the servers with default settings). Doesn't seem to be an IPv6 network issue
  • It doesn't affect all domains. For example dig @192.168.5.5 -t AAAA serverfault.com returns NOERROR, and no results. Same thing for google.com returns google's IPv6 addresses properly.
  • Tried installing hotfix from KB3014171, made no difference.
  • The update from KB3004539 is already installed.

Edit Nov 7, 2015

I've setup another non-domain joined Server 2012R2 machine, and installed DNS server role, and tested with the command nslookup -type=aaaa smtpgw1.gov.on.ca localhost. It does NOT have the same issues.

Both VMs are on the same host, and same network, so that eliminates any network/firewall issues. It's now down to either patch level or being a domain member/domain controller that makes the difference.

Edit Nov 8, 2015

Applied all updates, made no difference. Went through to double check if there were any configuration differences between my new test server and my domain controller's DNS settings, and there are - the domain controller had forwarders setup.

Now, I'm sure I tried with forwarders and without in my initial tests, but I only tried it using dig from a linux machine. I do get slightly different results with and without forwarders setup (tried with Google, OpenDNS, 4.2.2.1, and my ISP DNS servers) when I use nslookup on a windows machine.

With a forwarder set, I get Server failed.

Without a forwarder (so it uses root DNS servers), I get No IPv6 address (AAAA) records available for smtpgw1.gov.on.ca.

But that's still not the same as what I get for other domains that don't have IPv6 records - nslookup on windows just returns no results for other domains.

With or without forwarders, dig still shows SERVFAIL for that name when querying my windows DNS server.

There IS a small difference between the problem domain and other ones that seems relevant, even when I don't involve my windows DNS server:

dig -t aaaa @8.8.8.8 smtpgw1.gov.on.ca has no answers, and does not have an authority section.

dig -t aaaa @8.8.8.8 serverfault.com returns no answers, but does have an authority section. So do most other domains I try, no matter what resolver I use.

So why is that authority section missing, and why does Windows DNS server treat it as a failure when other DNS servers don't?

Daniel
  • 6,780
  • 5
  • 31
  • 60
Grant
  • 17,671
  • 14
  • 69
  • 101
  • Are you performing these tests from the Exchange server? If not, I would suggest doing that so that you can see it from Exchange's perspective. You might want to try running SMTPDiag from the Exchange server as well. I'd suggest running it while performing a network capture on the Exchange server so that you can view the details of the network/DNS activity. SMTPDiag is an old tool, but it's a command line tool that doesn't require any installation, so I'm thinking that it should work on all versions of Exchange. - http://www.microsoft.com/en-us/download/details.aspx?id=11393 – joeqwerty Oct 15 '15 at 22:44
  • Some network devices don't recognize and will reject EDNS packets. Did your network team introduce new device/setting recently? To eliminate this possibility, try to resolve google.com's AAAA record, it should return an IPv6 address. – strongline Oct 16 '15 at 00:05
  • @strongline EDNS packets come through fine. AAAA record for google works, as do a couple other sites I know have IPv6 running. Only chance made recently was getting rid of our last Server 2008R2 DC/DNS server and replacing with 2012R2. – Grant Oct 16 '15 at 00:10
  • Is IPv6 disabled in any way in your environment? – Jim B Oct 16 '15 at 00:23
  • @JimB neither really enabled nor disabled...IPv6 stack is running on the servers, because it's on by default, with whatever default configuration it has. Gateway and internet connection have no IPv6 whatsoever. – Grant Oct 16 '15 at 00:31
  • I'm also seeing no IPv6 AAAA records for smtpgw1.gov.on.ca, so I'd expect that to fail in a DNS lookup. The real question is why is Exchange preferring IPv6 over IPv4 for the DNS record? – joeqwerty Oct 16 '15 at 01:16
  • @joeqwerty but it shouldnt *fail*, it should return no records. In dig I get status "SERVFAIL" and nslookup gives "server failure". Not just a no such host error - if it got that exchange would try for the A record. – Grant Oct 16 '15 at 01:26
  • That's what I meant, it's not trying the A record... for some strange reason. – joeqwerty Oct 16 '15 at 01:37
  • @joeqwerty I read somewhere it tried for AAAA first and if that doesn't work out it goes to IPv4. Packet captures seem to confirm that. But when it encounters a dns server failure it seems to just give up entirely instead. Which is weird and a stupid design. – Grant Oct 16 '15 at 01:49
  • it seems that your DNS has "recursion" disabled. Try to enable it. – strongline Oct 20 '15 at 14:05
  • That KB installed ? https://support.microsoft.com/en-us/kb/3014171 - "•When a Windows Server 2012 R2-based DNS server is enabled for domain name system security extensions (DNSSEC) validation, the DNS server may not always resolve some DNSSEC-signed zones. For example, when the DNS server receives a request to resolve a host name in a DNSSEC-signed zone, the DNS server returns a SERVFAIL error to the client." – yagmoth555 Oct 21 '15 at 19:53
  • @yagmoth555 nope, hadn't seen that hotfix. Going to try it first thing in the morning. – Grant Oct 22 '15 at 02:19
  • @yagmoth555 well, I was really hopeful, but after installing that hotfix I still get the SERVFAIL error. – Grant Oct 22 '15 at 14:05
  • What about this? https://support.microsoft.com/en-us/kb/3004539 – duenni Oct 23 '15 at 10:00
  • @duenni update 3000850 (the one from that kb article) is already installed – Grant Oct 23 '15 at 11:23
  • Just rechecked your question, and I have now the feeling it will finish with MS's support, and probably a new kb will be issued by them after. – yagmoth555 Oct 26 '15 at 18:14
  • @yagmoth555 yeah, if I ever find the time to call them about it. Curious if anyone else has 2012R2/2012/2008R2 DNS servers they could test against and see if they get the same error. – Grant Oct 26 '15 at 21:33
  • @pjmahoney I will give bypassing the firewall a try and read those articles. I have tried it from networks using watchguard and pfsense firewalls, both with latest firmware, and neither doing any filtering or inspection of dns traffic, and both allow tcp and udp for dns, so I dont think thats the problem but I will certainly give it a try. – Grant Nov 07 '15 at 05:19
  • @pjmahoney also if it were the firewall, I would expect the queries against 4.2.2.1 to fail as well from that network, but they dont. Only queries against the 2012r2 servers fail. – Grant Nov 07 '15 at 05:29
  • @pjmahoney yes one affected network has a watchguard xtm510 running firmware 11.10. Rule allowing dns is just the built in dns packet filter rule (not dns proxy). Windows firewall on that server is disabled. – Grant Nov 07 '15 at 14:09
  • @Grant, I have just checked with a 2012R2 dns server and it doesn't return any SERVFAIL. Could you please compare just the `dig` output from both the dns servers you have to see if they are also identical? – Diamond Nov 22 '15 at 21:20
  • If you can run the dig command from the DNS server itself, try ```dig +trace``` for the problem names. That acts a bit like a recursive server, and will show each step on the way. Running it from another machine on the same network may be sufficient in a pinch. – Michael Graff Dec 25 '15 at 03:34
  • You mentioned that disabling EDNS fixes the issue. Is that still the case? If so, it could be a network issue. EDNS increases the packet size of the UDP packet. The exact size of the packet is contained in the OPT record. https://en.wikipedia.org/wiki/Extension_mechanisms_for_DNS I wonder if packets larger than a certain size are being truncated or dropped? I also noticed in the first packet, your client to server, the size is 4096. But in the last packet, from server to client, the packet size is 4000. – Mike Marseglia Dec 27 '15 at 17:37
  • @mikemarseglia dns on linux systems seems to work fine even for large packet on the same networks - its only server 2012 dns servers that seem affected. – Grant Dec 27 '15 at 18:35
  • If you really need the issue to be resolved, it might be worth a shot to open a support ticket at Microsoft: Https://support.microsoft.com/OAS. They are quite cheap considering how much time one spends on such issues. :) – Daniel Jan 12 '16 at 23:11

2 Answers2

3

I've looked into the network tace some more and done some reading. The reqest for the AAAA record, when non-existant, returns an SOA. Turns out the SOA is for a different domain that that being requested. I suspect that's why Windows is rejecting the response. Request AAAA for mx.atomwide.com. Response SOA for lgfl.org.uk. I will see if we can make some progress with this information. EDIT: Just for future reference, temporarily turning off "Secure cache against pollution" will allow the query to succeed. Not ideal, but proves the issue is with a dodgy DNS record. RFC4074 is also a good referemce - Intro and Section.

Grant
  • 17,671
  • 14
  • 69
  • 101
  • I am going to try to test this today in my environment, but I think you may be onto something! – Grant Mar 18 '16 at 12:13
  • Also I have edited out your link - signatures and off topic links are not allowed here, and I don't want to see your otherwise excellent answer get deleted for it. – Grant Mar 18 '16 at 12:19
0

According to KB832223

Cause

This issue occurs because of the Extension Mechanisms for DNS (EDNS0) functionality that is supported in Windows Server DNS.

EDNS0 allows larger User Datagram Protocol (UDP) packet sizes. However, some firewall programs may not allow UDP packets that are larger than 512 bytes. Therefore, these DNS packets may be blocked by the firewall.

Microsoft has the following resolution:

Resolution

To resolve this issue, update the firewall program to recognize and allow UDP packets that are larger than 512 bytes. For more information about how to do this, contact the manufacturer of your firewall program.

Microsoft has the following suggestion to work around the issue:

Workaround

To work around this issue, turn off the EDNS0 feature on Windows-based DNS servers. To do this, take the following action:

At a command prompt, type the following command, and then press Enter:

dnscmd /config /enableednsprobes 0

Note Type a 0 (zero) and not the letter "O" after "enableednsprobes" in this command.

Tim Penner
  • 1,799
  • 12
  • 22
  • I have seen this article - the firewalls I have tested with both pass large dns packetd without issue, as evidenced by it working perfectly on linux. Disabling edns prevents the use of DNSSEC, so though it fixes the problem it is not a good solution. – Grant Feb 29 '16 at 22:09
  • sorry I didn't realize that Microsoft's guidance would apply to Linux also. Out of curiosity, do you have **any** Microsoft OS that is working through the firewall? – Tim Penner Feb 29 '16 at 23:36