2

While investigating an incident, I noticed an error in my syslog that looks like this (anonymized):

Feb  3 21:59:59 ns1 named[18824]: client xxx.xxx.xxx.xxx#2091 (us-east1-aws.api.snapchat.com): view MyView: rpz QNAME rewrite us-east1-aws.api.snapchat.com via us-east1-aws.api.snapchat.com.rpz.vendorsite.com query_getzonedb()failed: zone not loaded

Feb  3 21:59:59 ns1 named[18824]: client yyy.yyy.yyy.yyy#27720 (time-ios.apple.com): view MyView: rpz QNAME rewrite time-osx.g.aaplimg.com via time-osx.g.aaplimg.com.rpz.vendorsite.com query_getzonedb()failed: zone not loaded

Feb  3 21:59:59 ns1 named[18824]: client yyy.yyy.yyy.yyy#27720 (time-ios.apple.com): view MyView: rpz QNAME rewrite time.apple.com via time.apple.com.rpz.vendorsite.com query_getzonedb()failed: zone not loaded

We have query logging turned on. Under the hood this is BIND 9. We use a vendor for DNS services, and that vendor uses Spamhaus as a threat feed. We subscribe to that service. This kind of message is strange for this service. The service is implemented by slaving an RPZ hosted by the vendor.

Noticed:

  • The "rpz" in the domain seems to refer to a Response Policy Zone problem
  • Sites that should have been blocked by this service were not being blocked
  • almost every DNS query not white listed was appearing with this same message
  • The error message seems to imply the service RPZ is failing to load from the master

What does this log message mean? And why did this happen in the middle of February?

Watki02
  • 537
  • 2
  • 12
  • 21

2 Answers2

0

It turns out that the log message means - the zone in the message was not loaded. :-). More specifically, the RPZ slave zone was failing to get updates. Now for the obvious follow-up questions: Why? And what is the real problem here?

While there are probably a few reasons an RPZ would fail to load (master server is down, FW rule change, etc.) It turns out our problem was that we never applied the new annual license. That is where the TSIG key was that allowed us to subscribe to the service.

Our licenses follow the solar calendar year, so why did this happen in the middle of February? Turns out there was a substantial grace period from the vendor! (After which it probably hit the refresh or expire limit and finally 'died'.)

I acquired the license, applied, deployed, and we were back in business - no more strange log messages (at least no more like the above described).

Watki02
  • 537
  • 2
  • 12
  • 21
0

Another possible answer to this I found in Pastebin:

https://pastebin.com/NCwum7up

For some reason RPZ rewrite on my setup kept failing dns querylog show the following:

Apr 18 12:26:28 Internal-DNS-DHCP named[7257]: client 172.16.11.17#58306: rpz QNAME rewrite gateway.fe.apple-dns.net via gateway.fe.apple-dns.net.rpz.local.net query_getzonedb() failed: zone not loaded Apr 18 12:26:31 Internal-DNS-DHCP named[7257]: client 172.16.10.13#64377: rpz QNAME rewrite teredo.ipv6.microsoft.com via teredo.ipv6.microsoft.com.rpz.local.net query_getzonedb() failed: zone not loaded

The error is: query_getzonedb() failed: zone not loaded

The Fix:

  1. Check your Bind DNS startup log:

Apr 18 12:39:23 Internal-DNS-DHCP named[7551]: zone rpz/IN: loading from master file /etc/named/zone/response-override.db failed: permission denied

Apr 18 12:39:23 Internal-DNS-DHCP named[7551]: zone rpz/IN: not loaded due to errors.

  1. Fix Zone File permission error.
  2. Restart named
  3. Done.
Watki02
  • 537
  • 2
  • 12
  • 21