While investigating an incident, I noticed an error in my syslog that looks like this (anonymized):
Feb 3 21:59:59 ns1 named[18824]: client xxx.xxx.xxx.xxx#2091 (us-east1-aws.api.snapchat.com): view MyView: rpz QNAME rewrite us-east1-aws.api.snapchat.com via us-east1-aws.api.snapchat.com.rpz.vendorsite.com query_getzonedb()failed: zone not loaded
Feb 3 21:59:59 ns1 named[18824]: client yyy.yyy.yyy.yyy#27720 (time-ios.apple.com): view MyView: rpz QNAME rewrite time-osx.g.aaplimg.com via time-osx.g.aaplimg.com.rpz.vendorsite.com query_getzonedb()failed: zone not loaded
Feb 3 21:59:59 ns1 named[18824]: client yyy.yyy.yyy.yyy#27720 (time-ios.apple.com): view MyView: rpz QNAME rewrite time.apple.com via time.apple.com.rpz.vendorsite.com query_getzonedb()failed: zone not loaded
We have query logging turned on. Under the hood this is BIND 9. We use a vendor for DNS services, and that vendor uses Spamhaus as a threat feed. We subscribe to that service. This kind of message is strange for this service. The service is implemented by slaving an RPZ hosted by the vendor.
Noticed:
- The "rpz" in the domain seems to refer to a Response Policy Zone problem
- Sites that should have been blocked by this service were not being blocked
- almost every DNS query not white listed was appearing with this same message
- The error message seems to imply the service RPZ is failing to load from the master
What does this log message mean? And why did this happen in the middle of February?