[DNSfirewalls] How reliable is RPZ in production? I'm seeing flakiness in testing.

Anne Bennett anne at encs.concordia.ca
Wed Dec 17 22:07:39 UTC 2014

Well, no response to my question on Monday about RPZ logging,
but I'll try my luck here again on a different aspect of RPZ...

I'm testing now with essentially two RPZs enabled in the
policy-reponse statement: first, a "whitelist" to make sure
that "important" sites (ours, and various vendor patch sites)
never get "blocked", and second, a "quarantine" to redirect
a local client to our "quarantine web server" using a CNAME,
if that client asks for anything not whitelisted.

In case it matters, my resolvers are slaves for the two RPZs
in question, whose master files reside on my two master DNS
servers.  All of these nameservers are running bind-9.10.1-P1.
The zones appear to be notifying and transferring correctly upon
update (based on log entries and the contents of the slave data
files, which I keep in text form with "masterfile-format text").

I'm having partial success with my testing: sometimes the RPZs
work as expected, then suddenly they stop matching, even when I
haven't changed the data.  At one point neither a reload nor a 
reconfig solved the problem, but a daemon restart brought the
"dead" RPZ to life again.

When things work as expected, this is the logged entry for a
hit on a whitelist entry by name:

  Dec 17 16:34:58 sloth named[13717]: rpz: info: client (alcor.concordia.ca): rpz QNAME PASSTHRU rewrite alcor.concordia.ca via alcor.concordia.ca.rpz-whitelist

This is a hit for the same by IP address (I had to comment out
the QNAME rules to test this):

  Dec 17 15:26:13 sloth named[1088]: rpz: info: client (alcor.concordia.ca): rpz IP PASSTHRU rewrite alcor.concordia.ca via

This is another whitelist hit, for a more complicated case where
the answer is:
  www.microsoft.com is an alias for toggle.www.ms.akadns.net.
  toggle.www.ms.akadns.net is an alias for www.microsoft.com.edgekey.net.
  www.microsoft.com.edgekey.net is an alias for e10088.dscb.akamaiedge.net.
  e10088.dscb.akamaiedge.net has address
  e10088.dscb.akamaiedge.net has IPv6 address 2600:140a:0:199::2768
  e10088.dscb.akamaiedge.net has IPv6 address 2600:140a:0:194::2768

  Dec 17 15:30:03 sloth named[1088]: rpz: info: client (www.microsoft.com): rpz IP PASSTHRU rewrite e10088.dscb.akamaiedge.net via

And this is a hit for a non-whitelisted entry, where my quarantined
client is redirected:

  Dec 17 16:34:43 sloth named[13717]: rpz: info: client (hecate.therockgarden.ca): rpz CLIENT-IP Local-Data rewrite hecate.therockgarden.ca via

As you can see, all of the above looks as though it is working

However, at one point in my testing, my quarantine rule just
stopped matching (based on no logged hits, and no redirection
of my queries from the quarantined host).  I hadn't changed
the quarantine RPZ at all, and my client had not changed its
IP address.  Only a restart of named on the resolver brought the
quarantine back, but then the whitelist worked only partially:
it was okay for simple queries, but failed to match for my
multi-level "www.microsoft.com" attempt.

I don't know what to make of this; it looks as though the
technology is several years old, and my experience with ISC
bind is usually excellent.  Has anyone else encountered this
type of flakiness?

Ms. Anne Bennett, Senior Sysadmin, ENCS, Concordia University, Montreal H3G 1M8
anne at encs.concordia.ca                                    +1 514 848-2424 x2285

More information about the DNSfirewalls mailing list