[DNSfirewalls] RPZ and client perception

Wed May 28 21:39:07 UTC 2014

        We implemented RPZ with a purchased feed about a month ago on our 
production DNS servers.   As expected from our testing and pilot there 
were a few immediate issues which we have taken care of.   However, we are 
still getting a trickle of complaints about slowness and failures that 
appear to be related to the RPZ and the amount of time it takes to 
complete all the extra queries for the NSDNAME checks.   When we research 
these issues they seem to fit into 2 groups. 

        1.   DNS zones with "slightly" broken infrastructure.  These would 
be domains with either slow response from one or more name servers or not 
responding name servers.  A recursive resolver without a RPZ loaded can 
work though the issues and provide a timely response to the client. 
However, the extra lookups required, primarily for the NSDNAME checks, 
amplify what would be a "minor" DNS issue and increases the query time to 
the point where DNS times out from the client perspective.    I can't 
really see a fix here,  the issue does reside with the domain owner, we 
are simply more susceptible to the issue because of the RPZ's.   

        2.  DNS zones with a large number of NS records and the name 
servers have FQDN's in several different DNS zones. I found some where the 
2nd and 3rd level domains have a different list of NS records in various 
unrelated domains.  These have primarily been non business related sites 
that I don't care about, however, here is a simple real world example: 

;; QUESTION SECTION:
;banque-france.org.             IN      NS

;; ANSWER SECTION:
banque-france.org.      600     IN      NS      indom80.indomco.hk.
banque-france.org.      600     IN      NS      indom30.indomco.fr.
banque-france.org.      600     IN      NS      indom20.indomco.net.
banque-france.org.      600     IN      NS      indom10.indomco.com.

        These are the most frustrating as there is really nothing wrong 
with this setup in my opinion.   This, by design, is just going to 
generate a large number of DNS lookups to do a full NSDNAME check.  These 
are hard to explain away as they "work from home" and "work from my 
phone".  These are also difficult as they are region specific.  For 
example:

These times are from recursive resolvers, physically located around the 
world, setup with root hints only, a empty cache, and a RPZ loaded that 
includes a NSDNAME check.  I ask each of them for www.banque-france.org. 
This lookup requires ~30 individual DNS lookups to complete the NSDNAME 
checks.

                no RPZ  RPZ
Europe          20ms    70ms
US              30ms    350ms
China           90ms    900ms
Australia       110ms   1400ms

        I understand the queries and latency amplification behind these 
times.  But due to poorly written web applications, anycast \ load 
balanced DNS servers that do not share a cache, and generally short TTLs 
on nearly every hop in this particular lookup, it takes a web site that is 
very usable globally before NSDNAME checks to one that is only usable in 
Europe. 

        Have others found similar issues when implementing RPZ's?
     What have you done to mitigate them?
     Is there a RPZ log event that says "It took over (X) seconds to 
complete this query because of RPZ"?  Basically, I got a good answer back 
for the 'real' query but I did not provide it to the client within X 
seconds because the RPZ check was still ongoing.  I can imagine there 
would be a huge amount of noise in those messages but they could 
conceivably be acted on before the client calls with an issue.

David A. Evans
Enterprise IP/DNS Management
Network Infrastructure Tools and Services
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.redbarn.org/pipermail/dnsfirewalls/attachments/20140528/d7678cb6/attachment.html>