[ratelimits] Rate-limiting in NSD

Sun Oct 14 01:49:51 UTC 2012

> From: Stephane Bortzmeyer <bortzmeyer at nic.fr>

> http://www.nlnetlabs.nl/blog/2012/10/11/nsd-ratelimit/

That web page raises some questions:
  - Is hash collision resolution the standard quadratic probing that
     it sounds like?
  - The computation of the probability of hash table collisions
     differs from the standard computation as a function of load
     factor (e.g. Knuth).  It would be good to have an estimate of
     false negatives based on that standard computation.
  - How do you detect and count false positives?  Are all discarded
     responses counted and optionally logged?

There is a major difference between the NSD mechanism and RRL:

  - RRL limits identical non-error responses by (IP,qname,qtype) while
     the NSD mechanism limits non-error responses almost entirely by
     IP address.  The NSD scheme would be more accurately described
     as client rate limiting instead of response rate limiting.

Again: a mechanism that counts all queries from a source with all
valid, non-wildcard names and any normal record type with a single
counter might be a good idea, but it is not *response* rate limiting.

I think that difference will yield significant differences in practice.

  - The RRL scheme has no false negatives, but the NSD scheme does.
     The NSD counter for an attack stream can be reset by hash collisions
     with other queries.  The colliding queries need happen only once
     for every (rrl-ratelimit-1) attack queries.  Given a probability
     of collision X, the default rrl-ratelimit=200, and hand waving
     about uniformity, then the probability of a false negative with
     a rate of 200 legitimate queries/second P=1-(1-X)^(199).
     (I'll not take the next step with X=0.001 from the web page
     because I think that number is wrong.)

  - The RRL scheme has false positives only when the reflection attack
     victim requsts the same type and name as are being forged.  For
     example, while the forged stream is `dig +dnssec www.isc.org A`
     the victim's `dig +dnssec www.isc.org AAAA` are not affected by
     the RRL scheme.

     With the NSD scheme, the victim must rely on the 'slip' statistics
     because the victims legitimate requests for names other than the
     attack name are not distinguished from the attack name.
     The NSD web page does not mention the NSD slip rate, so assume
     that it is the same as the default RRL "slip 2;".  If the
     victim tries and retries 'www.isc.org AAAA' a total of 5 time,
     then the probability of of 'slip' saving the day is 1-0.5^4=0.93.
     That's not bad but it differs significantly from the 1.0
     probability of success for RRL for distinct names or types.

  - The default rate limit for the RRL scheme is 10/second.
     I suspect false positives are why the NSD scheme has a default
     limit of 200/second.  The RRL scheme happily defends against
     attack rates as low as 5.

     Consider running RRL and NSD mechanisms on a TLD server.  A
     single big recursive client can easily need 200 different names
     per second, especially after being restarted.  RRL will count
     the distinct names separately and see no false positives even
     with a limit of 10/second, but the NSD mechanism will see them
     as a single attack stream and make all but the first 200 false
     positives.  At best the NSD mechanism will drop 50% of those
     legitimate requests and cause timeouts and retransmissions and
     convert the other 50% to TCP.

Testing the false negative difference seems difficult, but it is
easy to see the false positive difference.  

    - create this shell script in /tmp/foo
	#!/bin/sh 
	dig +short +time=1 www.example.com A @ns.example.com >/dev/null &

    - start 5 c or tcsh shells and start this command in each of them:
	 repeat 100000 /tmp/foo

    - run this command
	 dig +short www.example.com AAAA @ns.example.com

Replace "www.example.com" and "ns.example.com" so that you are asking
authoritative servers with RRL and NSD rate limiting.

The RRL scheme will not even pause on the AAAA request.
The NSD scheme wil either fail to detect the attack because the 5
`repeat` loops aren't fast enough to send 200 qps or almost always
hiccup and pause for seconds and then fall back to TCP for the AAAA request.

Vernon Schryver    vjs at rhyolite.com