[ratelimits] Rate-limiting in NSD
vjs at rhyolite.com
Sun Oct 14 01:49:51 UTC 2012
> From: Stephane Bortzmeyer <bortzmeyer at nic.fr>
That web page raises some questions:
- Is hash collision resolution the standard quadratic probing that
it sounds like?
- The computation of the probability of hash table collisions
differs from the standard computation as a function of load
factor (e.g. Knuth). It would be good to have an estimate of
false negatives based on that standard computation.
- How do you detect and count false positives? Are all discarded
responses counted and optionally logged?
There is a major difference between the NSD mechanism and RRL:
- RRL limits identical non-error responses by (IP,qname,qtype) while
the NSD mechanism limits non-error responses almost entirely by
IP address. The NSD scheme would be more accurately described
as client rate limiting instead of response rate limiting.
Again: a mechanism that counts all queries from a source with all
valid, non-wildcard names and any normal record type with a single
counter might be a good idea, but it is not *response* rate limiting.
I think that difference will yield significant differences in practice.
- The RRL scheme has no false negatives, but the NSD scheme does.
The NSD counter for an attack stream can be reset by hash collisions
with other queries. The colliding queries need happen only once
for every (rrl-ratelimit-1) attack queries. Given a probability
of collision X, the default rrl-ratelimit=200, and hand waving
about uniformity, then the probability of a false negative with
a rate of 200 legitimate queries/second P=1-(1-X)^(199).
(I'll not take the next step with X=0.001 from the web page
because I think that number is wrong.)
- The RRL scheme has false positives only when the reflection attack
victim requsts the same type and name as are being forged. For
example, while the forged stream is `dig +dnssec www.isc.org A`
the victim's `dig +dnssec www.isc.org AAAA` are not affected by
the RRL scheme.
With the NSD scheme, the victim must rely on the 'slip' statistics
because the victims legitimate requests for names other than the
attack name are not distinguished from the attack name.
The NSD web page does not mention the NSD slip rate, so assume
that it is the same as the default RRL "slip 2;". If the
victim tries and retries 'www.isc.org AAAA' a total of 5 time,
then the probability of of 'slip' saving the day is 1-0.5^4=0.93.
That's not bad but it differs significantly from the 1.0
probability of success for RRL for distinct names or types.
- The default rate limit for the RRL scheme is 10/second.
I suspect false positives are why the NSD scheme has a default
limit of 200/second. The RRL scheme happily defends against
attack rates as low as 5.
Consider running RRL and NSD mechanisms on a TLD server. A
single big recursive client can easily need 200 different names
per second, especially after being restarted. RRL will count
the distinct names separately and see no false positives even
with a limit of 10/second, but the NSD mechanism will see them
as a single attack stream and make all but the first 200 false
positives. At best the NSD mechanism will drop 50% of those
legitimate requests and cause timeouts and retransmissions and
convert the other 50% to TCP.
Testing the false negative difference seems difficult, but it is
easy to see the false positive difference.
- create this shell script in /tmp/foo
dig +short +time=1 www.example.com A @ns.example.com >/dev/null &
- start 5 c or tcsh shells and start this command in each of them:
repeat 100000 /tmp/foo
- run this command
dig +short www.example.com AAAA @ns.example.com
Replace "www.example.com" and "ns.example.com" so that you are asking
authoritative servers with RRL and NSD rate limiting.
The RRL scheme will not even pause on the AAAA request.
The NSD scheme wil either fail to detect the attack because the 5
`repeat` loops aren't fast enough to send 200 qps or almost always
hiccup and pause for seconds and then fall back to TCP for the AAAA request.
Vernon Schryver vjs at rhyolite.com
More information about the ratelimits