[ratelimits] Rate-limiting in NSD

Tue Nov 6 13:10:37 UTC 2012

Hi,

[Vernon and Wouter continued this discussion off list, it would be good
to share our view also in public]

On 10/14/2012 03:49 AM, Vernon Schryver wrote:
>> From: Stephane Bortzmeyer <bortzmeyer at nic.fr>
> 
>> http://www.nlnetlabs.nl/blog/2012/10/11/nsd-ratelimit/
> 
> That web page raises some questions: - Is hash collision resolution
> the standard quadratic probing that it sounds like?

No, the hash collision resolution described in the blog has nothing to
do with the quadratic probing scheme. What we do when a collision is
encountered, is overwrite the bucket IP source with the new source and
initialization of the rate counter on it.

In other words, if the source is not the same, the possible false
positive response is not blocked. Instead, the rate counter is
reinitialized, allowing the attacker another 200 queries to reflect (or
whatever you have configured for "rrl-ratelimit:").

With this algorithm, we try to prevent false positives as much as
possible. If collisions are sporadic, this shouldn't be a big problem.
If collisions happen often, an attack might not be stopped properly.

A false positive can still occur, if the source is the same as of the
attack. Therefor, we also implemented the SLIP technique.

> - The computation of the probability of hash table collisions differs
> from the standard computation as a function of load factor (e.g.
> Knuth).  It would be good to have an estimate of false negatives
> based on that standard computation.

Vernon calculated the probability of collision with evil streams is (1 -
(1-B/S)^R). This seems to be right, the more attacks there are, the
bigger the chance a collision occurs in NSD-RRL.

> - How do you detect and count false positives?  Are all discarded 
> responses counted and optionally logged?

We do have logging, but we still need to take a good look at it. We do
not want to log all discarded responses, this appears to be a too
verbose approach (It is also possible to run the NSD-RRL without logging).

> There is a major difference between the NSD mechanism and RRL:
> 
> - RRL limits identical non-error responses by (IP,qname,qtype) while 
> the NSD mechanism limits non-error responses almost entirely by IP
> address.  The NSD scheme would be more accurately described as client
> rate limiting instead of response rate limiting.

NSD-RRL uses the source IP, a domain name and the type of response for
classification. The value of the domain name depends on the type of
response. For positive answers, this is the QNAME. For delegations this
is the delegation point name.

The bucket contains the source netmask and rate counters and timestamps.
The bucket is identified by hash(source netmask, domain name, type of
response). The entry source netmask is checked, and if it is different,
the collision is detected. Other collisions go undetected. In other
words, multiple attacks or legit queries that come from the *same
network* may result in false negatives (attack not blocked) and false
positives (legit query blocked).

> Again: a mechanism that counts all queries from a source with all 
> valid, non-wildcard names and any normal record type with a single 
> counter might be a good idea, but it is not *response* rate
> limiting.
> 
> I think that difference will yield significant differences in
> practice.
> 
> - The RRL scheme has no false negatives, but the NSD scheme does. The
> NSD counter for an attack stream can be reset by hash collisions with
> other queries.  The colliding queries need happen only once for every
> (rrl-ratelimit-1) attack queries.  Given a probability of collision
> X, the default rrl-ratelimit=200, and hand waving about uniformity,
> then the probability of a false negative with a rate of 200
> legitimate queries/second P=1-(1-X)^(199). (I'll not take the next
> step with X=0.001 from the web page because I think that number is
> wrong.)
> 
> - The RRL scheme has false positives only when the reflection attack 
> victim requsts the same type and name as are being forged.  For 
> example, while the forged stream is `dig +dnssec www.isc.org A` the
> victim's `dig +dnssec www.isc.org AAAA` are not affected by the RRL
> scheme.
> 
> With the NSD scheme, the victim must rely on the 'slip' statistics 
> because the victims legitimate requests for names other than the 
> attack name are not distinguished from the attack name. The NSD web
> page does not mention the NSD slip rate, so assume that it is the
> same as the default RRL "slip 2;".  If the victim tries and retries
> 'www.isc.org AAAA' a total of 5 time, then the probability of of
> 'slip' saving the day is 1-0.5^4=0.93. That's not bad but it differs
> significantly from the 1.0 probability of success for RRL for
> distinct names or types.

The false negatives may become a problem when there are many different
attacks. If this in practice becomes a problem, we need to implement
something more delicate, at the cost of some additional bytes.

About the QTYPE: I don't know how the attacks look like. An attack could
also cover multiple types. In general: I don't know whether being
delicate or being coarse will provide better limiting.

> - The default rate limit for the RRL scheme is 10/second. I suspect
> false positives are why the NSD scheme has a default limit of
> 200/second.  The RRL scheme happily defends against attack rates as
> low as 5.

You can configure the limit. Default it is 200. This should not be an
amount that you have to worry about, while we do catch the big fish. But
if you want, you can configure 5. It would be good to hear from people
that try out NSD-RRL, what they configure and their motivation for that
number. We are thinking of setting the default to 100, which seems to be
more reasonable given the attacks we have seen.

> Consider running RRL and NSD mechanisms on a TLD server.  A single
> big recursive client can easily need 200 different names per second,
> especially after being restarted.  RRL will count the distinct names
> separately and see no false positives even with a limit of 10/second,
> but the NSD mechanism will see them as a single attack stream and
> make all but the first 200 false positives.  At best the NSD
> mechanism will drop 50% of those legitimate requests and cause
> timeouts and retransmissions and convert the other 50% to TCP.

Unlikely, because the names are used to create the hash.

> Testing the false negative difference seems difficult, but it is easy
> to see the false positive difference.
> 
> - create this shell script in /tmp/foo #!/bin/sh dig +short +time=1
> www.example.com A @ns.example.com >/dev/null &
> 
> - start 5 c or tcsh shells and start this command in each of them: 
> repeat 100000 /tmp/foo
> 
> - run this command dig +short www.example.com AAAA @ns.example.com
> 
> Replace "www.example.com" and "ns.example.com" so that you are
> asking authoritative servers with RRL and NSD rate limiting.
> 
> The RRL scheme will not even pause on the AAAA request. The NSD
> scheme wil either fail to detect the attack because the 5 `repeat`
> loops aren't fast enough to send 200 qps or almost always hiccup and
> pause for seconds and then fall back to TCP for the AAAA
request.

If you are asking for the same qname, then yes. But your assumption
above is that the recursive client asks different names.

Best regards,
  Matthijs

> 
> 
> Vernon Schryver    vjs at rhyolite.com 
> _______________________________________________ ratelimits mailing
> list ratelimits at lists.redbarn.org 
> http://lists.redbarn.org/mailman/listinfo/ratelimits
> 

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 551 bytes
Desc: OpenPGP digital signature
URL: <http://lists.redbarn.org/pipermail/ratelimits/attachments/20121106/141eb7f5/attachment.pgp>