[ratelimits] Remarks regarding the Knot DNS 1.2.0 RRL implementation

Wed Mar 6 16:23:40 UTC 2013

> From: =?UTF-8?Q?Marek_Vavru=C5=A1a?= <marek.vavrusa at nic.cz>

> But back to the original scenario, I have made a quick test to measure
> number of collisions using the exact same query (QNAME=test.rrl.) from
> N=100 000 random IPs (uniform dist.) and checked the number of
> collisions. N is chosen as an expected response rate, since collision
> in a >=1sec. inactive bin isn't really a collision just bucket reuse.
> There were ~3400 collisions in 100 000 different IPs (SD~52.2, 1000
> test runs). That is about p=0.034 which is reasonable for me
> (distribution attached).
> Yes, there can be other cases, but if they are unpredictable and seed
> changes over time,

Some random comments of no particular importance:

  - IP addresses are not uniformly distributed in [0,4294967295]
     so a uniform selection from [0,4294967295] is not entirely
     representative.

  - That the seed varies and so the collisions are not reproducable
     is a problem.  Unreproducable failures are the worst kind.

  - You don't mention the size of the hash table.  Collisions in good
     hash schemes depend strongly on load factor.

  - You didn't mention the query rate modeled by 100,000 random IPs.

  - Instead of a probabilty of collision for 100,000 IPs, we need
     the false positive or negative probability.  I think that is
     related to the expected value of the number of collisions per
     second given a query rate of Q and a hash table size of N.

>                                                 we would be willing to
> implement the internal chaining you proposed, similar to open
> addressing. It is a reasonable compromise, yes.

I honestly do not understand why any RRL implementation does not use
some sort of chaining.  It both saves memory and prevents failures.

Vernon Schryver    vjs at rhyolite.com