[ratelimits] Remarks regarding the Knot DNS 1.2.0 RRL implementation

Wed Mar 6 17:07:02 UTC 2013

On 6 March 2013 17:23, Vernon Schryver <vjs at rhyolite.com> wrote:
>> From: =?UTF-8?Q?Marek_Vavru=C5=A1a?= <marek.vavrusa at nic.cz>
>
>> But back to the original scenario, I have made a quick test to measure
>> number of collisions using the exact same query (QNAME=test.rrl.) from
>> N=100 000 random IPs (uniform dist.) and checked the number of
>> collisions. N is chosen as an expected response rate, since collision
>> in a >=1sec. inactive bin isn't really a collision just bucket reuse.
>> There were ~3400 collisions in 100 000 different IPs (SD~52.2, 1000
>> test runs). That is about p=0.034 which is reasonable for me
>> (distribution attached).
>> Yes, there can be other cases, but if they are unpredictable and seed
>> changes over time,
>
> Some random comments of no particular importance:
>
>   - IP addresses are not uniformly distributed in [0,4294967295]
>      so a uniform selection from [0,4294967295] is not entirely
>      representative.

Yes, I do not deny that. If anyone gives me reasonable set, I'll be
happy to rerun the tests.

>   - That the seed varies and so the collisions are not reproducable
>      is a problem.  Unreproducable failures are the worst kind.

Seed doesn't vary during the run, can be set to a fixed value, but as
you see from the the SD,
it doesn't affect the results much.

>   - You don't mention the size of the hash table.  Collisions in good
>      hash schemes depend strongly on load factor.

I used the default table size (prime near 1.5M), it scales
proportionally to the load factor,
p for size=1M is around 0.05, for size=500k around 0.20.

>   - You didn't mention the query rate modeled by 100,000 random IPs.

I stated that the test simulates 1 second in which 100k random IPs
with the same question query RRL.
Therefore it simulates query rate of 100k pps. As the bucket rate is
renewed every second, it is valid
to assume that the number of collisions within 1 second will be
similar to the number of collisions over a longer period of time.

>   - Instead of a probabilty of collision for 100,000 IPs, we need
>      the false positive or negative probability.  I think that is
>      related to the expected value of the number of collisions per
>      second given a query rate of Q and a hash table size of N.

Yes, it would be interesting to have more test cases, I just
implemented the discussed
scenario you hinted in the previous emails, nothing less nothing more.

>
>>                                                 we would be willing to
>> implement the internal chaining you proposed, similar to open
>> addressing. It is a reasonable compromise, yes.
>
> I honestly do not understand why any RRL implementation does not use
> some sort of chaining.  It both saves memory and prevents failures.

Depends, we now use 16B per bucket, so...
But I digress, you stated how you feel about chaining, I learned
something from it,
if it is what our users want, then I'll probably implement it.

>
>
> Vernon Schryver    vjs at rhyolite.com
> _______________________________________________
> ratelimits mailing list
> ratelimits at lists.redbarn.org
> http://lists.redbarn.org/mailman/listinfo/ratelimits