[ratelimits] Why not use chaining?

Tue Mar 19 14:28:20 UTC 2013

So, some news in our implementation, hopefully for the good.
I've been reluctant to use external chaining, so I poked around and
implemented Hopscotch hashing scheme (
http://people.csail.mit.edu/shanir/publications/disc2008_submission_98.pdf
).
It is internally chained and requires just 4B per bucket for hop
bitvector. Each bucket can contain H different keys
(H=sizeof(unsigned)*8, 32 in most platforms).
Unlike external chaining, it can't contain unlimited amount of buckets
but it is proven, that the probability of having more than H keys in a
bucket is 1/(H!) which is acceptable to me.
Even if that happens (load factor close to 1), it is handled as before
- bucket is shared for a while and given some extra tokens. Long story
short, it works quite well up to a load factor of 0.9 and
lookup/insert time is constant.

Also, to make it more BIND9 RRL compliant, we map buckets to tuple
<cls,imputed(qname),netblock> where cls is more/less the same as
errorstatus mandated by the memo. Imputed qname is hashed using our
default hash function (murmurhash3) as it works quite well I didn't
implement any specific hash function.

The default table size is now set to 393241 which is well enough up to
~ 350k different r/s.

There is still a minor difference in cls/errorstatus mandated by the
memo, but other than that,
it should behave very similar to BIND9 RRL.

Updated RRL will be present in the next release (candidate perhaps).

Cheers,
Marek

On 6 March 2013 00:41, Vernon Schryver <vjs at rhyolite.com> wrote:
> I've seen neither the Knot nor the NSD RRL code, but I really wonder
> why neither uses keys and chaining.
>
>
> Consider that the NSD RRL code is said to use a fixed size hash
> table of default 1,000,000 entries in 20 MByte.
>
> If each entry were increased to contain 4 bytes of compressed (hashed)
> qname and four 20-bit hash table indeces, then the code could easily
> use hashing with internal chaining.  That would only increase the
> default size of the table to 34 MBytes.
>
> However, by using chaining, you could significantly reduce the number
> of entries, because collisions would not matter except to lookup speed.
> For a target query rate of 100K qps and the familiar hash table load
> factor of 0.9 to get an average lookup rate of 1.1 or about 10%
> collisions, you could use a table of only 110K or 120 entries or less
> than 4 MBytes.  That is only 20% of the current default 20 MBytes and
> it escapes all of the the false positive or false negative problems
> of collisions in the current implementations.
>
> Two of the 20-bit indeces would be used to double link entries into a
> hash chain.  The other two 20-bit indeces would be used for an LRU
> list of entries.  It would use 20-bit indeces instead of pointers to
> save space, especially on 64-bit systems where pointers want 64 bits.
>
> I would be happy to outline a suitable variation of the internal
> chaining hashing I've used in many applications including a commercial
> database decades ago and DCC in this century.  It's simple and only
> about as messy as the BIND9 scheme.  It would differ from the BIND9
> RRL code, which uses external chaining.  The BIND9 RRL code is
> (perhaps prematurely and unnecessarily) optimized to dynamically
> size its hash table instead of using a fixed size as in the NSD RRL
> code.  I think internal chaining would require temporarily almost
> doubling the memory footprint during expansion.  I can't guarantee
> the idea is free of patent trolls, but the first time I saw and
> used the idea was in the late 1970's.
>
>
> Vernon Schryver    vjs at rhyolite.com
> _______________________________________________
> ratelimits mailing list
> ratelimits at lists.redbarn.org
> http://lists.redbarn.org/mailman/listinfo/ratelimits