[ratelimits] Why not use chaining?
marek.vavrusa at nic.cz
Tue Mar 19 14:28:20 UTC 2013
So, some news in our implementation, hopefully for the good.
I've been reluctant to use external chaining, so I poked around and
implemented Hopscotch hashing scheme (
It is internally chained and requires just 4B per bucket for hop
bitvector. Each bucket can contain H different keys
(H=sizeof(unsigned)*8, 32 in most platforms).
Unlike external chaining, it can't contain unlimited amount of buckets
but it is proven, that the probability of having more than H keys in a
bucket is 1/(H!) which is acceptable to me.
Even if that happens (load factor close to 1), it is handled as before
- bucket is shared for a while and given some extra tokens. Long story
short, it works quite well up to a load factor of 0.9 and
lookup/insert time is constant.
Also, to make it more BIND9 RRL compliant, we map buckets to tuple
<cls,imputed(qname),netblock> where cls is more/less the same as
errorstatus mandated by the memo. Imputed qname is hashed using our
default hash function (murmurhash3) as it works quite well I didn't
implement any specific hash function.
The default table size is now set to 393241 which is well enough up to
~ 350k different r/s.
There is still a minor difference in cls/errorstatus mandated by the
memo, but other than that,
it should behave very similar to BIND9 RRL.
Updated RRL will be present in the next release (candidate perhaps).
On 6 March 2013 00:41, Vernon Schryver <vjs at rhyolite.com> wrote:
> I've seen neither the Knot nor the NSD RRL code, but I really wonder
> why neither uses keys and chaining.
> Consider that the NSD RRL code is said to use a fixed size hash
> table of default 1,000,000 entries in 20 MByte.
> If each entry were increased to contain 4 bytes of compressed (hashed)
> qname and four 20-bit hash table indeces, then the code could easily
> use hashing with internal chaining. That would only increase the
> default size of the table to 34 MBytes.
> However, by using chaining, you could significantly reduce the number
> of entries, because collisions would not matter except to lookup speed.
> For a target query rate of 100K qps and the familiar hash table load
> factor of 0.9 to get an average lookup rate of 1.1 or about 10%
> collisions, you could use a table of only 110K or 120 entries or less
> than 4 MBytes. That is only 20% of the current default 20 MBytes and
> it escapes all of the the false positive or false negative problems
> of collisions in the current implementations.
> Two of the 20-bit indeces would be used to double link entries into a
> hash chain. The other two 20-bit indeces would be used for an LRU
> list of entries. It would use 20-bit indeces instead of pointers to
> save space, especially on 64-bit systems where pointers want 64 bits.
> I would be happy to outline a suitable variation of the internal
> chaining hashing I've used in many applications including a commercial
> database decades ago and DCC in this century. It's simple and only
> about as messy as the BIND9 scheme. It would differ from the BIND9
> RRL code, which uses external chaining. The BIND9 RRL code is
> (perhaps prematurely and unnecessarily) optimized to dynamically
> size its hash table instead of using a fixed size as in the NSD RRL
> code. I think internal chaining would require temporarily almost
> doubling the memory footprint during expansion. I can't guarantee
> the idea is free of patent trolls, but the first time I saw and
> used the idea was in the late 1970's.
> Vernon Schryver vjs at rhyolite.com
> ratelimits mailing list
> ratelimits at lists.redbarn.org
More information about the ratelimits