[ratelimits] ratelimiting /24 for <tld>

Fri Jul 6 20:53:49 UTC 2012

> From: Tony Finch <dot at dotat.at>

> hits is for recursive service, but if I understand the logs correctly it
> is using the TLD rather than the QNAME as the hash table key. For example,
>
> 05-Jul-2012 17:02:12.352 rate-limit: info: client 127.0.0.1#45990 (google.com): rate limiting /24 for com
> 05-Jul-2012 17:02:13.772 rate-limit: info: client 127.0.0.1#49967 (google.com): rate limiting /24 for com
> 05-Jul-2012 17:02:50.666 rate-limit: info: client 127.0.0.1#61454 (google.com): rate limiting /24 for com
> 05-Jul-2012 17:02:56.856 rate-limit: info: client 127.0.0.1#47649 (google.com): rate limiting /24 for com
> 05-Jul-2012 17:03:45.339 rate-limit: info: client 127.0.0.1#33859 (feeds.feedburner.com): rate limiting /24 for com
> 05-Jul-2012 17:03:55.705 rate-limit: info: client 127.0.0.1#60883 (feeds.feedburner.com): rate limiting /24 for com
> 05-Jul-2012 17:04:28.586 rate-limit: info: client 127.0.0.1#32788 (feeds.feedburner.com): rate limiting /24 for com

> This looks like a bug to me. Should query_find() be passing
> client->query.qname rather than fname to dns_rrl(), perhaps?

It is an intentional characteristic.  I think is a valuable feature
instead of a bug, because it is catches random name attacks.

It does not use the TLD instead of the QNAME to find the database
entry to count the response, but the closest name to the query name
that BIND can find in the authoritative zones and cache.  When the
name server is authoritative, this is exactly what is needed.  If
the key were the QNAME, then there would be no rate limiting on
<random>.example.com given a wildcard *.example.com.

When the name server is providing recursive service, one can argue
either way.  A big argument against using fname for rate limiting
on a recursive server is that the found name will vary.  After
recusion has finished, the found name will be the QNAME, and so
rate limiting might get a different result.

A counter-argument is that when rate limiting must be done before
recursing, a recursive server cannot know if the answer will be
records or an error.  A recursive server can never know if records
are generated from a wild card on the authoritative server.  If the
recursive server counted QNAMEs instead of what names it knows about
and if the authoritative server also uses rate limiting, then a bad
guy could get the recursive server rate limited by the authoritative
server by requesting a lot of <random>.example.com names.
For example, if Google used rate limiting based on requests instead
of responses, a bad guy could get your server blocked at Google by
making a lot of <random>.google.com requests.

This complication is one of the reasons that the official line
is that this mechanism is not suitable for recursive servers.

In this particular case, why is your recursive server rate limiting
requests for google.com?  What application made so many approximately
simultaneous requests for google.com at 17:02:12 that it got rate limited?   
Why wasn't google.com in the cache 1.4 and 38 seconds later?
What "window" and "response-rate" are you using?

Vernon Schryver    vjs at rhyolite.com