[ratelimits] CH/TXT/id.server queries rate-limited

Fri Oct 26 02:35:42 UTC 2012

> From: Jay Daley <jay at nzrs.net.nz>

> > It looks like rate-limits kick in for this network, but two or three
> > seconds later the rate-limit is removed. There are many such log lines,
> > showing the limit being removed around two seconds after it is first
> > applied. I thought the limit would be enforced for at least "window"
> > seconds. Have I misunderstood something?
>
> This is a common characteristic of token bucket rate limiting systems
> where the frequency of events being rate-limited is only just above
> the limit and the alignment of those events with the boundaries of the
> token bucket window is not consistent.  For example if 100 queries are
> sent in 10 seconds, with a limit of 10 qps then you often see a pattern
> of queries per bucket similar to 11/9/11/9/11/9/11/9 and so the rate
> limiting goes on/off/on/off/on/off/on/off.

One problem with that diagnosis is that it assumes many DNS client
address blocks are running at 10 qps.  Why that particular rate?
There's no cost to a bad guy in modestly overshooting the rate
limit.  The best tactic is send to plenty of qps to be sure to fill
the rate limit quota, but not so many that too many of your bots
are found and cured.

Instead I think two other things are going on.

The "stop limiting" messaages should only happen when either
  - 60 seconds after rate limiting stops
  - or there is a shortage of rate-limit database entries and fairly recent
      entries must be recycled.
Recycling prefers all entries older than 1 second that do not need
need a "stop limiting" message when they are recycled.  So unless
there's a bug, I think the DNS server is seeing significantly more
distinct IP address blocks per second than the 40000 supported by the
"max-table-size 40000;" clause.

In other words, even if fence-posting is happening, it should not
be visible in the log messages.

Second, more or less legitimate DNS clients stutter naturally when
rate limited.  When they are rate limited, they pause and generally
try to slow down until they are not rate limited.  Then they speed
up and hit the limit again.  You can see this in `repeat XXX dig ....`.

} From: Jay Daley <jay at nzrs.net.nz>

} When I've previously implemented rate limiting systems I've found
} that miscreants get to learn the limits and adjust accordingly. 

Yes, that is a necessary basic assumption that is too often ignored
in favor of neat ideas.

} For example, if I wished to launch a reflection attack then I would not
} be bothered by a server running a rate limit of 10qps, I would just
} find 1000 such servers and use all of them at once.  With the current
} RRL implementation I could happily run this attack for hours or even
} days.

On the contrary, whether you use 1 or 1,000,000 sources for your DNS
amplified reflection DoS attack, all of your packets will have the
same forged IP source addresses in the address block of your intended
victim and so will use a single bucket in the BIND9 RRL patch.

Please remember that the BIND9 RRL patch is not intended to defend
against a DoS attack on a DNS server, although it might in some cases.
Instead, the BIND9 RRL patch is intended to keep a DNS server from
being used to attack a third party in a DNS amplified reflection attack.

Both "amplified" and "reflection" are important aspects of the design
goal.  Unless you set "slip 0;" the BIND9 RRL patch will reflect or
send about 50% as many bits to the victim as the bad guy sends to the
DNS server.  With the default "slip 2;", the patch reduces the "gain"
of a DNS server in an amplified reflection attack to much less than
1.0 and so makes a DNS server uninteresting to bad guys.

It is easy to imagine scenarios in which a reflection attack with
a gain of less than 1.0 might be useful, but there other protocols
with higher gains DNS with the BIND9 RRL patch.  For example,
consider TCP/IPv6 SYN to port 80.

} Another benefit of a more general approach like this is that everybody
} does it differently, which makes it far harder for miscreants to
} predict cumulative behaviour and use cumulative behaviour to their
} advantage.

It's vital to address the problem at hand, amplified reflection DoS,
instead of other problems, such DoS against DNS servers.

Second, code diversity is good, but people running big (i.e. multi-box)
DNS servers have made their strong preference clear.  There must be a
useful intersection of the parameters and features of all amplified
reflection DoS rate limiting implementations at a site.

Vernon Schryver    vjs at rhyolite.com