[ratelimits] resource bounding and the question of referral rate limiting

Thu Jan 10 22:46:39 UTC 2013

we've been talking about rate limiting which is a subclass of the
general topic of resource bounding. requests are smaller than requests,
and require less processing to generate and receive than queries do.
assuming a symmetric connection speed (same uplink as downlink
bandwidth) and symmetric link loading by other non-dns services that use
that link, there is a limit as to how many queries you can usefully
respond to before your responses will congest your outbound path or your
cpu or both. above that limit, more queries can be carried, but uselessly.

this is due to the connectionless nature of udp. with tcp this does not
happen, because in non-parallel non-attack scenarios the rate at which
queries can be launched is limited by the number of responses that
actually make it back. (this is sometimes called ACK-timing).

what this means is, it is itinerant upon every requestor to rate limit
their dns requests sent to the same server. you can see an early
prototype of this thinking in late BIND4 (carried into BIND8) by which
the number of parallel requests made for SOA RRs by the zone management
code had to be limited or else the flood of outgoing SOA queries toward
primary name servers would crash upon some link bottleneck and mostly
just die, wasting bandwidth, costing time. i argue that this kind of
request throttling should be used by all dns requestors capable of
sending more than one query to a given server at the same time.

let me repeat for emphasis: the proper place for the rate limiting
burden is the requestor. (all requestors.)

DNS RRL simulates various conditions in which queries were lost or
responses would not fit in a packet, and preserves correct behaviour for
the statistically vast set of legitimate clients whose traffic might be
mixed in with a spoofed-source DDoS. precious few failures can be caused
by the RRL logic. the statistically likely worst case scenario is that a
legitimate client may have to retry with udp, or retry with tcp.

a legitimate (non-spoofed) client who fails to carry their burden of
rate limiting, will run into the RRL logic, and will be forced to slow
down. that's reasonable behaviour by any standard, since there won't be
(statistically speaking) failures.

if some set of operators only turns on response rate limiting when they
hear a complaint, or only turn on referral rate limiting when they hear
a complaint, then the permanent floating global resource pool available
to spoofed-source attackers is too loud. in other words, such a policy
would be a public health hazard, no matter how well intended, and no
matter how little harm is done by any one operator.

so:

resource use will be bounded, either by hardware limitations, software
limitations, or policy. correctness requires that we do it with policy,
and that policy is a primary burden for dns requestors. if responders
have to use policy to overcome the lack of policy by requestors, then
that policy will be low resolution ("somewhat ham handed"). that's not
the operator's fault, nor RRL's fault.

paul