From each at isc.org Fri Sep 11 20:37:14 2015 From: each at isc.org (Evan Hunt) Date: Fri, 11 Sep 2015 20:37:14 +0000 Subject: [dnstap] dnstap with auth/recursive servers Message-ID: <20150911203714.GB87680@isc.org> Greetings, I'm working on a BIND implementation of dnstap (targeted for BIND 9.11.0, early 2016), and have run into a problem. How should I differentiate between AUTH_{QUERY,RESPONSE} and CLIENT_{QUERY,RESPONSE} when the server is configured to be both authoritative and recursive? If a query arrives with RD=1, I can log it as a CQ, but then it might be answered authoritatively, in which case I might log it as AR, but it seems strange for the query and response to be unbalanced like that. I could postpone logging the query until I've determined whether we have an authoritative answer, but by that time I'd already be sending a response, and AQ and AR messages would be emitted almost simultaneously. It seems the best solution is would be to log all RD=1 queries as CQ and their responses as CR, and all RD=0 queries as AQ and their responses as AR, and to extend the CR message to indicate whether the response was authoritative. A suggestion was already made in http://lists.redbarn.org/pipermail/dnstap/2015-February/000017.html to extend the CR message to differentiate between cache hits and misses. I'd like to piggyback on that suggestion, and propose this, to be added as an optional field in the Message type. enum DataSource { // all data used to generate this response // are from local authoritative sources. AUTH_DATA = 1; // this response was generated from a // cache of previously-sent whole DNS responses. MESSAGE_CACHE = 2; // this response was generated by consulting // a cache of DNS records, but without sending // iterative queries RECORD_CACHE = 3; // at least one iterative query was sent in // the construction of this response RECURSION = 4; }; Thoughts? -- Evan Hunt -- each at isc.org Internet Systems Consortium, Inc. From edmonds at mycre.ws Sat Sep 12 01:21:51 2015 From: edmonds at mycre.ws (Robert Edmonds) Date: Fri, 11 Sep 2015 21:21:51 -0400 Subject: [dnstap] dnstap with auth/recursive servers In-Reply-To: <20150911203714.GB87680@isc.org> References: <20150911203714.GB87680@isc.org> Message-ID: <20150912012151.GA24170@mycre.ws> Hi, Evan: This email is a bit long, sorry about that. I try to go into some detail below about what I was thinking when originally developing the dnstap schema. Thanks for making me write this down. Evan Hunt wrote: > I'm working on a BIND implementation of dnstap (targeted for BIND 9.11.0, > early 2016), and have run into a problem. How should I differentiate > between AUTH_{QUERY,RESPONSE} and CLIENT_{QUERY,RESPONSE} when the server > is configured to be both authoritative and recursive? > > If a query arrives with RD=1, I can log it as a CQ, but then it might > be answered authoritatively, in which case I might log it as AR, but it > seems strange for the query and response to be unbalanced like that. This is a good question, and one that hasn't come up before in previous server implementations of dnstap in Unbound and Knot, since Unbound is caching/forwarding only, and Knot is authoritative only. There isn't a really good reason to enforce that the query and its corresponding response be "paired" in terms of Message.Type values (other than symmetry, I guess). Adopting Joe's "message_tag" proposal might make it slightly easier to locate a query/response pair from a dnstap log. How "malleable" is the runtime configuration of BIND with regard to whether authoritative, recursive, or mixed mode service is being provided? (IIRC, weren't there some rndc "addzone" and "delzone" commands added at some point?) Your hypothetical here is a server that's been configured for mixed-mode service. What about the other two cases, where a server is configured only for recursive service, or only for authoritative service? Is there a global variable that indicates whether the server has been configured for recursive-only vs authoritative-only service? (That is, is it straight forward for BIND to make good use of the AUTH_QUERY and CLIENT_QUERY values when it's not running in mixed mode?) > I could postpone logging the query until I've determined whether we have > an authoritative answer, but by that time I'd already be sending a > response, and AQ and AR messages would be emitted almost simultaneously. Yeah, ideally a DNS server should emit its dnstap log messages as early as possible (but in the case of responses, *after* the response has been sent, because logging should take a secondary priority to providing name service). For instance, the Unbound dnstap implementation generates CQs before even doing basic sanity and ACL checks on the message, but this is because we can make the simplification that all inbound queries processed by Unbound will be marked as CLIENT_QUERYs. But in BIND's case, you might end up traversing a fair amount of data structures before being able to determine how the query should be classified, right? That strikes me as less than optimal, but as long as you can emit the log message without waiting on cache misses to be filled, it seems that it would still be desireable to be able to accurate classify the inbound query. > It seems the best solution is would be to log all RD=1 queries as CQ and > their responses as CR, and all RD=0 queries as AQ and their responses as > AR, and to extend the CR message to indicate whether the response was > authoritative. Hm, so, I intentionally tried to not define the Message.Type enum's *_QUERY values based solely on the RD bit in the query message, because of the corner cases: (1) A recursive-only server receiving an RD=0 query is processing a "cache snooping" request. It might answer from cache without performing recursion, or REFUSE it based on policy (e.g. Unbound without the "allow_snoop" ACL set), etc. A mixed-mode server might also process these queries via the cache if it doesn't match an authoritative zone, too. So, it shouldn't be classified as an AUTH_QUERY based solely on the RD bit, because it's not necessarily being processed as if it were a request for authoritative service. (2) An authoritative-only server receiving an RD=1 query is processing... well, I don't think there's a cute name for it, but you usually get back a response without the RA bit set that's identical to what you would have received if the RD bit were cleared. (The most common cause of this is probably people running something like "dig @ ..." without setting +norec, because after all, it still works even if you don't set +norec, right?) So, it shouldn't be classified as a CLIENT_QUERY based solely on the RD bit, because it's not being processed as if it were a recursion-desired query. I originally thought of each Message.Type value as representing a unique code site inside the nameserver implementation, and corresponding to separate dnstap logging config knobs that could be independently enabled or disabled. (So, for instance, suppose you were interested in passive DNS replication. You could enable logging RESOLVER_RESPONSE's but leave RESOLVER_QUERY's disabled, since the query is largely redundant for that use case, anyway.) This concept of Message.Type values corresponding to specific code sites broke down a bit when I actually implemented dnstap in Unbound and found that the same code paths were used for both RESOLVER_* and FORWARDER_*, and there wasn't a good way to distinguish between the two cases, other than by actually inspecting the RD bit [0,1] of the query that Unbound was sending out. I think it's OK to make this classification (compared to the corner cases above) because the specific RD bit being inspected here is always under the control of the server and it's correct 100% of the time; there aren't any corner cases, AFAIK. [0] https://github.com/jedisct1/unbound/blob/cbe0bdb67691fb8bfa9fa869e1da61389479c150/dnstap/dnstap.c#L420-L429 [1] https://github.com/jedisct1/unbound/blob/cbe0bdb67691fb8bfa9fa869e1da61389479c150/dnstap/dnstap.c#L471-L480 I think we should try to accurately classify the response messages (AR vs QR) according to how they're actually processed in the server, and not based on what the query header bits look like. So I think I'm leaning towards recommending postponing AQ/CQ logging until you know A vs C, or possibly introducing an indeterminate "QUERY" type that just represents a generic query received by a responder. > A suggestion was already made in > http://lists.redbarn.org/pipermail/dnstap/2015-February/000017.html > to extend the CR message to differentiate between cache hits and misses. > I'd like to piggyback on that suggestion, and propose this, to be added > as an optional field in the Message type. > > enum DataSource { > // all data used to generate this response > // are from local authoritative sources. > AUTH_DATA = 1; > > // this response was generated from a > // cache of previously-sent whole DNS responses. > MESSAGE_CACHE = 2; > > // this response was generated by consulting > // a cache of DNS records, but without sending > // iterative queries > RECORD_CACHE = 3; > > // at least one iterative query was sent in > // the construction of this response > RECURSION = 4; > }; > > Thoughts? Is that a replacement for the original CacheStatus enum in the message you reference? Where did the "cache miss" value go? -- Robert Edmonds From each at isc.org Sat Sep 12 06:55:48 2015 From: each at isc.org (Evan Hunt) Date: Sat, 12 Sep 2015 06:55:48 +0000 Subject: [dnstap] dnstap with auth/recursive servers In-Reply-To: <20150912012151.GA24170@mycre.ws> References: <20150911203714.GB87680@isc.org> <20150912012151.GA24170@mycre.ws> Message-ID: <20150912065548.GA91292@isc.org> On Fri, Sep 11, 2015 at 09:21:51PM -0400, Robert Edmonds wrote: > There isn't a really good reason to enforce that the query and its > corresponding response be "paired" in terms of Message.Type values > (other than symmetry, I guess). Adopting Joe's "message_tag" proposal > might make it slightly easier to locate a query/response pair from a > dnstap log. Fair point, but if I've configured dnstap to log AUTH traffic but not CLIENT, or vice versa, then we might end up logging some AUTH responses and leaving out the corresponding queries, which seems nonoptimal. Hmmm. On the other hand, I *could* make this problem go away by not allowing such a configuration. If logging CLIENT automatically includes AUTH, I doubt very many people would be disappointed. (I suspect the same would be true if I had RESOLVER automatically include FORWARDER, for that matter...) > How "malleable" is the runtime configuration of BIND with regard to > whether authoritative, recursive, or mixed mode service is being > provided? (IIRC, weren't there some rndc "addzone" and "delzone" > commands added at some point?) Yes. And even without addzone/delzone, a server could be reconfigured to add zones any time. > Your hypothetical here is a server that's been configured for mixed-mode > service. What about the other two cases, where a server is configured > only for recursive service, or only for authoritative service? Is there > a global variable that indicates whether the server has been configured > for recursive-only vs authoritative-only service? (That is, is it > straight forward for BIND to make good use of the AUTH_QUERY and > CLIENT_QUERY values when it's not running in mixed mode?) Basically, you can't trust a recursive server not to have any authoritative data -- if nothing else, most of them have built-in empty zones for RFC1918 reverse lookups. However, you can configure a server to refuse recursion and cache lookups, in which case all traffic (except as permitted by the relevant ACLs) would be authoritative, and AUTH_QUERY would always be appropriate. > I think we should try to accurately classify the response messages (AR > vs QR) according to how they're actually processed in the server, and > not based on what the query header bits look like. So I think I'm > leaning towards recommending postponing AQ/CQ logging until you know A > vs C, or possibly introducing an indeterminate "QUERY" type that just > represents a generic query received by a responder. Hm, that's promising. There's no difference in content (other than message type) between AUTH QUERY and CLIENT QUERY anyway, is there? Another thought, though, is that the AUTH_RESPONSE could have the same DataSource enum I suggested for CLIENT_RESPONSE, and then we'd be able to see if an RD=0 query was answered via cache snooping. > Is that a replacement for the original CacheStatus enum in the message > you reference? Where did the "cache miss" value go? I renamed it "recursion" because "cache miss" isn't a data source. -- Evan Hunt -- each at isc.org Internet Systems Consortium, Inc.