From each at isc.org  Fri Sep 11 20:37:14 2015
From: each at isc.org (Evan Hunt)
Date: Fri, 11 Sep 2015 20:37:14 +0000
Subject: [dnstap] dnstap with auth/recursive servers
Message-ID: <20150911203714.GB87680@isc.org>

Greetings,

I'm working on a BIND implementation of dnstap (targeted for BIND 9.11.0,
early 2016), and have run into a problem.  How should I differentiate
between AUTH_{QUERY,RESPONSE} and CLIENT_{QUERY,RESPONSE} when the server
is configured to be both authoritative and recursive?

If a query arrives with RD=1, I can log it as a CQ, but then it might
be answered authoritatively, in which case I might log it as AR, but it
seems strange for the query and response to be unbalanced like that.

I could postpone logging the query until I've determined whether we have
an authoritative answer, but by that time I'd already be sending a
response, and AQ and AR messages would be emitted almost simultaneously.

It seems the best solution is would be to log all RD=1 queries as CQ and
their responses as CR, and all RD=0 queries as AQ and their responses as
AR, and to extend the CR message to indicate whether the response was
authoritative.

A suggestion was already made in
http://lists.redbarn.org/pipermail/dnstap/2015-February/000017.html
to extend the CR message to differentiate between cache hits and misses.
I'd like to piggyback on that suggestion, and propose this, to be added
as an optional field in the Message type.

        enum DataSource {
                // all data used to generate this response
                // are from local authoritative sources.
                AUTH_DATA = 1;

                // this response was generated from a 
                // cache of previously-sent whole DNS responses.
                MESSAGE_CACHE = 2;

                // this response was generated by consulting
                // a cache of DNS records, but without sending
                // iterative queries
                RECORD_CACHE = 3;

                // at least one iterative query was sent in
                // the construction of this response
                RECURSION = 4;
        };

Thoughts?

--
Evan Hunt -- each at isc.org
Internet Systems Consortium, Inc.

From edmonds at mycre.ws  Sat Sep 12 01:21:51 2015
From: edmonds at mycre.ws (Robert Edmonds)
Date: Fri, 11 Sep 2015 21:21:51 -0400
Subject: [dnstap] dnstap with auth/recursive servers
In-Reply-To: <20150911203714.GB87680@isc.org>
References: <20150911203714.GB87680@isc.org>
Message-ID: <20150912012151.GA24170@mycre.ws>

Hi, Evan:

This email is a bit long, sorry about that.  I try to go into some
detail below about what I was thinking when originally developing the
dnstap schema.  Thanks for making me write this down.

Evan Hunt wrote:
> I'm working on a BIND implementation of dnstap (targeted for BIND 9.11.0,
> early 2016), and have run into a problem.  How should I differentiate
> between AUTH_{QUERY,RESPONSE} and CLIENT_{QUERY,RESPONSE} when the server
> is configured to be both authoritative and recursive?
> 
> If a query arrives with RD=1, I can log it as a CQ, but then it might
> be answered authoritatively, in which case I might log it as AR, but it
> seems strange for the query and response to be unbalanced like that.

This is a good question, and one that hasn't come up before in previous
server implementations of dnstap in Unbound and Knot, since Unbound is
caching/forwarding only, and Knot is authoritative only.

There isn't a really good reason to enforce that the query and its
corresponding response be "paired" in terms of Message.Type values
(other than symmetry, I guess).  Adopting Joe's "message_tag" proposal
might make it slightly easier to locate a query/response pair from a
dnstap log.

How "malleable" is the runtime configuration of BIND with regard to
whether authoritative, recursive, or mixed mode service is being
provided?  (IIRC, weren't there some rndc "addzone" and "delzone"
commands added at some point?)

Your hypothetical here is a server that's been configured for mixed-mode
service.  What about the other two cases, where a server is configured
only for recursive service, or only for authoritative service?  Is there
a global variable that indicates whether the server has been configured
for recursive-only vs authoritative-only service?  (That is, is it
straight forward for BIND to make good use of the AUTH_QUERY and
CLIENT_QUERY values when it's not running in mixed mode?)

> I could postpone logging the query until I've determined whether we have
> an authoritative answer, but by that time I'd already be sending a
> response, and AQ and AR messages would be emitted almost simultaneously.

Yeah, ideally a DNS server should emit its dnstap log messages as early
as possible (but in the case of responses, *after* the response has been
sent, because logging should take a secondary priority to providing name
service).  For instance, the Unbound dnstap implementation generates CQs
before even doing basic sanity and ACL checks on the message, but this
is because we can make the simplification that all inbound queries
processed by Unbound will be marked as CLIENT_QUERYs.

But in BIND's case, you might end up traversing a fair amount of data
structures before being able to determine how the query should be
classified, right?  That strikes me as less than optimal, but as long
as you can emit the log message without waiting on cache misses to be
filled, it seems that it would still be desireable to be able to
accurate classify the inbound query.

> It seems the best solution is would be to log all RD=1 queries as CQ and
> their responses as CR, and all RD=0 queries as AQ and their responses as
> AR, and to extend the CR message to indicate whether the response was
> authoritative.

Hm, so, I intentionally tried to not define the Message.Type enum's
*_QUERY values based solely on the RD bit in the query message, because
of the corner cases:

(1) A recursive-only server receiving an RD=0 query is processing a
"cache snooping" request.  It might answer from cache without performing
recursion, or REFUSE it based on policy (e.g. Unbound without the
"allow_snoop" ACL set), etc.  A mixed-mode server might also process
these queries via the cache if it doesn't match an authoritative zone,
too.  So, it shouldn't be classified as an AUTH_QUERY based solely on
the RD bit, because it's not necessarily being processed as if it were a
request for authoritative service.

(2) An authoritative-only server receiving an RD=1 query is
processing...  well, I don't think there's a cute name for it, but you
usually get back a response without the RA bit set that's identical to
what you would have received if the RD bit were cleared.  (The most
common cause of this is probably people running something like "dig
@<AUTH-SERVER> ..." without setting +norec, because after all, it still
works even if you don't set +norec, right?)  So, it shouldn't be
classified as a CLIENT_QUERY based solely on the RD bit, because it's
not being processed as if it were a recursion-desired query.

I originally thought of each Message.Type value as representing a unique
code site inside the nameserver implementation, and corresponding to
separate dnstap logging config knobs that could be independently enabled
or disabled.  (So, for instance, suppose you were interested in passive
DNS replication.  You could enable logging RESOLVER_RESPONSE's but leave
RESOLVER_QUERY's disabled, since the query is largely redundant for that
use case, anyway.)

This concept of Message.Type values corresponding to specific code sites
broke down a bit when I actually implemented dnstap in Unbound and found
that the same code paths were used for both RESOLVER_* and FORWARDER_*,
and there wasn't a good way to distinguish between the two cases, other
than by actually inspecting the RD bit [0,1] of the query that Unbound
was sending out.  I think it's OK to make this classification (compared
to the corner cases above) because the specific RD bit being inspected
here is always under the control of the server and it's correct 100% of
the time; there aren't any corner cases, AFAIK.

[0] https://github.com/jedisct1/unbound/blob/cbe0bdb67691fb8bfa9fa869e1da61389479c150/dnstap/dnstap.c#L420-L429

[1] https://github.com/jedisct1/unbound/blob/cbe0bdb67691fb8bfa9fa869e1da61389479c150/dnstap/dnstap.c#L471-L480

I think we should try to accurately classify the response messages (AR
vs QR) according to how they're actually processed in the server, and
not based on what the query header bits look like.  So I think I'm
leaning towards recommending postponing AQ/CQ logging until you know A
vs C, or possibly introducing an indeterminate "QUERY" type that just
represents a generic query received by a responder.

> A suggestion was already made in
> http://lists.redbarn.org/pipermail/dnstap/2015-February/000017.html
> to extend the CR message to differentiate between cache hits and misses.
> I'd like to piggyback on that suggestion, and propose this, to be added
> as an optional field in the Message type.
> 
>         enum DataSource {
>                 // all data used to generate this response
>                 // are from local authoritative sources.
>                 AUTH_DATA = 1;
> 
>                 // this response was generated from a 
>                 // cache of previously-sent whole DNS responses.
>                 MESSAGE_CACHE = 2;
> 
>                 // this response was generated by consulting
>                 // a cache of DNS records, but without sending
>                 // iterative queries
>                 RECORD_CACHE = 3;
> 
>                 // at least one iterative query was sent in
>                 // the construction of this response
>                 RECURSION = 4;
>         };
> 
> Thoughts?

Is that a replacement for the original CacheStatus enum in the message
you reference?  Where did the "cache miss" value go?

-- 
Robert Edmonds

From each at isc.org  Sat Sep 12 06:55:48 2015
From: each at isc.org (Evan Hunt)
Date: Sat, 12 Sep 2015 06:55:48 +0000
Subject: [dnstap] dnstap with auth/recursive servers
In-Reply-To: <20150912012151.GA24170@mycre.ws>
References: <20150911203714.GB87680@isc.org> <20150912012151.GA24170@mycre.ws>
Message-ID: <20150912065548.GA91292@isc.org>

On Fri, Sep 11, 2015 at 09:21:51PM -0400, Robert Edmonds wrote:
> There isn't a really good reason to enforce that the query and its
> corresponding response be "paired" in terms of Message.Type values
> (other than symmetry, I guess).  Adopting Joe's "message_tag" proposal
> might make it slightly easier to locate a query/response pair from a
> dnstap log.

Fair point, but if I've configured dnstap to log AUTH traffic but not
CLIENT, or vice versa, then we might end up logging some AUTH responses
and leaving out the corresponding queries, which seems nonoptimal.

Hmmm.  On the other hand, I *could* make this problem go away by not
allowing such a configuration.  If logging CLIENT automatically
includes AUTH, I doubt very many people would be disappointed.
(I suspect the same would be true if I had RESOLVER automatically
include FORWARDER, for that matter...)

> How "malleable" is the runtime configuration of BIND with regard to
> whether authoritative, recursive, or mixed mode service is being
> provided?  (IIRC, weren't there some rndc "addzone" and "delzone"
> commands added at some point?)

Yes. And even without addzone/delzone, a server could be reconfigured
to add zones any time.

> Your hypothetical here is a server that's been configured for mixed-mode
> service.  What about the other two cases, where a server is configured
> only for recursive service, or only for authoritative service?  Is there
> a global variable that indicates whether the server has been configured
> for recursive-only vs authoritative-only service?  (That is, is it
> straight forward for BIND to make good use of the AUTH_QUERY and
> CLIENT_QUERY values when it's not running in mixed mode?)

Basically, you can't trust a recursive server not to have any authoritative
data -- if nothing else, most of them have built-in empty zones for RFC1918
reverse lookups.

However, you can configure a server to refuse recursion and cache lookups,
in which case all traffic (except as permitted by the relevant ACLs) would
be authoritative, and AUTH_QUERY would always be appropriate.

> I think we should try to accurately classify the response messages (AR
> vs QR) according to how they're actually processed in the server, and
> not based on what the query header bits look like.  So I think I'm
> leaning towards recommending postponing AQ/CQ logging until you know A
> vs C, or possibly introducing an indeterminate "QUERY" type that just
> represents a generic query received by a responder.

Hm, that's promising. There's no difference in content (other than
message type) between AUTH QUERY and CLIENT QUERY anyway, is there?

Another thought, though, is that the AUTH_RESPONSE could have the
same DataSource enum I suggested for CLIENT_RESPONSE, and then we'd
be able to see if an RD=0 query was answered via cache snooping.

> Is that a replacement for the original CacheStatus enum in the message
> you reference?  Where did the "cache miss" value go?

I renamed it "recursion" because "cache miss" isn't a data source.

-- 
Evan Hunt -- each at isc.org
Internet Systems Consortium, Inc.