From dot at dotat.at Fri Apr 13 17:59:33 2018 From: dot at dotat.at (Tony Finch) Date: Fri, 13 Apr 2018 18:59:33 +0100 Subject: [dnstap] dnstap fanout and replay Message-ID: I have a short wishlist of dnstap-related tools. I haven't managed to find out if anything like this already exists - if it does exist I'll be grateful for any pointers! We have a couple of kinds of people who have expressed interest in getting dnstap feeds from our campus resolvers. * There are people on site doing threat intelligence research, who would like a full feed of client queries and responses. * And there are third parties who would like a passive DNS feed of outgoing resolver queries, and who aren't allowed a full-fat feed for privacy reasons. The dnstap implementation in BIND only supports one output stream, so if we are going to satisfy these consumers, we would need to split the dnstap feed downstream of BIND before feeding the distributaries onwards. More recently it occurred to me that it might be useful to generate queries from a dnstap feed. I have a couple of scenarios: * Replay client queries against a test server, to verify that it behaves OK with real-ish traffic. I have a tool for replaying cache dump files, but these are nothing like real user traffic since they don't include repeated queries etc. * Replay resolver queries from a live server against a standby server. These queries are effectively the cache misses, so they are less costly to replicate than all the client traffic. This keeps the standby cache hot whereas at the moment my standby servers have cold caches. It might also be worth duplicating this traffic from one live server to the other one, in the hope that this increases the cache hit rate, since hit rate increses the more users a cache has. (Some experimentation needed!) I'm not really insterested in the responses to these queries so it's OK if the replay just drops the answers. (Though when replaying a CQ feed it might be useful to compare the responses to the CR feed.) If anything like this does not exist, I might write it myself. I have not used protobufs before so I'm keen to hear advice from those who have already got their hands dirty / fingers burned. I'm tempted to weld libfstrm to Lua, so you can configure filtering, replication, and output with a bit of Lua. The number of Lua protobuf implementations is a bit of a worry - if anyone has a recommendation I'd like to short-cut the experimental stage. (I should ask this on the Lua list I guess!) Alternatively it might be easier to hack around with the golang-dnstap code, tho then I would have to think harder about how to configure it... Tony. -- f.anthony.n.finch http://dotat.at/ justice and liberty cannot be confined by national boundaries From edmonds at mycre.ws Fri Apr 13 21:27:24 2018 From: edmonds at mycre.ws (Robert Edmonds) Date: Fri, 13 Apr 2018 17:27:24 -0400 Subject: [dnstap] dnstap fanout and replay In-Reply-To: References: Message-ID: <20180413212724.difwgfbnl4dzarkh@mycre.ws> Tony Finch wrote: > We have a couple of kinds of people who have expressed interest in getting > dnstap feeds from our campus resolvers. > > * There are people on site doing threat intelligence research, who would > like a full feed of client queries and responses. > > * And there are third parties who would like a passive DNS feed of > outgoing resolver queries, and who aren't allowed a full-fat feed for > privacy reasons. Note that most folks doing passive DNS replication would really need the RESOLVER_RESPONSE messages, not just the queries. > The dnstap implementation in BIND only supports one output stream, so if > we are going to satisfy these consumers, we would need to split the dnstap > feed downstream of BIND before feeding the distributaries onwards. This makes sense. If a component has to burn CPU on making copies it should probably be downstream of the DNS server process. The component doesn't even need to be on the same machine as the DNS server if you're using the 'next' branch of fstrm, which has TCP support, though BIND would probably need to be updated to allow configuring an fstrm TCP writer. > More recently it occurred to me that it might be useful to generate > queries from a dnstap feed. I have a couple of scenarios: > > * Replay client queries against a test server, to verify that it behaves > OK with real-ish traffic. I have a tool for replaying cache dump files, > but these are nothing like real user traffic since they don't include > repeated queries etc. > > * Replay resolver queries from a live server against a standby server. > These queries are effectively the cache misses, so they are less costly > to replicate than all the client traffic. This keeps the standby cache > hot whereas at the moment my standby servers have cold caches. > > It might also be worth duplicating this traffic from one live server to > the other one, in the hope that this increases the cache hit rate, since > hit rate increses the more users a cache has. (Some experimentation > needed!) This is definitely one of the use cases I had in mind as something that dnstap could support, with the right glue utilities, but nowadays I wonder if the "keeping a standby cache hot" use case wouldn't best be served by existing functionality in dnsdist, if you're using dnsdist to front your recursive servers? > I'm not really insterested in the responses to these queries so it's OK if > the replay just drops the answers. (Though when replaying a CQ feed it > might be useful to compare the responses to the CR feed.) > > If anything like this does not exist, I might write it myself. > > I have not used protobufs before so I'm keen to hear advice from those who > have already got their hands dirty / fingers burned. The only advice I can offer would be to watch out for Protocol Buffers v2 versus Protocol Buffers v3 compatibility issues. dnstap was designed using the Protocol Buffers v2 language, and v3 removes some v2 functionality that the dnstap schema uses. (Also note that, technically, "protobuf" is an implementation of the "Protocol Buffers" language / serialization format. For instance, the "protobuf v3" implementation supports, for now, both the "Protocol Buffers v2" and "Protocol Buffers v3" languages. The v3 re-design happened shortly after the dnstap protocol buffers schema was written, and if I were doing things from scratch I would probably avoid the whole issue by using something else, probably CBOR.) > I'm tempted to weld libfstrm to Lua, so you can configure filtering, > replication, and output with a bit of Lua. The number of Lua protobuf > implementations is a bit of a worry - if anyone has a recommendation I'd > like to short-cut the experimental stage. (I should ask this on the Lua > list I guess!) One of the cool things about Frame Streams is that it's strongly layered to the point that it doesn't care what is actually in the payloads that it transports (other than carrying a "content type" for the user to describe the type of payloads carried in the stream), so if you don't actually need to edit or consume the dnstap protobuf payloads, you could certainly write a fanout utility in Lua without needing a protobuf dependency at all. Though for replaying, you would certainly need to deserialize each payload to extract the query messages. The Frame Streams protocol is also (intentionally) easy to implement, for instance the fstrm_capture utility is libevent based for the I/O handling but actually directly implements most of the protocol itself (except for a few bits that use the fstrm_control interface for encoding and decoding control frames): https://github.com/farsightsec/fstrm/blob/master/src/fstrm_capture.c Unfortunately the Frame Streams protocol itself isn't fully documented. Parts of it are described in the libfstrm API documentation, though: http://farsightsec.github.io/fstrm/group__fstrm__control.html > Alternatively it might be easier to hack around with the golang-dnstap > code, tho then I would have to think harder about how to configure it... -- Robert Edmonds From matt at conundrum.com Fri Apr 13 21:43:08 2018 From: matt at conundrum.com (Matthew Pounsett) Date: Fri, 13 Apr 2018 17:43:08 -0400 Subject: [dnstap] dnstap fanout and replay In-Reply-To: References: Message-ID: On 13 April 2018 at 13:59, Tony Finch wrote: > > The dnstap implementation in BIND only supports one output stream, so if > we are going to satisfy these consumers, we would need to split the dnstap > feed downstream of BIND before feeding the distributaries onwards. > Maybe have a look at Jerry's luadns (somewhere on the dns-oarc.net web site, which isn't responding to me at the moment). I had a chat with him after the last OARC meeting about a problem I had been trying to solve in a past life where we had three or four different things all attached to the same bpf on our measurement machines. He seemed to think his tool would be well suited to taking a single feed of DNS packet info and branching out into processing it in multiple ways. I don't recall how pcap-specific his sample code is, and can't go check right now, but I'm hoping it'd be pretty easily adapted to dnstap messages. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dot at dotat.at Sun Apr 15 10:59:22 2018 From: dot at dotat.at (Tony Finch) Date: Sun, 15 Apr 2018 11:59:22 +0100 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <20180413212724.difwgfbnl4dzarkh@mycre.ws> References: <20180413212724.difwgfbnl4dzarkh@mycre.ws> Message-ID: > On 13 Apr 2018, at 22:27, Robert Edmonds wrote: > > The component doesn't even need to be on the same machine as the DNS server if you're using the 'next' branch of fstrm, which has TCP support, though BIND would probably need to be updated to allow configuring an fstrm TCP writer. Sweet, I should have a look at that! > This is definitely one of the use cases I had in mind as something that dnstap could support, with the right glue utilities, but nowadays I wonder if the "keeping a standby cache hot" use case wouldn't best be served by existing functionality in dnsdist, if you're using dnsdist to front your recursive servers? We aren?t - dnsdist is undeniably cool, but I am a bit reluctant to have two layers of health checks / failover when one layer does the job, and we don?t have the amount of traffic or level of abuse that would make it compelling. I quite like being able to implement an optional extra like this off the critical path, so any breakage is decoupled from the main job of serving queries. > The only advice I can offer would be to watch out for Protocol Buffers v2 versus Protocol Buffers v3 compatibility issues. dnstap was designed using the Protocol Buffers v2 language, and v3 removes some v2 functionality that the dnstap schema uses. I think the more interesting Lua implementations are plugins to protoc, which I hope means they will work OK! > One of the cool things about Frame Streams is that it's strongly layered to the point that it doesn't care what is actually in the payloads that it transports (other than carrying a "content type" for the user to describe the type of payloads carried in the stream), so if you don't actually need to edit or consume the dnstap protobuf payloads, you could certainly write a fanout utility in Lua without needing a protobuf dependency at all. Though for replaying, you would certainly need to deserialize each payload to extract the query messages. Oh, right, I was vaguely aware of this layering but I thought the client/resolver/etc. query/response tag was in the protobuf blob rather than the fstrm metadata (it?s described in the dnstap.proto file) which would imply even a simple dnstap filter needs to know about protobuf... Thanks for the tips about code to look at! Tony. -- f.anthony.n.finch http://dotat.at -------------- next part -------------- An HTML attachment was scrubbed... URL: From jerry at dns-oarc.net Tue Apr 24 08:14:24 2018 From: jerry at dns-oarc.net (Jerry =?ISO-8859-1?Q?Lundstr=F6m?=) Date: Tue, 24 Apr 2018 08:14:24 +0000 Subject: [dnstap] dnstap fanout and replay In-Reply-To: References: Message-ID: <1524557664.1.15.camel@dns-oarc.net> On Fri, 2018-04-13 at 17:43 -0400, Matthew Pounsett wrote: > On 13 April 2018 at 13:59, Tony Finch wrote: > > > The dnstap implementation in BIND only supports one output stream, so if > > we are going to satisfy these consumers, we would need to split the dnstap > > feed downstream of BIND before feeding the distributaries onwards. > > > Maybe have a look at Jerry's luadns (somewhere on the dns-oarc.net web > site, which isn't responding to me at the moment).??I had a chat with him > after the last OARC meeting about a problem I had been trying to solve in a > past life where we had three or four different things all attached to the > same bpf on our measurement machines.??He seemed to think his tool would be > well suited to taking a single feed of DNS packet info and branching out > into processing it in multiple ways. dnsjit is something new I am developing which basically is parts from dsc, dnscap and drool that is glued together with Lua (kinda like snabb). ??https://github.com/DNS-OARC/dnsjit > I don't recall how pcap-specific his sample code is, and can't go check > right now, but I'm hoping it'd be pretty easily adapted to dnstap messages. This is the good thing with breaking up the components, it's very easy to add new ones so it's not dependent on just one format or library. I would gladly see dnstap move towards CBOR, it's basically the same thing as msgpack/protobuf so the transition should be easy. Cheers, Jerry From paul at redbarn.org Tue Apr 24 14:24:40 2018 From: paul at redbarn.org (Paul Vixie) Date: Tue, 24 Apr 2018 07:24:40 -0700 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <1524557664.1.15.camel@dns-oarc.net> References: <1524557664.1.15.camel@dns-oarc.net> Message-ID: <5ADF3E28.4040600@redbarn.org> Jerry Lundstr?m wrote: > > I would gladly see dnstap move towards CBOR, it's basically the same thing > as msgpack/protobuf so the transition should be easy. > cbor is pretty recent. does it have a code generator for C yet? dnstap is now years old. i don't expect it to change its on-wire format. -- P Vixie From jerry at dns-oarc.net Wed Apr 25 06:20:38 2018 From: jerry at dns-oarc.net (Jerry =?ISO-8859-1?Q?Lundstr=F6m?=) Date: Wed, 25 Apr 2018 06:20:38 +0000 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <5ADF3E28.4040600@redbarn.org> References: <1524557664.1.15.camel@dns-oarc.net> <5ADF3E28.4040600@redbarn.org> Message-ID: <1524637238.1.21.camel@dns-oarc.net> On Tue, 2018-04-24 at 07:24 -0700, Paul Vixie wrote: > cbor is pretty recent. does it have a code generator for C yet? There are plenty of libraries for reading and writing CBOR today in multiple languages. The "code generator" is protobuf specific. > dnstap is now years old. i don't expect it to change its on-wire format. But it would be really good to move away from a?proprietary format. /Jerry From paul at redbarn.org Wed Apr 25 08:13:47 2018 From: paul at redbarn.org (Paul Vixie) Date: Wed, 25 Apr 2018 01:13:47 -0700 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <1524637238.1.21.camel@dns-oarc.net> References: <1524557664.1.15.camel@dns-oarc.net> <5ADF3E28.4040600@redbarn.org> <1524637238.1.21.camel@dns-oarc.net> Message-ID: <5AE038BB.6080507@redbarn.org> Jerry Lundstr?m wrote: > On Tue, 2018-04-24 at 07:24 -0700, Paul Vixie wrote: >> cbor is pretty recent. does it have a code generator for C yet? > > There are plenty of libraries for reading and writing CBOR today in > multiple languages. > > The "code generator" is protobuf specific. so, cbor is designed without that assumption? most really good xdr's have had codegen. > >> dnstap is now years old. i don't expect it to change its on-wire format. > > But it would be really good to move away from a proprietary format. i don't think it's proprietary in any sense. google open sourced it. what i'm hoping is that the first dnstap file ever recorded will always be readable, just as the first dns-containing pcap file will always be. movement for the sake of movement is not what i'd call good engineering. -- P Vixie From matt at conundrum.com Wed Apr 25 16:38:36 2018 From: matt at conundrum.com (Matthew Pounsett) Date: Wed, 25 Apr 2018 12:38:36 -0400 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <5AE038BB.6080507@redbarn.org> References: <1524557664.1.15.camel@dns-oarc.net> <5ADF3E28.4040600@redbarn.org> <1524637238.1.21.camel@dns-oarc.net> <5AE038BB.6080507@redbarn.org> Message-ID: On 25 April 2018 at 04:13, Paul Vixie wrote: > > dnstap is now years old. i don't expect it to change its on-wire format. >>> >> >> But it would be really good to move away from a proprietary format. >> > > i don't think it's proprietary in any sense. google open sourced it. > > what i'm hoping is that the first dnstap file ever recorded will always be > readable, just as the first dns-containing pcap file will always be. > > movement for the sake of movement is not what i'd call good engineering. > I think it's no secret that a group of us have been working on writing a clear specification for the current implementations of dntap. The goal is to get that specification published, and then start work on a version 2 to make it better. One of the benefits of using protobufs is that we can easily do that in a backward compatible way. I like C-DNS for long term storage, but even its creators agree that it's not particularly good at streaming data, which is a basic requirement of any useful dnstap deployment. So, I don't see us moving in that direction. Someone might want to write a dnstap->C-DNS converter for long term storage, however. -------------- next part -------------- An HTML attachment was scrubbed... URL: From paul at redbarn.org Wed Apr 25 16:43:22 2018 From: paul at redbarn.org (Paul Vixie) Date: Wed, 25 Apr 2018 09:43:22 -0700 Subject: [dnstap] dnstap fanout and replay In-Reply-To: References: <1524557664.1.15.camel@dns-oarc.net> <5ADF3E28.4040600@redbarn.org> <1524637238.1.21.camel@dns-oarc.net> <5AE038BB.6080507@redbarn.org> Message-ID: <5AE0B02A.5010902@redbarn.org> Matthew Pounsett wrote: > ... > > I like C-DNS for long term storage, but even its creators agree that > it's not particularly good at streaming data, which is a basic > requirement of any useful dnstap deployment. So, I don't see us moving > in that direction. Someone might want to write a dnstap->C-DNS > converter for long term storage, however. to be clear, dnstap can do streaming, or storage. it lacks a lot compared to c-dns for storage, but it's not limited to streaming. also to be clear, protobuf could be swapped out to any other xdr including cbor, and dnstap would retain its current character. i don't recommend this, since protobuf is open source and dnstap already has an installed base. but it's not nec'y to use c-dns in order to use cbor, and that point should not be lost. cbor is excellent work, and if we wanted a new binary dns protocol, i'd pick cbor vs. asn1, protobuf, or any of the binary json mappings. -- P Vixie From matt at conundrum.com Wed Apr 25 17:39:50 2018 From: matt at conundrum.com (Matthew Pounsett) Date: Wed, 25 Apr 2018 13:39:50 -0400 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <5AE0B02A.5010902@redbarn.org> References: <1524557664.1.15.camel@dns-oarc.net> <5ADF3E28.4040600@redbarn.org> <1524637238.1.21.camel@dns-oarc.net> <5AE038BB.6080507@redbarn.org> <5AE0B02A.5010902@redbarn.org> Message-ID: On 25 April 2018 at 12:43, Paul Vixie wrote: > > > Matthew Pounsett wrote: > >> ... >> >> I like C-DNS for long term storage, but even its creators agree that >> it's not particularly good at streaming data, which is a basic >> requirement of any useful dnstap deployment. So, I don't see us moving >> in that direction. Someone might want to write a dnstap->C-DNS >> converter for long term storage, however. >> > > to be clear, dnstap can do streaming, or storage. it lacks a lot compared > to c-dns for storage, but it's not limited to streaming. > Indeed. I didn't mean to suggest it wasn't useful for that.. just that for the streaming component protobuf has clear advantages over C-DNS. -------------- next part -------------- An HTML attachment was scrubbed... URL: From edmonds at mycre.ws Sat Apr 28 17:51:53 2018 From: edmonds at mycre.ws (Robert Edmonds) Date: Sat, 28 Apr 2018 13:51:53 -0400 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <1524557664.1.15.camel@dns-oarc.net> References: <1524557664.1.15.camel@dns-oarc.net> Message-ID: <20180428175153.e2h2vxvfsizzy557@mycre.ws> Jerry Lundstr?m wrote: > I would gladly see dnstap move towards CBOR, it's basically the same thing > as msgpack/protobuf so the transition should be easy. There are major differences among CBOR, msgpack, and protobuf. It doesn't really make sense to refer to them all as basically the same thing. protobufs are not self describing and require a schema to define the data types and field names of particular wire fields. There isn't a 1:1 mapping of wire types to data types, for instance "varints" on the wire can be int32, int64, uint32, uint64, etc. MessagePack is sort of a JSON inspired binary encoding. I think the only reason it's interesting for this mailing list is because it can store arbitrary binary data (e.g., DNS wire messages) efficiently. Otherwise you get many of the downsides of JSON plus the ecosystem downside of being much more obscure than JSON. The thing that really distinguishes CBOR from protobuf for me is the IANA "Concise Binary Object Representation (CBOR) Tags" registry (https://www.iana.org/assignments/cbor-tags/cbor-tags.xhtml). There is nothing like this for protobuf. This is sort of like the RRTYPEs registry for DNS. If I understand correctly, CBOR lets you apply a "tag" to a field on the wire that describes the semantics of the field. protobuf, on the other hand, doesn't have any space for these extra semantics and they have to be carried around in programmer's heads, or in comments. E.g., take this field in the dnstap protobuf schema: https://github.com/dnstap/dnstap.pb/blob/2d8098aaef53e548e3808f757e48acc51c37b6c9/dnstap.proto#L209-L211 // The "zone" or "bailiwick" pertaining to the DNS query message. // This is a wire-format DNS domain name. optional bytes query_zone = 11; This field is a protobuf byte string, so it can literally have any binary data in it, and there's a comment requesting that implementers of encoders and decoders treat this field as a wire-format DNS name. In CBOR you could register a tag for a DNS wire-format name and make this request have an explicit wire-format representation and standardized meaning. Similarly for other data types like DNS wire messages, IP addresses, time stamps, etc. -- Robert Edmonds From edmonds at mycre.ws Sat Apr 28 18:28:43 2018 From: edmonds at mycre.ws (Robert Edmonds) Date: Sat, 28 Apr 2018 14:28:43 -0400 Subject: [dnstap] dnstap fanout and replay In-Reply-To: <5ADF3E28.4040600@redbarn.org> References: <1524557664.1.15.camel@dns-oarc.net> <5ADF3E28.4040600@redbarn.org> Message-ID: <20180428182843.74wjoqjc2rmldgv2@mycre.ws> Paul Vixie wrote: > Jerry Lundstr?m wrote: > > > > I would gladly see dnstap move towards CBOR, it's basically the same thing > > as msgpack/protobuf so the transition should be easy. > > > > cbor is pretty recent. does it have a code generator for C yet? RFC 7049 is from 2013, which was five years ago. The initial public release of Protocol Buffers was in 2008, and the first commits in the dnstap.pb repository are from 2013. So when work on dnstap began, protobuf was about as old as CBOR is now. > dnstap is now years old. i don't expect it to change its on-wire format. Byte stream stability is a very defensible argument. But, future implementations would have to deal with the technical debt of dnstap being implemented in the protobuf 2 language, and the protobuf ecosystem, while having a significant amount of open source technology is still basically controlled by Google. E.g., except for a few niche implementations, almost everybody uses Google's protobuf C++ compiler library or implements their code generator as a plugin for Google's protoc tool, *including* the protobuf-c implementation that the DNS servers written in C have used for their dnstap implementations. The dnstap protobuf schema is written in the proto2 language and uses the 'optional' keyword heavily, which has been removed from proto3. As long as new versions of Google protobuf continue to implement support for proto2, the techical debt of dnstap being implemented in proto2 is fairly manageable. There is a risk that Google will decide to retire proto2 support from new versions of protobuf in the future, at which point the dnstap ecosystem would be in the awkward position of telling users that they need to build software against older releases of protobuf. Or it may be possible to port the dnstap schema from proto2 to proto3. Note that I'm not advocating for or against a breaking format change for dnstap (I don't use it enough these days to really have a say), but if a new format is needed, it can easily support a superset of the semantics of the previous format, and a tool to convert old captures is easy enough to develop. -- Robert Edmonds