hannes.robur.coop/Posts/DNS
2018-03-17 14:44:36 +00:00

311 lines
16 KiB
Text

---
title: My 2018 contains robur and starts with re-engineering DNS
author: hannes
tags: mirageos, protocol
abstract: New year brings new possibilities and a new environment. I've been working on the most Widely deployed key-value store, the domain name system. Primary and secondary name services are available, including dynamic updates, notify, and tsig authentication.
---
## 2018
At the end of 2017, I resigned from my PostDoc position at University of
Cambridge (in the [rems](https://www.cl.cam.ac.uk/~pes20/rems/) project). Early
December 2017 I organised the [4th MirageOS hack
retreat](https://mirage.io/blog/2017-winter-hackathon-roundup), with which I'm
very satisfied. In March 2018 the [5th retreat](http://retreat.mirage.io) will
happen (please sign up!).
In 2018 I moved to Berlin and started to work for the (non-profit) [Centre for
the cultivation of technology](https://techcultivation.org) with our
[robur.io](http://robur.io) project "At robur, we build performant bespoke
minimal operating systems for high-assurance services". robur is only possible
by generous donations in autumn 2017, enthusiastic collaborateurs, supportive
friends, and a motivated community, thanks to all. We will receive funding from
the [prototypefund](https://prototypefund.de/project/robur-io/) to work on a
[CalDAV server](http://robur.io/Projects/CalDAV) implementation in OCaml
targeting MirageOS. We're still looking for donations and further funding,
please get in touch. Apart from CalDAV, I want to start the year by finishing
several projects which I discovered on my hard drive. This includes DNS, [opam
signing](/Posts/Conex), TCP, ... . My personal goal for 2018 is to develop a
flexible `mirage deploy`, because after configuring and building a unikernel, I
want to get it smoothly up and running (spoiler: I already use
[albatross](/Posts/VMM) in production).
To kick off (3% of 2018 is already used) this year, I'll talk in more detail
about [µDNS](https://github.com/roburio/udns), an opinionated from-scratch
re-engineered DNS library, which I've been using since Christmas 2017 in production for
[ns.nqsb.io](https://github.com/hannesm/ns.nqsb.io) and
[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary). The
development started in March 2017, and continued over several evenings and long
weekends. My initial motivation was to implement a recursive resolver to run on
my laptop. I had a working prototype in use on my laptop over 4 months in the
summer 2017, but that code was not in a good shape, so I went down the rabbit
hole and (re)wrote a server (and learned more about GADT). A configurable
resolver needs a server, as local overlay, usually anyways. Furthermore,
dynamic updates are standardised and thus a configuration interface exists
inside the protocol, even with hmac-signatures for authentication!
Coincidentally, I started to solve another issue, namely automated management of let's
encrypt certificates (see [this
branch](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate) for an
initial hack). On my journey, I also reported a cache poisoning vulnerability,
which was fixed in [Docker for
Windows](https://docs.docker.com/docker-for-windows/release-notes/#docker-community-edition-17090-ce-win32-2017-10-02-stable).
But let's get started with some content. Please keep in mind that while the
code is publicly available, it is not yet released (mainly since the test
coverage is not high enough, and the lack of documentation). I appreciate early
adopters, please let me know if you find any issues or find a use case which is
not straightforward to solve. This won't be the last article about DNS this
year - persistent storage, resolver, let's encrypt support are still missing.
## What is DNS?
The [domain name system](https://en.wikipedia.org/wiki/DNS) is a core Internet
protocol, which translates domain names to IP addresses. A domain name is
easier to memorise for human beings than an IP address. DNS is hierarchical and
decentralised. It was initially "specified" in Nov 1987 in [RFC
1034](https://tools.ietf.org/html/rfc1034) and [RFC
1035](https://tools.ietf.org/html/rfc1035). Nowadays it spans over more than 20
technical RFCs, 10 security related, 5 best current practises and another 10
informational. The basic encoding and mechanisms did not change.
On the Internet, there is a set of root servers (administrated by IANA) which
provide the information about which name servers are authoritative for which top level
domain (such as ".com"). They provide the information about which name servers are
responsible for which second level domain name (such as "example.com"), and so
on. There are at least two name servers for each domain name in separate
networks - in case one is unavailable the other can be reached.
The building blocks for DNS are: the resolver, a stub (`gethostbyname` provided
by your C library) or caching forwarding resolver (at your ISP), which send DNS
packets to another resolver, or a recursive resolver which, once seeded with the
root servers, finds out the IP address of a requested domain name. The other
part are authoritative servers, which reply to requests for their configured
domain.
To get some terminology, a DNS client sends a query, consisting of a domain
name and a query type, and expects a set of answers, which are called resource
records, and contain: name, time to live, type, and data. The resolver
iteratively requests resource records from authoritative servers, until the requested
domain name is resolved or fails (name does not exist, server
failure, server offline).
DNS usually uses UDP as transport which is not reliable and limited to 512 byte
payload on the Internet (due to various middleboxes). DNS can also be
transported via TCP, and even via TLS over UDP or TCP. If a DNS packet
transferred via UDP is larger than 512 bytes, it is cut at the 512 byte mark,
and a bit in its header is set. The receiver can decide whether to use the 512
bytes of information, or to throw it away and attempt a TCP connection.
### DNS packet
The packet encoding starts with a 16bit identifier followed by a 16bit header
(containing operation, flags, status code), and four counters, each 16bit,
specifying the amount of resource records in the body: questions, answers,
authority records, and additional records. The header starts with one bit
operation (query or response), four bits opcode, various flags (recursion,
authoritative, truncation, ...), and the last four bit encode the response code.
A question consists of a domain name, a query type, and a query class. A
resource record additionally contains a 32bit time to live, a length, and the
data.
Each domain name is a case sensitive string of up to 255 bytes, separated by `.`
into labels of up to 63 bytes each. A label is either encoded by its length
followed by the content, or by an offset to the start of a label in the current
DNS frame (poor mans compression). Care must be taken during decoding to avoid
cycles in offsets. Common operations on domain names are comparison: equality,
ordering, and also whether some domain name is a subdomain of another domain
name, should be efficient. My initial representation naïvely was a list of
strings, now it is an array of strings in reverse order. This speeds up common
operations by a factor of 5 (see test/bench.ml).
The only really used class is `IN` (for Internet), as mentioned in [RFC
6895](https://tools.ietf.org/html/rfc6895). Various query types (`MD`, `MF`,
`MB`, `MG`, `MR`, `NULL`, `AFSDB`, ...) are barely or never used. There is no
need to convolute the implementation and its API with these legacy options (if
you have a use case and see those in the wild, please tell me).
My implemented packet decoding does decompression, only allows valid internet
domain names, and may return a partial parse - to use as many resource records
in truncated packets as possible. There are no exceptions raised, the parsing
uses a monadic style error handling. Since label decompression requires the
parser to know absolute offsets, the original buffer and the offset is manually
passed around at all times, instead of using smaller views on the buffer. The
decoder does not allow for gaps, when the outer resource data length specifies a
byte length which is not completely consumed by the specific resource data
subparser (an A record must always consume four bytes). Failing to check this can
lead to a way to exfiltrate data without getting noticed.
Each zone (a served domain name) contains a SOA "start of authority" entry,
which includes the primary nameserver name, the hostmaster's email address (both
encoded as domain name), a serial number of the zone, a refresh, retry, expiry,
and minimum interval (all encoded as 32bit unsigned number in seconds). Common
resource records include A, which payload is 32bit IPv4 address. A nameserver
(NS) record carries a domain name as payload. A mail exchange (MX) whose
payload is a 16bit priority and a domain name. A CNAME record is an alias to
another domain name. These days, there are even records to specify the
certificate authority authorisation (CAA) records containing a flag (critical),
a tag ("issue") and a value ("letsencrypt.org").
## Server
The operation of a DNS server is to listen for a request and serve a reply.
Data to be served can be canonically encoded (the RFC describes the format) in a
zone file. Apart from insecurity in DNS server implementations, another attack
vector are amplification attacks where an attacker crafts a small UDP frame
with a fake source IP address, and the server answers with a large response to
that address which may lead to a DoS attack. Various mitigations exist
including rate limiting, serving large replies only via TCP, ...
Internally, the zone file data is stored in a tree (module
[Dns_trie](https://github.com/roburio/udns/blob/master/server/dns_trie.mli)
[implementation](https://github.com/roburio/udns/blob/master/server/dns_trie.ml)),
where each node contains two maps: `sub`, which key is a label and value is a
subtree and `dns_map` (module Dns_map), which key is a resource record type and
value is the resource record. Both use the OCaml
[Map](http://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.html) ("also known
as finite maps or dictionaries, given a total ordering function over the
keys. All operations over maps are purely applicative (no side-effects). The
implementation uses balanced binary trees, and therefore searching and insertion
take time logarithmic in the size of the map").
The server looks up the queried name, and in the returned Dns_map the queried
type. The found resource records are sent as answer, which also includes the
question and authority information (NS records of the zone) and additional glue
records (IP addresses of names mentioned earlier in the same zone).
### Dns_map
The data structure which contains resource record types as key, and a collection
of matching resource records as values. In OCaml the value type must be
homogenous - using a normal sum type leads to an unneccessary unpacking step
(or lacking type information):
```OCaml
let lookup_ns t =
match Map.find NS t with
| None -> Error `NotFound
| Some (NS nameservers) -> Ok nameservers
| Some _ -> Error `NotFound
```
Instead, I use in my current rewrite [generalized algebraic data
types](https://en.wikipedia.org/wiki/Generalized_algebraic_data_type) (read
[OCaml manual](http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec251) and
[Mads Hartmann blog post about use cases for
GADTs](http://mads-hartmann.com/ocaml/2015/01/05/gadt-ocaml.html), [Andreas
Garnæs about using GADTs for GraphQL type
modifiers](https://andreas.github.io/2018/01/05/modeling-graphql-type-modifiers-with-gadts/))
to preserve a relation between key and value (and A record has a list of IPv4
addresses and a ttl as value) - similar to
[hmap](http://erratique.ch/software/hmap), but different: a closed key-value
mapping (the GADT), no int for each key and mutable state. Thanks to Justus
Matthiesen for helping me with GADTs and this code. Look into the
[interface](https://github.com/roburio/udns/blob/master/src/dns_map.mli) and
[implementation](https://github.com/roburio/udns/blob/master/src/dns_map.ml).
```OCaml
(* an ordering relation, I dislike using int for that *)
module Order = struct
type (_,_) t =
| Lt : ('a, 'b) t
| Eq : ('a, 'a) t
| Gt : ('a, 'b) t
end
module Key = struct
(* The key and its value type *)
type _ t =
| Soa : (int32 * Dns_packet.soa) t
| A : (int32 * Ipaddr.V4.t list) t
| Ns : (int32 * Dns_name.DomSet.t) t
| Cname : (int32 * Dns_name.t) t
(* we need a total order on our keys *)
let compare : type a b. a t -> b t -> (a, b) Order.t = fun t t' ->
let open Order in
match t, t' with
| Cname, Cname -> Eq | Cname, _ -> Lt | _, Cname -> Gt
| Ns, Ns -> Eq | Ns, _ -> Lt | _, Ns -> Gt
| Soa, Soa -> Eq | Soa, _ -> Lt | _, Soa -> Gt
| A, A -> Eq
end
type 'a key = 'a Key.t
(* our OCaml Map with an encapsulated constructor as key *)
type k = K : 'a key -> k
module M = Map.Make(struct
type t = k
(* the price I pay for not using int as three-state value *)
let compare (K a) (K b) = match Key.compare a b with
| Order.Lt -> -1
| Order.Eq -> 0
| Order.Gt -> 1
end)
(* v contains a key and value pair, wrapped by a single constructor *)
type v = V : 'a key * 'a -> v
(* t is the main type of a Dns_map, used by clients *)
type t = v M.t
(* retrieve a typed value out of the store *)
let get : type a. a Key.t -> t -> a = fun k t ->
match M.find (K k) t with
| V (k', v) ->
(* this comparison is superfluous, just for the types *)
match Key.compare k k' with
| Order.Eq -> v
| _ -> assert false
```
This helps me to programmaticaly retrieve tightly typed values from the cache,
important when code depends on concrete values (i.e. when there are domain
names, look these up as well and add as additional records). Look into [server/dns_server.ml](https://github.com/roburio/udns/blob/master/server/dns_server.ml)
### Dynamic updates, notifications, and authentication
[Dynamic updates](https://tools.ietf.org/html/rfc2136) specify in-protocol
record updates (supported for example by `nsupdate` from ISC bind-tools),
[notifications](https://tools.ietf.org/html/rfc1996) are used by primary servers
to notify secondary servers about updates, which then initiate a [zone
transfer](https://tools.ietf.org/html/rfc5936) to retrieve up to date
data. [Shared hmac secrets](https://tools.ietf.org/html/rfc2845) are used to
ensure that the transaction (update, zone transfer) was authorised. These are
all protocol extensions, there is no need to use out-of-protocol solutions.
The server logic for update and zone transfer frames is slightly more complex,
and includes a dependency upon an authenticator (implemented using the
[nocrypto](https://github.com/mirleft/ocaml-nocrypto) library, and
[ptime](http://erratique.ch/software/ptime)).
### Deployment and Let's Encrypt
To deploy servers without much persistent data, an authentication schema is
hardcoded in the dns-server: shared secrets are also stored as DNS entries
(DNSKEY), and `_transfer.zone`, `_update.zone`, and `_key-management.zone` names
are introduced to encode the permissions. A `_transfer` key also needs to
encode the IP address of the primary (to know where to request zone transfers)
and secondary IP (to know where to send notifications).
Please have a look at
[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary) and the [examples](https://github.com/roburio/udns/blob/master/mirage/examples) for more details. The shared secrets are provided as boot parameter of the unikernel.
I hacked maker's
[ocaml-letsencrypt](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate)
library to use µDNS and sending update frames to the given IP address. I
already used this to have letsencrypt issue various certificates for my domains.
There is no persistent storage of updates yet, but this can be realised by
implementing a secondary (which is notified on update) that writes every new
zone to persistent storage (e.g. [disk](https://github.com/mirage/mirage-block)
or [git](https://github.com/mirage/ocaml-git)). I also plan to have an
automated Let's Encrypt certificate unikernel which listens for certificate
signing requests and stores signed certificates in DNS. Luckily the year only
started and there's plenty of time left.
I'm interested in feedback, either via <strike>[twitter](https://twitter.com/h4nnes)</strike>
hannesm@mastodon.social or an issue on the [data
repository](https://github.com/hannesm/hannes.nqsb.io/issues).