310 lines
16 KiB
Text
310 lines
16 KiB
Text
---
|
|
title: My 2018 contains robur and starts with re-engineering DNS
|
|
author: hannes
|
|
tags: mirageos, protocol
|
|
abstract: New year brings new possibilities and a new environment. I've been working on the most Widely deployed key-value store, the domain name system. Primary and secondary name services are available, including dynamic updates, notify, and tsig authentication.
|
|
---
|
|
|
|
## 2018
|
|
|
|
At the end of 2017, I resigned from my PostDoc position at University of
|
|
Cambridge (in the [rems](https://www.cl.cam.ac.uk/~pes20/rems/) project). Early
|
|
December 2017 I organised the [4th MirageOS hack
|
|
retreat](https://mirageos.org/blog/2017-winter-hackathon-roundup), with which I'm
|
|
very satisfied. In March 2018 the [5th retreat](http://retreat.mirageos.org) will
|
|
happen (please sign up!).
|
|
|
|
In 2018 I moved to Berlin and started to work for the (non-profit) [Center for
|
|
the cultivation of technology](https://techcultivation.org) with our
|
|
[robur.coop](http://robur.coop) project "At robur, we build performant bespoke
|
|
minimal operating systems for high-assurance services". robur is only possible
|
|
by generous donations in autumn 2017, enthusiastic collaborateurs, supportive
|
|
friends, and a motivated community, thanks to all. We will receive funding from
|
|
the [prototypefund](https://prototypefund.de/project/robur-io/) to work on a
|
|
[CalDAV server](https://robur.coop/Our%20Work/Projects#CalDAV-Server) implementation in OCaml
|
|
targeting MirageOS. We're still looking for donations and further funding,
|
|
please get in touch. Apart from CalDAV, I want to start the year by finishing
|
|
several projects which I discovered on my hard drive. This includes DNS, [opam
|
|
signing](/Posts/Conex), TCP, ... . My personal goal for 2018 is to develop a
|
|
flexible `mirage deploy`, because after configuring and building a unikernel, I
|
|
want to get it smoothly up and running (spoiler: I already use
|
|
[albatross](/Posts/VMM) in production).
|
|
|
|
To kick off (3% of 2018 is already used) this year, I'll talk in more detail
|
|
about [µDNS](https://github.com/roburio/udns), an opinionated from-scratch
|
|
re-engineered DNS library, which I've been using since Christmas 2017 in production for
|
|
[ns.nqsb.io](https://github.com/hannesm/ns.nqsb.io) and
|
|
[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary). The
|
|
development started in March 2017, and continued over several evenings and long
|
|
weekends. My initial motivation was to implement a recursive resolver to run on
|
|
my laptop. I had a working prototype in use on my laptop over 4 months in the
|
|
summer 2017, but that code was not in a good shape, so I went down the rabbit
|
|
hole and (re)wrote a server (and learned more about GADT). A configurable
|
|
resolver needs a server, as local overlay, usually anyways. Furthermore,
|
|
dynamic updates are standardised and thus a configuration interface exists
|
|
inside the protocol, even with hmac-signatures for authentication!
|
|
Coincidentally, I started to solve another issue, namely automated management of let's
|
|
encrypt certificates (see [this
|
|
branch](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate) for an
|
|
initial hack). On my journey, I also reported a cache poisoning vulnerability,
|
|
which was fixed in [Docker for
|
|
Windows](https://docs.docker.com/docker-for-windows/release-notes/#docker-community-edition-17090-ce-win32-2017-10-02-stable).
|
|
|
|
But let's get started with some content. Please keep in mind that while the
|
|
code is publicly available, it is not yet released (mainly since the test
|
|
coverage is not high enough, and the lack of documentation). I appreciate early
|
|
adopters, please let me know if you find any issues or find a use case which is
|
|
not straightforward to solve. This won't be the last article about DNS this
|
|
year - persistent storage, resolver, let's encrypt support are still missing.
|
|
|
|
## What is DNS?
|
|
|
|
The [domain name system](https://en.wikipedia.org/wiki/DNS) is a core Internet
|
|
protocol, which translates domain names to IP addresses. A domain name is
|
|
easier to memorise for human beings than an IP address. DNS is hierarchical and
|
|
decentralised. It was initially "specified" in Nov 1987 in [RFC
|
|
1034](https://tools.ietf.org/html/rfc1034) and [RFC
|
|
1035](https://tools.ietf.org/html/rfc1035). Nowadays it spans over more than 20
|
|
technical RFCs, 10 security related, 5 best current practises and another 10
|
|
informational. The basic encoding and mechanisms did not change.
|
|
|
|
On the Internet, there is a set of root servers (administrated by IANA) which
|
|
provide the information about which name servers are authoritative for which top level
|
|
domain (such as ".com"). They provide the information about which name servers are
|
|
responsible for which second level domain name (such as "example.com"), and so
|
|
on. There are at least two name servers for each domain name in separate
|
|
networks - in case one is unavailable the other can be reached.
|
|
|
|
The building blocks for DNS are: the resolver, a stub (`gethostbyname` provided
|
|
by your C library) or caching forwarding resolver (at your ISP), which send DNS
|
|
packets to another resolver, or a recursive resolver which, once seeded with the
|
|
root servers, finds out the IP address of a requested domain name. The other
|
|
part are authoritative servers, which reply to requests for their configured
|
|
domain.
|
|
|
|
To get some terminology, a DNS client sends a query, consisting of a domain
|
|
name and a query type, and expects a set of answers, which are called resource
|
|
records, and contain: name, time to live, type, and data. The resolver
|
|
iteratively requests resource records from authoritative servers, until the requested
|
|
domain name is resolved or fails (name does not exist, server
|
|
failure, server offline).
|
|
|
|
DNS usually uses UDP as transport which is not reliable and limited to 512 byte
|
|
payload on the Internet (due to various middleboxes). DNS can also be
|
|
transported via TCP, and even via TLS over UDP or TCP. If a DNS packet
|
|
transferred via UDP is larger than 512 bytes, it is cut at the 512 byte mark,
|
|
and a bit in its header is set. The receiver can decide whether to use the 512
|
|
bytes of information, or to throw it away and attempt a TCP connection.
|
|
|
|
### DNS packet
|
|
|
|
The packet encoding starts with a 16bit identifier followed by a 16bit header
|
|
(containing operation, flags, status code), and four counters, each 16bit,
|
|
specifying the amount of resource records in the body: questions, answers,
|
|
authority records, and additional records. The header starts with one bit
|
|
operation (query or response), four bits opcode, various flags (recursion,
|
|
authoritative, truncation, ...), and the last four bit encode the response code.
|
|
|
|
A question consists of a domain name, a query type, and a query class. A
|
|
resource record additionally contains a 32bit time to live, a length, and the
|
|
data.
|
|
|
|
Each domain name is a case sensitive string of up to 255 bytes, separated by `.`
|
|
into labels of up to 63 bytes each. A label is either encoded by its length
|
|
followed by the content, or by an offset to the start of a label in the current
|
|
DNS frame (poor mans compression). Care must be taken during decoding to avoid
|
|
cycles in offsets. Common operations on domain names are comparison: equality,
|
|
ordering, and also whether some domain name is a subdomain of another domain
|
|
name, should be efficient. My initial representation naïvely was a list of
|
|
strings, now it is an array of strings in reverse order. This speeds up common
|
|
operations by a factor of 5 (see test/bench.ml).
|
|
|
|
The only really used class is `IN` (for Internet), as mentioned in [RFC
|
|
6895](https://tools.ietf.org/html/rfc6895). Various query types (`MD`, `MF`,
|
|
`MB`, `MG`, `MR`, `NULL`, `AFSDB`, ...) are barely or never used. There is no
|
|
need to convolute the implementation and its API with these legacy options (if
|
|
you have a use case and see those in the wild, please tell me).
|
|
|
|
My implemented packet decoding does decompression, only allows valid internet
|
|
domain names, and may return a partial parse - to use as many resource records
|
|
in truncated packets as possible. There are no exceptions raised, the parsing
|
|
uses a monadic style error handling. Since label decompression requires the
|
|
parser to know absolute offsets, the original buffer and the offset is manually
|
|
passed around at all times, instead of using smaller views on the buffer. The
|
|
decoder does not allow for gaps, when the outer resource data length specifies a
|
|
byte length which is not completely consumed by the specific resource data
|
|
subparser (an A record must always consume four bytes). Failing to check this can
|
|
lead to a way to exfiltrate data without getting noticed.
|
|
|
|
Each zone (a served domain name) contains a SOA "start of authority" entry,
|
|
which includes the primary nameserver name, the hostmaster's email address (both
|
|
encoded as domain name), a serial number of the zone, a refresh, retry, expiry,
|
|
and minimum interval (all encoded as 32bit unsigned number in seconds). Common
|
|
resource records include A, which payload is 32bit IPv4 address. A nameserver
|
|
(NS) record carries a domain name as payload. A mail exchange (MX) whose
|
|
payload is a 16bit priority and a domain name. A CNAME record is an alias to
|
|
another domain name. These days, there are even records to specify the
|
|
certificate authority authorisation (CAA) records containing a flag (critical),
|
|
a tag ("issue") and a value ("letsencrypt.org").
|
|
|
|
## Server
|
|
|
|
The operation of a DNS server is to listen for a request and serve a reply.
|
|
Data to be served can be canonically encoded (the RFC describes the format) in a
|
|
zone file. Apart from insecurity in DNS server implementations, another attack
|
|
vector are amplification attacks where an attacker crafts a small UDP frame
|
|
with a fake source IP address, and the server answers with a large response to
|
|
that address which may lead to a DoS attack. Various mitigations exist
|
|
including rate limiting, serving large replies only via TCP, ...
|
|
|
|
Internally, the zone file data is stored in a tree (module
|
|
[Dns_trie](https://github.com/roburio/udns/blob/master/server/dns_trie.mli)
|
|
[implementation](https://github.com/roburio/udns/blob/master/server/dns_trie.ml)),
|
|
where each node contains two maps: `sub`, which key is a label and value is a
|
|
subtree and `dns_map` (module Dns_map), which key is a resource record type and
|
|
value is the resource record. Both use the OCaml
|
|
[Map](http://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.html) ("also known
|
|
as finite maps or dictionaries, given a total ordering function over the
|
|
keys. All operations over maps are purely applicative (no side-effects). The
|
|
implementation uses balanced binary trees, and therefore searching and insertion
|
|
take time logarithmic in the size of the map").
|
|
|
|
The server looks up the queried name, and in the returned Dns_map the queried
|
|
type. The found resource records are sent as answer, which also includes the
|
|
question and authority information (NS records of the zone) and additional glue
|
|
records (IP addresses of names mentioned earlier in the same zone).
|
|
|
|
### Dns_map
|
|
|
|
The data structure which contains resource record types as key, and a collection
|
|
of matching resource records as values. In OCaml the value type must be
|
|
homogenous - using a normal sum type leads to an unneccessary unpacking step
|
|
(or lacking type information):
|
|
|
|
```OCaml
|
|
let lookup_ns t =
|
|
match Map.find NS t with
|
|
| None -> Error `NotFound
|
|
| Some (NS nameservers) -> Ok nameservers
|
|
| Some _ -> Error `NotFound
|
|
```
|
|
|
|
Instead, I use in my current rewrite [generalized algebraic data
|
|
types](https://en.wikipedia.org/wiki/Generalized_algebraic_data_type) (read
|
|
[OCaml manual](http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec251) and
|
|
[Mads Hartmann blog post about use cases for
|
|
GADTs](http://mads-hartmann.com/ocaml/2015/01/05/gadt-ocaml.html), [Andreas
|
|
Garnæs about using GADTs for GraphQL type
|
|
modifiers](https://andreas.github.io/2018/01/05/modeling-graphql-type-modifiers-with-gadts/))
|
|
to preserve a relation between key and value (and A record has a list of IPv4
|
|
addresses and a ttl as value) - similar to
|
|
[hmap](http://erratique.ch/software/hmap), but different: a closed key-value
|
|
mapping (the GADT), no int for each key and mutable state. Thanks to Justus
|
|
Matthiesen for helping me with GADTs and this code. Look into the
|
|
[interface](https://github.com/roburio/udns/blob/master/src/dns_map.mli) and
|
|
[implementation](https://github.com/roburio/udns/blob/master/src/dns_map.ml).
|
|
|
|
|
|
```OCaml
|
|
(* an ordering relation, I dislike using int for that *)
|
|
module Order = struct
|
|
type (_,_) t =
|
|
| Lt : ('a, 'b) t
|
|
| Eq : ('a, 'a) t
|
|
| Gt : ('a, 'b) t
|
|
end
|
|
|
|
module Key = struct
|
|
(* The key and its value type *)
|
|
type _ t =
|
|
| Soa : (int32 * Dns_packet.soa) t
|
|
| A : (int32 * Ipaddr.V4.t list) t
|
|
| Ns : (int32 * Dns_name.DomSet.t) t
|
|
| Cname : (int32 * Dns_name.t) t
|
|
|
|
(* we need a total order on our keys *)
|
|
let compare : type a b. a t -> b t -> (a, b) Order.t = fun t t' ->
|
|
let open Order in
|
|
match t, t' with
|
|
| Cname, Cname -> Eq | Cname, _ -> Lt | _, Cname -> Gt
|
|
| Ns, Ns -> Eq | Ns, _ -> Lt | _, Ns -> Gt
|
|
| Soa, Soa -> Eq | Soa, _ -> Lt | _, Soa -> Gt
|
|
| A, A -> Eq
|
|
end
|
|
|
|
type 'a key = 'a Key.t
|
|
|
|
(* our OCaml Map with an encapsulated constructor as key *)
|
|
type k = K : 'a key -> k
|
|
module M = Map.Make(struct
|
|
type t = k
|
|
(* the price I pay for not using int as three-state value *)
|
|
let compare (K a) (K b) = match Key.compare a b with
|
|
| Order.Lt -> -1
|
|
| Order.Eq -> 0
|
|
| Order.Gt -> 1
|
|
end)
|
|
|
|
(* v contains a key and value pair, wrapped by a single constructor *)
|
|
type v = V : 'a key * 'a -> v
|
|
|
|
(* t is the main type of a Dns_map, used by clients *)
|
|
type t = v M.t
|
|
|
|
(* retrieve a typed value out of the store *)
|
|
let get : type a. a Key.t -> t -> a = fun k t ->
|
|
match M.find (K k) t with
|
|
| V (k', v) ->
|
|
(* this comparison is superfluous, just for the types *)
|
|
match Key.compare k k' with
|
|
| Order.Eq -> v
|
|
| _ -> assert false
|
|
```
|
|
|
|
This helps me to programmaticaly retrieve tightly typed values from the cache,
|
|
important when code depends on concrete values (i.e. when there are domain
|
|
names, look these up as well and add as additional records). Look into [server/dns_server.ml](https://github.com/roburio/udns/blob/master/server/dns_server.ml)
|
|
|
|
### Dynamic updates, notifications, and authentication
|
|
|
|
[Dynamic updates](https://tools.ietf.org/html/rfc2136) specify in-protocol
|
|
record updates (supported for example by `nsupdate` from ISC bind-tools),
|
|
[notifications](https://tools.ietf.org/html/rfc1996) are used by primary servers
|
|
to notify secondary servers about updates, which then initiate a [zone
|
|
transfer](https://tools.ietf.org/html/rfc5936) to retrieve up to date
|
|
data. [Shared hmac secrets](https://tools.ietf.org/html/rfc2845) are used to
|
|
ensure that the transaction (update, zone transfer) was authorised. These are
|
|
all protocol extensions, there is no need to use out-of-protocol solutions.
|
|
|
|
The server logic for update and zone transfer frames is slightly more complex,
|
|
and includes a dependency upon an authenticator (implemented using the
|
|
[nocrypto](https://github.com/mirleft/ocaml-nocrypto) library, and
|
|
[ptime](http://erratique.ch/software/ptime)).
|
|
|
|
### Deployment and Let's Encrypt
|
|
|
|
To deploy servers without much persistent data, an authentication schema is
|
|
hardcoded in the dns-server: shared secrets are also stored as DNS entries
|
|
(DNSKEY), and `_transfer.zone`, `_update.zone`, and `_key-management.zone` names
|
|
are introduced to encode the permissions. A `_transfer` key also needs to
|
|
encode the IP address of the primary (to know where to request zone transfers)
|
|
and secondary IP (to know where to send notifications).
|
|
|
|
Please have a look at
|
|
[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary) and the [examples](https://github.com/roburio/udns/blob/master/mirage/examples) for more details. The shared secrets are provided as boot parameter of the unikernel.
|
|
|
|
I hacked maker's
|
|
[ocaml-letsencrypt](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate)
|
|
library to use µDNS and sending update frames to the given IP address. I
|
|
already used this to have letsencrypt issue various certificates for my domains.
|
|
|
|
There is no persistent storage of updates yet, but this can be realised by
|
|
implementing a secondary (which is notified on update) that writes every new
|
|
zone to persistent storage (e.g. [disk](https://github.com/mirage/mirage-block)
|
|
or [git](https://github.com/mirage/ocaml-git)). I also plan to have an
|
|
automated Let's Encrypt certificate unikernel which listens for certificate
|
|
signing requests and stores signed certificates in DNS. Luckily the year only
|
|
started and there's plenty of time left.
|
|
|
|
I'm interested in feedback, either via <strike>[twitter](https://twitter.com/h4nnes)</strike>
|
|
hannesm@mastodon.social or via eMail.
|