initial DNS
This commit is contained in:
parent
295403b642
commit
dbe8791b02
4 changed files with 320 additions and 4 deletions
|
@ -229,7 +229,7 @@ conex_author: [ERROR] package index arp was not found in repository
|
|||
|
||||
This shows your key material and accounts, team membership and packages you are
|
||||
authorised to modify (inferred as described
|
||||
[here](https://hannes.nqsb.io/Posts/Maintainer).
|
||||
[here](https://hannes.nqsb.io/Posts/Maintainers).
|
||||
|
||||
The `--noteam` argument limits the package list to only these you are personally
|
||||
authorised for. The `--id` argument presents you with a view of another author,
|
||||
|
|
311
Posts/DNS
Normal file
311
Posts/DNS
Normal file
|
@ -0,0 +1,311 @@
|
|||
---
|
||||
title: My 2018 contains robur and starts with re-engineering DNS
|
||||
author: hannes
|
||||
tags: mirageos, protocol
|
||||
abstract: New year brings new possibilities and a new environment. I've been working on the most Widely deployed key-value store, the domain name system. Primary and secondary name services are available, including dynamic updates, notify, and tsig authentication.
|
||||
---
|
||||
|
||||
## 2018
|
||||
|
||||
At the end of 2017, I resigned from my PostDoc position at University of
|
||||
Cambridge (in the [rems](https://www.cl.cam.ac.uk/~pes20/rems/) project). Early
|
||||
December 2017 I organised the [4th MirageOS hack
|
||||
retreat](https://mirage.io/blog/2017-winter-hackathon-roundup), with which I'm
|
||||
very satisfied. In March 2018 the [5th retreat](http://retreat.mirage.io) will
|
||||
happen (please sign up!).
|
||||
|
||||
In 2018 I moved to Berlin and started to work for the (non-profit) [Centre for
|
||||
the cultivation of technology](https://techcultivation.org) with our
|
||||
[robur.io](http://robur.io) project "At robur, we build performant bespoke
|
||||
minimal operating systems for high-assurance services". robur is only possible
|
||||
by generous donations in autumn 2017, enthusiastic collaborateurs, supportive
|
||||
friends, and a motivated community, thanks to all. We will receive funding from
|
||||
the [prototypefund](https://prototypefund.de/project/robur-io/) to work on a
|
||||
[CalDAV server](http://robur.io/Projects/CalDAV) implementation in OCaml
|
||||
targeting MirageOS. We're still looking for donations and further funding,
|
||||
please get in touch. Apart from CalDAV, I want to start the year by finishing
|
||||
several projects which I discovered on my hard drive. This includes DNS, [opam
|
||||
signing](/Posts/Conex), TCP, ... . My personal goal for 2018 is to develop a
|
||||
flexible `mirage deploy`, because after configuring and building a unikernel, I
|
||||
want to get it smoothly up and running (spoiler: I already use
|
||||
[albatross](/Posts/VMM) in production).
|
||||
|
||||
To kick off (3% of 2018 is already used) this year, I'll talk in more detail
|
||||
about [µDNS](https://github.com/roburio/udns), an opinionated from-scratch
|
||||
re-engineered DNS library, which I've been using since Christmas 2017 in production for
|
||||
[ns.nqsb.io](https://github.com/hannesm/ns.nqsb.io) and
|
||||
[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary). The
|
||||
development started in March 2017, and continued over several evenings and long
|
||||
weekends. My initial motivation was to implement a recursive resolver to run on
|
||||
my laptop. I had a working prototype in use on my laptop over 4 months in the
|
||||
summer 2017, but that code was not in a good shape, so I went down the rabbit
|
||||
hole and (re)wrote a server (and learned more about GADT). A configurable
|
||||
resolver needs a server, as local overlay, usually anyways. Furthermore,
|
||||
dynamic updates are standardised and thus a configuration interface exists
|
||||
inside the protocol, even with hmac-signatures for authentication!
|
||||
Coincidentally, I started to solve another issue, namely automated management of let's
|
||||
encrypt certificates (see [this
|
||||
branch](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate) for an
|
||||
initial hack). On my journey, I also reported a cache poisoning vulnerability,
|
||||
which was fixed in [Docker for
|
||||
Windows](https://docs.docker.com/docker-for-windows/release-notes/#docker-community-edition-17090-ce-win32-2017-10-02-stable).
|
||||
|
||||
But let's get started with some content. Please keep in mind that while the
|
||||
code is publicly available, it is not yet released (mainly since the test
|
||||
coverage is not high enough, and the lack of documentation). I appreciate early
|
||||
adopters, please let me know if you find any issues or find a use case which is
|
||||
not straightforward to solve. This won't be the last article about DNS this
|
||||
year - persistent storage, resolver, let's encrypt support are still missing.
|
||||
|
||||
## What is DNS?
|
||||
|
||||
The [domain name system](https://en.wikipedia.org/wiki/DNS) is a core Internet
|
||||
protocol, which translates domain names to IP addresses. A domain name is
|
||||
easier to memorise for human beings than an IP address. DNS is hierarchical and
|
||||
decentralised. It was initially "specified" in Nov 1987 in [RFC
|
||||
1034](https://tools.ietf.org/html/rfc1034) and [RFC
|
||||
1035](https://tools.ietf.org/html/rfc1035). Nowadays it spans over more than 20
|
||||
technical RFCs, 10 security related, 5 best current practises and another 10
|
||||
informational. The basic encoding and mechanisms did not change.
|
||||
|
||||
On the Internet, there is a set of root servers (administrated by IANA) which
|
||||
provide the information about which name servers are authoritative for which top level
|
||||
domain (such as ".com"). They provide the information about which name servers are
|
||||
responsible for which second level domain name (such as "example.com"), and so
|
||||
on. There are at least two name servers for each domain name in separate
|
||||
networks - in case one is unavailable the other can be reached.
|
||||
|
||||
The building blocks for DNS are: the resolver, a stub (`gethostbyname` provided
|
||||
by your C library) or caching forwarding resolver (at your ISP), which send DNS
|
||||
packets to another resolver, or a recursive resolver which, once seeded with the
|
||||
root servers, finds out the IP address of a requested domain name. The other
|
||||
part are authoritative servers, which reply to requests for their configured
|
||||
domain.
|
||||
|
||||
To get some terminology, a DNS client sends a query, consisting of a domain
|
||||
name and a query type, and expects a set of answers, which are called resource
|
||||
records, and contain: name, time to live, type, and data. The resolver
|
||||
iteratively requests resource records from authoritative servers, until the requested
|
||||
domain name is resolved or fails (name does not exist, server
|
||||
failure, server offline).
|
||||
|
||||
DNS usually uses UDP as transport which is not reliable and limited to 512 byte
|
||||
payload on the Internet (due to various middleboxes). DNS can also be
|
||||
transported via TCP, and even via TLS over UDP or TCP. If a DNS packet
|
||||
transferred via UDP is larger than 512 bytes, it is cut at the 512 byte mark,
|
||||
and a bit in its header is set. The receiver can decide whether to use the 512
|
||||
bytes of information, or to throw it away and attempt a TCP connection.
|
||||
|
||||
### DNS packet
|
||||
|
||||
The packet encoding starts with a 16bit identifier followed by a 16bit header
|
||||
(containing operation, flags, status code), and four counters, each 16bit,
|
||||
specifying the amount of resource records in the body: questions, answers,
|
||||
authority records, and additional records. The header starts with one bit
|
||||
operation (query or response), four bits opcode, various flags (recursion,
|
||||
authoritative, truncation, ...), and the last four bit encode the response code.
|
||||
|
||||
A question consists of a domain name, a query type, and a query class. A
|
||||
resource record additionally contains a 32bit time to live, a length, and the
|
||||
data.
|
||||
|
||||
Each domain name is a case sensitive string of up to 255 bytes, separated by `.`
|
||||
into labels of up to 63 bytes each. A label is either encoded by its length
|
||||
followed by the content, or by an offset to the start of a label in the current
|
||||
DNS frame (poor mans compression). Care must be taken during decoding to avoid
|
||||
cycles in offsets. Common operations on domain names are comparison: equality,
|
||||
ordering, and also whether some domain name is a subdomain of another domain
|
||||
name, should be efficient. My initial representation naïvely was a list of
|
||||
strings, now it is an array of strings in reverse order. This speeds up common
|
||||
operations by a factor of 5 (see test/bench.ml).
|
||||
|
||||
The only really used class is `IN` (for Internet), as mentioned in [RFC
|
||||
6895](https://tools.ietf.org/html/rfc6895). Various query types (`MD`, `MF`,
|
||||
`MB`, `MG`, `MR`, `NULL`, `AFSDB`, ...) are barely or never used. There is no
|
||||
need to convolute the implementation and its API with these legacy options (if
|
||||
you have a use case and see those in the wild, please tell me).
|
||||
|
||||
My implemented packet decoding does decompression, only allows valid internet
|
||||
domain names, and may return a partial parse - to use as many resource records
|
||||
in truncated packets as possible. There are no exceptions raised, the parsing
|
||||
uses a monadic style error handling. Since label decompression requires the
|
||||
parser to know absolute offsets, the original buffer and the offset is manually
|
||||
passed around at all times, instead of using smaller views on the buffer. The
|
||||
decoder does not allow for gaps, when the outer resource data length specifies a
|
||||
byte length which is not completely consumed by the specific resource data
|
||||
subparser (an A record must always consume four bytes). Failing to check this can
|
||||
lead to a way to exfiltrate data without getting noticed.
|
||||
|
||||
Each zone (a served domain name) contains a SOA "start of authority" entry,
|
||||
which includes the primary nameserver name, the hostmaster's email address (both
|
||||
encoded as domain name), a serial number of the zone, a refresh, retry, expiry,
|
||||
and minimum interval (all encoded as 32bit unsigned number in seconds). Common
|
||||
resource records include A, which payload is 32bit IPv4 address. A nameserver
|
||||
(NS) record carries a domain name as payload. A mail exchange (MX) whose
|
||||
payload is a 16bit priority and a domain name. A CNAME record is an alias to
|
||||
another domain name. These days, there are even records to specify the
|
||||
certificate authority authorisation (CAA) records containing a flag (critical),
|
||||
a tag ("issue") and a value ("letsencrypt.org").
|
||||
|
||||
## Server
|
||||
|
||||
The operation of a DNS server is to listen for a request and serve a reply.
|
||||
Data to be served can be canonically encoded (the RFC describes the format) in a
|
||||
zone file. Apart from insecurity in DNS server implementations, another attack
|
||||
vector are amplification attacks where an attacker crafts a small UDP frame
|
||||
with a fake source IP address, and the server answers with a large response to
|
||||
that address which may lead to a DoS attack. Various mitigations exist
|
||||
including rate limiting, serving large replies only via TCP, ...
|
||||
|
||||
Internally, the zone file data is stored in a tree (module
|
||||
[Dns_trie](https://github.com/roburio/udns/blob/master/server/dns_trie.mli)
|
||||
[implementation](https://github.com/roburio/udns/blob/master/server/dns_trie.ml)),
|
||||
where each node contains two maps: `sub`, which key is a label and value is a
|
||||
subtree and `dns_map` (module Dns_map), which key is a resource record type and
|
||||
value is the resource record. Both use the OCaml
|
||||
[Map](http://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.html) ("also known
|
||||
as finite maps or dictionaries, given a total ordering function over the
|
||||
keys. All operations over maps are purely applicative (no side-effects). The
|
||||
implementation uses balanced binary trees, and therefore searching and insertion
|
||||
take time logarithmic in the size of the map").
|
||||
|
||||
The server looks up the queried name, and in the returned Dns_map the queried
|
||||
type. The found resource records are sent as answer, which also includes the
|
||||
question and authority information (NS records of the zone) and additional glue
|
||||
records (IP addresses of names mentioned earlier in the same zone).
|
||||
|
||||
### Dns_map
|
||||
|
||||
The data structure which contains resource record types as key, and a collection
|
||||
of matching resource records as values. In OCaml the value type must be
|
||||
homogenous - using a normal sum type leads to an unneccessary unpacking step
|
||||
(or lacking type information):
|
||||
|
||||
```OCaml
|
||||
let lookup_ns t =
|
||||
match Map.find NS t with
|
||||
| None -> Error `NotFound
|
||||
| Some (NS nameservers) -> Ok nameservers
|
||||
| Some _ -> Error `NotFound
|
||||
```
|
||||
|
||||
Instead, I use in my current rewrite [generalized algebraic data
|
||||
types](https://en.wikipedia.org/wiki/Generalized_algebraic_data_type) (read
|
||||
[OCaml manual](http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec251) and
|
||||
[Mads Hartmann blog post about use cases for
|
||||
GADTs](http://mads-hartmann.com/ocaml/2015/01/05/gadt-ocaml.html), [Andreas
|
||||
Garnæs about using GADTs for GraphQL type
|
||||
modifiers](https://andreas.github.io/2018/01/05/modeling-graphql-type-modifiers-with-gadts/))
|
||||
to preserve a relation between key and value (and A record has a list of IPv4
|
||||
addresses and a ttl as value) - similar to
|
||||
[hmap](http://erratique.ch/software/hmap), but different: a closed key-value
|
||||
mapping (the GADT), no int for each key and mutable state. Thanks to Justus
|
||||
Matthiesen for helping me with GADTs and this code. Look into the
|
||||
[interface](https://github.com/roburio/udns/blob/master/src/dns_map.mli) and
|
||||
[implementation](https://github.com/roburio/udns/blob/master/src/dns_map.ml).
|
||||
|
||||
|
||||
```OCaml
|
||||
(* an ordering relation, I dislike using int for that *)
|
||||
module Order = struct
|
||||
type (_,_) t =
|
||||
| Lt : ('a, 'b) t
|
||||
| Eq : ('a, 'a) t
|
||||
| Gt : ('a, 'b) t
|
||||
end
|
||||
|
||||
module Key = struct
|
||||
(* The key and its value type *)
|
||||
type _ t =
|
||||
| Soa : (int32 * Dns_packet.soa) t
|
||||
| A : (int32 * Ipaddr.V4.t list) t
|
||||
| Ns : (int32 * Dns_name.DomSet.t) t
|
||||
| Cname : (int32 * Dns_name.t) t
|
||||
|
||||
(* we need a total order on our keys *)
|
||||
let compare : type a b. a t -> b t -> (a, b) Order.t = fun t t' ->
|
||||
let open Order in
|
||||
match t, t' with
|
||||
| Cname, Cname -> Eq | Cname, _ -> Lt | _, Cname -> Gt
|
||||
| Ns, Ns -> Eq | Ns, _ -> Lt | _, Ns -> Gt
|
||||
| Soa, Soa -> Eq | Soa, _ -> Lt | _, Soa -> Gt
|
||||
| A, A -> Eq
|
||||
end
|
||||
|
||||
type 'a key = 'a Key.t
|
||||
|
||||
(* our OCaml Map with an encapsulated constructor as key *)
|
||||
type k = K : 'a key -> k
|
||||
module M = Map.Make(struct
|
||||
type t = k
|
||||
(* the price I pay for not using int as three-state value *)
|
||||
let compare (K a) (K b) = match Key.compare a b with
|
||||
| Order.Lt -> -1
|
||||
| Order.Eq -> 0
|
||||
| Order.Gt -> 1
|
||||
end)
|
||||
|
||||
(* v contains a key and value pair, wrapped by a single constructor *)
|
||||
type v = V : 'a key * 'a -> v
|
||||
|
||||
(* t is the main type of a Dns_map, used by clients *)
|
||||
type t = v M.t
|
||||
|
||||
(* retrieve a typed value out of the store *)
|
||||
let get : type a. a Key.t -> t -> a = fun k t ->
|
||||
match M.find (K k) t with
|
||||
| V (k', v) ->
|
||||
(* this comparison is superfluous, just for the types *)
|
||||
match Key.compare k k' with
|
||||
| Order.Eq -> v
|
||||
| _ -> assert false
|
||||
```
|
||||
|
||||
This helps me to programmaticaly retrieve tightly typed values from the cache,
|
||||
important when code depends on concrete values (i.e. when there are domain
|
||||
names, look these up as well and add as additional records). Look into [server/dns_server.ml](https://github.com/roburio/udns/blob/master/server/dns_server.ml)
|
||||
|
||||
### Dynamic updates, notifications, and authentication
|
||||
|
||||
[Dynamic updates](https://tools.ietf.org/html/rfc2136) specify in-protocol
|
||||
record updates (supported for example by `nsupdate` from ISC bind-tools),
|
||||
[notifications](https://tools.ietf.org/html/rfc1996) are used by primary servers
|
||||
to notify secondary servers about updates, which then initiate a [zone
|
||||
transfer](https://tools.ietf.org/html/rfc5936) to retrieve up to date
|
||||
data. [Shared hmac secrets](https://tools.ietf.org/html/rfc2845) are used to
|
||||
ensure that the transaction (update, zone transfer) was authorised. These are
|
||||
all protocol extensions, there is no need to use out-of-protocol solutions.
|
||||
|
||||
The server logic for update and zone transfer frames is slightly more complex,
|
||||
and includes a dependency upon an authenticator (implemented using the
|
||||
[nocrypto](https://github.com/mirleft/ocaml-nocrypto) library, and
|
||||
[ptime](http://erratique.ch/software/ptime)).
|
||||
|
||||
### Deployment and Let's Encrypt
|
||||
|
||||
To deploy servers without much persistent data, an authentication schema is
|
||||
hardcoded in the dns-server: shared secrets are also stored as DNS entries
|
||||
(DNSKEY), and `_transfer.zone`, `_update.zone`, and `_key-management.zone` names
|
||||
are introduced to encode the permissions. A `_transfer` key also needs to
|
||||
encode the IP address of the primary (to know where to request zone transfers)
|
||||
and secondary IP (to know where to send notifications).
|
||||
|
||||
Please have a look at
|
||||
[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary) and the [examples](https://github.com/roburio/udns/blob/master/mirage/examples) for more details. The shared secrets are provided as boot parameter of the unikernel.
|
||||
|
||||
I hacked maker's
|
||||
[ocaml-letsencrypt](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate)
|
||||
library to use µDNS and sending update frames to the given IP address. I
|
||||
already used this to have letsencrypt issue various certificates for my domains.
|
||||
|
||||
There is no persistent storage of updates yet, but this can be realised by
|
||||
implementing a secondary (which is notified on update) that writes every new
|
||||
zone to persistent storage (e.g. [disk](https://github.com/mirage/mirage-block)
|
||||
or [git](https://github.com/mirage/ocaml-git)). I also plan to have an
|
||||
automated Let's Encrypt certificate unikernel which listens for certificate
|
||||
signing requests and stores signed certificates in DNS. Luckily the year only
|
||||
started and there's plenty of time left.
|
||||
|
||||
I'm interested in feedback, either via <strike>[twitter](https://twitter.com/h4nnes)</strike>
|
||||
hannesm@mastodon.social or an issue on the [data
|
||||
repository](https://github.com/hannesm/hannes.nqsb.io/issues).
|
|
@ -44,8 +44,8 @@ create virtual network interfaces and execute virtual machines.
|
|||
To get rid of these ad-hoc shell scripts and copying of virtual machine images,
|
||||
I developed an UNIX daemon which accomplishes the required work. This daemon
|
||||
waits for (mutually!) authenticated network connections, and provides the
|
||||
desired commands; to create a new virtual machine, to aquire a block device of a
|
||||
given size, to destroy a virtual machine, to stream the console output of a
|
||||
desired commands; to create a new virtual machine, to acquire a block device of
|
||||
a given size, to destroy a virtual machine, to stream the console output of a
|
||||
virtual machine.
|
||||
|
||||
## System design
|
||||
|
@ -87,7 +87,7 @@ virtual machines.
|
|||
Connecting to the vmmd requires a TLS client, a CA certificate, a leaf
|
||||
certificate (and the delegation chain) and its private key. In the background,
|
||||
it is a multi-step process using TLS: first, the client establishes a TLS
|
||||
connection where it authenticates the server using the CA certificae, then the
|
||||
connection where it authenticates the server using the CA certificate, then the
|
||||
server demands a TLS renegotiation where it requires the client to authenticate
|
||||
with its leaf certificate and private key. Using renegotiation over the
|
||||
encrypted channel prevents passive observers to see the client certificate in
|
||||
|
|
|
@ -18,6 +18,10 @@ html a:visited {
|
|||
color: black;
|
||||
}
|
||||
|
||||
p {
|
||||
line-height: 1.6;
|
||||
}
|
||||
|
||||
.navbar {
|
||||
box-sizing: border-box;
|
||||
box-shadow: 0 2px 2px -2px rgba(0,0,0,.15);
|
||||
|
@ -61,6 +65,7 @@ footer {
|
|||
|
||||
pre {
|
||||
padding: 0px;
|
||||
line-height: 1.3;
|
||||
}
|
||||
|
||||
body h2 {
|
||||
|
|
Loading…
Reference in a new issue