diff --git a/Posts/Conex b/Posts/Conex index c34be73..2e3b7ae 100644 --- a/Posts/Conex +++ b/Posts/Conex @@ -229,7 +229,7 @@ conex_author: [ERROR] package index arp was not found in repository This shows your key material and accounts, team membership and packages you are authorised to modify (inferred as described -[here](https://hannes.nqsb.io/Posts/Maintainer). +[here](https://hannes.nqsb.io/Posts/Maintainers). The `--noteam` argument limits the package list to only these you are personally authorised for. The `--id` argument presents you with a view of another author, diff --git a/Posts/DNS b/Posts/DNS new file mode 100644 index 0000000..71c64a3 --- /dev/null +++ b/Posts/DNS @@ -0,0 +1,311 @@ +--- +title: My 2018 contains robur and starts with re-engineering DNS +author: hannes +tags: mirageos, protocol +abstract: New year brings new possibilities and a new environment. I've been working on the most Widely deployed key-value store, the domain name system. Primary and secondary name services are available, including dynamic updates, notify, and tsig authentication. +--- + +## 2018 + +At the end of 2017, I resigned from my PostDoc position at University of +Cambridge (in the [rems](https://www.cl.cam.ac.uk/~pes20/rems/) project). Early +December 2017 I organised the [4th MirageOS hack +retreat](https://mirage.io/blog/2017-winter-hackathon-roundup), with which I'm +very satisfied. In March 2018 the [5th retreat](http://retreat.mirage.io) will +happen (please sign up!). + +In 2018 I moved to Berlin and started to work for the (non-profit) [Centre for +the cultivation of technology](https://techcultivation.org) with our +[robur.io](http://robur.io) project "At robur, we build performant bespoke +minimal operating systems for high-assurance services". robur is only possible +by generous donations in autumn 2017, enthusiastic collaborateurs, supportive +friends, and a motivated community, thanks to all. We will receive funding from +the [prototypefund](https://prototypefund.de/project/robur-io/) to work on a +[CalDAV server](http://robur.io/Projects/CalDAV) implementation in OCaml +targeting MirageOS. We're still looking for donations and further funding, +please get in touch. Apart from CalDAV, I want to start the year by finishing +several projects which I discovered on my hard drive. This includes DNS, [opam +signing](/Posts/Conex), TCP, ... . My personal goal for 2018 is to develop a +flexible `mirage deploy`, because after configuring and building a unikernel, I +want to get it smoothly up and running (spoiler: I already use +[albatross](/Posts/VMM) in production). + +To kick off (3% of 2018 is already used) this year, I'll talk in more detail +about [µDNS](https://github.com/roburio/udns), an opinionated from-scratch +re-engineered DNS library, which I've been using since Christmas 2017 in production for +[ns.nqsb.io](https://github.com/hannesm/ns.nqsb.io) and +[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary). The +development started in March 2017, and continued over several evenings and long +weekends. My initial motivation was to implement a recursive resolver to run on +my laptop. I had a working prototype in use on my laptop over 4 months in the +summer 2017, but that code was not in a good shape, so I went down the rabbit +hole and (re)wrote a server (and learned more about GADT). A configurable +resolver needs a server, as local overlay, usually anyways. Furthermore, +dynamic updates are standardised and thus a configuration interface exists +inside the protocol, even with hmac-signatures for authentication! +Coincidentally, I started to solve another issue, namely automated management of let's +encrypt certificates (see [this +branch](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate) for an +initial hack). On my journey, I also reported a cache poisoning vulnerability, +which was fixed in [Docker for +Windows](https://docs.docker.com/docker-for-windows/release-notes/#docker-community-edition-17090-ce-win32-2017-10-02-stable). + +But let's get started with some content. Please keep in mind that while the +code is publicly available, it is not yet released (mainly since the test +coverage is not high enough, and the lack of documentation). I appreciate early +adopters, please let me know if you find any issues or find a use case which is +not straightforward to solve. This won't be the last article about DNS this +year - persistent storage, resolver, let's encrypt support are still missing. + +## What is DNS? + +The [domain name system](https://en.wikipedia.org/wiki/DNS) is a core Internet +protocol, which translates domain names to IP addresses. A domain name is +easier to memorise for human beings than an IP address. DNS is hierarchical and +decentralised. It was initially "specified" in Nov 1987 in [RFC +1034](https://tools.ietf.org/html/rfc1034) and [RFC +1035](https://tools.ietf.org/html/rfc1035). Nowadays it spans over more than 20 +technical RFCs, 10 security related, 5 best current practises and another 10 +informational. The basic encoding and mechanisms did not change. + +On the Internet, there is a set of root servers (administrated by IANA) which +provide the information about which name servers are authoritative for which top level +domain (such as ".com"). They provide the information about which name servers are +responsible for which second level domain name (such as "example.com"), and so +on. There are at least two name servers for each domain name in separate +networks - in case one is unavailable the other can be reached. + +The building blocks for DNS are: the resolver, a stub (`gethostbyname` provided +by your C library) or caching forwarding resolver (at your ISP), which send DNS +packets to another resolver, or a recursive resolver which, once seeded with the +root servers, finds out the IP address of a requested domain name. The other +part are authoritative servers, which reply to requests for their configured +domain. + +To get some terminology, a DNS client sends a query, consisting of a domain +name and a query type, and expects a set of answers, which are called resource +records, and contain: name, time to live, type, and data. The resolver +iteratively requests resource records from authoritative servers, until the requested +domain name is resolved or fails (name does not exist, server +failure, server offline). + +DNS usually uses UDP as transport which is not reliable and limited to 512 byte +payload on the Internet (due to various middleboxes). DNS can also be +transported via TCP, and even via TLS over UDP or TCP. If a DNS packet +transferred via UDP is larger than 512 bytes, it is cut at the 512 byte mark, +and a bit in its header is set. The receiver can decide whether to use the 512 +bytes of information, or to throw it away and attempt a TCP connection. + +### DNS packet + +The packet encoding starts with a 16bit identifier followed by a 16bit header +(containing operation, flags, status code), and four counters, each 16bit, +specifying the amount of resource records in the body: questions, answers, +authority records, and additional records. The header starts with one bit +operation (query or response), four bits opcode, various flags (recursion, +authoritative, truncation, ...), and the last four bit encode the response code. + +A question consists of a domain name, a query type, and a query class. A +resource record additionally contains a 32bit time to live, a length, and the +data. + +Each domain name is a case sensitive string of up to 255 bytes, separated by `.` +into labels of up to 63 bytes each. A label is either encoded by its length +followed by the content, or by an offset to the start of a label in the current +DNS frame (poor mans compression). Care must be taken during decoding to avoid +cycles in offsets. Common operations on domain names are comparison: equality, +ordering, and also whether some domain name is a subdomain of another domain +name, should be efficient. My initial representation naïvely was a list of +strings, now it is an array of strings in reverse order. This speeds up common +operations by a factor of 5 (see test/bench.ml). + +The only really used class is `IN` (for Internet), as mentioned in [RFC +6895](https://tools.ietf.org/html/rfc6895). Various query types (`MD`, `MF`, +`MB`, `MG`, `MR`, `NULL`, `AFSDB`, ...) are barely or never used. There is no +need to convolute the implementation and its API with these legacy options (if +you have a use case and see those in the wild, please tell me). + +My implemented packet decoding does decompression, only allows valid internet +domain names, and may return a partial parse - to use as many resource records +in truncated packets as possible. There are no exceptions raised, the parsing +uses a monadic style error handling. Since label decompression requires the +parser to know absolute offsets, the original buffer and the offset is manually +passed around at all times, instead of using smaller views on the buffer. The +decoder does not allow for gaps, when the outer resource data length specifies a +byte length which is not completely consumed by the specific resource data +subparser (an A record must always consume four bytes). Failing to check this can +lead to a way to exfiltrate data without getting noticed. + +Each zone (a served domain name) contains a SOA "start of authority" entry, +which includes the primary nameserver name, the hostmaster's email address (both +encoded as domain name), a serial number of the zone, a refresh, retry, expiry, +and minimum interval (all encoded as 32bit unsigned number in seconds). Common +resource records include A, which payload is 32bit IPv4 address. A nameserver +(NS) record carries a domain name as payload. A mail exchange (MX) whose +payload is a 16bit priority and a domain name. A CNAME record is an alias to +another domain name. These days, there are even records to specify the +certificate authority authorisation (CAA) records containing a flag (critical), +a tag ("issue") and a value ("letsencrypt.org"). + +## Server + +The operation of a DNS server is to listen for a request and serve a reply. +Data to be served can be canonically encoded (the RFC describes the format) in a +zone file. Apart from insecurity in DNS server implementations, another attack +vector are amplification attacks where an attacker crafts a small UDP frame +with a fake source IP address, and the server answers with a large response to +that address which may lead to a DoS attack. Various mitigations exist +including rate limiting, serving large replies only via TCP, ... + +Internally, the zone file data is stored in a tree (module +[Dns_trie](https://github.com/roburio/udns/blob/master/server/dns_trie.mli) +[implementation](https://github.com/roburio/udns/blob/master/server/dns_trie.ml)), +where each node contains two maps: `sub`, which key is a label and value is a +subtree and `dns_map` (module Dns_map), which key is a resource record type and +value is the resource record. Both use the OCaml +[Map](http://caml.inria.fr/pub/docs/manual-ocaml/libref/Map.html) ("also known +as finite maps or dictionaries, given a total ordering function over the +keys. All operations over maps are purely applicative (no side-effects). The +implementation uses balanced binary trees, and therefore searching and insertion +take time logarithmic in the size of the map"). + +The server looks up the queried name, and in the returned Dns_map the queried +type. The found resource records are sent as answer, which also includes the +question and authority information (NS records of the zone) and additional glue +records (IP addresses of names mentioned earlier in the same zone). + +### Dns_map + +The data structure which contains resource record types as key, and a collection +of matching resource records as values. In OCaml the value type must be +homogenous - using a normal sum type leads to an unneccessary unpacking step +(or lacking type information): + +```OCaml +let lookup_ns t = + match Map.find NS t with + | None -> Error `NotFound + | Some (NS nameservers) -> Ok nameservers + | Some _ -> Error `NotFound +``` + +Instead, I use in my current rewrite [generalized algebraic data +types](https://en.wikipedia.org/wiki/Generalized_algebraic_data_type) (read +[OCaml manual](http://caml.inria.fr/pub/docs/manual-ocaml/extn.html#sec251) and +[Mads Hartmann blog post about use cases for +GADTs](http://mads-hartmann.com/ocaml/2015/01/05/gadt-ocaml.html), [Andreas +Garnæs about using GADTs for GraphQL type +modifiers](https://andreas.github.io/2018/01/05/modeling-graphql-type-modifiers-with-gadts/)) +to preserve a relation between key and value (and A record has a list of IPv4 +addresses and a ttl as value) - similar to +[hmap](http://erratique.ch/software/hmap), but different: a closed key-value +mapping (the GADT), no int for each key and mutable state. Thanks to Justus +Matthiesen for helping me with GADTs and this code. Look into the +[interface](https://github.com/roburio/udns/blob/master/src/dns_map.mli) and +[implementation](https://github.com/roburio/udns/blob/master/src/dns_map.ml). + + +```OCaml +(* an ordering relation, I dislike using int for that *) +module Order = struct + type (_,_) t = + | Lt : ('a, 'b) t + | Eq : ('a, 'a) t + | Gt : ('a, 'b) t +end + +module Key = struct + (* The key and its value type *) + type _ t = + | Soa : (int32 * Dns_packet.soa) t + | A : (int32 * Ipaddr.V4.t list) t + | Ns : (int32 * Dns_name.DomSet.t) t + | Cname : (int32 * Dns_name.t) t + + (* we need a total order on our keys *) + let compare : type a b. a t -> b t -> (a, b) Order.t = fun t t' -> + let open Order in + match t, t' with + | Cname, Cname -> Eq | Cname, _ -> Lt | _, Cname -> Gt + | Ns, Ns -> Eq | Ns, _ -> Lt | _, Ns -> Gt + | Soa, Soa -> Eq | Soa, _ -> Lt | _, Soa -> Gt + | A, A -> Eq +end + +type 'a key = 'a Key.t + +(* our OCaml Map with an encapsulated constructor as key *) +type k = K : 'a key -> k +module M = Map.Make(struct + type t = k + (* the price I pay for not using int as three-state value *) + let compare (K a) (K b) = match Key.compare a b with + | Order.Lt -> -1 + | Order.Eq -> 0 + | Order.Gt -> 1 + end) + +(* v contains a key and value pair, wrapped by a single constructor *) +type v = V : 'a key * 'a -> v + +(* t is the main type of a Dns_map, used by clients *) +type t = v M.t + +(* retrieve a typed value out of the store *) +let get : type a. a Key.t -> t -> a = fun k t -> + match M.find (K k) t with + | V (k', v) -> + (* this comparison is superfluous, just for the types *) + match Key.compare k k' with + | Order.Eq -> v + | _ -> assert false +``` + +This helps me to programmaticaly retrieve tightly typed values from the cache, +important when code depends on concrete values (i.e. when there are domain +names, look these up as well and add as additional records). Look into [server/dns_server.ml](https://github.com/roburio/udns/blob/master/server/dns_server.ml) + +### Dynamic updates, notifications, and authentication + +[Dynamic updates](https://tools.ietf.org/html/rfc2136) specify in-protocol +record updates (supported for example by `nsupdate` from ISC bind-tools), +[notifications](https://tools.ietf.org/html/rfc1996) are used by primary servers +to notify secondary servers about updates, which then initiate a [zone +transfer](https://tools.ietf.org/html/rfc5936) to retrieve up to date +data. [Shared hmac secrets](https://tools.ietf.org/html/rfc2845) are used to +ensure that the transaction (update, zone transfer) was authorised. These are +all protocol extensions, there is no need to use out-of-protocol solutions. + +The server logic for update and zone transfer frames is slightly more complex, +and includes a dependency upon an authenticator (implemented using the +[nocrypto](https://github.com/mirleft/ocaml-nocrypto) library, and +[ptime](http://erratique.ch/software/ptime)). + +### Deployment and Let's Encrypt + +To deploy servers without much persistent data, an authentication schema is +hardcoded in the dns-server: shared secrets are also stored as DNS entries +(DNSKEY), and `_transfer.zone`, `_update.zone`, and `_key-management.zone` names +are introduced to encode the permissions. A `_transfer` key also needs to +encode the IP address of the primary (to know where to request zone transfers) +and secondary IP (to know where to send notifications). + +Please have a look at +[ns.robur.io](https://git.robur.io/?p=ns.robur.io.git;a=summary) and the [examples](https://github.com/roburio/udns/blob/master/mirage/examples) for more details. The shared secrets are provided as boot parameter of the unikernel. + +I hacked maker's +[ocaml-letsencrypt](https://github.com/hannesm/ocaml-letsencrypt/tree/nsupdate) +library to use µDNS and sending update frames to the given IP address. I +already used this to have letsencrypt issue various certificates for my domains. + +There is no persistent storage of updates yet, but this can be realised by +implementing a secondary (which is notified on update) that writes every new +zone to persistent storage (e.g. [disk](https://github.com/mirage/mirage-block) +or [git](https://github.com/mirage/ocaml-git)). I also plan to have an +automated Let's Encrypt certificate unikernel which listens for certificate +signing requests and stores signed certificates in DNS. Luckily the year only +started and there's plenty of time left. + +I'm interested in feedback, either via [twitter](https://twitter.com/h4nnes) +hannesm@mastodon.social or an issue on the [data +repository](https://github.com/hannesm/hannes.nqsb.io/issues). diff --git a/Posts/VMM b/Posts/VMM index 7b0fdd8..b385195 100644 --- a/Posts/VMM +++ b/Posts/VMM @@ -44,8 +44,8 @@ create virtual network interfaces and execute virtual machines. To get rid of these ad-hoc shell scripts and copying of virtual machine images, I developed an UNIX daemon which accomplishes the required work. This daemon waits for (mutually!) authenticated network connections, and provides the -desired commands; to create a new virtual machine, to aquire a block device of a -given size, to destroy a virtual machine, to stream the console output of a +desired commands; to create a new virtual machine, to acquire a block device of +a given size, to destroy a virtual machine, to stream the console output of a virtual machine. ## System design @@ -87,7 +87,7 @@ virtual machines. Connecting to the vmmd requires a TLS client, a CA certificate, a leaf certificate (and the delegation chain) and its private key. In the background, it is a multi-step process using TLS: first, the client establishes a TLS -connection where it authenticates the server using the CA certificae, then the +connection where it authenticates the server using the CA certificate, then the server demands a TLS renegotiation where it requires the client to authenticate with its leaf certificate and private key. Using renegotiation over the encrypted channel prevents passive observers to see the client certificate in diff --git a/static/css/style.css b/static/css/style.css index 7a5d637..a96cde9 100644 --- a/static/css/style.css +++ b/static/css/style.css @@ -18,6 +18,10 @@ html a:visited { color: black; } +p { + line-height: 1.6; +} + .navbar { box-sizing: border-box; box-shadow: 0 2px 2px -2px rgba(0,0,0,.15); @@ -61,6 +65,7 @@ footer { pre { padding: 0px; + line-height: 1.3; } body h2 {