About
Published: 2016-04-01 (last updated: 2021-11-19)
What is a "full stack engineer"?
+About
Published: 2016-04-01 (last updated: 2023-11-20)
What is a "full stack engineer"?
Analysing the word literally, we should start with silicon and some electrons, maybe a soldering iron, and build everything all the way up to our favourite communication system.
@@ -72,7 +72,7 @@ Morocco. A good friend of mine pointed me to Mirage clean-slate operating system written in the high-level language OCaml. I got hooked pretty fast, after some experience with LISP machines I imagined a modern OS written in a single functional programming language. -From summer 2014 until end of 2017 I worked as a postdoctoral researcher at University of Cambridge (in the rigorous engineering of mainstream systems project) with Peter Sewell. I primarily worked on TLS, MirageOS, opam signing, and network semantics. In 2018 I relocated back to Berlin and am working on robur.
+From summer 2014 until end of 2017 I worked as a postdoctoral researcher at University of Cambridge (in the rigorous engineering of mainstream systems project) with Peter Sewell. I primarily worked on TLS, MirageOS, opam signing, and network semantics. In 2018 I relocated back to Berlin and am working on robur.
MirageOS had various bits and pieces into place, including infrastructure for
building and testing (and a neat self-hosted website). A big gap was security.
No access control, no secure sockets layer, nothing. This will be the topic of
@@ -84,5 +84,5 @@ not invalidated :-)" Xavier Leroy
You can find me on twitter and on
GitHub. The data of this blog is stored in a git repository. The data of this blog is stored in a git repository.Me on the intertubes
Conex, establish trust in community repositories
Published: 2017-02-16 (last updated: 2021-11-19)
Less than two years after the initial proposal, we're happy to present conex +
Conex, establish trust in community repositories
Published: 2017-02-16 (last updated: 2023-11-20)
Less than two years after the initial proposal, we're happy to present conex 0.9.2. Pleas note that this is still work in progress, to be deployed with opam 2.0 and the opam repository.
@@ -313,5 +313,5 @@ cannot enable them. verification experiments, and opam2 integration.I'm interested in feedback, please open an issue on the conex repository. This article itself is stored as -Markdown in a different repository.
+Markdown in a different repository.My 2018 contains robur and starts with re-engineering DNS
Published: 2018-01-11 (last updated: 2021-11-19)
2018
+My 2018 contains robur and starts with re-engineering DNS
Published: 2018-01-11 (last updated: 2023-11-20)
2018
At the end of 2017, I resigned from my PostDoc position at University of Cambridge (in the rems project). Early December 2017 I organised the 4th MirageOS hack @@ -8,12 +8,12 @@ very satisfied. In March 2018 the 5th retrea happen (please sign up!).
In 2018 I moved to Berlin and started to work for the (non-profit) Center for
the cultivation of technology with our
-robur.io project "At robur, we build performant bespoke
+robur.coop project "At robur, we build performant bespoke
minimal operating systems for high-assurance services". robur is only possible
by generous donations in autumn 2017, enthusiastic collaborateurs, supportive
friends, and a motivated community, thanks to all. We will receive funding from
the prototypefund to work on a
-CalDAV server implementation in OCaml
+CalDAV server implementation in OCaml
targeting MirageOS. We're still looking for donations and further funding,
please get in touch. Apart from CalDAV, I want to start the year by finishing
several projects which I discovered on my hard drive. This includes DNS, opam
diff --git a/Posts/Deploy b/Posts/Deploy
index 15e3bee..a17b825 100644
--- a/Posts/Deploy
+++ b/Posts/Deploy
@@ -1,5 +1,5 @@
- MirageOS development focus has been a lot on tooling and the developer experience, but to accomplish our goal to "get MirageOS into production", we need to lower the barrier. This means for us to release binary unikernels. As described earlier, we received a grant for "Deploying MirageOS" from NGI Pointer to work on the required infrastructure. This is joint work with Reynir. We provide at builds.robur.coop binary unikernel images (and supplementary software). Doing binary releases of MirageOS unikernels is challenging in two aspects: firstly to be useful for everyone, a binary unikernel should not contain any configuration (such as private keys, certificates, etc.). Secondly, the binaries should be reproducible. This is crucial for security; everyone can reproduce the exact same binary and verify that our build service did only use the sources. No malware or backdoors included. This post describes how you can deploy MirageOS unikernels without compiling it from source, then dives into the two issues outlined above - configuration and reproducibility - and finally describes how to setup your own reproducible build infrastructure for MirageOS, and how to bootstrap it. With opam, we already have precise tracking which opam packages are used, and since opam 2.1 the The goal of reproducible builds can certainly be achieved in several ways, including to store all sources and used executables in a huge tarball (or docker container), which is preserved for rebuilders. The question of minimal trusted computing base and how such a container could be rebuild from sources in reproducible way are open. The opam-repository is a community repository, where packages are released to on a daily basis by a lot of OCaml developers. Package dependencies usually only use lower bounds of other packages, and the continuous integration system of the opam repository takes care that upon API changes all reverse dependencies include the right upper bounds. Using the head commit of opam-repository usually leads to a working package universe. For our MirageOS unikernels, we don't want to stay behind with ancient versions of libraries. That's why our automated building is done on a daily basis with the head commit of opam-repository. Since our unikernels are not part of the main opam repository (they include the configuration information which target to use, e.g. hvt), and we occasionally development versions of opam packages, we use the unikernel-repo as overlay. For our MirageOS unikernels, we don't want to stay behind with ancient versions of libraries. That's why our automated building is done on a daily basis with the head commit of opam-repository. Since our unikernels are not part of the main opam repository (they include the configuration information which target to use, e.g. hvt), and we occasionally development versions of opam packages, we use the unikernel-repo as overlay. If no dependent package got a new release, the resulting binary has the same checksum. If any dependency was released with a newer release, this is picked up, and eventually the checksum changes. Each unikernel (and non-unikernel) job (e.g. dns-primary outputs some artifacts: These tools are themselves reproducible, and built on a daily basis. The infrastructure executing the build jobs installs the most recent packages of orb and builder before conducting a build. This means that our build infrastructure is reproducible as well, and uses the latest code when it is released. We at robur developed opam-mirror in the last month and run a public opam mirror at https://opam.robur.coop (updated hourly). We at robur developed opam-mirror in the last month and run a public opam mirror at https://opam.robur.coop (updated hourly). Opam is the OCaml package manager (also used by other projects such as coq). It is a source based system: the so-called repository contains the metadata (url to source tarballs, build dependencies, author, homepage, development repository) of all packages. The main repository is hosted on GitHub as ocaml/opam-repository, where authors of OCaml software can contribute (as pull request) their latest releases. When opening a pull request, automated systems attempt to build not only the newly released package on various platforms and OCaml versions, but also all reverse dependencies, and also with dependencies with the lowest allowed version numbers. That's crucial since neither semantic versioning has been adapted across the OCaml ecosystem (which is tricky, for example due to local opens any newly introduced binding will lead to a major version bump), neither do many people add upper bounds of dependencies when releasing a package (nobody is keen to state "my package will not work with cmdliner in version 1.2.0"). According to DNS, opam.ocaml.org is a machine at amazon. It likely, apart from the website, uses Apart from being a single point of failure, if you're compiling a lot of opam projects (e.g. a continuous integration / continuous build system), it makes sense from a network usage (and thus sustainability perspective) to move the cache closer to where you need the source archives. We're also organising the MirageOS hack retreats in a northern African country with poor connectivity - so if you gather two dozen camels you better bring your opam repository cache with you to reduce the bandwidth usage (NB: this requires at the moment cooperation of all participants to configure their default opam repository accordingly). The need for a local opam cache at our reproducible build infrastructure and the retreats, we decided to develop opam-mirror as a MirageOS unikernel. Apart from a useful showcase using persistent storage (that won't fit into memory), and having fun while developing it, our aim was to reduce our time spent on system administration (the The need for a local opam cache at our reproducible build infrastructure and the retreats, we decided to develop opam-mirror as a MirageOS unikernel. Apart from a useful showcase using persistent storage (that won't fit into memory), and having fun while developing it, our aim was to reduce our time spent on system administration (the Another reason for re-developing the functionality was that the opam code (what opam admin index actually does) is part of the opam source code, which totals to 50_000 lines of code -- looking up whether one or all checksums are verified before adding the tarball to the cache, was rather tricky. In earlier years, we avoided persistent storage and block devices in MirageOS (by embedding it into the source code with crunch, or using a remote git repository), but recent development, e.g. of chamelon sparked some interest in actually using file systems and figuring out whether MirageOS is ready in that area. A month ago we started the opam-mirror project. Opam-mirror takes a remote repository URL, and downloads all referenced archives. It serves as a cache and opam-repository - and does periodic updates from the remote repository. The idea is to validate all available checksums and store the tarballs only once, and store overlays (as maps) from the other hash algorithms. There is already a gap in the above plan: which http client to use - in the best case something similar to our http-lwt-client - in MirageOS: it should support HTTP 1.1 and HTTP 2, TLS (with certificate validation), and using happy-eyeballs to seemlessly support both IPv6 and legacy IPv4. Of course it should follow redirect, without that we won't get far in the current Internet. On the path (over the last month), we fixed file descriptor leaks (memory leaks) in paf -- which is used as a runtime for httpaf and h2. Then we ran into some trouble with chamelon (out of memory, some degraded peformance, it reporting out of disk space), and re-thought our demands for opam-mirror. Since the cache is only ever growing (new packages are released), there's no need to ever remove anything: it is append-only. Once we figured that out, we investigated what needs to be done in ocaml-tar (where tar is in fact a tape archive, and was initially designed as file format to be appended to) to support appending to an archive. We also re-thought our bandwidth usage, and instead of cloning the git remote at startup, we developed git-kv which can dump and restore the git state. We also re-thought our bandwidth usage, and instead of cloning the git remote at startup, we developed git-kv which can dump and restore the git state. Also, initially we computed all hashes of all tarballs, but with the size increasing (all archives are around 7.5GB) this lead to a major issue of startup time (around 5 minutes on a laptop), so we wanted to save and restore the maps as well. Since neither git state nor the maps are suitable for tar's append-only semantics, and we didn't want to investigate yet another file system - such as fat may just work fine, but the code looks slightly bitrot, and the reported issues and non-activity doesn't make this package very trustworthy from our point of view. Instead, we developed mirage-block-partition to partition a block device into two. Then we just store the maps and the git state at the end - the end of a tar archive is 2 blocks of zeroes, so stuff at the far end aren't considered by any tooling. Extending the tar archive is also possible, only the maps and git state needs to be moved to the end (or recomputed). As file system, we developed oneffs which stores a single value on the block device. Since neither git state nor the maps are suitable for tar's append-only semantics, and we didn't want to investigate yet another file system - such as fat may just work fine, but the code looks slightly bitrot, and the reported issues and non-activity doesn't make this package very trustworthy from our point of view. Instead, we developed mirage-block-partition to partition a block device into two. Then we just store the maps and the git state at the end - the end of a tar archive is 2 blocks of zeroes, so stuff at the far end aren't considered by any tooling. Extending the tar archive is also possible, only the maps and git state needs to be moved to the end (or recomputed). As file system, we developed oneffs which stores a single value on the block device. We observed a high memory usage, since each requested archive was first read from the block device into memory, and then sent out. Thanks to Pierre Alains recent enhancements of the mirage-kv API, there is a What is next? Downloading and writing to the tar archive could be done chunk-wise as well; also dumping and restoring the git state is quite CPU intensive, we would like to improve that. Adding the TLS frontend (currently done on our site by our TLS termination proxy tlstunnel) similar to how unipi does it, including let's encrypt provisioning -- should be straightforward (drop us a note if you'd be interesting in that feature).Deploying binary MirageOS unikernels
Published: 2021-06-30 (last updated: 2021-11-15)Introduction
+Deploying binary MirageOS unikernels
Published: 2021-06-30 (last updated: 2023-11-20)Introduction
opam switch export
includes extra-files (patches) and records the VCS version. Based on this functionality, orb, an alternative command line application using the opam-client library, can be used to collect (a) the switch export, (b) host system packages, and (c) the environment variables. Only required environment variables are kept, all others are unset while conducting a build. The only required environment variables are PATH
(sanitized with an allow list, /bin
, /sbin
, with /usr
, /usr/local
, and /opt
prefixes), and HOME
. To enable Debian's apt
to install packages, DEBIAN_FRONTEND
is set to noninteractive
. The SWITCH_PATH
is recorded to allow orb to use the same path during a rebuild. The SOURCE_DATE_EPOCH
is set to enable tools that record a timestamp to use a static one. The OS*
variables are only used for recording the host OS and version.
@@ -47,7 +47,7 @@
Mirroring the opam repository and all tarballs
Published: 2022-09-29 (last updated: 2022-10-11)Mirroring the opam repository and all tarballs
Published: 2022-09-29 (last updated: 2023-11-20)What is opam and why should I care?
opam admin index
periodically to create the index tarball and the cache. There's an observable delay between a package merge in the opam-repository and when it shows up at opam.ocaml.org. Recently, there was a reported downtime.Re-developing "opam admin create" as MirageOS unikernel
-opam admin index
is only one part of the story, it needs a Unix system and a webserver next to it - plus remote access for doing software updates - which has quite some attack surface.opam admin index
is only one part of the story, it needs a Unix system and a webserver next to it - plus remote access for doing software updates - which has quite some attack surface.get_partial
, that we use to chunk-wise read the archive and send it via HTTP. Now, the memory usage is around 20MB (the git repository and the generated tarball are kept in memory).Conclusion
diff --git a/atom b/atom
index 1bf21b2..41a054a 100644
--- a/atom
+++ b/atom
@@ -1,4 +1,4 @@
-