--- title: Technology author: someone abstract: some abstract --- We develop digital infrastructure with a minimal footprint. Where other approaches try to patch general purpose operating systems by adding more layers of indirection, we strive to build a secure system from the ground up. Each piece of digital infrastructure or service is written in a high-level memory-safe programming language and tailored to only contain the required functionality at compilation time. This reduces the attack vectors and the attack surface. The resulting service is executed as a virtual machine on a modern hypervisor. Its size is usually around 1-10 MB, much smaller than a UNIX / Linux system, and boots within milliseconds. ## MirageOS - bespoke operating systems Our work is based on MirageOS, a suite to build operating systems. It has been developed since 2009 at University of Cambridge, UK and is written in the programming language OCaml (see [Why OCaml](#Why-OCaml)). Most libraries are developed as open source (MIT/ISC/BSD2/Apache2). MirageOS is a library operating system. It composes OCaml libraries into a bespoke operating system, called a unikernel. A unikernel can be a compiled as a UNIX binary, or a standalone virtual machine image. To build the right unikernel for your custom business logic, we can pick from hundreds of libraries which implement network protocols, storage on block devices, or interfaces to network devices via the hypervisor. On top of the hypervisor, a small layer of C code unifies the interface on which OCaml runs. OCaml is a functional programming language that minimizes side effects and mutable state. Its functional programming concepts give us a list of security advantages for MirageOS. ## Running a unikernel & system security Aside from automated memory management to avoid memory corruption, and type checking to avoid many common programming errors, the major advantage of functional programming is localized reasoning about program code. All inputs, outputs and effects of a function are known. Immutable datastructures and cooperative multitasking allow us to reason about the state of the entire system, even if we use parallelism and complex distributed systems. ### Simple config management model with localized reasoning There are three ways to feed a virtual machine with configuration data, such as network configuration or TLS certificate and key. - Compile the information into the virtual machine image, which requires recompilation on configuration change. - Pass the information as boot parameters, which requires reboot on configuration change. - Store this information in a virtual block device which is attached to the virtual machine. For example, logs can be written from the unikernel to a syslog collector with UDP, TCP, or TLS as transport. The transport needs to be chosen at compile time because TLS requires the TLS library to be linked into the kernel image, but the log destination is passed as boot parameter. ### Simple concurrency model with localized reasoning MirageOS is an event based operating system with asynchronous tasks. A task yields the CPU once its execution is finished, or if it has to wait for IO. This concurrency model leads to a cooperative multitasking programming style, rather than the error prone preemptive multitasking, where each code block needs to make sure to use appropriate locking strategies to avoid reentrant execution errors. A recent example for code which is not safe under reentrant execution [in Ethereum](http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/) lead to a huge amount of ether being transferred. Established software like the [Firefox JavaScript engine](http://www.nist.org/news.php?extend.175), or [PHP](https://bugs.php.net/bug.php?id=74308) shows similar problems on a regular basis. ### Simple process memory model with localized reasoning The virtual memory subsystem in contemporary operating systems provides an address mapping for each process. Since a unikernel is only a single service, it uses a single address space, avoiding the need for complex address mapping code altogether. An example for corrupting the page table is [Xen's XSA-182](http://xenbits.xen.org/xsa/advisory-182.html). ### Simple library model with localized reasoning A MirageOS unikernel is much smaller than a comparable UNIX virtual machine. By avoiding superfluous code we decrease the attack surface immensly. As example, the lines of code of the [Bitcoin PiƱata](/Projects/Pinata) compared to a virtual machine using Linux and OpenSSL, in thousands of lines of code:
LinuxMirageOS
Kernel160048
Runtime68925
Crypto23023
TLS416
Total2560102
The choice of programming language avoids several attack vectors. Immutable data structures, type checking and the OCaml runtime memory manager ensure memory safety in OCaml. By minimising each unikernel to its minimal footprint, security breaches are contained to the information the unikernel contains. ### Secure updates If an OCaml library introduces security flaws or information leakage, all unikernels depending on that library need to be updated. Updating an OCaml library can safely be done via its package manager opam, which uses signed repositories. TODO: For example .. ## Why OCaml OCaml is a functional programming language with automated memory management, preventing manual memory management errors. The strong and expressive type system of OCaml catches most programming errors already at compile time. We use a declarative style with immutable data structures and memory to avoid side effects that are hard to reason about. Errors are expressed as part of the type signature. IO is contained on top of the protocol logic. An implementation of a protocol can be used both as executable code, and as a test oracle. Most security problems for network services arise in parsers of received network data. OCaml allows us to write strict parsers, which return success and error on the type system level to ensure that the caller handles all cases. If a parser contains an error, in our system the impact is local, it cannot access memory beyond the the network data. Errors in parsers written in unsafe languages often lead to buffer overflows, which can lead to remote code execution. OCaml code is compiled to native code running in the OCaml runtime, which is very performant, on par with C++ code. The OCaml runtime is just used for memory management, and very small compared to a JVM or Python runtime. As example, our TLS library has up to 85% of the bulk throughput of OpenSSL (using AES128-CBC). The TLS handshake performance is equal with OpenSSL. TODO: OBWOHL AUF BESTIMMTE KONTEXTE BESCHRAENKT/NICHT ALLERWELTSPRACHE DA LERNINTENSIV? OCaml is known as a mature and safe programming language that is used in both industry (facebook for compilers, jane street for trading, docker, ahrefs, simcorp, lexifi) and academia (coq, compcert). ### Module system & Compilation OCaml has a unique module system. A module specifies abstract datatypes and functions, and each module can have multiple implementations. Modules can take other modules as parameters, the module system is a complete programming language, evaluated at compile time. MirageOS uses this module system as a powerful abstraction mechanism to adapt the unikernel to the compilation target. It defines modules for all operating system services, such as the console, the network stack, the random number generator. For each service, an implementation can be provided depending on the compilation target (UNIX process or virtual machine). On UNIX, the sockets API is used as the networking stack. On a virtual machine, the TCP/IP stack in OCaml is being used. By leveraging the module system of OCaml for MirageOS, we get module separation and dependency analysis from the well-tested module system of OCaml and avoid reimplementing these error-prone and important parts. A MirageOS unikernel developer does not need to reason about compilation targets, just about the module interface. TODO: OCaml runtime vom franz. BSI reviewed, solo5 noch kein wirkliches review OCaml code can be very fast (our TLS implementation reaches up to 85% of the throughput of OpenSSL), and compiles either to native code on various architectures or to bytecode. It can even compile to JavaScript. OCaml is memory managed, individual developers don't have to manually allocate and release memory (which is a common source of security issues in other operating systems). In 2016, Facebook developed [reason](https://reasonml.github.io/), a dialect of OCaml which syntax is closer to JavaScript, and easier to comprehend for beginners. Reason and OCaml code can be easily combined in a single application, since the same compiler is used. Links: - [OCaml for the masses](http://queue.acm.org/detail.cfm?id=2038036) - [Why OCaml (from realworldocaml)](https://realworldocaml.org/v1/en/html/prologue.html) - [Replacing Python with OCaml in 0install](http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-replacement-for-0install/) - [Why tezos uses OCaml](https://www.tezos.com/static/papers/position_paper.pdf) ## Current state and future directions Many libraries developed in the MirageOS project are deployed by Docker for Mac and Docker for Windows, which have more than 100000 active users. Available libraries include an IPv4 stack (TCP, UDP, ARP), DHCP client and server, DNS server and resolver (both recursive and forwarding), HTTP (including webmachine for request routing, and sessions), syslog, git (both client and server, with mutliple storage backends: block device, in-memory), prometheus integration. A TLS library, including random number generator (Fortuna), cryptographic primitives (RSA, DSA, DH, AES), X.509 (using ASN.1), was developed 3 years ago and is in production serving websites, plus some applications using the client side. A prototype implementation for managing unikernels on the host system is already deployed and actively used, similar to libvirt, but with a minimised code base, and written in OCaml. Monitoring is done with prometheus. <- TODO: das bedeutet structured data und cloud ready / scalable / funktioniert in distributed system? More libraries are under active development, this includes an OpenPGP implementation, an ssh implementation, structured syslog, Cap'n proto (RPC with support for capabilities). OCaml can be compiled to JavaScript, which means projects can developed in a single language to ensure consistency and avoid errors, but code can be executed on the client or on the server. The idea of unikernels is not limited to MirageOS, other projects apply the same concept in different programming languages. HalVM - the Haskell ligthweight virtual machine - was developed by Galois Inc., and is based on Haskell. It is used for network services such as honeypots and secure IPSec gateways. IncludeOS is a C++ based unikernel which was initially developed at University of Oslo, and now further developed in a startup. Other unikernels are listed on [Wikipedia](https://en.wikipedia.org/wiki/Unikernel) and there is an exchange of ideas between different approaches. Among all unikernels, MirageOS has been around the longest, has the most libraries available and deployed to production, has an active developer community, and the safe programming language makes it suitable for secure systems. MirageOS has a small trusted code base compared to other operating systems. Apart from the CPU (and its virtualisation extensions, VT-x, VT-d, EPT), and the hypervisor implementation.Dieser satzkein verb On top of the hypervisor in the host system a tiny virtual machine monitor (solo5) is executed. It does not rely on qemu or other emulation code, but only contains drivers needed for the actual unikernel (block and network devices). The unikernel itself consists of roughly 2000 lines of C code with basic functions such as malloc, memcopy, memcmp, on which the vanilla OCaml runtime is executed. On top of that, only OCaml code is executed, which includes an asynchronous task engine, the mentioned TCP/IP stack, and the concrete services. TODO: vllt bullet points rausnehmen, eher das formal verification argument detailliert erklaeren? The security of MirageOS unikernels is planned to be improved even more in several areas: - data segments will be be mapped read/write, code segments execute-only - private key material will be zeroed before destruction - the address space layout will be randomised to make exploitation harder - MirageOS will be ported to (se)L4 as hypervisor to minimize the trusted code running on the host system - once open hardware (RISC-V) is widely available, MirageOS will use this as target. There is already a RISC-V backend for OCaml - OCaml will be compilable to Coq (an interactive theorem prover) definitions, within which theorems about the code can be proven - Coq code will also be extractable to OCaml.