2017-09-15 20:19:19 +00:00
|
|
|
---
|
|
|
|
title: Technology behind robur
|
|
|
|
author: someone
|
|
|
|
abstract: some abstract
|
|
|
|
---
|
|
|
|
|
2017-09-16 15:46:56 +00:00
|
|
|
We develop digital infrastructure with a small footprint. This is in stark
|
|
|
|
contrast with other approaches that try to patch general purpose operating
|
|
|
|
systems by adding more layers of indirection.
|
|
|
|
|
|
|
|
Each piece of digital infrastructure (or service) is (a) written in a high-level
|
|
|
|
memory-safe programming language and (b) specialised to only contain the
|
|
|
|
required functionality at compilation time. This (a) reduces the attack vectors
|
|
|
|
and (b) drastically reduces the attack surface.
|
|
|
|
|
|
|
|
The resulting service is executed as a virtual machine on any modern hypervisor.
|
|
|
|
Its size is usually two orders of magnitude smaller (ranging from kilobytes to
|
|
|
|
16 megabytes) than a UNIX, it boots within milliseconds.
|
|
|
|
|
|
|
|
As programming language we use [OCaml](https://ocaml.org), a multi-paradigm
|
|
|
|
programming language, which unifies functional, imperative, and object-oriented
|
|
|
|
programming. OCaml has an expressive static type system, and type inference. A
|
|
|
|
developer can specify complex invariants in the type system, which are
|
|
|
|
checked at compile time, and violations are caught early.
|
|
|
|
|
2017-09-16 15:49:37 +00:00
|
|
|
We discuss more reasons why we use OCaml [further down](#Why-OCaml).
|
2017-09-16 15:46:56 +00:00
|
|
|
|
|
|
|
## MirageOS
|
|
|
|
|
2017-09-16 17:04:53 +00:00
|
|
|
Our work is based on MirageOS, a suite to develop operating systems (developed
|
|
|
|
since 2009 at University of Cambridge, UK) written in the programming language
|
|
|
|
OCaml (developed since 1996 at INRIA in Paris, France). Most libraries are
|
|
|
|
developed under permissive licenses (MIT/ISC/BSD2/Apache2). The OCaml compiler
|
|
|
|
is dual-licensed: LGPLv2 or BSD for consortium members (3000 EUR/year).
|
|
|
|
|
|
|
|
MirageOS is a library operating system. It composes OCaml libraries into a
|
|
|
|
bespoke operating system, called a unikernel. A unikernel can be a compiled as a
|
|
|
|
UNIX binary, or a standalone virtual machine image. To build exactly the right
|
|
|
|
unikernel for each purpose, we can pick from hundreds of libraries which
|
|
|
|
implement protocols, storage on block devices, or interfaces to network devices
|
|
|
|
provided by the hypervisor.
|
|
|
|
|
|
|
|
On top of the hypervisor, a small layer of C code (developed at IBM research) is
|
|
|
|
used to unify the interface between the different hypervisors, on which the
|
|
|
|
OCaml runtime system is executed.
|
|
|
|
|
|
|
|
OCaml is a functional programming language with automated memory management,
|
|
|
|
preventing manual memory management errors. The strong and expressive type
|
|
|
|
system of OCaml catches most programming errors already at compile time. We use
|
|
|
|
a declarative style with immutable data structures and memory to avoid side
|
|
|
|
effects that are hard to reason about. Errors are expressed as part of the type
|
|
|
|
signature. IO is contained on top of the protocol logic. An implementation of a
|
|
|
|
protocol can be used both as executable code, and as a test oracle.
|
|
|
|
|
|
|
|
Most security problems for network services arise in parsers of received network
|
|
|
|
data. OCaml allows us to write strict parsers, which return success and error
|
|
|
|
on the type system level to ensure that the caller handles all cases. If a
|
|
|
|
parser contains an error, in our system the impact is local, it cannot access
|
|
|
|
memory beyond the the network data. Errors in parsers written in unsafe
|
|
|
|
languages often lead to buffer overflows, which can lead to remote code
|
|
|
|
execution.
|
|
|
|
|
|
|
|
OCaml code is compiled to native code running in the OCaml runtime, which is
|
|
|
|
very performant, on par with C++ code. The OCaml runtime is just used for
|
|
|
|
memory management, and very small compared to a JVM or Python runtime. As
|
|
|
|
example, our TLS library has up to 85% of the bulk throughput of OpenSSL (using
|
|
|
|
AES128-CBC). The TLS handshake performance is equal with OpenSSL.
|
|
|
|
|
|
|
|
TODO: OBWOHL AUF BESTIMMTE KONTEXTE BESCHRAENKT/NICHT ALLERWELTSPRACHE DA LERNINTENSIV?
|
|
|
|
OCaml is known as a mature and safe programming language that is used in both
|
|
|
|
industry (facebook for compilers, jane street for trading, docker, ahrefs,
|
|
|
|
simcorp, lexifi) and academia (coq, compcert, ...).
|
|
|
|
|
|
|
|
### Module system & Compilation
|
|
|
|
|
|
|
|
OCaml has a unique module system, in which a module specifies abstract datatypes
|
|
|
|
and functions, and each module can have multiple implementations. Modules can
|
|
|
|
take other modules as parameters, the module system is a complete programming
|
|
|
|
language, evaluated at compile time. MirageOS uses this module system as a
|
|
|
|
powerful abstraction mechanism to adapt the unikernel to the compilation target.
|
|
|
|
It defines modules for all operating system services, such as the console, the
|
|
|
|
network stack, the random number generator. For each service, an implementation
|
|
|
|
can be provided depending on the compilation target (UNIX process or virtual
|
|
|
|
machine). On UNIX, the sockets API is used as the networking stack. On a
|
|
|
|
virtual machine, the TCP/IP stack in OCaml is being used.
|
|
|
|
|
|
|
|
By leveraging the module system of OCaml for MirageOS, we get module separation
|
|
|
|
and dependency analysis from the well-tested module system of OCaml and avoid
|
|
|
|
reimplementing these error-prone and important parts. A MirageOS unikernel
|
|
|
|
developer does not need to reason about compilation targets, just about the
|
|
|
|
module interface.
|
|
|
|
|
|
|
|
## MirageOS: Running a unikernel & system security
|
|
|
|
|
|
|
|
### Simple config management model with localized reasoning
|
|
|
|
|
|
|
|
There are three ways to feed a virtual machine with configuration data, such as
|
|
|
|
network configuration or TLS certificate and key.
|
|
|
|
- Compile the information into the virtual machine image, which requires
|
|
|
|
recompilation on configuration change.
|
|
|
|
- Pass the information as boot parameters, which requires reboot on
|
|
|
|
configuration change.
|
|
|
|
- Store this information in a virtual block device which is attached to the
|
|
|
|
virtual machine.
|
|
|
|
|
|
|
|
For example, logs can be exfiltrated using syslog with UDP, TCP, or TLS as
|
|
|
|
transport. The transport to use has to be chosen at compile time because TLS
|
|
|
|
requires the TLS library to be linked in, but the log destination can be passed
|
|
|
|
as boot parameter to the unikernel.
|
|
|
|
|
|
|
|
### Simple concurrency model with localized reasoning
|
|
|
|
|
|
|
|
MirageOS is an event based operating system with asynchronous tasks. A task
|
|
|
|
yields the CPU once its execution is finished, or if it has to wait for IO.
|
|
|
|
This concurrency model leads to a cooperative multitasking programming style,
|
|
|
|
rather than the error prone preemptive multitasking where each code block needs
|
|
|
|
to be hardened to allow reentrant execution.
|
|
|
|
|
|
|
|
TODO: For example ..
|
|
|
|
|
|
|
|
### Simple process memory model with localized reasoning
|
|
|
|
|
|
|
|
The virtual memory subsystem in contemporary operating systems provides an
|
|
|
|
address mapping for each process. Since a unikernel is only a single service, it
|
|
|
|
uses a single address space, avoiding the need for complex address mapping code
|
|
|
|
altogether.
|
|
|
|
|
|
|
|
TODO: For example ..
|
|
|
|
|
|
|
|
### Simple library model with localized reasoning
|
|
|
|
|
|
|
|
As stated above, a MirageOS unikernel is much smaller than a comparable UNIX
|
|
|
|
virtual machine, since only the required libraries are linked into the virtual
|
|
|
|
machine image. By avoiding superfluous code we decrease the attack surface
|
|
|
|
immensly.
|
|
|
|
|
|
|
|
TODO: Example: lines of code / binary size fuer zb dns server / dhcp server
|
|
|
|
|
|
|
|
The choice of programming language avoids several attack vectors. Immutable data
|
|
|
|
structures, type checking and the OCaml runtime memory manager ensure memory
|
|
|
|
safety in OCaml. By minimising each unikernel to its minimal footprint,
|
|
|
|
security breaches are contained to the information the unikernel contains.
|
|
|
|
|
|
|
|
### Secure updates
|
|
|
|
|
|
|
|
If an OCaml library introduces security flaws or information leakage, all
|
|
|
|
unikernels depending on that library need to be updated. Updating an OCaml
|
|
|
|
library can safely be done via its package manager opam, which uses signed
|
|
|
|
repositories.
|
|
|
|
|
|
|
|
TODO: For example ..
|
2017-09-16 15:46:56 +00:00
|
|
|
|
|
|
|
## Why OCaml
|
|
|
|
|
2017-09-16 17:04:53 +00:00
|
|
|
TODO: OCaml runtime vom franz. BSI reviewed, solo5 noch kein wirkliches review
|
|
|
|
|
2017-09-16 15:46:56 +00:00
|
|
|
OCaml code can be very fast (our TLS implementation reaches up to
|
|
|
|
85% of the throughput of OpenSSL), and compiles either to native code on various
|
|
|
|
architectures or to bytecode. It can even compile to JavaScript. OCaml is
|
|
|
|
memory managed, individual developers don't have to manually allocate and
|
|
|
|
release memory (which is a common source of security issues in other operating
|
|
|
|
systems).
|
|
|
|
|
|
|
|
In 2016, Facebook developed [reason](https://reasonml.github.io/), a dialect of
|
|
|
|
OCaml which syntax is closer to JavaScript, and easier to comprehend for
|
|
|
|
beginners. Reason and OCaml code can be easily combined in a single
|
|
|
|
application, since the same compiler is used.
|
2017-09-16 15:48:51 +00:00
|
|
|
|
|
|
|
Links:
|
|
|
|
- [OCaml for the masses](http://queue.acm.org/detail.cfm?id=2038036)
|
|
|
|
- [Why OCaml (from realworldocaml)](https://realworldocaml.org/v1/en/html/prologue.html)
|
|
|
|
- [Replacing Python with OCaml in 0install](http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-replacement-for-0install/)
|
|
|
|
- [Why tezos uses OCaml](https://www.tezos.com/static/papers/position_paper.pdf)
|
2017-09-16 16:52:05 +00:00
|
|
|
|
|
|
|
## Current state and future directions
|
|
|
|
|
2017-09-16 17:04:53 +00:00
|
|
|
Many libraries developed in the MirageOS project are deployed by Docker for Mac
|
|
|
|
and Docker for Windows, which have more than 100000 active users.
|
|
|
|
|
|
|
|
Available libraries include an IPv4 stack (TCP, UDP, ARP), DHCP client and
|
|
|
|
server, DNS server and resolver (both recursive and forwarding), HTTP (including
|
|
|
|
webmachine for request routing, and sessions), syslog, git (both client and
|
|
|
|
server, with mutliple storage backends: block device, in-memory), prometheus
|
|
|
|
integration. A TLS library, including random number generator (Fortuna),
|
|
|
|
cryptographic primitives (RSA, DSA, DH, AES), X.509 (using ASN.1), was developed
|
|
|
|
3 years ago and is in production serving websites, plus some applications using
|
|
|
|
the client side. A prototype implementation for managing unikernels on the host
|
|
|
|
system is already deployed and actively used, similar to libvirt, but with a
|
|
|
|
minimised code base, and written in OCaml. Monitoring is done with
|
|
|
|
prometheus. <- TODO: das bedeutet structured data und cloud ready / scalable /
|
|
|
|
funktioniert in distributed system?
|
|
|
|
|
|
|
|
More libraries are under active development, this includes an OpenPGP
|
|
|
|
implementation, an ssh implementation, structured syslog, Cap'n proto (RPC with
|
|
|
|
support for capabilities).
|
|
|
|
|
|
|
|
OCaml can be compiled to JavaScript, which means projects can developed in a
|
|
|
|
single language to ensure consistency and avoid errors, but code can be executed
|
|
|
|
on the client, or on the server.
|
|
|
|
|
|
|
|
The idea of unikernels is not limited to MirageOS, other projects apply the same
|
|
|
|
methodology in different programming languages. HalVM - the Haskell ligthweight
|
|
|
|
virtual machine - was developed by Galois Inc., and is based on Haskell. It is
|
|
|
|
used for network services such as honeypots and secure IPSec gateways.
|
|
|
|
|
|
|
|
IncludeOS is a C++ based unikernel which was initially developed at University
|
|
|
|
of Oslo, and now further developed in a startup.
|
|
|
|
|
|
|
|
Other unikernels are listed on
|
|
|
|
[Wikipedia](https://en.wikipedia.org/wiki/Unikernel) and there is an exchange of
|
|
|
|
ideas between different approaches. Among all unikernels, MirageOS has been
|
|
|
|
around the longest, has the most libraries available and deployed to production,
|
|
|
|
has an active developer community, and the safe programming language makes it
|
|
|
|
suitable for secure systems.
|
|
|
|
|
|
|
|
MirageOS has a small trusted code base compared to other operating systems.
|
|
|
|
Apart from the CPU (and its virtualisation extensions, VT-x, VT-d, EPT), and the
|
|
|
|
hypervisor implementation.Dieser satzkein verb On top of the hypervisor in the
|
|
|
|
host system a tiny virtual machine monitor (solo5) is executed. It does not rely
|
|
|
|
on qemu or other emulation code, but only contains drivers needed for the actual
|
|
|
|
unikernel (block and network devices). The unikernel itself consists of roughly
|
|
|
|
2000 lines of C code with basic functions such as malloc, memcopy, memcmp, on
|
|
|
|
which the vanilla OCaml runtime is executed. On top of that, only OCaml code is
|
|
|
|
executed, which includes an asynchronous task engine, the mentioned TCP/IP
|
|
|
|
stack, and the concrete services.
|
|
|
|
|
|
|
|
TODO: vllt bullet points rausnehmen, eher das formal verification argument detailliert erklaeren?
|
|
|
|
|
|
|
|
The security of MirageOS unikernels is planned to be improved even more in
|
|
|
|
several areas:
|
2017-09-16 16:52:05 +00:00
|
|
|
- data segments will be be mapped read/write, code segments execute-only
|
|
|
|
- private key material will be zeroed before destruction
|
|
|
|
- the address space layout will be randomised to make exploitation harder
|
2017-09-16 17:04:53 +00:00
|
|
|
- MirageOS will be ported to (se)L4 as hypervisor to minimize the trusted code
|
|
|
|
running on the host system
|
|
|
|
- once open hardware (RISC-V) is widely available, MirageOS will use this as
|
|
|
|
target. There is already a RISC-V backend for OCaml
|
|
|
|
- OCaml will be compilable to Coq (an interactive theorem prover) definitions,
|
|
|
|
within which theorems about the code can be proven
|
2017-09-16 16:52:05 +00:00
|
|
|
- Coq code will also be extractable to OCaml.
|
|
|
|
|
|
|
|
## Conclusion
|
|
|
|
|
2017-09-16 17:04:53 +00:00
|
|
|
MirageOS started as a research project, and has matured to a full suite for
|
|
|
|
building secure operating systems, with libraries that work well in production
|
|
|
|
and cover a variety of application needs. MirageOS is a game changer for secure
|
|
|
|
network services, since the attack surface is minimised to 1% of the size of
|
|
|
|
other contemporary operating systems. In addition, common attack vectors are
|
|
|
|
avoided by the usage of a programming language with memory safety. A unikernel
|
|
|
|
boots within tens of milliseconds, and services can be spawned on demand. When a
|
|
|
|
request (e.g. a DNS request) for a unikernel comes in, the kernel boots up,
|
|
|
|
handles the request, and is destroyed after an inactivity period. Only the
|
|
|
|
necessary services need to run, and they can be short-lived to minimize state in
|
|
|
|
the system.
|
|
|
|
|
|
|
|
The choice of a high-level programming language also allows for rapid
|
|
|
|
prototyping, new features can be developed quickly. In contrast to scripting
|
|
|
|
languages, the code does not need to be re-implemented for production use (but
|
|
|
|
nevertheless can be fine-tuned for performance).
|
2017-09-16 16:52:05 +00:00
|
|
|
|
|
|
|
|
|
|
|
WHY YOU NEED THIS!
|
|
|
|
WHAT ERRORS WE CAN AVOID
|
|
|
|
HOW WE CAN HELP
|