252 lines
13 KiB
Text
252 lines
13 KiB
Text
---
|
|
title: Technology
|
|
author: someone
|
|
abstract: some abstract
|
|
---
|
|
|
|
We develop digital infrastructure with a minimal footprint. Where other approaches
|
|
try to patch general purpose operating systems by adding more layers of indirection,
|
|
we strive to build a secure system from the ground up.
|
|
|
|
Each piece of digital infrastructure or service is written in a high-level
|
|
memory-safe programming language and tailored to only contain the
|
|
required functionality at compilation time. This reduces the attack vectors
|
|
and the attack surface.
|
|
|
|
The resulting service is executed as a virtual machine on a modern hypervisor.
|
|
Its size is usually around 1-10 MB, much smaller than a UNIX / Linux system, and boots within milliseconds.
|
|
|
|
## MirageOS - bespoke operating systems
|
|
|
|
Our work is based on MirageOS, a suite to build operating systems. It has been developed
|
|
since 2009 at University of Cambridge, UK and is written in the programming language
|
|
OCaml (see [Why OCaml](#Why-OCaml)).
|
|
Most libraries are developed as open source (MIT/ISC/BSD2/Apache2).
|
|
|
|
MirageOS is a library operating system. It composes OCaml libraries into a
|
|
bespoke operating system, called a unikernel. A unikernel can be a compiled as a
|
|
UNIX binary, or a standalone virtual machine image. To build the right
|
|
unikernel for your custom business logic, we can pick from hundreds of libraries which
|
|
implement network protocols, storage on block devices, or interfaces to network devices
|
|
via the hypervisor.
|
|
|
|
On top of the hypervisor, a small layer of C code unifies
|
|
the interface on which OCaml runs.
|
|
|
|
OCaml is a functional programming language that minimizes side effects and mutable state.
|
|
Its functional programming concepts give us a list of security advantages for MirageOS.
|
|
|
|
## Running unikernel, system security
|
|
|
|
Aside from automated memory management to avoid memory corruption, and type checking to avoid many common
|
|
programming errors, the major advantage of functional programming is localized reasoning about program code.
|
|
All inputs, outputs and effects of a function are known.
|
|
Immutable datastructures and cooperative multitasking allow us to reason about the state of the entire system,
|
|
even if we use parallelism and complex distributed systems.
|
|
|
|
### Simple config management model with localized reasoning
|
|
|
|
There are three ways to feed a virtual machine with configuration data, such as
|
|
network configuration or TLS certificate and key.
|
|
- Compile the information into the virtual machine image, which requires
|
|
recompilation on configuration change.
|
|
- Pass the information as boot parameters, which requires reboot on
|
|
configuration change.
|
|
- Store this information in a virtual block device which is attached to the
|
|
virtual machine.
|
|
|
|
For example, logs can be written from the unikernel to a syslog collector with UDP, TCP, or TLS as
|
|
transport. The transport needs to be chosen at compile time because TLS
|
|
requires the TLS library to be linked into the kernel image, but the log destination is passed
|
|
as boot parameter.
|
|
|
|
### Simple concurrency model with localized reasoning
|
|
|
|
MirageOS is an event based operating system with asynchronous tasks. A task
|
|
yields the CPU once its execution is finished, or if it has to wait for IO.
|
|
This concurrency model leads to a cooperative multitasking programming style,
|
|
rather than the error prone preemptive multitasking, where each code block needs
|
|
to make sure to use appropriate locking strategies to avoid reentrant execution errors.
|
|
|
|
A recent example for code which is not safe under reentrant execution
|
|
[in Ethereum](http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/)
|
|
lead to a huge amount of ether being transferred.
|
|
|
|
Established software like the [Firefox JavaScript engine](http://www.nist.org/news.php?extend.175),
|
|
or [PHP](https://bugs.php.net/bug.php?id=74308) shows similar problems on a regular basis.
|
|
|
|
### Simple process memory model with localized reasoning
|
|
|
|
The virtual memory subsystem in contemporary operating systems provides an
|
|
address mapping for each process. Since a unikernel is only a single service, it
|
|
uses a single address space, avoiding the need for complex address mapping code
|
|
altogether.
|
|
|
|
An example for corrupting the page table is [Xen's XSA-182](http://xenbits.xen.org/xsa/advisory-182.html).
|
|
|
|
### Simple library model with localized reasoning
|
|
|
|
A MirageOS unikernel is much smaller than a comparable UNIX
|
|
virtual machine. By avoiding superfluous code we decrease the attack surface
|
|
immensly.
|
|
|
|
Consider the breakdown of the code of the example system [Bitcoin Piñata](/Projects/Pinata) compared
|
|
to a virtual machine using Linux and OpenSSL, measured in thousands of lines of code:
|
|
|
|
<table>
|
|
<tr><th></th><th>Linux</th><th>MirageOS</th></tr>
|
|
<tr><td>Kernel</td><td>1600</td><td>48</td></tr>
|
|
<tr><td>Runtime</td><td>689</td><td>25</td></tr>
|
|
<tr><td>Crypto</td><td>230</td><td>23</td></tr>
|
|
<tr><td>TLS</td><td>41</td><td>6</td></tr>
|
|
<tr><td>Total</td><td>2560</td><td>102</td></tr>
|
|
</table>
|
|
|
|
By minimising each unikernel to its minimal footprint,
|
|
security breaches are contained to the information the unikernel contains.
|
|
|
|
### Secure updates
|
|
|
|
If an OCaml library introduces security flaws or information leakage, all
|
|
unikernels depending on that library need to be updated. Updating an OCaml
|
|
library can safely be done via its package manager opam, which uses signed
|
|
repositories.
|
|
|
|
TODO: For example ..
|
|
|
|
## Why OCaml
|
|
|
|
OCaml is a functional programming language with automated memory management,
|
|
preventing manual memory management errors. The strong and expressive type
|
|
system of OCaml catches most programming errors already at compile time. We use
|
|
a declarative style with immutable data structures and memory to avoid side
|
|
effects that are hard to reason about. Errors are expressed as part of the type
|
|
signature. IO is contained on top of the protocol logic. An implementation of a
|
|
protocol can be used both as executable code, and as a test oracle.
|
|
|
|
Most security problems for network services arise in parsers of received network
|
|
data. OCaml allows us to write strict parsers, which return success and error
|
|
on the type system level to ensure that the caller handles all cases. If a
|
|
parser contains an error, in our system the impact is local, it cannot access
|
|
memory beyond the the network data. Errors in parsers written in unsafe
|
|
languages often lead to buffer overflows, which can lead to remote code
|
|
execution.
|
|
|
|
OCaml code is compiled to native code running in the OCaml runtime, which is
|
|
very performant, on par with C++ code. The OCaml runtime is just used for
|
|
memory management, and very small compared to a JVM or Python runtime. As
|
|
example, our TLS library has up to 85% of the bulk throughput of OpenSSL (using
|
|
AES128-CBC). The TLS handshake performance is equal with OpenSSL.
|
|
|
|
TODO: OBWOHL AUF BESTIMMTE KONTEXTE BESCHRAENKT/NICHT ALLERWELTSPRACHE DA LERNINTENSIV?
|
|
OCaml is known as a mature and safe programming language that is used in both
|
|
industry (facebook for compilers, jane street for trading, docker, ahrefs,
|
|
simcorp, lexifi) and academia (coq, compcert).
|
|
|
|
### Module system & Compilation
|
|
|
|
OCaml has a unique module system. A module specifies abstract datatypes
|
|
and functions, and each module can have multiple implementations. Modules can
|
|
take other modules as parameters, the module system is a complete programming
|
|
language, evaluated at compile time. MirageOS uses this module system as a
|
|
powerful abstraction mechanism to adapt the unikernel to the compilation target.
|
|
It defines modules for all operating system services, such as the console, the
|
|
network stack, the random number generator. For each service, an implementation
|
|
can be provided depending on the compilation target (UNIX process or virtual
|
|
machine). On UNIX, the sockets API is used as the networking stack. On a
|
|
virtual machine, the TCP/IP stack in OCaml is being used.
|
|
|
|
By leveraging the module system of OCaml for MirageOS, we get module separation
|
|
and dependency analysis from the well-tested module system of OCaml and avoid
|
|
reimplementing these error-prone and important parts. A MirageOS unikernel
|
|
developer does not need to reason about compilation targets, just about the
|
|
module interface.
|
|
|
|
TODO: OCaml runtime vom franz. BSI reviewed, solo5 noch kein wirkliches review
|
|
|
|
OCaml code can be very fast (our TLS implementation reaches up to
|
|
85% of the throughput of OpenSSL), and compiles either to native code on various
|
|
architectures or to bytecode. It can even compile to JavaScript. OCaml is
|
|
memory managed, individual developers don't have to manually allocate and
|
|
release memory (which is a common source of security issues in other operating
|
|
systems).
|
|
|
|
In 2016, Facebook developed [reason](https://reasonml.github.io/), a dialect of
|
|
OCaml which syntax is closer to JavaScript, and easier to comprehend for
|
|
beginners. Reason and OCaml code can be easily combined in a single
|
|
application, since the same compiler is used.
|
|
|
|
Links:
|
|
- [OCaml for the masses](http://queue.acm.org/detail.cfm?id=2038036)
|
|
- [Why OCaml (from realworldocaml)](https://realworldocaml.org/v1/en/html/prologue.html)
|
|
- [Replacing Python with OCaml in 0install](http://roscidus.com/blog/blog/2013/06/09/choosing-a-python-replacement-for-0install/)
|
|
- [Why tezos uses OCaml](https://www.tezos.com/static/papers/position_paper.pdf)
|
|
|
|
## Current state and future directions
|
|
|
|
Many libraries developed in the MirageOS project are deployed by Docker for Mac
|
|
and Docker for Windows, which have more than 100000 active users.
|
|
|
|
Available libraries include an IPv4 stack (TCP, UDP, ARP), DHCP client and
|
|
server, DNS server and resolver (both recursive and forwarding), HTTP (including
|
|
webmachine for request routing, and sessions), syslog, git (both client and
|
|
server, with mutliple storage backends: block device, in-memory), prometheus
|
|
integration. A TLS library, including random number generator (Fortuna),
|
|
cryptographic primitives (RSA, DSA, DH, AES), X.509 (using ASN.1), was developed
|
|
3 years ago and is in production serving websites, plus some applications using
|
|
the client side. A prototype implementation for managing unikernels on the host
|
|
system is already deployed and actively used, similar to libvirt, but with a
|
|
minimised code base, and written in OCaml. Monitoring is done with
|
|
prometheus. <- TODO: das bedeutet structured data und cloud ready / scalable /
|
|
funktioniert in distributed system?
|
|
|
|
More libraries are under active development, this includes an OpenPGP
|
|
implementation, an ssh implementation, structured syslog, Cap'n proto (RPC with
|
|
support for capabilities).
|
|
|
|
OCaml can be compiled to JavaScript, which means projects can developed in a
|
|
single language to ensure consistency and avoid errors, but code can be executed
|
|
on the client or on the server.
|
|
|
|
The idea of unikernels is not limited to MirageOS, other projects apply the same
|
|
concept in different programming languages. HalVM - the Haskell ligthweight
|
|
virtual machine - was developed by Galois Inc., and is based on Haskell. It is
|
|
used for network services such as honeypots and secure IPSec gateways.
|
|
|
|
IncludeOS is a C++ based unikernel which was initially developed at University
|
|
of Oslo, and now further developed in a startup.
|
|
|
|
Other unikernels are listed on
|
|
[Wikipedia](https://en.wikipedia.org/wiki/Unikernel) and there is an exchange of
|
|
ideas between different approaches. Among all unikernels, MirageOS has been
|
|
around the longest, has the most libraries available and deployed to production,
|
|
has an active developer community, and the safe programming language makes it
|
|
suitable for secure systems.
|
|
|
|
MirageOS has a small trusted code base compared to other operating systems.
|
|
Apart from the CPU (and its virtualisation extensions, VT-x, VT-d, EPT), and the
|
|
hypervisor implementation.Dieser satzkein verb On top of the hypervisor in the
|
|
host system a tiny virtual machine monitor (solo5) is executed. It does not rely
|
|
on qemu or other emulation code, but only contains drivers needed for the actual
|
|
unikernel (block and network devices). The unikernel itself consists of roughly
|
|
2000 lines of C code with basic functions such as malloc, memcopy, memcmp, on
|
|
which the vanilla OCaml runtime is executed. On top of that, only OCaml code is
|
|
executed, which includes an asynchronous task engine, the mentioned TCP/IP
|
|
stack, and the concrete services.
|
|
|
|
TODO: vllt bullet points rausnehmen, eher das formal verification argument detailliert erklaeren?
|
|
|
|
The security of MirageOS unikernels is planned to be improved even more in
|
|
several areas:
|
|
- data segments will be be mapped read/write, code segments execute-only
|
|
- private key material will be zeroed before destruction
|
|
- the address space layout will be randomised to make exploitation harder
|
|
- MirageOS will be ported to (se)L4 as hypervisor to minimize the trusted code
|
|
running on the host system
|
|
- once open hardware (RISC-V) is widely available, MirageOS will use this as
|
|
target. There is already a RISC-V backend for OCaml
|
|
- OCaml will be compilable to Coq (an interactive theorem prover) definitions,
|
|
within which theorems about the code can be proven
|
|
- Coq code will also be extractable to OCaml.
|
|
|
|
|