homepage-data/Technology

242 lines
12 KiB
Text
Raw Normal View History

2017-09-15 20:19:19 +00:00
---
2017-09-16 18:01:54 +00:00
title: Technology
2017-09-15 20:19:19 +00:00
author: someone
abstract: some abstract
---
2017-09-16 18:01:54 +00:00
We develop digital infrastructure with a minimal footprint. Where other approaches
try to patch general purpose operating systems by adding more layers of indirection,
we strive to build a secure system from the ground up.
2017-09-16 15:46:56 +00:00
2017-09-16 18:24:22 +00:00
Each piece of digital infrastructure or service is written in a high-level
memory-safe programming language and tailored to only contain the
required functionality at compilation time. This reduces the attack vectors
and the attack surface.
2017-09-16 15:46:56 +00:00
2017-09-16 18:01:54 +00:00
The resulting service is executed as a virtual machine on a modern hypervisor.
2017-09-16 18:24:22 +00:00
Its size is usually around 1-10 MB, much smaller than a UNIX / Linux system, and boots within milliseconds.
2017-09-16 15:46:56 +00:00
2017-09-16 18:01:54 +00:00
## MirageOS - bespoke operating systems
2017-09-16 15:46:56 +00:00
2017-09-16 18:47:44 +00:00
Our work is based on MirageOS, a suite to build operating systems. It has been developed
2017-09-16 18:01:54 +00:00
since 2009 at University of Cambridge, UK and is written in the programming language
2017-09-16 18:47:44 +00:00
OCaml (see [Why OCaml](#Why-OCaml)).
Most libraries are developed as open source (MIT/ISC/BSD2/Apache2).
2017-09-16 17:04:53 +00:00
MirageOS is a library operating system. It composes OCaml libraries into a
bespoke operating system, called a unikernel. A unikernel can be a compiled as a
2017-09-16 18:47:44 +00:00
UNIX binary, or a standalone virtual machine image. To build the right
unikernel for your custom business logic, we can pick from hundreds of libraries which
implement network protocols, storage on block devices, or interfaces to network devices
via the hypervisor.
On top of the hypervisor, a small layer of C code unifies
the interface on which OCaml runs.
2017-09-16 19:06:49 +00:00
OCaml is a functional programming language that minimizes side effects and mutable state.
Its functional programming concepts give us a list of security advantages for MirageOS.
2017-09-16 17:04:53 +00:00
2017-09-16 22:08:57 +00:00
## Running unikernel, system security
2017-09-16 19:50:01 +00:00
Aside from automated memory management to avoid memory corruption, and type checking to avoid many common
programming errors, the major advantage of functional programming is localized reasoning about program code.
All inputs, outputs and effects of a function are known.
Immutable datastructures and cooperative multitasking allow us to reason about the state of the entire system,
even if we use parallelism and complex distributed systems.
2017-09-16 17:04:53 +00:00
### Simple config management model with localized reasoning
There are three ways to feed a virtual machine with configuration data, such as
network configuration or TLS certificate and key.
- Compile the information into the virtual machine image, which requires
recompilation on configuration change.
- Pass the information as boot parameters, which requires reboot on
configuration change.
- Store this information in a virtual block device which is attached to the
virtual machine.
2017-09-16 19:50:01 +00:00
For example, logs can be written from the unikernel to a syslog collector with UDP, TCP, or TLS as
transport. The transport needs to be chosen at compile time because TLS
requires the TLS library to be linked into the kernel image, but the log destination is passed
as boot parameter.
2017-09-16 17:04:53 +00:00
### Simple concurrency model with localized reasoning
MirageOS is an event based operating system with asynchronous tasks. A task
yields the CPU once its execution is finished, or if it has to wait for IO.
This concurrency model leads to a cooperative multitasking programming style,
2017-09-16 22:03:19 +00:00
rather than the error prone preemptive multitasking, where each code block needs
to make sure to use appropriate locking strategies to avoid reentrant execution errors.
2017-09-16 17:04:53 +00:00
2017-09-16 20:26:03 +00:00
A recent example for code which is not safe under reentrant execution
[in Ethereum](http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/)
2017-09-16 20:14:22 +00:00
lead to a huge amount of ether being transferred.
2017-09-16 20:26:03 +00:00
Established software like the [Firefox JavaScript engine](http://www.nist.org/news.php?extend.175),
or [PHP](https://bugs.php.net/bug.php?id=74308) shows similar problems on a regular basis.
2017-09-16 17:04:53 +00:00
### Simple process memory model with localized reasoning
The virtual memory subsystem in contemporary operating systems provides an
address mapping for each process. Since a unikernel is only a single service, it
uses a single address space, avoiding the need for complex address mapping code
altogether.
2017-09-16 20:26:03 +00:00
An example for corrupting the page table is [Xen's XSA-182](http://xenbits.xen.org/xsa/advisory-182.html).
2017-09-16 17:04:53 +00:00
### Simple library model with localized reasoning
2017-09-16 22:03:19 +00:00
A MirageOS unikernel is much smaller than a comparable UNIX
virtual machine. By avoiding superfluous code we decrease the attack surface
2017-09-16 17:04:53 +00:00
immensly.
2017-09-16 22:08:57 +00:00
Consider the breakdown of the code of the example system [Bitcoin Piñata](/Projects/Pinata) compared
to a virtual machine using Linux and OpenSSL, measured in thousands of lines of code:
2017-09-16 21:52:27 +00:00
2017-09-16 21:59:10 +00:00
<table>
2017-09-16 21:57:52 +00:00
<tr><th></th><th>Linux</th><th>MirageOS</th></tr>
<tr><td>Kernel</td><td>1600</td><td>48</td></tr>
<tr><td>Runtime</td><td>689</td><td>25</td></tr>
<tr><td>Crypto</td><td>230</td><td>23</td></tr>
<tr><td>TLS</td><td>41</td><td>6</td></tr>
<tr><td>Total</td><td>2560</td><td>102</td></tr>
</table>
2017-09-16 17:04:53 +00:00
### Secure updates
2017-09-16 22:47:55 +00:00
If a security flaw is discovered in a library, and the library gets a security update,
2017-09-16 22:24:39 +00:00
all unikernels depending on this library need to be updated as well.
This can be done with the OCaml package manager.
It resolves dependencies and lets authors sign their releases,
so there is no need for a central package repository server.
2017-09-16 22:47:55 +00:00
Central repository servers are known targets for attackers and have been breached in the past, amongst them
the [Linux kernel](https://lwn.net/Articles/57135/), [FreeBSD
2017-09-16 22:24:39 +00:00
infrastructure](https://www.freebsd.org/news/2012-compromise.html),
2017-09-16 22:47:55 +00:00
[Debian](https://www.debian.org/News/2003/20031202) and
[PHP](http://php.net/archive/2013.php#id2013-10-24-2).
2017-09-16 15:46:56 +00:00
## Why OCaml
2017-09-16 22:47:55 +00:00
### Functional programming style
### Performance
2017-09-16 19:06:49 +00:00
2017-09-16 22:51:54 +00:00
OCaml code compiles to native code running in the OCaml runtime, which is
as performant as compiled C++ code. The OCaml runtime is just used for
2017-09-16 19:06:49 +00:00
memory management, and very small compared to a JVM or Python runtime. As
2017-09-16 22:51:54 +00:00
an example, our TLS library has up to 85% of the bulk throughput of OpenSSL (using
2017-09-16 23:19:15 +00:00
AES128-CBC). The TLS handshake performance is comparable with OpenSSL.
2017-09-16 19:06:49 +00:00
2017-09-16 22:47:55 +00:00
### Dependency management
2017-09-16 19:06:49 +00:00
2017-09-16 23:03:36 +00:00
MirageOS leverages OCaml's module system to adapt the unikernel to the compilation target.
Each operating system service in MirageOS is a module, for example the console, the
network stack, the random number generator.
Each of the services has multiple implementations that are chosen based on the target.
On UNIX, the sockets API of the host is used as networking stack. On a
unikernel, the TCP/IP stack natively implemented in OCaml is being used.
2017-09-16 22:47:55 +00:00
A MirageOS developer does not need to reason about compilation targets, just about the
2017-09-16 19:06:49 +00:00
module interface.
2017-09-16 23:20:16 +00:00
### Verification
2017-09-16 22:47:55 +00:00
2017-09-16 23:20:16 +00:00
A large subset of the OCaml semantics has been
[mechanized](http://www.cl.cam.ac.uk/~so294/ocaml/) in a theorem prover, and
this metatheory is verified.
2017-09-16 23:03:34 +00:00
2017-09-16 23:20:16 +00:00
OCaml is the implementation language of the well-known proof assistant
[Coq](https://coq.inria.fr). Development in Coq can be extracted to OCaml code,
as demonstrated by [compcert](http://compcert.inria.fr/), a formally verified
optimizing C compiler, in order to be compiled and executed. The other
direction is also possible: OCaml code can be translated into Coq definitions
(using [Coq of OCaml](https://github.com/clarus/coq-of-ocaml/)).
The National Cybersecurity Agency of France reviewed the implementation of the
OCaml runtime system, [their
report](http://www.ssi.gouv.fr/agence/publication/lafosec-securite-et-langages-fonctionnels/)
altered some language modifications, such as that strings are no longer mutable.
2017-09-16 17:04:53 +00:00
2017-09-16 22:47:55 +00:00
### Modern dialects and compile targets
2017-09-16 23:03:34 +00:00
OCaml is a mature programming language that is used both in
industry (Facebook, Jane Street Capital, Docker, ahrefs,
2017-09-16 23:20:16 +00:00
simcorp, lexifi) and academia.
2017-09-16 15:46:56 +00:00
In 2016, Facebook developed [reason](https://reasonml.github.io/), a dialect of
OCaml which syntax is closer to JavaScript, and easier to comprehend for
beginners. Reason and OCaml code can be easily combined in a single
application, since the same compiler is used.
2017-09-16 15:48:51 +00:00
2017-09-16 23:03:34 +00:00
More literature on why OCaml is a good choice in the modern world has been
written by Yaron Minsky (Jane Street) in the article [OCaml for the masses](http://queue.acm.org/detail.cfm?id=2038036), and more recently by the crypto-ledger [tezos](https://www.tezos.com/static/papers/position_paper.pdf).
2017-09-16 16:52:05 +00:00
## Current state and future directions
2017-09-16 17:04:53 +00:00
Many libraries developed in the MirageOS project are deployed by Docker for Mac
and Docker for Windows, which have more than 100000 active users.
Available libraries include an IPv4 stack (TCP, UDP, ARP), DHCP client and
server, DNS server and resolver (both recursive and forwarding), HTTP (including
webmachine for request routing, and sessions), syslog, git (both client and
server, with mutliple storage backends: block device, in-memory), prometheus
integration. A TLS library, including random number generator (Fortuna),
cryptographic primitives (RSA, DSA, DH, AES), X.509 (using ASN.1), was developed
3 years ago and is in production serving websites, plus some applications using
the client side. A prototype implementation for managing unikernels on the host
system is already deployed and actively used, similar to libvirt, but with a
minimised code base, and written in OCaml. Monitoring is done with
prometheus. <- TODO: das bedeutet structured data und cloud ready / scalable /
funktioniert in distributed system?
More libraries are under active development, this includes an OpenPGP
implementation, an ssh implementation, structured syslog, Cap'n proto (RPC with
support for capabilities).
OCaml can be compiled to JavaScript, which means projects can developed in a
single language to ensure consistency and avoid errors, but code can be executed
2017-09-16 18:01:54 +00:00
on the client or on the server.
2017-09-16 17:04:53 +00:00
The idea of unikernels is not limited to MirageOS, other projects apply the same
2017-09-16 18:01:54 +00:00
concept in different programming languages. HalVM - the Haskell ligthweight
2017-09-16 17:04:53 +00:00
virtual machine - was developed by Galois Inc., and is based on Haskell. It is
used for network services such as honeypots and secure IPSec gateways.
IncludeOS is a C++ based unikernel which was initially developed at University
of Oslo, and now further developed in a startup.
Other unikernels are listed on
[Wikipedia](https://en.wikipedia.org/wiki/Unikernel) and there is an exchange of
ideas between different approaches. Among all unikernels, MirageOS has been
around the longest, has the most libraries available and deployed to production,
has an active developer community, and the safe programming language makes it
suitable for secure systems.
MirageOS has a small trusted code base compared to other operating systems.
Apart from the CPU (and its virtualisation extensions, VT-x, VT-d, EPT), and the
hypervisor implementation.Dieser satzkein verb On top of the hypervisor in the
host system a tiny virtual machine monitor (solo5) is executed. It does not rely
on qemu or other emulation code, but only contains drivers needed for the actual
unikernel (block and network devices). The unikernel itself consists of roughly
2000 lines of C code with basic functions such as malloc, memcopy, memcmp, on
which the vanilla OCaml runtime is executed. On top of that, only OCaml code is
executed, which includes an asynchronous task engine, the mentioned TCP/IP
stack, and the concrete services.
TODO: vllt bullet points rausnehmen, eher das formal verification argument detailliert erklaeren?
The security of MirageOS unikernels is planned to be improved even more in
several areas:
2017-09-16 16:52:05 +00:00
- data segments will be be mapped read/write, code segments execute-only
- private key material will be zeroed before destruction
- the address space layout will be randomised to make exploitation harder
2017-09-16 17:04:53 +00:00
- MirageOS will be ported to (se)L4 as hypervisor to minimize the trusted code
running on the host system
- once open hardware (RISC-V) is widely available, MirageOS will use this as
target. There is already a RISC-V backend for OCaml
- OCaml will be compilable to Coq (an interactive theorem prover) definitions,
within which theorems about the code can be proven
2017-09-16 16:52:05 +00:00
- Coq code will also be extractable to OCaml.
2017-09-16 18:06:47 +00:00