179 lines
8.8 KiB
Text
179 lines
8.8 KiB
Text
---
|
|
title: Technology
|
|
---
|
|
|
|
We develop digital infrastructure with a minimal footprint. Where other approaches
|
|
try to patch general purpose operating systems by adding more layers of indirection,
|
|
we strive to build a secure system from the ground up.
|
|
|
|
Each piece of digital infrastructure or service is written in a high-level
|
|
memory-safe programming language and tailored to only contain the
|
|
required functionality at compilation time. This reduces the attack vectors
|
|
and the attack surface.
|
|
|
|
The resulting service is executed as a virtual machine on a modern hypervisor.
|
|
Its size is usually around 1-10 MB, much smaller than a UNIX / Linux system, and boots within milliseconds.
|
|
|
|
## MirageOS - bespoke operating systems
|
|
|
|
Our work is based on MirageOS, a suite to build operating systems. It has been developed
|
|
since 2009 at University of Cambridge, UK and is written in the programming language
|
|
OCaml (see [Why OCaml](#Why-OCaml)).
|
|
Most libraries are developed as open source (MIT/ISC/BSD2/Apache2).
|
|
|
|
MirageOS is a library operating system. It composes OCaml libraries into a
|
|
bespoke operating system, called a unikernel. A unikernel can be a compiled as a
|
|
UNIX binary, or a standalone virtual machine image. To build the right
|
|
unikernel for your custom business logic, we can pick from hundreds of libraries which
|
|
implement network protocols, storage on block devices, or interfaces to network devices
|
|
via the hypervisor.
|
|
|
|
On top of the hypervisor, a small layer of C code unifies
|
|
the interface on which OCaml runs.
|
|
|
|
OCaml is a functional programming language that minimizes side effects and mutable state.
|
|
Its functional programming concepts give us a list of security advantages for MirageOS.
|
|
|
|
## Running unikernel, system security
|
|
|
|
Aside from automated memory management to avoid memory corruption, and type checking to avoid many common
|
|
programming errors, the major advantage of functional programming is localized reasoning about program code.
|
|
All inputs, outputs and effects of a function are known.
|
|
Immutable datastructures and cooperative multitasking allow us to reason about the state of the entire system,
|
|
even if we use parallelism and complex distributed systems.
|
|
|
|
### Simple config management model with localized reasoning
|
|
|
|
There are three ways to feed a virtual machine with configuration data like
|
|
network configuration or TLS certificate and key:
|
|
- compile the information into the virtual machine image, which requires
|
|
recompilation on configuration change
|
|
- pass the information as boot parameters, which requires reboot on
|
|
configuration change
|
|
- store this information in a virtual block device attached to the
|
|
virtual machine.
|
|
|
|
For example, logs can be written from the unikernel to a syslog collector with UDP, TCP, or TLS as
|
|
transport. The transport needs to be chosen at compile time because TLS
|
|
requires the TLS library to be linked into the kernel image, but the log destination is passed
|
|
as boot parameter.
|
|
|
|
### Simple concurrency model with localized reasoning
|
|
|
|
MirageOS is an event based operating system with asynchronous tasks. A task
|
|
yields the CPU once its execution is finished, or if it has to wait for IO.
|
|
This concurrency model leads to a cooperative multitasking programming style,
|
|
rather than the error prone preemptive multitasking, where each code block needs
|
|
to make sure to use appropriate locking strategies to avoid reentrant execution errors.
|
|
|
|
A recent example for code which is not safe under reentrant execution
|
|
[in Ethereum](http://hackingdistributed.com/2016/06/18/analysis-of-the-dao-exploit/)
|
|
lead to a huge amount of ether being stolen.
|
|
|
|
Established software like the [Firefox JavaScript engine](http://www.nist.org/news.php?extend.175),
|
|
or [PHP](https://bugs.php.net/bug.php?id=74308) shows similar problems on a regular basis.
|
|
|
|
### Simple process memory model with localized reasoning
|
|
|
|
The virtual memory subsystem in contemporary operating systems provides an
|
|
address mapping for each process. Since a unikernel is only a single service, it
|
|
uses a single address space, avoiding the need for complex address mapping code
|
|
altogether.
|
|
|
|
[//]: # (I think we should explain the context for mentioning Xen here)
|
|
|
|
An example for corrupting the page table is [Xen's XSA-182](http://xenbits.xen.org/xsa/advisory-182.html).
|
|
|
|
### Simple library model with localized reasoning
|
|
|
|
A MirageOS unikernel is much smaller than a comparable UNIX
|
|
virtual machine. By avoiding superfluous code we decrease the attack surface
|
|
immensely.
|
|
|
|
Consider the breakdown of the code of the example system [Bitcoin Piñata](/Projects/Pinata) compared
|
|
to a virtual machine using Linux and OpenSSL, measured in thousands of lines of code:
|
|
|
|
<table>
|
|
<tr><th></th><th>Linux</th><th>MirageOS</th></tr>
|
|
<tr><td>Kernel</td><td>1600</td><td>48</td></tr>
|
|
<tr><td>Runtime</td><td>689</td><td>25</td></tr>
|
|
<tr><td>Crypto</td><td>230</td><td>23</td></tr>
|
|
<tr><td>TLS</td><td>41</td><td>6</td></tr>
|
|
<tr><td>Total</td><td>2560</td><td>102</td></tr>
|
|
</table>
|
|
|
|
### Secure updates
|
|
|
|
If a security flaw is discovered in a library, and there is a security update,
|
|
all unikernels depending on this library need to be updated as well.
|
|
This can be done with the [OCaml package manager](https://opam.ocaml.org).
|
|
It resolves dependencies and lets [authors sign their releases](https://github.com/hannesm/conex),
|
|
so there is no need for a central package repository server.
|
|
|
|
Central repository servers are known targets for attackers and have been breached in the past, amongst them
|
|
the [Linux kernel](https://lwn.net/Articles/57135/), [FreeBSD
|
|
infrastructure](https://www.freebsd.org/news/2012-compromise.html),
|
|
[Debian](https://www.debian.org/News/2003/20031202) and
|
|
[PHP](http://php.net/archive/2013.php#id2013-10-24-2).
|
|
|
|
## Why OCaml
|
|
|
|
### Functional programming style
|
|
|
|
As discussed in the system security paragraph, many security advantages of MirageOS are
|
|
based on the choice of a programming language in which problems can be solved in a functional style.
|
|
This style allows us to reason about the possible states of a system.
|
|
|
|
### Performance
|
|
|
|
OCaml code compiles to native code, which is
|
|
competitive, and comparable to compiled C code. As
|
|
an example, our [TLS library](https://usenix15.nqsb.io) has up to 85% of the bulk throughput of OpenSSL (using
|
|
AES128-CBC). The TLS handshake performance is comparable with OpenSSL.
|
|
|
|
### Dependency management
|
|
|
|
MirageOS leverages OCaml's module system to adapt the unikernel to the compilation target.
|
|
Each operating system service in MirageOS is a module, for example the console, the
|
|
network stack, the random number generator.
|
|
Each of the services has multiple implementations that are chosen based on the target.
|
|
On UNIX, the sockets API of the host is used as networking stack. On a
|
|
unikernel, the TCP/IP stack natively implemented in OCaml is being used.
|
|
A MirageOS developer does not need to reason about compilation targets, just about the
|
|
module interface.
|
|
|
|
### Verification
|
|
|
|
A large subset of the OCaml semantics has been
|
|
[mechanically proven sound](http://www.cl.cam.ac.uk/~so294/ocaml/) in a theorem prover.
|
|
|
|
OCaml is the implementation language of the well-known proof assistant
|
|
[Coq](https://coq.inria.fr). Development in Coq can be extracted to OCaml code,
|
|
as demonstrated by [compcert](http://compcert.inria.fr/), a formally verified
|
|
optimizing C compiler, in order to be compiled and executed. The other
|
|
direction is also possible: OCaml code can be translated into Coq definitions
|
|
(using [Coq of OCaml](https://github.com/clarus/coq-of-ocaml/)).
|
|
|
|
The National Cybersecurity Agency of France reviewed the implementation of the
|
|
OCaml runtime system, [their
|
|
report](http://www.ssi.gouv.fr/agence/publication/lafosec-securite-et-langages-fonctionnels/)
|
|
prompted some language modifications, such as that strings are no longer mutable.
|
|
|
|
### Modern dialects and compile targets
|
|
|
|
OCaml is a mature programming language that is used both in
|
|
industry (Facebook, Jane Street Capital, Docker, ahrefs,
|
|
simcorp, lexifi) and academia.
|
|
|
|
The OCaml compiler generates native code for x86, arm, etc., and has a bytecode
|
|
backend, which can target microcontrollers (PIC18 family in the [OcaPIC project](http://www.algo-prog.info/ocapic/web/?id=OCAPIC:OCAPIC)).
|
|
OCaml can also be compiled to JavaScript, so both client
|
|
and server side of a web application can be developed in the same language with shared interface code (more details at the [ocsigen project](http://ocsigen.org/)).
|
|
|
|
In 2016, Facebook developed [reason](https://reasonml.github.io/), a dialect of
|
|
OCaml which syntax is closer to JavaScript, and easier to comprehend for
|
|
beginners. Reason and OCaml code can be easily combined into a single
|
|
application, since they use the same compiler.
|
|
|
|
More literature on why OCaml is a good choice has been
|
|
written by Yaron Minsky (Jane Street) in the article [OCaml for the masses](http://queue.acm.org/detail.cfm?id=2038036), and more recently by the crypto-ledger [tezos](https://www.tezos.com/static/papers/position_paper.pdf).
|