This commit is contained in:
Hannes Mehnert 2017-09-16 18:04:53 +01:00
parent 42cdb93dda
commit 7ac00bd572
2 changed files with 216 additions and 85 deletions

61
Home
View file

@ -1,27 +1,56 @@
--- ---
title: Robust Open Bare Ubiquitous Resilient services title: Robust Open Bare-metal Ubiquitous Resilient
author: someone author: someone
abstract: some abstract
--- ---
TODO: vermutlich ist es marketingmaessig klug, folgende features besser zu verkaufen:
- targeted hypervisors: KVM, ... on amd64 and arm64; Xen on am64 and arm32
- compiles to JavaScript (for single-language web applications)
- high-level memory-safe language allows for quick turnaround cycles during development and feature addition
- visualisation / comparison of the trusted code base
- decent performance
- fast boot time
- several libraries already deployed by others (Docker)
- 'scalable cloud-ready' (cap'n proto, prometheus, ...)
- small footprint (MB of memory -> lots of per server)
- could spawn a VM for each user
- energy efficient (https://sites.google.com/view/energy-efficiency-languages/)
Computers on the Internet get compromised, Computers on the Internet get compromised,
- to access data on the computer (databases, passwords, user accounts) or block access to it (Ransomware), - to access data on the computer (databases, passwords, user accounts) or block
- to participate in DoS of other services (e.g. the Mirai botnet on SmartTVs) and manipulate opininon (chatbots). access to it (Ransomware),
- to participate in DoS of other services (e.g. the Mirai botnet on SmartTVs)
and manipulate opininon (chatbots).
The software stack is often missing critical security updates, most embedded devices (home router, SmartTV, etc.) have no secure update channel, they need a secure system from the start. The software stack is often missing critical security updates, most embedded
devices (home router, SmartTV, etc.) have no secure update channel, they need a
secure system from the start.
Recent security solutions focus on containing compromised software by using virtualisation technology and containers. But the root cause remains: running old systems that are insecure. Recent security solutions focus on containing compromised software by using
virtualisation technology and containers. But the root cause remains: running
old systems that are insecure.
Instead of trying to fix these decades-old operating systems, which were designed as multi-user time sharing systems of the past, Instead of trying to fix these decades-old operating systems, which were
we build small services from scratch with security in mind, directly on the virtualization layer. designed as multi-user time sharing systems of the past, we build small services
In our operating system, each service is a separate virtual machine with only the required code (usually no need for process- and user management, or a file system), and no copy of a multi-purpose operating system kernel. from scratch with security in mind, directly on the virtualization layer. In
our operating system, each service is a separate virtual machine with only the
required code (usually no need for process- and user management, or a file
system), and no copy of a multi-purpose operating system kernel.
This makes our systems much smaller. This makes our systems much smaller. The binary size of an HTTP server with TLS
The binary size of an HTTP server with TLS support in our system is around 4% compared to one in a conventional operating system, making the attack surface much smaller. support in our system is around 4% compared to one in a conventional operating
system, making the attack surface much smaller.
Additionally, we use a safe programming language: a statically and strongly typed functional programming language with automated memory management. Additionally, we use a safe programming language: a statically and strongly
This reduces the attack vectors: temporal and spatial memory corruption are no concern anymore. typed functional programming language with automated memory management. This
The declarative programming style makes it possible to formally verify the correctness of the entire virtual machine with a theorem prover. reduces the attack vectors: temporal and spatial memory corruption are no
concern anymore. The declarative programming style makes it possible to
formally verify the correctness of the entire virtual machine with a theorem
prover.
One of our single purpose operating systems boots within milliseconds, and has a tiny memory footprint. One of our single purpose operating systems boots within milliseconds, and has a
For client-side features that run in a webbrowser, we compile to JavaScript from the same codebase, to ensure consistency. The strong and static type system helps to detect errors early, and enables rapid prototyping. For production use the prototype code can be further optimized for performance. tiny memory footprint. For client-side features that run in a webbrowser, we
compile to JavaScript from the same codebase, to ensure consistency. The strong
and static type system helps to detect errors early, and enables rapid
prototyping. For production use the prototype code can be further optimized for
performance.

View file

@ -27,75 +27,133 @@ We discuss more reasons why we use OCaml [further down](#Why-OCaml).
## MirageOS ## MirageOS
Our work is based on MirageOS, a suite to develop operating systems (developed since 2009 at University of Cambridge, UK) written in the programming language OCaml (developed since 1996 at INRIA in Paris, France). Our work is based on MirageOS, a suite to develop operating systems (developed
Most libraries are developed under permissive licenses (MIT/ISC/BSD2/Apache2). The OCaml compiler is dual-licensed: LGPLv2 or BSD for consortium members (3000 EUR/year). since 2009 at University of Cambridge, UK) written in the programming language
OCaml (developed since 1996 at INRIA in Paris, France). Most libraries are
developed under permissive licenses (MIT/ISC/BSD2/Apache2). The OCaml compiler
is dual-licensed: LGPLv2 or BSD for consortium members (3000 EUR/year).
MirageOS is a library operating system. It composes OCaml libraries into a bespoke operating system, called a unikernel. A unikernel can be a compiled as a UNIX binary, or a standalone virtual machine image. To build exactly the right unikernel for each purpose, we can pick from hundreds of libraries which implement protocols, storage on block devices, or interfaces to network devices provided by the hypervisor. MirageOS is a library operating system. It composes OCaml libraries into a
bespoke operating system, called a unikernel. A unikernel can be a compiled as a
UNIX binary, or a standalone virtual machine image. To build exactly the right
unikernel for each purpose, we can pick from hundreds of libraries which
implement protocols, storage on block devices, or interfaces to network devices
provided by the hypervisor.
On top of the hypervisor, a small layer of C code (developed at IBM research) is used to unify the interface between the different hypervisors, on which the OCaml runtime system is executed. On top of the hypervisor, a small layer of C code (developed at IBM research) is
used to unify the interface between the different hypervisors, on which the
OCaml runtime system is executed.
OCaml is a functional programming language with automated memory management, preventing manual memory management errors. OCaml is a functional programming language with automated memory management,
The strong and expressive type system of OCaml catches most programming errors already at compile time. preventing manual memory management errors. The strong and expressive type
We use a declarative style with immutable data structures and memory to avoid side effects that are hard to reason about. Errors are expressed as part of the type signature. IO is contained on top of the protocol logic. An implementation of a protocol can be used both as executable code, and as a test oracle. system of OCaml catches most programming errors already at compile time. We use
a declarative style with immutable data structures and memory to avoid side
effects that are hard to reason about. Errors are expressed as part of the type
signature. IO is contained on top of the protocol logic. An implementation of a
protocol can be used both as executable code, and as a test oracle.
Most security problems for network services arise in parsers of received network data. Most security problems for network services arise in parsers of received network
OCaml allows us to write strict parsers, which return success and error on the type system level to ensure that the caller handles all cases. data. OCaml allows us to write strict parsers, which return success and error
If a parser contains an error, in our system the impact is local, it cannot access memory beyond the the network data. on the type system level to ensure that the caller handles all cases. If a
Errors in parsers written in unsafe languages often lead to buffer overflows, which can lead to remote code execution. parser contains an error, in our system the impact is local, it cannot access
memory beyond the the network data. Errors in parsers written in unsafe
languages often lead to buffer overflows, which can lead to remote code
execution.
OCaml code is compiled to native code running in the OCaml runtime, which is very performant, on par with C++ code. OCaml code is compiled to native code running in the OCaml runtime, which is
The OCaml runtime is just used for memory management, and very small compared to a JVM or Python runtime. very performant, on par with C++ code. The OCaml runtime is just used for
As example, our TLS library has up to 85% of the bulk throughput of OpenSSL (using AES128-CBC). The TLS handshake performance is equal with OpenSSL. memory management, and very small compared to a JVM or Python runtime. As
OBWOHL AUF BESTIMMTE KONTEXTE BESCHRAENKT/NICHT ALLERWELTSPRACHE DA LERNINTENSIV? OCaml is known as a mature and safe programming language that is used in both industry (facebook for compilers, jane street for trading, docker, ahrefs, simcorp, lexifi) and academia (coq, compcert, ...). example, our TLS library has up to 85% of the bulk throughput of OpenSSL (using
AES128-CBC). The TLS handshake performance is equal with OpenSSL.
TODO: OBWOHL AUF BESTIMMTE KONTEXTE BESCHRAENKT/NICHT ALLERWELTSPRACHE DA LERNINTENSIV?
OCaml is known as a mature and safe programming language that is used in both
industry (facebook for compilers, jane street for trading, docker, ahrefs,
simcorp, lexifi) and academia (coq, compcert, ...).
2. Module system & Compilation ### Module system & Compilation
OCaml has a unique module system, in which a module specifies abstract datatypes and functions, and each module can have multiple implementations. Modules can take other modules as parameters, the module system is a complete programming language, evaluated at compile time.
MirageOS uses this module system as a powerful abstraction mechanism to adapt the unikernel to the compilation target.
It defines modules for all operating system services, such as the console, the network stack, the random number generator.
For each service, an implementation can be provided depending on the compilation target (UNIX process or virtual machine).
On UNIX, the sockets API is used as the networking stack. On a virtual machine, the TCP/IP stack in OCaml is being used.
By leveraging the module system of OCaml for MirageOS, we get module separation and dependency analysis from the well-tested module system of OCaml and avoid reimplementing these error-prone and important parts. OCaml has a unique module system, in which a module specifies abstract datatypes
A MirageOS unikernel developer does not need to reason about compilation targets, just about the module interface. and functions, and each module can have multiple implementations. Modules can
take other modules as parameters, the module system is a complete programming
language, evaluated at compile time. MirageOS uses this module system as a
powerful abstraction mechanism to adapt the unikernel to the compilation target.
It defines modules for all operating system services, such as the console, the
network stack, the random number generator. For each service, an implementation
can be provided depending on the compilation target (UNIX process or virtual
machine). On UNIX, the sockets API is used as the networking stack. On a
virtual machine, the TCP/IP stack in OCaml is being used.
3. MirageOS: Running a unikernel & system security By leveraging the module system of OCaml for MirageOS, we get module separation
and dependency analysis from the well-tested module system of OCaml and avoid
reimplementing these error-prone and important parts. A MirageOS unikernel
developer does not need to reason about compilation targets, just about the
module interface.
3.0 Simple config management model with localized reasoning ## MirageOS: Running a unikernel & system security
There are three ways to feed a virtual machine with configuration data, such as network configuration or TLS certificate and key.
- Compile the information into the virtual machine image, which requires recompilation on configuration change.
- Pass the information as boot parameters, which requires reboot on configuration change.
- Store this information in a virtual block device which is attached to the virtual machine.
For example, logs can be exfiltrated using syslog with UDP, TCP, or TLS as transport. The transport to use has to be chosen at compile time because TLS requires the TLS library to be linked in, but the log destination can be passed as boot parameter to the unikernel. ### Simple config management model with localized reasoning
3.1 Simple concurrency model with localized reasoning There are three ways to feed a virtual machine with configuration data, such as
MirageOS is an event based operating system with asynchronous tasks. A task yields the CPU once its execution is finished, or if it has to wait for IO. network configuration or TLS certificate and key.
This concurrency model leads to a cooperative multitasking programming style, rather than the error prone preemptive multitasking where each code block needs to be hardened to allow reentrant execution. - Compile the information into the virtual machine image, which requires
recompilation on configuration change.
- Pass the information as boot parameters, which requires reboot on
configuration change.
- Store this information in a virtual block device which is attached to the
virtual machine.
For example .. For example, logs can be exfiltrated using syslog with UDP, TCP, or TLS as
transport. The transport to use has to be chosen at compile time because TLS
requires the TLS library to be linked in, but the log destination can be passed
as boot parameter to the unikernel.
3.2 Simple process memory model with localized reasoning ### Simple concurrency model with localized reasoning
The virtual memory subsystem in contemporary operating systems provides an address mapping for each process. Since a unikernel is only a single service, it uses a single address space, avoiding the need for complex address mapping code altogether.
For example .. MirageOS is an event based operating system with asynchronous tasks. A task
yields the CPU once its execution is finished, or if it has to wait for IO.
This concurrency model leads to a cooperative multitasking programming style,
rather than the error prone preemptive multitasking where each code block needs
to be hardened to allow reentrant execution.
3.3 Simple library model with localized reasoning TODO: For example ..
As stated above, a MirageOS unikernel is much smaller than a comparable UNIX virtual machine, since only the required libraries are linked into the virtual machine image. By avoiding superfluous code we decrease the attack surface immensly.
Example: lines of code / binary size fuer zb dns server / dhcp server ### Simple process memory model with localized reasoning
The choice of programming language avoids several attack vectors. Immutable data structures, type checking and the OCaml runtime memory manager ensure memory safety in OCaml. The virtual memory subsystem in contemporary operating systems provides an
By minimising each unikernel to its minimal footprint, security breaches are contained to the information the unikernel contains. address mapping for each process. Since a unikernel is only a single service, it
uses a single address space, avoiding the need for complex address mapping code
altogether.
3.4 Secure updates TODO: For example ..
If an OCaml library introduces security flaws or information leakage, all unikernels depending on that library need to be updated.
Updating an OCaml library can safely be done via its package manager opam, which uses signed repositories.
For example .. ### Simple library model with localized reasoning
As stated above, a MirageOS unikernel is much smaller than a comparable UNIX
virtual machine, since only the required libraries are linked into the virtual
machine image. By avoiding superfluous code we decrease the attack surface
immensly.
TODO: Example: lines of code / binary size fuer zb dns server / dhcp server
The choice of programming language avoids several attack vectors. Immutable data
structures, type checking and the OCaml runtime memory manager ensure memory
safety in OCaml. By minimising each unikernel to its minimal footprint,
security breaches are contained to the information the unikernel contains.
### Secure updates
If an OCaml library introduces security flaws or information leakage, all
unikernels depending on that library need to be updated. Updating an OCaml
library can safely be done via its package manager opam, which uses signed
repositories.
TODO: For example ..
## Why OCaml ## Why OCaml
TODO: OCaml runtime vom franz. BSI reviewed, solo5 noch kein wirkliches review
OCaml code can be very fast (our TLS implementation reaches up to OCaml code can be very fast (our TLS implementation reaches up to
85% of the throughput of OpenSSL), and compiles either to native code on various 85% of the throughput of OpenSSL), and compiles either to native code on various
architectures or to bytecode. It can even compile to JavaScript. OCaml is architectures or to bytecode. It can even compile to JavaScript. OCaml is
@ -116,45 +174,89 @@ Links:
## Current state and future directions ## Current state and future directions
Many libraries developed in the MirageOS project are deployed by Docker for Mac and Docker for Windows, which have more than 100000 active users. Many libraries developed in the MirageOS project are deployed by Docker for Mac
and Docker for Windows, which have more than 100000 active users.
Available libraries include an IPv4 stack (TCP, UDP, ARP), DHCP client and server, DNS server and resolver (both recursive and forwarding), HTTP (including webmachine for request routing, and sessions), syslog, git (both client and server, with mutliple storage backends: block device, in-memory), prometheus integration. Available libraries include an IPv4 stack (TCP, UDP, ARP), DHCP client and
A TLS library, including random number generator (Fortuna), cryptographic primitives (RSA, DSA, DH, AES), X.509 (using ASN.1), was developed 3 years ago and is in production serving websites, plus some applications using the client side. server, DNS server and resolver (both recursive and forwarding), HTTP (including
A prototype implementation for managing unikernels on the host system is already deployed and actively used, similar to libvirt, but with a minimised code base, and written in OCaml. Monitoring is done with prometheus. <- das bedeutet structured data und cloud ready / scalable / funktioniert in distributed system? webmachine for request routing, and sessions), syslog, git (both client and
server, with mutliple storage backends: block device, in-memory), prometheus
integration. A TLS library, including random number generator (Fortuna),
cryptographic primitives (RSA, DSA, DH, AES), X.509 (using ASN.1), was developed
3 years ago and is in production serving websites, plus some applications using
the client side. A prototype implementation for managing unikernels on the host
system is already deployed and actively used, similar to libvirt, but with a
minimised code base, and written in OCaml. Monitoring is done with
prometheus. <- TODO: das bedeutet structured data und cloud ready / scalable /
funktioniert in distributed system?
More libraries are under active development, this includes an OpenPGP implementation, an ssh implementation, structured syslog, Cap'n proto (RPC with support for capabilities). More libraries are under active development, this includes an OpenPGP
implementation, an ssh implementation, structured syslog, Cap'n proto (RPC with
support for capabilities).
OCaml can be compiled to JavaScript, which means projects can developed in a single language to ensure consistency and avoid errors, but code can be executed on the client, or on the server. OCaml can be compiled to JavaScript, which means projects can developed in a
single language to ensure consistency and avoid errors, but code can be executed
on the client, or on the server.
The idea of unikernels is not limited to MirageOS, other projects apply the same methodology in different programming languages. The idea of unikernels is not limited to MirageOS, other projects apply the same
HalVM - the Haskell ligthweight virtual machine - was developed by Galois Inc., and is based on Haskell. It is used for network services such as honeypots and secure IPSec gateways. methodology in different programming languages. HalVM - the Haskell ligthweight
virtual machine - was developed by Galois Inc., and is based on Haskell. It is
used for network services such as honeypots and secure IPSec gateways.
IncludeOS is a C++ based unikernel which was initially developed at University of Oslo, and now further developed in a startup. IncludeOS is a C++ based unikernel which was initially developed at University
of Oslo, and now further developed in a startup.
Other unikernels are listed on Wikipedia https://en.wikipedia.org/wiki/Unikernel and there is an exchange of ideas between different approaches. Among all unikernels, MirageOS has been around the longest, has the most libraries available and deployed to production, has an active developer community, and the safe programming language makes it suitable for secure systems. Other unikernels are listed on
[Wikipedia](https://en.wikipedia.org/wiki/Unikernel) and there is an exchange of
ideas between different approaches. Among all unikernels, MirageOS has been
around the longest, has the most libraries available and deployed to production,
has an active developer community, and the safe programming language makes it
suitable for secure systems.
MirageOS has a small trusted code base compared to other operating systems. Apart from the CPU (and its virtualisation extensions, VT-x, VT-d, EPT), and the hypervisor implementation.Dieser satzkein verb MirageOS has a small trusted code base compared to other operating systems.
On top of the hypervisor in the host system a tiny virtual machine monitor (solo5) is executed. It does not rely on qemu or other emulation code, but only contains drivers needed for the actual unikernel (block and network devices). Apart from the CPU (and its virtualisation extensions, VT-x, VT-d, EPT), and the
The unikernel itself consists of roughly 2000 lines of C code with basic functions such as malloc, memcopy, memcmp, on which the vanilla OCaml runtime is executed. hypervisor implementation.Dieser satzkein verb On top of the hypervisor in the
On top of that, only OCaml code is executed, which includes an asynchronous task engine, the mentioned TCP/IP stack, and the concrete services. host system a tiny virtual machine monitor (solo5) is executed. It does not rely
on qemu or other emulation code, but only contains drivers needed for the actual
unikernel (block and network devices). The unikernel itself consists of roughly
2000 lines of C code with basic functions such as malloc, memcopy, memcmp, on
which the vanilla OCaml runtime is executed. On top of that, only OCaml code is
executed, which includes an asynchronous task engine, the mentioned TCP/IP
stack, and the concrete services.
The security of MirageOS unikernels is planned to be improved even more in several areas: TODO: vllt bullet points rausnehmen, eher das formal verification argument detailliert erklaeren?
The security of MirageOS unikernels is planned to be improved even more in
several areas:
- data segments will be be mapped read/write, code segments execute-only - data segments will be be mapped read/write, code segments execute-only
- private key material will be zeroed before destruction - private key material will be zeroed before destruction
- the address space layout will be randomised to make exploitation harder - the address space layout will be randomised to make exploitation harder
- MirageOS will be ported to (se)L4 as hypervisor to minimize the trusted code running on the host system - MirageOS will be ported to (se)L4 as hypervisor to minimize the trusted code
- once open hardware (RISC-V) is widely available, MirageOS will use this as target. There is already a RISC-V backend for OCaml running on the host system
- OCaml will be compilable to Coq (an interactive theorem prover) definitions, within which theorems about the code can be proven - once open hardware (RISC-V) is widely available, MirageOS will use this as
target. There is already a RISC-V backend for OCaml
- OCaml will be compilable to Coq (an interactive theorem prover) definitions,
within which theorems about the code can be proven
- Coq code will also be extractable to OCaml. - Coq code will also be extractable to OCaml.
## Conclusion ## Conclusion
MirageOS started as a research project, and has matured to a full suite for building secure operating systems, with libraries that work well in production and cover a variety of application needs. MirageOS started as a research project, and has matured to a full suite for
MirageOS is a game changer for secure network services, since the attack surface is minimised to 1% of the size of other contemporary operating systems. In addition, common attack vectors are avoided by the usage of a programming language with memory safety. building secure operating systems, with libraries that work well in production
A unikernel boots within tens of milliseconds, and services can be spawned on demand. When a request (e.g. a DNS request) for a unikernel comes in, the kernel boots up, handles the request, and is destroyed after an inactivity period. Only the necessary services need to run, and they can be short-lived to minimize state in the system. and cover a variety of application needs. MirageOS is a game changer for secure
network services, since the attack surface is minimised to 1% of the size of
other contemporary operating systems. In addition, common attack vectors are
avoided by the usage of a programming language with memory safety. A unikernel
boots within tens of milliseconds, and services can be spawned on demand. When a
request (e.g. a DNS request) for a unikernel comes in, the kernel boots up,
handles the request, and is destroyed after an inactivity period. Only the
necessary services need to run, and they can be short-lived to minimize state in
the system.
The choice of a high-level programming language also allows for rapid prototyping, new features can be developed quickly. The choice of a high-level programming language also allows for rapid
In contrast to scripting languages, the code does not need to be re-implemented for production use (but nevertheless can be fine-tuned for performance). prototyping, new features can be developed quickly. In contrast to scripting
languages, the code does not need to be re-implemented for production use (but
nevertheless can be fine-tuned for performance).
WHY YOU NEED THIS! WHY YOU NEED THIS!