Instead of the bhyve binary, a ~70kB small `ukvm-bin` binary (dynamically
linking libc) can be used which is the solo5 virtual machine monitor on the host
side.
Until now, I manually created and deployed virtual machines using shell scripts,
ssh logins, and a network file system shared with the FreeBSD virtual machine
which builds my MirageOS unikernels.
But there are several drawbacks with this approach, the biggest is that sharing
resources is hard - to enable a friend to run their unikernel on my server,
they'll need to have a user account, and even privileged permissions to
create virtual network interfaces and execute virtual machines.
To get rid of these ad-hoc shell scripts and copying of virtual machine images,
I developed an UNIX daemon which accomplishes the required work. This daemon
waits for (mutually!) authenticated network connections, and provides the
desired commands; to create a new virtual machine, to aquire a block device of a
given size, to destroy a virtual machine, to stream the console output of a
virtual machine.
## System design
The system bears minimalistic characteristics. The single interface to the
outside world is a TLS stream over TCP. Internally, there is a family of
processes, one of which has superuser privileges, communicating via unix domain
sockets. The processes do need any persistent storage (apart from the
revocation lists). A brief enumeration of the processes is provided below:
* `vmmd` (superuser privileges), which terminates TLS sessions, proxies messages, and creates and destroys virtual machines (including setup and teardown of network interfaces and virtual block devices)
* `vmm_stats` periodically gathers resource usage and network interface statistics
* `vmm_console` reads console output of every provided fifo, and stores this in a ringbuffer, replaying to a client on demand
* `vmm_log` consumes the event log (login, starting, and stopping of virtual machines)
The system uses X.509 certificates as tokens. These are authenticated key value
stores. There are four shapes of certificates: a *virtual machine certificate*
which embeds the entire virtual machine image, together with configuration
information (resource usage, how many and which network interfaces, block device
access); a *command certificate* (for interactive use, allowing (a subset of)
commands such as attaching to console output); a *revocation certificate* which
contains a list of revoked certificates; and a *delegation certificate* to
distribute resources to someone else (an intermediate CA certificate).
The resources which can be controlled are CPUs, memory consumption, block
storage, and access to bridge interfaces (virtual switches) - encoded in the
virtual machine and delegation certificates. Additionally, delegation
certificates can limit the number of virtual machines.
Leveraging the X.509 system ensures that the client always has to present a
certificate chain from the root certificate. Each intermediate certificate is a
delegation certificate, which may further restrict resources. The serial
numbers of the chain is used as unique identifier for each virtual machine and
other certificates. The chain restricts access of the leaf certificate as well:
only the subtree of the chain can be viewed. E.g. if there are delegations to
both Alice and Bob from the root certificate, they can not see each other
virtual machines.
Connecting to the vmmd requires a TLS client, a CA certificate, a leaf
certificate (and the delegation chain) and its private key. In the background,
it is a multi-step process using TLS: first, the client establishes a TLS
connection where it authenticates the server using the CA certificae, then the
server demands a TLS renegotiation where it requires the client to authenticate
with its leaf certificate and private key. Using renegotiation over the
encrypted channel prevents passive observers to see the client certificate in
clear.
Depending on the leaf certificate, the server logic is slightly different. A
command certificate opens an interactive session where - depending on
permissions encoded in the certificate - different commands can be issued: the
console output can be streamed, the event log can be viewed, virtual machines
can be destroyed, statistics can be collected, and block devices can be managed.
When a virtual machine certificate is presented, the desired resource usage is
checked against the resource policies in the delegation certificate chain and
the currently running virtual machines. If sufficient resources are free, the
embedded virtual machine is started. In addition to other resource information,
a delegation certificate may embed IP usage, listing the network configuration
(gateway and netmask), and which addresses you're supposed to use. Boot
arguments can be encoded in the certificate as well, they are just passed to the
virtual machine (for easy deployment of off-the-shelf systems).
If a revocation certificate is presented, the embodied revocation list is
verified, and stored on the host system. Revocation is enforced by destroying
any revoked virtual machines and terminating any revoked interactive sessions.
If a delegation certificate is revoked, additionally the connected block devices
are destroyed.
The maximum size of a virtual machine image embedded into a X.509 certificate
transferred over TLS is 2 ^ 24 - 1 bytes, roughly 16 MB. If this turns out to
be not sufficient, compression may help. Or staging of deployment.
## An example
Instructions on how to setup `vmmd` and the certificate authority are in the
README file of the [`vmm` git repository](https://github.com/hannesm/vmm). Here