Compare commits
2 commits
e9a3a6fe40
...
8c3a9220c2
Author | SHA1 | Date | |
---|---|---|---|
8c3a9220c2 | |||
009a734fae |
2 changed files with 711 additions and 0 deletions
247
articles/2024-08-21-OpenVPN-and-MirageVPN.markdown
Normal file
247
articles/2024-08-21-OpenVPN-and-MirageVPN.markdown
Normal file
|
@ -0,0 +1,247 @@
|
||||||
|
---
|
||||||
|
date: 2024-08-21
|
||||||
|
article.title: MirageVPN and OpenVPN
|
||||||
|
article.description: Discoveries made implementing MirageVPN, a OpenVPN-compatible VPN library
|
||||||
|
tags:
|
||||||
|
- MirageVPN
|
||||||
|
- OpenVPN
|
||||||
|
- security
|
||||||
|
author:
|
||||||
|
name: Reynir Björnsson
|
||||||
|
email: reynir@reynir.dk
|
||||||
|
link: https://reyn.ir/
|
||||||
|
---
|
||||||
|
At [Robur][robur] we have been busy at work implementing our OpenVPN™-compatible MirageVPN software.
|
||||||
|
Recently we have implemented the [server side][miragevpn-server].
|
||||||
|
In order to implement this side of the protocol I studied parts of the OpenVPN™ source code and performed experiments to understand what the implementation does at the protocol level.
|
||||||
|
Studying the OpenVPN™ implementation has lead me to discover two security issues: CVE-2024-28882 and CVE-2024-5594.
|
||||||
|
In this article I will talk about the relevant parts of the protocol, and describe the security issues in detail.
|
||||||
|
|
||||||
|
A VPN establishes a secure tunnel in which (usually) IP packets are sent.
|
||||||
|
The OpenVPN protocol establishes a TLS tunnel[^openvpn-tls] with which key material and configuration options are negotiated.
|
||||||
|
Once established the TLS tunnel is used to exchange so-called control channel messages.
|
||||||
|
They are NUL-terminated (well, more on that later) text messages sent in a single TLS record frame (mostly, more on that later).
|
||||||
|
|
||||||
|
I will describe two (groups) of control channel messages (and a bonus control channel message):
|
||||||
|
|
||||||
|
* `EXIT`, `RESTART`, and `HALT`
|
||||||
|
* `PUSH_REQUEST` / `PUSH_REPLY`
|
||||||
|
* (`AUTH_FAILED`)
|
||||||
|
|
||||||
|
The `EXIT`, `RESTART`, and `HALT` messages share similarity.
|
||||||
|
They are all three used to signal to the client that it should disconnect[^disconnect] from the server.
|
||||||
|
`HALT` tells the client to disconnect and suggests the client should terminate.
|
||||||
|
`RESTART` also tells the client to disconnect and suggests the client can reconnect either to the same server or the next server if multiple are configured depending on flags in the message.
|
||||||
|
`EXIT` tells the *peer* that it is exiting and the *peer* should disconnect.
|
||||||
|
The last one can be sent by either the server or the client and is useful when the underlying transport is UDP.
|
||||||
|
It informs the peer that the sender is exiting and will (soon) not be receiving and ACK'ing messages; for UDP the peer would otherwise (re)send messages until a timeout.
|
||||||
|
|
||||||
|
Because the underlying transport can either be TCP or UDP the sender may have no guarantees that the message arrives.
|
||||||
|
OpenVPN's control channel implements a reliable layer with ACKs and retransmissions to work around that.
|
||||||
|
To accomodate this OpenVPN™ will wait five seconds before disconnecting to allow for retransmission of the exit message.
|
||||||
|
|
||||||
|
### The bug
|
||||||
|
|
||||||
|
While I was working on implementing more control channel message types I modified a client application that connects to a server and sends pings over the tunnel - instead of ICMPv4 echo requests I modified it to send the `EXIT` control channel message once a second.
|
||||||
|
In the server logs I saw that the server successfully received the `EXIT` message!
|
||||||
|
But nothing else happened.
|
||||||
|
The server just kept receiving `EXIT` messages but for some reason it never disconnected the client.
|
||||||
|
|
||||||
|
Curious about this behavior I dived into the OpenVPN™ source code and found that on each `EXIT` message it (re)schedules an exit (disconnect) timer! That is, every time the server receives an `EXIT` message it'll go "OK! I'll shut down this connection in five seconds" forgetting it promised to do so earlier, too.
|
||||||
|
|
||||||
|
### Implications
|
||||||
|
|
||||||
|
At first this seemed like a relatively benign bug.
|
||||||
|
What's the worst that could happen if a client says "let's stop in five second! No, five seconds from now! No, five seconds from now!" etc?
|
||||||
|
Well, it turns out the same timer is used when the server sends an exit message.
|
||||||
|
Ok, so what?
|
||||||
|
The client can hold open a resource it *was* authorized to use *longer*.
|
||||||
|
So we have a somewhat boring potential denial of service attack.
|
||||||
|
|
||||||
|
Then I learned more about the management interface.
|
||||||
|
The management interface is a text protocol to communicate with the OpenVPN server (or client) and query for information or send commands.
|
||||||
|
One command is the `client-kill` command.
|
||||||
|
The documentation says to use this command to "[i]mmediately kill a client instance[...]".
|
||||||
|
In practice it sends an exit message to the client (either a custom one or the default `RESTART`).
|
||||||
|
I learnt that it shares code paths with the exit control messages to schedule an exit (disconnect)[^kill-immediately].
|
||||||
|
That is, `client-kill` schedules the same five second timer.
|
||||||
|
|
||||||
|
Thus a malicious client can, instead of exiting on receiving an exit or `RESTART` message, send back repeatedly `EXIT` to the server to reset the five second timer.
|
||||||
|
This way the client can indefinitely delay the exit/disconnect assuming sufficiently stable and responsive network.
|
||||||
|
This is suddenly not so good.
|
||||||
|
The application using the management interface might be enforcing a security policy which we can now circumvent!
|
||||||
|
The client might be a former employee in a company, and the security team might want to revoke access to the internal network for the former employee, and in that process uses `client-kill` to kick off all of his connecting clients.
|
||||||
|
The former employee, if prepared, can circumvent this by sending back `EXIT` messages repeatedly and thus keep unauthorized access.
|
||||||
|
Or a commercial VPN service may try to enforce a data transfer limit with the same mechanism which is then rather easily circumvented by sending `EXIT` messages.
|
||||||
|
|
||||||
|
Does anyone use the management interface in this way?
|
||||||
|
I don't know.
|
||||||
|
If you do or are aware of software that enforces policies this way please do reach out to [me][contact].
|
||||||
|
It would be interesting to hear and discuss.
|
||||||
|
The OpenVPN security@ mailing list took it seriously enough to assign it CVE-2024-28882.
|
||||||
|
|
||||||
|
## OpenVPN configuration language
|
||||||
|
|
||||||
|
Next up we have `PUSH_REQUEST` / `PUSH_REPLY`.
|
||||||
|
As the names suggest it's a request/response protocol.
|
||||||
|
It is used to communicate configuration options from the server to the client.
|
||||||
|
These options include routes, ip address configuration, negotiated cryptographic algorithms.
|
||||||
|
The client signals it would like to receive configuration options from the server by sending the `PUSH_REQUEST` control channel message[^proto-push-request].
|
||||||
|
The server then sends a `PUSH_REPLY` message.
|
||||||
|
|
||||||
|
The format of a `PUSH_REPLY` message is `PUSH_REPLY,` followed by a comma separated list of OpenVPN configuration directives terminated by a NUL byte as in other control channel messages.
|
||||||
|
Note that this means pushed configuration directives cannot contain commas.
|
||||||
|
|
||||||
|
When implementing the `push` server configuration directive, which tells the server to send the parameter of `push` as a configuration option to the client in the `PUSH_REPLY`, I studied how exactly OpenVPN™ parses configuration options.
|
||||||
|
I learned some quirks of the configuration language which I find surprising and somewhat hard to implement.
|
||||||
|
I will not cover all corners of the configuration language.
|
||||||
|
|
||||||
|
In some sense you could say the configuration language of OpenVPN™ is line based.
|
||||||
|
At least, the first step to parsing configuration directives as OpenVPN 2.X does is to read one line at a time and parse it as one configuration directive[^inline-files].
|
||||||
|
A line is whatever `fgets()` says it is - this includes the newline if not at the end of the file[^configuration-newlines].
|
||||||
|
This is how it is for configuration files.
|
||||||
|
However, if it is a `PUSH_REPLY` a *"line"* is the text string up to a comma or the end of file (or, importantly, a NUL byte).
|
||||||
|
This "line" tokenization is done by repeatedly calling OpenVPN™'s `buf_parse(buf, ',', line, sizeof(line))` function.
|
||||||
|
|
||||||
|
```C
|
||||||
|
/* file: src/openvpn/buffer.c */
|
||||||
|
bool
|
||||||
|
buf_parse(struct buffer *buf, const int delim, char *line, const int size)
|
||||||
|
{
|
||||||
|
bool eol = false;
|
||||||
|
int n = 0;
|
||||||
|
int c;
|
||||||
|
|
||||||
|
ASSERT(size > 0);
|
||||||
|
|
||||||
|
do
|
||||||
|
{
|
||||||
|
c = buf_read_u8(buf);
|
||||||
|
if (c < 0)
|
||||||
|
{
|
||||||
|
eol = true;
|
||||||
|
}
|
||||||
|
if (c <= 0 || c == delim)
|
||||||
|
{
|
||||||
|
c = 0;
|
||||||
|
}
|
||||||
|
if (n >= size)
|
||||||
|
{
|
||||||
|
break;
|
||||||
|
}
|
||||||
|
line[n++] = c;
|
||||||
|
}
|
||||||
|
while (c);
|
||||||
|
|
||||||
|
line[size-1] = '\0';
|
||||||
|
return !(eol && !strlen(line));
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
`buf_parse()` takes a `struct buffer*` which is a pointer to a byte array with a offset and length field, a delimiter character (in our case `','`), a destination buffer `line` and its length `size`.
|
||||||
|
It calls `buf_read_u8()` which returns the first character in the buffer and advances the offset and decrements the length, or returns `-1` if the buffer is empty.
|
||||||
|
In essence, `buf_parse()` "reads" from the buffer and copies over to `line` until it encounters `delim`, a NUL byte or the end of the buffer.
|
||||||
|
In that case a NUL byte is written to `line`.
|
||||||
|
|
||||||
|
What is interesting is that a NUL byte is effectively considered a delimiter, too, and that it is consumed by `buf_parse()`.
|
||||||
|
Next, let's look at how incoming control channel messages are handled (modified for brevity):
|
||||||
|
|
||||||
|
```C
|
||||||
|
/* file: src/openvpn/forward.c (before fix) */
|
||||||
|
/*
|
||||||
|
* Handle incoming configuration
|
||||||
|
* messages on the control channel.
|
||||||
|
*/
|
||||||
|
static void
|
||||||
|
check_incoming_control_channel(struct context *c, struct buffer buf)
|
||||||
|
{
|
||||||
|
/* force null termination of message */
|
||||||
|
buf_null_terminate(&buf);
|
||||||
|
|
||||||
|
/* enforce character class restrictions */
|
||||||
|
string_mod(BSTR(&buf), CC_PRINT, CC_CRLF, 0);
|
||||||
|
|
||||||
|
if (buf_string_match_head_str(&buf, "AUTH_FAILED"))
|
||||||
|
{
|
||||||
|
receive_auth_failed(c, &buf);
|
||||||
|
}
|
||||||
|
else if (buf_string_match_head_str(&buf, "PUSH_"))
|
||||||
|
{
|
||||||
|
incoming_push_message(c, &buf);
|
||||||
|
}
|
||||||
|
/* SNIP */
|
||||||
|
}
|
||||||
|
```
|
||||||
|
|
||||||
|
First, the buffer is ensured to be NUL terminated by replacing the last byte with a NUL byte.
|
||||||
|
This is already somewhat questionable as it could make an otherwise invalid message valid.
|
||||||
|
Next, character class restrictions are "enforced".
|
||||||
|
What this roughly does is removing non-printable characters and carriage returns and line feeds from the C string.
|
||||||
|
The macro `BSTR()` returns the underlying buffer behind the `struct buffer` with the offset added.
|
||||||
|
Notably, `string_mod()` works on (NUL terminated) C strings and not `struct buffer`s.
|
||||||
|
As an example, the string (with the usual C escape sequences):
|
||||||
|
|
||||||
|
"PUSH_REPLY,line \nfeeds\n,are\n,removed\n\000"
|
||||||
|
|
||||||
|
becomes
|
||||||
|
|
||||||
|
"PUSH_REPLY,line feeds,are,removed\000ed\n\000"
|
||||||
|
|
||||||
|
As you can see, if interpreted as a C string we have removed the line feeds.
|
||||||
|
But what is this at the end?
|
||||||
|
It is the same last 4 bytes from the original string.
|
||||||
|
More generally, it is the last N bytes from the original string if the original string has N line feeds (or other disallowed characters).
|
||||||
|
|
||||||
|
The whole buffer is still passed to the push reply parsing.
|
||||||
|
Remember that the "line" parser will not only consume commas as the line delimiter, but also NUL bytes!
|
||||||
|
This means the configuration directives are parsed as lines:
|
||||||
|
|
||||||
|
```C
|
||||||
|
"line feeds"
|
||||||
|
"are"
|
||||||
|
"removed"
|
||||||
|
"ed\n"
|
||||||
|
```
|
||||||
|
|
||||||
|
With this technique we can now inject (almost; the exception is NUL) arbitrary bytes as configuration directive lines.
|
||||||
|
This is bad because the configuration directive is printed to the console if it doesn't parse.
|
||||||
|
As a proof of concept I sent a `PUSH_REPLY` with an embedded BEL character, and the OpenVPN™ client prints to console (abbreviated):
|
||||||
|
|
||||||
|
Unrecognized option or missing or extra parameter(s): ^G
|
||||||
|
|
||||||
|
The `^G` is how the BEL character is printed in my terminal.
|
||||||
|
I was also able to hear an audible bell.
|
||||||
|
|
||||||
|
A more thorough explanation on how terminal escape sequences can be exploited can be found on [G-Reasearch's blog](https://www.gresearch.com/news/g-research-the-terminal-escapes/).
|
||||||
|
|
||||||
|
### The fix
|
||||||
|
|
||||||
|
The fix also is also a first step towards decoupling the control channel messaging from the TLS record frames.
|
||||||
|
First, the data is split on NUL bytes in order to get the control channel message(s), and then messages are rejected if they contain illegal characters.
|
||||||
|
This solves the vulnerability described previously.
|
||||||
|
|
||||||
|
Unfortunately, it turns out that especially for the `AUTH_FAILED` control channel message it is easy to create invalid messages:
|
||||||
|
If 2FA is implemented using the script mechanism sending custom messages they easily end with a newline asking the client to enter the verification code.
|
||||||
|
I believe in 2.6.12 the client tolerates trailing newline characters.
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
The first bug, the timer rescheduling bug, is at least 20 years old!
|
||||||
|
It hasn't always been exploitable, but the bug itself goes back as far as the git history does.
|
||||||
|
I haven't attempted further software archeology to find the exact time of introduction.
|
||||||
|
Either way, it's old and gone unnoticed for quite a while.
|
||||||
|
|
||||||
|
I think this shows that diversity in implementations is a great way to exercise corner cases, push forward (protocol) documentation efforts and get thorough code review by motivated peers.
|
||||||
|
This work was funded by [the EU NGI Assure Fund through NLnet](https://nlnet.nl/project/MirageVPN/).
|
||||||
|
In my opinion, this shows that funding one open source project can have a positive impact on other open source projects, too.
|
||||||
|
|
||||||
|
[robur]: https://robur.coop/
|
||||||
|
[miragevpn-server]: https://blog.robur.coop/articles/miragevpn-server.html
|
||||||
|
[contact]: https://reyn.ir/contact.html
|
||||||
|
|
||||||
|
[^openvpn-tls]: This is not always the case. It is possible to use static shared secret keys, but it is mostly considered deprecated.
|
||||||
|
[^disconnect]: I say "disconnect" even when the underlying transport is the connection-less UDP.
|
||||||
|
[^kill-immediately]: As the alert reader might have realized this is inaccurate. It does not kill the client "immediately" as it will wait five seconds after the exit message is sent before exiting. At best this will kill a cooperating client once it's received the kill message.
|
||||||
|
[^proto-push-request]: There is another mechanism to request a `PUSH_REPLY` earlier with less roundtrips, but let's ignore that for now. The exact message is `PUSH_REQUEST<NUL-BYTE>` as messages need to be NUL-terminated.
|
||||||
|
[^inline-files]: An exception being inline files which can span multiple lines. They vaguely resemble XML tags with an open `<tag>` and close `</tag>` each on their own line with the data in between. I doubt these are sent in `PUSH_REPLY`s, but I can't rule out without diving into the source code that it isn't possible to send inline files.
|
||||||
|
[^configuration-newlines]: This results in the quirk that it is possible to sort-of escape a newline in a configuration directive. But since the line splitting is done *first* it's not possible to continue the directive on the next line! I believe this is mostly useless, but it is a way to inject line feeds in configuration options without modifying the OpenVPN source code.
|
464
articles/tar-release.md
Normal file
464
articles/tar-release.md
Normal file
|
@ -0,0 +1,464 @@
|
||||||
|
---
|
||||||
|
date: 2024-08-15
|
||||||
|
article.title: The new Tar release, a retrospective
|
||||||
|
article.description: A little retrospective to the new Tar release and changes
|
||||||
|
tags:
|
||||||
|
- OCaml
|
||||||
|
- Cstruct
|
||||||
|
- functors
|
||||||
|
author:
|
||||||
|
name: Romain Calascibetta
|
||||||
|
email: romain.calascibetta@gmail.com
|
||||||
|
link: https://blog.osau.re
|
||||||
|
---
|
||||||
|
We are delighted to announce the new release of `ocaml-tar`. A small library for
|
||||||
|
reading and writing tar archives in OCaml. Since this is a major release, we'll
|
||||||
|
take the time in this article to explain the work that's been done by the
|
||||||
|
cooperative on this project.
|
||||||
|
|
||||||
|
Tar is an **old** project. Originally written by David Scott as part of Mirage,
|
||||||
|
this project is particularly interesting for building bridges between the tools
|
||||||
|
we can offer and what already exists. Tar is, in fact, widely used. So we're
|
||||||
|
both dealing with a format that's older than I am (but I'm used to it by email)
|
||||||
|
and a project that's been around since... 2012 (over 10 years!).
|
||||||
|
|
||||||
|
But we intend to maintain and improve it, since we're using it for the
|
||||||
|
[opam-mirror][opam-mirror] project among other things - this unikernel is to
|
||||||
|
provide an opam-repository "tarball" for opam when you do `opam update`.
|
||||||
|
|
||||||
|
## `Cstruct.t` & bytes
|
||||||
|
|
||||||
|
As some of you may have noticed, over the last few months we've begun a fairly
|
||||||
|
substantial change to the Mirage ecosystem, replacing the use of `Cstruct.t` in
|
||||||
|
key places with bytes/string.
|
||||||
|
|
||||||
|
This choice is based on 2 considerations:
|
||||||
|
- we came to realize that `Cstruct.t` could be very costly in terms of
|
||||||
|
performance
|
||||||
|
- `Cstruct.t` remains a "Mirage" structure; outside the Mirage ecosystem, the
|
||||||
|
use of `Cstruct.t` is not so "obvious".
|
||||||
|
|
||||||
|
The pull-request is available here: https://github.com/mirage/ocaml-tar/pull/137.
|
||||||
|
The discussion can be interesting in discovering common bugs (uninitialized
|
||||||
|
buffer, invalid access). There's also a small benchmark to support our initial
|
||||||
|
intuition<sup>[1](#fn1)</sup>.
|
||||||
|
|
||||||
|
But this PR can also be an opportunity to understand the existence of
|
||||||
|
`Cstruct.t` in the Mirage ecosystem and the reasons for this historic choice.
|
||||||
|
|
||||||
|
### `Cstruct.t` as a non-moveable data
|
||||||
|
|
||||||
|
I've already [made][discuss-cstruct] a list of pros/cons when it comes to
|
||||||
|
bigarrays. Indeed, `Cstruct.t` is based on a bigarray:
|
||||||
|
```ocaml
|
||||||
|
type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
|
||||||
|
|
||||||
|
type t =
|
||||||
|
{ buffer : buffer
|
||||||
|
; off : int
|
||||||
|
; len : int }
|
||||||
|
```
|
||||||
|
|
||||||
|
The experienced reader may rightly wonder why Cstruct.t is a bigarray with `off`
|
||||||
|
and `len`. First, we need to clarify what a bigarray is for OCaml.
|
||||||
|
|
||||||
|
A bigarray is a somewhat special value in OCaml. This value is allocated in the
|
||||||
|
C heap. In other words, its contents are not in OCaml's garbage collector, but
|
||||||
|
exist outside it. The first (and very important) implication of this feature is
|
||||||
|
that the contents of a bigarray do not move (even if the GC tries to defragment
|
||||||
|
the memory). This feature has several advantages:
|
||||||
|
- in parallel programming, it can be very interesting to use a bigarray knowing
|
||||||
|
that, from the point of view of the 2 processes, the position of the bigarray
|
||||||
|
will never change - this is essentially what [parmap][parmap] does (before
|
||||||
|
OCaml 5).
|
||||||
|
- for calculations such as checksum or hash, it can be interesting to use a
|
||||||
|
bigarray. The calculation would not be interrupted by the GC since the
|
||||||
|
bigarray does not move. The calculation can therefore be continued at the same
|
||||||
|
point, which can help the CPU to better predict the next stage of the
|
||||||
|
calculation. This is what [digestif][digestif] offers and what
|
||||||
|
[decompress][decompress] requires.
|
||||||
|
- for one reason or another, particularly when interacting with something other
|
||||||
|
than OCaml, you need to offer a memory zone that cannot move. This is
|
||||||
|
particularly true for unikernels as Xen guests (where the _net device_
|
||||||
|
corresponds to a fixed memory zone with which we need to interact) or
|
||||||
|
[mmap][mmap].
|
||||||
|
- there are other subtleties more related to the way OCaml compiles. For
|
||||||
|
example, using bigarray layouts to manipulate "bigger words" can really have
|
||||||
|
an impact on performance, as [this PR][pr-utcp] on [utcp][utcp] shows.
|
||||||
|
- finally, it may be useful to store sensitive information in a bigarray so as
|
||||||
|
to have the opportunity to clean up this information as quickly as possible
|
||||||
|
(ensuring that the GC has not made a copy) in certain situations.
|
||||||
|
|
||||||
|
All these examples show that bigarrays can be of real interest as long as
|
||||||
|
**their uses are properly contextualized** - which ultimately remains very
|
||||||
|
specific. Our experience of using them in Mirage has shown us their advantages,
|
||||||
|
but also, and above all, their disadvantages:
|
||||||
|
- keep in mind that bigarray allocation uses either a system call like `mmap` or
|
||||||
|
`malloc()`. The latter, compared with what OCaml can offer, is slow. As soon
|
||||||
|
as you need to allocate bytes/strings smaller than
|
||||||
|
[`(256 * words)`][minor-alloc], these values are allocated in the minor heap,
|
||||||
|
which is incredibly fast to allocate (3 processor instructions which can be
|
||||||
|
predicted very well). So, preferring to allocate a 10-byte bigarray rather
|
||||||
|
than a 10-byte `bytes` penalizes you enormously.
|
||||||
|
- since the bigarray exists in the C heap, the GC has a special mechanism for
|
||||||
|
knowing when to `free()` the zone as soon as the value is no longer in use.
|
||||||
|
Reference-counting is used to then allocate "small" values in the OCaml heap
|
||||||
|
and use them to manipulate _indirectly_ the bigarray.
|
||||||
|
|
||||||
|
#### Ownership, proxy and GC
|
||||||
|
|
||||||
|
This last point deserves a little clarification, particularly with regard to the
|
||||||
|
`Bigarray.sub` function. This function will not create a new, smaller bigarray
|
||||||
|
and copy what was in the old one to the new one (as `Bytes.sub`/`String.sub`
|
||||||
|
does). In fact, OCaml will allocate a "proxy" of your bigarray that represents a
|
||||||
|
subfield. This is where _reference-counting_ comes in. This proxy value needs
|
||||||
|
the initial bigarray to be manipulated. So, as long as proxies exist, the GC
|
||||||
|
cannot `free()` the initial bigarray.
|
||||||
|
|
||||||
|
This poses several problems:
|
||||||
|
- the first is the allocation of these proxies. They can help us to manipulate
|
||||||
|
the initial bigarray in several places without copying it, but as time goes
|
||||||
|
by, these proxies could be very expensive
|
||||||
|
- the second is GC intervention. You still need to scan the bigarray, in a
|
||||||
|
particular way, to know whether or not to keep it. This particular scan, once
|
||||||
|
again in time immemorial, was not all that common.
|
||||||
|
- the third concerns bigarray ownership. Since we're talking about proxies, we
|
||||||
|
can imagine 2 competing tasks having access to the same bigarray.
|
||||||
|
|
||||||
|
As far as the first point is concerned, `Bigarray.sub` could still be "slow" for
|
||||||
|
small data since it was, _de facto_ (since a bigarray always has a finalizer -
|
||||||
|
don't forget reference counting!), allocated in the major heap. And, in truth,
|
||||||
|
this is perhaps the main reason for the existence of Cstruct! To have a "proxy"
|
||||||
|
to a bigarray allocated in the minor heap (and, be fast). But since
|
||||||
|
[Pierre Chambart's PR#92][bigarray-minor], the problem is no more.
|
||||||
|
|
||||||
|
The second point, on the other hand, is still topical, even if we can see that
|
||||||
|
[considerable efforts][better-bigarray-free] have been made. What we see every
|
||||||
|
day on our unikernels is [the pressure][gc-bigarray-pressure] that can be put on
|
||||||
|
the GC when it comes to bigarrays. Indeed, bigarrays use memory and making the C
|
||||||
|
heap cohabit with the OCaml heap inevitably comes at a cost. As far as
|
||||||
|
unikernels are concerned, which have a more limited memory than an OCaml
|
||||||
|
application, we reach this limit rather quickly and we therefore ask the GC to
|
||||||
|
work more specifically on our 10 or 20 byte bigarrays...
|
||||||
|
|
||||||
|
Finally, the third point can be the toughest. On several occasions, we've
|
||||||
|
noticed competing accesses on our bigarrays that we didn't want (for example,
|
||||||
|
`http-lwt-client` had [this problem][http-lwt-client-bug]). In our experience,
|
||||||
|
it's very difficult to observe and know that there is indeed an unauthorized
|
||||||
|
concurrent access changing the contents of our buffer. In this respect, the
|
||||||
|
question remains open as regards `Cstruct.t` and the possibility of encoding
|
||||||
|
ownership of a `Cstruct.t` in the type to prevent unauthorized access.
|
||||||
|
[This PR][cstruct-cap] is interesting to see all the discussions that have taken
|
||||||
|
place on this subject<sup>[2](#fn2)</sup>.
|
||||||
|
|
||||||
|
It should be noted that, with regard to the third point, the problem also
|
||||||
|
applies to bytes and the use of `Bytes.unsafe_to_string`!
|
||||||
|
|
||||||
|
### Conclusion about Cstruct
|
||||||
|
|
||||||
|
We hope we've been thorough enough in our experience with Cstruct. If we go back
|
||||||
|
to the initial definition of our `Cstruct.t` shown above and take all the
|
||||||
|
history into account, it becomes increasingly difficult to argue for a
|
||||||
|
**systematic** use of Cstruct in our unikernels. In fact, the question of
|
||||||
|
`Cstruct.t` versus bytes/string remains completely open.
|
||||||
|
|
||||||
|
It's worth noting that the original reasons for `Cstruct.t` are no longer really
|
||||||
|
relevant if we consider how OCaml has evolved. It should also be noted that this
|
||||||
|
systematic approach to using `Cstruct.t` rather than bytes/string has cost us.
|
||||||
|
|
||||||
|
This is not to say that `Cstruct.t` is obsolete. The library is very good and
|
||||||
|
offers an API where manipulating bytes to extract information such as a TCP/IP
|
||||||
|
packet remains more pleasant than directly using bytes (even if, here too,
|
||||||
|
[efforts][ocaml-getters] have been made).
|
||||||
|
|
||||||
|
As far as `ocaml-tar` is concerned, what really counts is the possibility for
|
||||||
|
other projects to use this library without requiring `Cstruct.t` - thus
|
||||||
|
facilitating its adoption. In other words, given the advantages/disadvantages of
|
||||||
|
`Cstruct.t`, we felt it would be a good idea to remove this dependency.
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
<tag id="fn1">**1**</tag>: It should be noted that the benchmark also concerns
|
||||||
|
compression. In this case, we use `decompress`, which uses bigarrays. So there's
|
||||||
|
some copying involved (from bytes to bigarrays)! But despite this copying, it
|
||||||
|
seems that the change is worthwhile.
|
||||||
|
|
||||||
|
<tag id="fn2">**2**</tag>: It reminds me that we've been experimenting with
|
||||||
|
capabilities and using the type system to enforce certain characteristics. To
|
||||||
|
date, `Cstruct_cap` has not been used anywhere, which raises a real question
|
||||||
|
about the advantages/disadvantages in everyday use.
|
||||||
|
|
||||||
|
## Functors
|
||||||
|
|
||||||
|
This is perhaps the other point of the Mirage ecosystem that is also the subject
|
||||||
|
of debate. Functors! Before we talk about functors, we need to understand their
|
||||||
|
relevance in the context of Mirage.
|
||||||
|
|
||||||
|
Mirage transforms an application into an operating system. What's the difference
|
||||||
|
between a "normal" application and a unikernel: the "subsystem" with which you
|
||||||
|
interact. In this case, a normal application will interact with the host system,
|
||||||
|
whereas a unikernel will have to interact with the Solo5 _mini-system_.
|
||||||
|
|
||||||
|
What Mirage is trying to offer is the ability for an application to transform
|
||||||
|
itself into either without changing a thing! Mirage's aim is to **inject** the
|
||||||
|
subsystem into your application. In this case:
|
||||||
|
- inject `unix.cmxa` when you want a Mirage application to become a simple
|
||||||
|
executable
|
||||||
|
- inject [ocaml-solo5][ocaml-solo5] when you want to produce a unikernel
|
||||||
|
|
||||||
|
So we're not going to talk about the pros and cons of this approach here, but
|
||||||
|
consider this feature as one that requires us to use functors.
|
||||||
|
|
||||||
|
Indeed, what's the best way in OCaml to inject one implementation into another:
|
||||||
|
functors? There are definite advantages here too, but we're going to concentrate
|
||||||
|
on one in particular: the expressiveness of types at module level (which can be
|
||||||
|
used as arguments to our functors).
|
||||||
|
|
||||||
|
For example, did you know that OCaml has a dependent type system?
|
||||||
|
```ocaml
|
||||||
|
type 'a nat = Zero : zero nat | Succ : 'a nat -> 'a succ nat
|
||||||
|
and zero = |
|
||||||
|
and 'a succ = S
|
||||||
|
|
||||||
|
module type T = sig type t val v : t nat end
|
||||||
|
module type Rec = functor (T:T) -> T
|
||||||
|
module type Nat = functor (S:Rec) -> functor (Z:T) -> T
|
||||||
|
|
||||||
|
module Zero = functor (S:Rec) -> functor (Z:T) -> Z
|
||||||
|
module Succ = functor (N:Nat) -> functor (S:Rec) -> functor (Z:T) -> S(N(S)(Z))
|
||||||
|
module Add = functor (X:Nat) -> functor (Y:Nat) -> functor (S:Rec) -> functor (Z:T) -> X(S)(Y(S)(Z))
|
||||||
|
|
||||||
|
module One = Succ(Zero)
|
||||||
|
module Two_a = Add(One)(One)
|
||||||
|
module Two_b = Succ(One)
|
||||||
|
|
||||||
|
module Z : T with type t = zero = struct
|
||||||
|
type t = zero
|
||||||
|
let v = Zero
|
||||||
|
end
|
||||||
|
|
||||||
|
module S (T:T) : T with type t = T.t succ = struct
|
||||||
|
type t = T.t succ
|
||||||
|
let v = Succ T.v
|
||||||
|
end
|
||||||
|
|
||||||
|
module A = Two_a(S)(Z)
|
||||||
|
module B = Two_b(S)(Z)
|
||||||
|
|
||||||
|
type ('a, 'b) refl = Refl : ('a, 'a) refl
|
||||||
|
|
||||||
|
let _ : (A.t, B.t) refl = Refl (* 1+1 == succ 1 *)
|
||||||
|
```
|
||||||
|
|
||||||
|
The code is ... magical, but it shows that two differently constructed modules
|
||||||
|
(`Two_a` & `Two_b`) ultimately produce the same type, and OCaml is able to prove
|
||||||
|
this equality. Above all, the example shows just how powerful functors can be.
|
||||||
|
But it also shows just how difficult functors can be to understand and use.
|
||||||
|
|
||||||
|
In fact, this is one of Mirage's biggest drawbacks: the overuse of functors
|
||||||
|
makes the code difficult to read and understand. It can be difficult to deduce
|
||||||
|
in your head the type that results from an application of functors, and the
|
||||||
|
constraints associated with it... (yes, I don't use `merlin`).
|
||||||
|
|
||||||
|
But back to our initial problem: injection! In truth, the functor is a
|
||||||
|
fly-killing sledgehammer in most cases. There are many other ways of injecting
|
||||||
|
what the system would be (and how to do a `read` or `write`) into an
|
||||||
|
implementation. The best example, as [@nojb pointed out][nojb-response], is of
|
||||||
|
course [ocaml-tls][ocaml-tls] - this answer also shows a contrast between the
|
||||||
|
functor approach (with [CoHTTP][cohttp] for example) and the "pure value-passing
|
||||||
|
interface" of `ocaml-tls`.
|
||||||
|
|
||||||
|
What's more, we've been trying to find other approaches for injecting the system
|
||||||
|
we want for several years now. We can already list several:
|
||||||
|
- `ocaml-tls`' "value-passing" approach, of course, but also `decompress`
|
||||||
|
- of course, there's the passing of [a record][poor-man-functor] (a sort of
|
||||||
|
mini-module with fewer possibilities with types, but which does the job - a
|
||||||
|
poor man's functor, in short) which would have the functions to perform the
|
||||||
|
system's operations
|
||||||
|
- [mimic][mimic] can be used to inject a module as an implementation of a
|
||||||
|
flow/stream according to a resolution mechanism (DNS, `/etc/services`, etc.) -
|
||||||
|
a little closer to the idea of _runtime-resolved implicit implementations_
|
||||||
|
- there are, of course, the variants (but if we go back to 2010, this solution
|
||||||
|
wasn't so obvious) popularized by [ptime][ptime]/[mtime][mtime], `digestif` &
|
||||||
|
[dune][dune-variants]
|
||||||
|
- and finally, [GADTs][decompress-lzo], which describe what the process should
|
||||||
|
do, then let the user implement the `run` function according to the system.
|
||||||
|
|
||||||
|
In short, based on this list and the various experiments we've carried out on a
|
||||||
|
number of projects, we've decided to remove the functors from `ocaml-tar`! The
|
||||||
|
crucial question now is: which method to choose?
|
||||||
|
|
||||||
|
### The best answers
|
||||||
|
|
||||||
|
There's no real answer to that, and in truth it depends on what level of
|
||||||
|
abstraction you're at. In fact, you'd like to have a fairly simple method of
|
||||||
|
abstraction from the system at the start and at the lowest level, to end up
|
||||||
|
proposing a functor that does all the _ceremony_ (the glue between your
|
||||||
|
implementation and the system) at the end - that's what [ocaml-git][ocaml-git]
|
||||||
|
does, for example.
|
||||||
|
|
||||||
|
The abstraction you choose also depends on how the process is going to work. As
|
||||||
|
far as streams/protocols are concerned, the `ocaml-tls`/`decompress` approach
|
||||||
|
still seems the best. But when it comes to introspecting a file/block-device, it
|
||||||
|
may be preferable to use a GADT that will force the user to implement an
|
||||||
|
arbitrary memory access rather than consume a sequence of bytes. In short, at
|
||||||
|
this stage, experience speaks for itself and, just as we were wrong about
|
||||||
|
functors, we won't be advising you to use this or that solution.
|
||||||
|
|
||||||
|
But based on our experience of `ocaml-tls` & `decompress` with LZO (which
|
||||||
|
requires arbitrary access to the content) and the way Tar works, we decided to
|
||||||
|
use a "value-passing" approach (to describe when we need to read/write) and a
|
||||||
|
GADT to describe calculations such as:
|
||||||
|
- iterating over the files/folders contained in a Tar document
|
||||||
|
- producing a Tar file according to a "dispenser" of inputs
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
val decode : decode_state -> string ->
|
||||||
|
decode_state *
|
||||||
|
* [ `Read of int
|
||||||
|
| `Skip of int
|
||||||
|
| `Header of Header.t ] option
|
||||||
|
* Header.Extended.t option
|
||||||
|
(** [decode state] returns a new state and what the user should do next:
|
||||||
|
- [`Skip] skip bytes
|
||||||
|
- [`Read] read bytes
|
||||||
|
- [`Header hdr] do something according the last header extracted
|
||||||
|
(like stream-out the contents of a file). *)
|
||||||
|
|
||||||
|
type ('a, 'err) t =
|
||||||
|
| Really_read : int -> (string, 'err) t
|
||||||
|
| Read : int -> (string, 'err) t
|
||||||
|
| Seek : int -> (unit, 'err) t
|
||||||
|
| Bind : ('a, 'err) t * ('a -> ('b, 'err) t) -> ('b, 'err) t
|
||||||
|
| Return : ('a, 'err) result -> ('a, 'err) t
|
||||||
|
| Write : string -> (unit, 'err) t
|
||||||
|
```
|
||||||
|
|
||||||
|
However, and this is where we come back to OCaml's limitations and where
|
||||||
|
functors could help us: higher kinded polymorphism!
|
||||||
|
|
||||||
|
### Higher kinded Polymorphism
|
||||||
|
|
||||||
|
If we return to our functor example above, there's one element that may be of
|
||||||
|
interest: `T with type t = T.t succ`
|
||||||
|
|
||||||
|
In other words, add a constraint to a signature type. A constraint often seen
|
||||||
|
with Mirage (but deprecated now according to [this issue][mirage-lwt]) is the
|
||||||
|
type `io` and its constraint: `type 'a io`, `with type 'a io = 'a Lwt.t`.
|
||||||
|
|
||||||
|
So we had this type in Tar. The problem is that our GADT can't understand that
|
||||||
|
sometimes it will have to manipulate _Lwt_ values, sometimes _Async_ or
|
||||||
|
sometimes _Eio_ (or _Miou_!). In other words: how do we compose our `Bind` with
|
||||||
|
the `Bind` of these three targets? The difficulty lies above all in history?
|
||||||
|
Supporting this library requires us to assume a certain compatibility with
|
||||||
|
applications over which we have no control. What's more, we need to maintain
|
||||||
|
support for all three libraries without imposing one.
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
A small disgression at this stage seems important to us, as we've been working
|
||||||
|
in this way for over 10 years. Of course, despite all the solutions mentioned
|
||||||
|
above, not depending on a system (and/or a scheduler) also allows us to ensure
|
||||||
|
the existence of libraries like Tar over more than a decade! The OCaml ecosystem
|
||||||
|
is changing, and choosing this or that library to facilitate the development of
|
||||||
|
an application has implications we might regret 10 years down the line (for
|
||||||
|
example... `Cstruct.t`!). So, it can be challenging to ensure compatibility with
|
||||||
|
all systems, but the result is libraries steeped in the experience and know-how
|
||||||
|
of many developers!
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
So, and this is why we talk about Higher Kinded Polymorphism, how do we abstract
|
||||||
|
the `t` from `'a t` (to replace it with `Lwt.t` or even with a type such as
|
||||||
|
`type 'a t = 'a`)? This is where we're going to use the trick explained in
|
||||||
|
[this paper][hkt]. The trick is to consider a "new type" that will represent our
|
||||||
|
monad (lwt or async) and inject/project a value from this monad to something
|
||||||
|
understandable by our GADT: `High : ('a, 't) io -> ('a, 't) t`.
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
type ('a, 't) io
|
||||||
|
|
||||||
|
type ('a, 'err, 't) t =
|
||||||
|
| Really_read : int -> (string, 'err, 't) t
|
||||||
|
| Read : int -> (string, 'err, 't) t
|
||||||
|
| Seek : int -> (unit, 'err, 't) t
|
||||||
|
| Bind : ('a, 'err, 't) t * ('a -> ('b, 'err, 't) t) -> ('b, 'err, 't) t
|
||||||
|
| Return : ('a, 'err) result -> ('a, 'err, 't) t
|
||||||
|
| Write : string -> (unit, 'err, 't) t
|
||||||
|
| High : ('a, 't) io -> ('a, 'err, 't) t
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, we need to create this new type according to the chosen scheduler. Let's
|
||||||
|
take _Lwt_ as an example:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
module Make (X : sig type 'a t end) = struct
|
||||||
|
type t (* our new type *)
|
||||||
|
type 'a s = 'a X.t
|
||||||
|
|
||||||
|
external inj : 'a s -> ('a, t) io = "%identity"
|
||||||
|
external prj : ('a, t) io -> 'a s = "%identity"
|
||||||
|
end
|
||||||
|
|
||||||
|
module L = Make(Lwt)
|
||||||
|
|
||||||
|
let rec run
|
||||||
|
: type a err. (a, err, L.t) t -> (a, err) result Lwt.t
|
||||||
|
= function
|
||||||
|
| High v -> Ok (L.prj v)
|
||||||
|
| Return v -> Lwt.return v
|
||||||
|
| Bind (x, f) ->
|
||||||
|
run x >>= fun value -> run (f value)
|
||||||
|
| _ -> ...
|
||||||
|
```
|
||||||
|
|
||||||
|
So, as you can see, it's a real trick to avoid doing at home without a
|
||||||
|
companion. Indeed, the use of `%identity` corresponds to an `Obj.magic`! So even
|
||||||
|
if the `io` type is exposed (to let the user derive Tar for their own system),
|
||||||
|
this trick is not exposed for other packages, and we instead suggest helpers
|
||||||
|
such as:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
val lwt : 'a Lwt.t -> ('a, 'err, lwt) t
|
||||||
|
val miou : 'a -> ('a, 'err, miou) t
|
||||||
|
```
|
||||||
|
|
||||||
|
But this way, Tar can always be derived from another system, and the process for
|
||||||
|
extracting entries from a Tar file is the same for **all** systems!
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
This Tar release isn't as impressive as this article, but it does sum up all the
|
||||||
|
work we've been able to do over the last few months and years. We hope that our
|
||||||
|
work is appreciated and that this article, which sets out all the thoughts we've
|
||||||
|
had (and still have), helps you to better understand our work!
|
||||||
|
|
||||||
|
[opam-mirror]: https://hannes.robur.coop/Posts/OpamMirror
|
||||||
|
[discuss-cstruct]: https://discuss.ocaml.org/t/buffered-io-bytes-vs-bigstring/8978/3
|
||||||
|
[parmap]: https://github.com/rdicosmo/parmap
|
||||||
|
[digestif]: https://github.com/mirage/digestif
|
||||||
|
[decompress]: https://github.com/mirage/decompress
|
||||||
|
[pr-utcp]: https://github.com/robur-coop/utcp/pull/29
|
||||||
|
[utcp]: https://github.com/robur-coop/utcp
|
||||||
|
[mmap]: https://ocaml.org/manual/5.2/api/Unix.html#1_Mappingfilesintomemory
|
||||||
|
[minor-alloc]: https://github.com/ocaml/ocaml/blob/744006bfbfa045cc1ca442ff7b52c2650d2abe32/runtime/alloc.c#L175
|
||||||
|
[bigarray-minor]: https://github.com/ocaml/ocaml/pull/92
|
||||||
|
[http-lwt-client-bug]: https://github.com/robur-coop/http-lwt-client/pull/16
|
||||||
|
[cstruct-cap]: https://github.com/mirage/ocaml-cstruct/pull/237
|
||||||
|
[gc-bigarray-pressure]: https://github.com/ocaml/ocaml/issues/7750
|
||||||
|
[better-bigarray-free]: https://github.com/ocaml/ocaml/pull/1738
|
||||||
|
[ocaml-getters]: https://github.com/ocaml/ocaml/pull/1864
|
||||||
|
[ocaml-solo5]: https://github.com/mirage/ocaml-solo5
|
||||||
|
[nojb-response]: https://discuss.ocaml.org/t/best-practices-and-design-patterns-for-supporting-concurrent-io-in-libraries/15001/4?u=dinosaure
|
||||||
|
[ocaml-tls]: https://github.com/mirleft/ocaml-tls
|
||||||
|
[cohttp]: https://github.com/mirage/ocaml-cohttp
|
||||||
|
[poor-man-functor]: https://github.com/mirage/colombe/blob/07cd4cf134168ecd841924ee7ddda1a9af8fbd5a/src/sigs.ml#L13-L16
|
||||||
|
[mimic]: https://github.com/dinosaure/mimic
|
||||||
|
[ptime]: https://github.com/dbuenzli/ptime
|
||||||
|
[mtime]: https://github.com/dbuenzli/mtime
|
||||||
|
[dune-variants]: https://github.com/ocaml/dune/pull/1207
|
||||||
|
[decompress-lzo]: https://github.com/mirage/decompress/blob/c8301ba674e037b682338958d6d0bb5c42fd720e/lib/lzo.ml#L164-L175
|
||||||
|
[ocaml-git]: https://github.com/mirage/ocaml-git
|
||||||
|
[mirage-lwt]: https://github.com/mirage/mirage/issues/1004#issue-507517315
|
||||||
|
[hkt]: https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-kinded-polymorphism.pdf
|
||||||
|
|
Loading…
Reference in a new issue