forked from robur/blog.robur.coop
New article about tar
This commit is contained in:
parent
85410e6bf7
commit
009a734fae
1 changed files with 464 additions and 0 deletions
464
articles/tar-release.md
Normal file
464
articles/tar-release.md
Normal file
|
@ -0,0 +1,464 @@
|
||||||
|
---
|
||||||
|
date: 2024-08-15
|
||||||
|
article.title: The new Tar release, a retrospective
|
||||||
|
article.description: A little retrospective to the new Tar release and changes
|
||||||
|
tags:
|
||||||
|
- OCaml
|
||||||
|
- Cstruct
|
||||||
|
- functors
|
||||||
|
author:
|
||||||
|
name: Romain Calascibetta
|
||||||
|
email: romain.calascibetta@gmail.com
|
||||||
|
link: https://blog.osau.re
|
||||||
|
---
|
||||||
|
We are delighted to announce the new release of `ocaml-tar`. A small library for
|
||||||
|
reading and writing tar archives in OCaml. Since this is a major release, we'll
|
||||||
|
take the time in this article to explain the work that's been done by the
|
||||||
|
cooperative on this project.
|
||||||
|
|
||||||
|
Tar is an **old** project. Originally written by David Scott as part of Mirage,
|
||||||
|
this project is particularly interesting for building bridges between the tools
|
||||||
|
we can offer and what already exists. Tar is, in fact, widely used. So we're
|
||||||
|
both dealing with a format that's older than I am (but I'm used to it by email)
|
||||||
|
and a project that's been around since... 2012 (over 10 years!).
|
||||||
|
|
||||||
|
But we intend to maintain and improve it, since we're using it for the
|
||||||
|
[opam-mirror][opam-mirror] project among other things - this unikernel is to
|
||||||
|
provide an opam-repository "tarball" for opam when you do `opam update`.
|
||||||
|
|
||||||
|
## `Cstruct.t` & bytes
|
||||||
|
|
||||||
|
As some of you may have noticed, over the last few months we've begun a fairly
|
||||||
|
substantial change to the Mirage ecosystem, replacing the use of `Cstruct.t` in
|
||||||
|
key places with bytes/string.
|
||||||
|
|
||||||
|
This choice is based on 2 considerations:
|
||||||
|
- we came to realize that `Cstruct.t` could be very costly in terms of
|
||||||
|
performance
|
||||||
|
- `Cstruct.t` remains a "Mirage" structure; outside the Mirage ecosystem, the
|
||||||
|
use of `Cstruct.t` is not so "obvious".
|
||||||
|
|
||||||
|
The pull-request is available here: https://github.com/mirage/ocaml-tar/pull/137.
|
||||||
|
The discussion can be interesting in discovering common bugs (uninitialized
|
||||||
|
buffer, invalid access). There's also a small benchmark to support our initial
|
||||||
|
intuition<sup>[1](#fn1)</sup>.
|
||||||
|
|
||||||
|
But this PR can also be an opportunity to understand the existence of
|
||||||
|
`Cstruct.t` in the Mirage ecosystem and the reasons for this historic choice.
|
||||||
|
|
||||||
|
### `Cstruct.t` as a non-moveable data
|
||||||
|
|
||||||
|
I've already [made][discuss-cstruct] a list of pros/cons when it comes to
|
||||||
|
bigarrays. Indeed, `Cstruct.t` is based on a bigarray:
|
||||||
|
```ocaml
|
||||||
|
type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
|
||||||
|
|
||||||
|
type t =
|
||||||
|
{ buffer : buffer
|
||||||
|
; off : int
|
||||||
|
; len : int }
|
||||||
|
```
|
||||||
|
|
||||||
|
The experienced reader may rightly wonder why Cstruct.t is a bigarray with `off`
|
||||||
|
and `len`. First, we need to clarify what a bigarray is for OCaml.
|
||||||
|
|
||||||
|
A bigarray is a somewhat special value in OCaml. This value is allocated in the
|
||||||
|
C heap. In other words, its contents are not in OCaml's garbage collector, but
|
||||||
|
exist outside it. The first (and very important) implication of this feature is
|
||||||
|
that the contents of a bigarray do not move (even if the GC tries to defragment
|
||||||
|
the memory). This feature has several advantages:
|
||||||
|
- in parallel programming, it can be very interesting to use a bigarray knowing
|
||||||
|
that, from the point of view of the 2 processes, the position of the bigarray
|
||||||
|
will never change - this is essentially what [parmap][parmap] does (before
|
||||||
|
OCaml 5).
|
||||||
|
- for calculations such as checksum or hash, it can be interesting to use a
|
||||||
|
bigarray. The calculation would not be interrupted by the GC since the
|
||||||
|
bigarray does not move. The calculation can therefore be continued at the same
|
||||||
|
point, which can help the CPU to better predict the next stage of the
|
||||||
|
calculation. This is what [digestif][digestif] offers and what
|
||||||
|
[decompress][decompress] requires.
|
||||||
|
- for one reason or another, particularly when interacting with something other
|
||||||
|
than OCaml, you need to offer a memory zone that cannot move. This is
|
||||||
|
particularly true for unikernels as Xen guests (where the _net device_
|
||||||
|
corresponds to a fixed memory zone with which we need to interact) or
|
||||||
|
[mmap][mmap].
|
||||||
|
- there are other subtleties more related to the way OCaml compiles. For
|
||||||
|
example, using bigarray layouts to manipulate "bigger words" can really have
|
||||||
|
an impact on performance, as [this PR][pr-utcp] on [utcp][utcp] shows.
|
||||||
|
- finally, it may be useful to store sensitive information in a bigarray so as
|
||||||
|
to have the opportunity to clean up this information as quickly as possible
|
||||||
|
(ensuring that the GC has not made a copy) in certain situations.
|
||||||
|
|
||||||
|
All these examples show that bigarrays can be of real interest as long as
|
||||||
|
**their uses are properly contextualized** - which ultimately remains very
|
||||||
|
specific. Our experience of using them in Mirage has shown us their advantages,
|
||||||
|
but also, and above all, their disadvantages:
|
||||||
|
- keep in mind that bigarray allocation uses either a system call like `mmap` or
|
||||||
|
`malloc()`. The latter, compared with what OCaml can offer, is slow. As soon
|
||||||
|
as you need to allocate bytes/strings smaller than
|
||||||
|
[`(256 * words)`][minor-alloc], these values are allocated in the minor heap,
|
||||||
|
which is incredibly fast to allocate (3 processor instructions which can be
|
||||||
|
predicted very well). So, preferring to allocate a 10-byte bigarray rather
|
||||||
|
than a 10-byte `bytes` penalizes you enormously.
|
||||||
|
- since the bigarray exists in the C heap, the GC has a special mechanism for
|
||||||
|
knowing when to `free()` the zone as soon as the value is no longer in use.
|
||||||
|
Reference-counting is used to then allocate "small" values in the OCaml heap
|
||||||
|
and use them to manipulate _indirectly_ the bigarray.
|
||||||
|
|
||||||
|
#### Ownership, proxy and GC
|
||||||
|
|
||||||
|
This last point deserves a little clarification, particularly with regard to the
|
||||||
|
`Bigarray.sub` function. This function will not create a new, smaller bigarray
|
||||||
|
and copy what was in the old one to the new one (as `Bytes.sub`/`String.sub`
|
||||||
|
does). In fact, OCaml will allocate a "proxy" of your bigarray that represents a
|
||||||
|
subfield. This is where _reference-counting_ comes in. This proxy value needs
|
||||||
|
the initial bigarray to be manipulated. So, as long as proxies exist, the GC
|
||||||
|
cannot `free()` the initial bigarray.
|
||||||
|
|
||||||
|
This poses several problems:
|
||||||
|
- the first is the allocation of these proxies. They can help us to manipulate
|
||||||
|
the initial bigarray in several places without copying it, but as time goes
|
||||||
|
by, these proxies could be very expensive
|
||||||
|
- the second is GC intervention. You still need to scan the bigarray, in a
|
||||||
|
particular way, to know whether or not to keep it. This particular scan, once
|
||||||
|
again in time immemorial, was not all that common.
|
||||||
|
- the third concerns bigarray ownership. Since we're talking about proxies, we
|
||||||
|
can imagine 2 competing tasks having access to the same bigarray.
|
||||||
|
|
||||||
|
As far as the first point is concerned, `Bigarray.sub` could still be "slow" for
|
||||||
|
small data since it was, _de facto_ (since a bigarray always has a finalizer -
|
||||||
|
don't forget reference counting!), allocated in the major heap. And, in truth,
|
||||||
|
this is perhaps the main reason for the existence of Cstruct! To have a "proxy"
|
||||||
|
to a bigarray allocated in the minor heap (and, be fast). But since
|
||||||
|
[Pierre Chambart's PR#92][bigarray-minor], the problem is no more.
|
||||||
|
|
||||||
|
The second point, on the other hand, is still topical, even if we can see that
|
||||||
|
[considerable efforts][better-bigarray-free] have been made. What we see every
|
||||||
|
day on our unikernels is [the pressure][gc-bigarray-pressure] that can be put on
|
||||||
|
the GC when it comes to bigarrays. Indeed, bigarrays use memory and making the C
|
||||||
|
heap cohabit with the OCaml heap inevitably comes at a cost. As far as
|
||||||
|
unikernels are concerned, which have a more limited memory than an OCaml
|
||||||
|
application, we reach this limit rather quickly and we therefore ask the GC to
|
||||||
|
work more specifically on our 10 or 20 byte bigarrays...
|
||||||
|
|
||||||
|
Finally, the third point can be the toughest. On several occasions, we've
|
||||||
|
noticed competing accesses on our bigarrays that we didn't want (for example,
|
||||||
|
`http-lwt-client` had [this problem][http-lwt-client-bug]). In our experience,
|
||||||
|
it's very difficult to observe and know that there is indeed an unauthorized
|
||||||
|
concurrent access changing the contents of our buffer. In this respect, the
|
||||||
|
question remains open as regards `Cstruct.t` and the possibility of encoding
|
||||||
|
ownership of a `Cstruct.t` in the type to prevent unauthorized access.
|
||||||
|
[This PR][cstruct-cap] is interesting to see all the discussions that have taken
|
||||||
|
place on this subject<sup>[2](#fn2)</sup>.
|
||||||
|
|
||||||
|
It should be noted that, with regard to the third point, the problem also
|
||||||
|
applies to bytes and the use of `Bytes.unsafe_to_string`!
|
||||||
|
|
||||||
|
### Conclusion about Cstruct
|
||||||
|
|
||||||
|
We hope we've been thorough enough in our experience with Cstruct. If we go back
|
||||||
|
to the initial definition of our `Cstruct.t` shown above and take all the
|
||||||
|
history into account, it becomes increasingly difficult to argue for a
|
||||||
|
**systematic** use of Cstruct in our unikernels. In fact, the question of
|
||||||
|
`Cstruct.t` versus bytes/string remains completely open.
|
||||||
|
|
||||||
|
It's worth noting that the original reasons for `Cstruct.t` are no longer really
|
||||||
|
relevant if we consider how OCaml has evolved. It should also be noted that this
|
||||||
|
systematic approach to using `Cstruct.t` rather than bytes/string has cost us.
|
||||||
|
|
||||||
|
This is not to say that `Cstruct.t` is obsolete. The library is very good and
|
||||||
|
offers an API where manipulating bytes to extract information such as a TCP/IP
|
||||||
|
packet remains more pleasant than directly using bytes (even if, here too,
|
||||||
|
[efforts][ocaml-getters] have been made).
|
||||||
|
|
||||||
|
As far as `ocaml-tar` is concerned, what really counts is the possibility for
|
||||||
|
other projects to use this library without requiring `Cstruct.t` - thus
|
||||||
|
facilitating its adoption. In other words, given the advantages/disadvantages of
|
||||||
|
`Cstruct.t`, we felt it would be a good idea to remove this dependency.
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
<tag id="fn1">**1**</tag>: It should be noted that the benchmark also concerns
|
||||||
|
compression. In this case, we use `decompress`, which uses bigarrays. So there's
|
||||||
|
some copying involved (from bytes to bigarrays)! But despite this copying, it
|
||||||
|
seems that the change is worthwhile.
|
||||||
|
|
||||||
|
<tag id="fn2">**2**</tag>: It reminds me that we've been experimenting with
|
||||||
|
capabilities and using the type system to enforce certain characteristics. To
|
||||||
|
date, `Cstruct_cap` has not been used anywhere, which raises a real question
|
||||||
|
about the advantages/disadvantages in everyday use.
|
||||||
|
|
||||||
|
## Functors
|
||||||
|
|
||||||
|
This is perhaps the other point of the Mirage ecosystem that is also the subject
|
||||||
|
of debate. Functors! Before we talk about functors, we need to understand their
|
||||||
|
relevance in the context of Mirage.
|
||||||
|
|
||||||
|
Mirage transforms an application into an operating system. What's the difference
|
||||||
|
between a "normal" application and a unikernel: the "subsystem" with which you
|
||||||
|
interact. In this case, a normal application will interact with the host system,
|
||||||
|
whereas a unikernel will have to interact with the Solo5 _mini-system_.
|
||||||
|
|
||||||
|
What Mirage is trying to offer is the ability for an application to transform
|
||||||
|
itself into either without changing a thing! Mirage's aim is to **inject** the
|
||||||
|
subsystem into your application. In this case:
|
||||||
|
- inject `unix.cmxa` when you want a Mirage application to become a simple
|
||||||
|
executable
|
||||||
|
- inject [ocaml-solo5][ocaml-solo5] when you want to produce a unikernel
|
||||||
|
|
||||||
|
So we're not going to talk about the pros and cons of this approach here, but
|
||||||
|
consider this feature as one that requires us to use functors.
|
||||||
|
|
||||||
|
Indeed, what's the best way in OCaml to inject one implementation into another:
|
||||||
|
functors? There are definite advantages here too, but we're going to concentrate
|
||||||
|
on one in particular: the expressiveness of types at module level (which can be
|
||||||
|
used as arguments to our functors).
|
||||||
|
|
||||||
|
For example, did you know that OCaml has a dependent type system?
|
||||||
|
```ocaml
|
||||||
|
type 'a nat = Zero : zero nat | Succ : 'a nat -> 'a succ nat
|
||||||
|
and zero = |
|
||||||
|
and 'a succ = S
|
||||||
|
|
||||||
|
module type T = sig type t val v : t nat end
|
||||||
|
module type Rec = functor (T:T) -> T
|
||||||
|
module type Nat = functor (S:Rec) -> functor (Z:T) -> T
|
||||||
|
|
||||||
|
module Zero = functor (S:Rec) -> functor (Z:T) -> Z
|
||||||
|
module Succ = functor (N:Nat) -> functor (S:Rec) -> functor (Z:T) -> S(N(S)(Z))
|
||||||
|
module Add = functor (X:Nat) -> functor (Y:Nat) -> functor (S:Rec) -> functor (Z:T) -> X(S)(Y(S)(Z))
|
||||||
|
|
||||||
|
module One = Succ(Zero)
|
||||||
|
module Two_a = Add(One)(One)
|
||||||
|
module Two_b = Succ(One)
|
||||||
|
|
||||||
|
module Z : T with type t = zero = struct
|
||||||
|
type t = zero
|
||||||
|
let v = Zero
|
||||||
|
end
|
||||||
|
|
||||||
|
module S (T:T) : T with type t = T.t succ = struct
|
||||||
|
type t = T.t succ
|
||||||
|
let v = Succ T.v
|
||||||
|
end
|
||||||
|
|
||||||
|
module A = Two_a(S)(Z)
|
||||||
|
module B = Two_b(S)(Z)
|
||||||
|
|
||||||
|
type ('a, 'b) refl = Refl : ('a, 'a) refl
|
||||||
|
|
||||||
|
let _ : (A.t, B.t) refl = Refl (* 1+1 == succ 1 *)
|
||||||
|
```
|
||||||
|
|
||||||
|
The code is ... magical, but it shows that two differently constructed modules
|
||||||
|
(`Two_a` & `Two_b`) ultimately produce the same type, and OCaml is able to prove
|
||||||
|
this equality. Above all, the example shows just how powerful functors can be.
|
||||||
|
But it also shows just how difficult functors can be to understand and use.
|
||||||
|
|
||||||
|
In fact, this is one of Mirage's biggest drawbacks: the overuse of functors
|
||||||
|
makes the code difficult to read and understand. It can be difficult to deduce
|
||||||
|
in your head the type that results from an application of functors, and the
|
||||||
|
constraints associated with it... (yes, I don't use `merlin`).
|
||||||
|
|
||||||
|
But back to our initial problem: injection! In truth, the functor is a
|
||||||
|
fly-killing sledgehammer in most cases. There are many other ways of injecting
|
||||||
|
what the system would be (and how to do a `read` or `write`) into an
|
||||||
|
implementation. The best example, as [@nojb pointed out][nojb-response], is of
|
||||||
|
course [ocaml-tls][ocaml-tls] - this answer also shows a contrast between the
|
||||||
|
functor approach (with [CoHTTP][cohttp] for example) and the "pure value-passing
|
||||||
|
interface" of `ocaml-tls`.
|
||||||
|
|
||||||
|
What's more, we've been trying to find other approaches for injecting the system
|
||||||
|
we want for several years now. We can already list several:
|
||||||
|
- `ocaml-tls`' "value-passing" approach, of course, but also `decompress`
|
||||||
|
- of course, there's the passing of [a record][poor-man-functor] (a sort of
|
||||||
|
mini-module with fewer possibilities with types, but which does the job - a
|
||||||
|
poor man's functor, in short) which would have the functions to perform the
|
||||||
|
system's operations
|
||||||
|
- [mimic][mimic] can be used to inject a module as an implementation of a
|
||||||
|
flow/stream according to a resolution mechanism (DNS, `/etc/services`, etc.) -
|
||||||
|
a little closer to the idea of _runtime-resolved implicit implementations_
|
||||||
|
- there are, of course, the variants (but if we go back to 2010, this solution
|
||||||
|
wasn't so obvious) popularized by [ptime][ptime]/[mtime][mtime], `digestif` &
|
||||||
|
[dune][dune-variants]
|
||||||
|
- and finally, [GADTs][decompress-lzo], which describe what the process should
|
||||||
|
do, then let the user implement the `run` function according to the system.
|
||||||
|
|
||||||
|
In short, based on this list and the various experiments we've carried out on a
|
||||||
|
number of projects, we've decided to remove the functors from `ocaml-tar`! The
|
||||||
|
crucial question now is: which method to choose?
|
||||||
|
|
||||||
|
### The best answers
|
||||||
|
|
||||||
|
There's no real answer to that, and in truth it depends on what level of
|
||||||
|
abstraction you're at. In fact, you'd like to have a fairly simple method of
|
||||||
|
abstraction from the system at the start and at the lowest level, to end up
|
||||||
|
proposing a functor that does all the _ceremony_ (the glue between your
|
||||||
|
implementation and the system) at the end - that's what [ocaml-git][ocaml-git]
|
||||||
|
does, for example.
|
||||||
|
|
||||||
|
The abstraction you choose also depends on how the process is going to work. As
|
||||||
|
far as streams/protocols are concerned, the `ocaml-tls`/`decompress` approach
|
||||||
|
still seems the best. But when it comes to introspecting a file/block-device, it
|
||||||
|
may be preferable to use a GADT that will force the user to implement an
|
||||||
|
arbitrary memory access rather than consume a sequence of bytes. In short, at
|
||||||
|
this stage, experience speaks for itself and, just as we were wrong about
|
||||||
|
functors, we won't be advising you to use this or that solution.
|
||||||
|
|
||||||
|
But based on our experience of `ocaml-tls` & `decompress` with LZO (which
|
||||||
|
requires arbitrary access to the content) and the way Tar works, we decided to
|
||||||
|
use a "value-passing" approach (to describe when we need to read/write) and a
|
||||||
|
GADT to describe calculations such as:
|
||||||
|
- iterating over the files/folders contained in a Tar document
|
||||||
|
- producing a Tar file according to a "dispenser" of inputs
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
val decode : decode_state -> string ->
|
||||||
|
decode_state *
|
||||||
|
* [ `Read of int
|
||||||
|
| `Skip of int
|
||||||
|
| `Header of Header.t ] option
|
||||||
|
* Header.Extended.t option
|
||||||
|
(** [decode state] returns a new state and what the user should do next:
|
||||||
|
- [`Skip] skip bytes
|
||||||
|
- [`Read] read bytes
|
||||||
|
- [`Header hdr] do something according the last header extracted
|
||||||
|
(like stream-out the contents of a file). *)
|
||||||
|
|
||||||
|
type ('a, 'err) t =
|
||||||
|
| Really_read : int -> (string, 'err) t
|
||||||
|
| Read : int -> (string, 'err) t
|
||||||
|
| Seek : int -> (unit, 'err) t
|
||||||
|
| Bind : ('a, 'err) t * ('a -> ('b, 'err) t) -> ('b, 'err) t
|
||||||
|
| Return : ('a, 'err) result -> ('a, 'err) t
|
||||||
|
| Write : string -> (unit, 'err) t
|
||||||
|
```
|
||||||
|
|
||||||
|
However, and this is where we come back to OCaml's limitations and where
|
||||||
|
functors could help us: higher kinded polymorphism!
|
||||||
|
|
||||||
|
### Higher kinded Polymorphism
|
||||||
|
|
||||||
|
If we return to our functor example above, there's one element that may be of
|
||||||
|
interest: `T with type t = T.t succ`
|
||||||
|
|
||||||
|
In other words, add a constraint to a signature type. A constraint often seen
|
||||||
|
with Mirage (but deprecated now according to [this issue][mirage-lwt]) is the
|
||||||
|
type `io` and its constraint: `type 'a io`, `with type 'a io = 'a Lwt.t`.
|
||||||
|
|
||||||
|
So we had this type in Tar. The problem is that our GADT can't understand that
|
||||||
|
sometimes it will have to manipulate _Lwt_ values, sometimes _Async_ or
|
||||||
|
sometimes _Eio_ (or _Miou_!). In other words: how do we compose our `Bind` with
|
||||||
|
the `Bind` of these three targets? The difficulty lies above all in history?
|
||||||
|
Supporting this library requires us to assume a certain compatibility with
|
||||||
|
applications over which we have no control. What's more, we need to maintain
|
||||||
|
support for all three libraries without imposing one.
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
A small disgression at this stage seems important to us, as we've been working
|
||||||
|
in this way for over 10 years. Of course, despite all the solutions mentioned
|
||||||
|
above, not depending on a system (and/or a scheduler) also allows us to ensure
|
||||||
|
the existence of libraries like Tar over more than a decade! The OCaml ecosystem
|
||||||
|
is changing, and choosing this or that library to facilitate the development of
|
||||||
|
an application has implications we might regret 10 years down the line (for
|
||||||
|
example... `Cstruct.t`!). So, it can be challenging to ensure compatibility with
|
||||||
|
all systems, but the result is libraries steeped in the experience and know-how
|
||||||
|
of many developers!
|
||||||
|
|
||||||
|
<hr />
|
||||||
|
|
||||||
|
So, and this is why we talk about Higher Kinded Polymorphism, how do we abstract
|
||||||
|
the `t` from `'a t` (to replace it with `Lwt.t` or even with a type such as
|
||||||
|
`type 'a t = 'a`)? This is where we're going to use the trick explained in
|
||||||
|
[this paper][hkt]. The trick is to consider a "new type" that will represent our
|
||||||
|
monad (lwt or async) and inject/project a value from this monad to something
|
||||||
|
understandable by our GADT: `High : ('a, 't) io -> ('a, 't) t`.
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
type ('a, 't) io
|
||||||
|
|
||||||
|
type ('a, 'err, 't) t =
|
||||||
|
| Really_read : int -> (string, 'err, 't) t
|
||||||
|
| Read : int -> (string, 'err, 't) t
|
||||||
|
| Seek : int -> (unit, 'err, 't) t
|
||||||
|
| Bind : ('a, 'err, 't) t * ('a -> ('b, 'err, 't) t) -> ('b, 'err, 't) t
|
||||||
|
| Return : ('a, 'err) result -> ('a, 'err, 't) t
|
||||||
|
| Write : string -> (unit, 'err, 't) t
|
||||||
|
| High : ('a, 't) io -> ('a, 'err, 't) t
|
||||||
|
```
|
||||||
|
|
||||||
|
Next, we need to create this new type according to the chosen scheduler. Let's
|
||||||
|
take _Lwt_ as an example:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
module Make (X : sig type 'a t end) = struct
|
||||||
|
type t (* our new type *)
|
||||||
|
type 'a s = 'a X.t
|
||||||
|
|
||||||
|
external inj : 'a s -> ('a, t) io = "%identity"
|
||||||
|
external prj : ('a, t) io -> 'a s = "%identity"
|
||||||
|
end
|
||||||
|
|
||||||
|
module L = Make(Lwt)
|
||||||
|
|
||||||
|
let rec run
|
||||||
|
: type a err. (a, err, L.t) t -> (a, err) result Lwt.t
|
||||||
|
= function
|
||||||
|
| High v -> Ok (L.prj v)
|
||||||
|
| Return v -> Lwt.return v
|
||||||
|
| Bind (x, f) ->
|
||||||
|
run x >>= fun value -> run (f value)
|
||||||
|
| _ -> ...
|
||||||
|
```
|
||||||
|
|
||||||
|
So, as you can see, it's a real trick to avoid doing at home without a
|
||||||
|
companion. Indeed, the use of `%identity` corresponds to an `Obj.magic`! So even
|
||||||
|
if the `io` type is exposed (to let the user derive Tar for their own system),
|
||||||
|
this trick is not exposed for other packages, and we instead suggest helpers
|
||||||
|
such as:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
val lwt : 'a Lwt.t -> ('a, 'err, lwt) t
|
||||||
|
val miou : 'a -> ('a, 'err, miou) t
|
||||||
|
```
|
||||||
|
|
||||||
|
But this way, Tar can always be derived from another system, and the process for
|
||||||
|
extracting entries from a Tar file is the same for **all** systems!
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
This Tar release isn't as impressive as this article, but it does sum up all the
|
||||||
|
work we've been able to do over the last few months and years. We hope that our
|
||||||
|
work is appreciated and that this article, which sets out all the thoughts we've
|
||||||
|
had (and still have), helps you to better understand our work!
|
||||||
|
|
||||||
|
[opam-mirror]: https://hannes.robur.coop/Posts/OpamMirror
|
||||||
|
[discuss-cstruct]: https://discuss.ocaml.org/t/buffered-io-bytes-vs-bigstring/8978/3
|
||||||
|
[parmap]: https://github.com/rdicosmo/parmap
|
||||||
|
[digestif]: https://github.com/mirage/digestif
|
||||||
|
[decompress]: https://github.com/mirage/decompress
|
||||||
|
[pr-utcp]: https://github.com/robur-coop/utcp/pull/29
|
||||||
|
[utcp]: https://github.com/robur-coop/utcp
|
||||||
|
[mmap]: https://ocaml.org/manual/5.2/api/Unix.html#1_Mappingfilesintomemory
|
||||||
|
[minor-alloc]: https://github.com/ocaml/ocaml/blob/744006bfbfa045cc1ca442ff7b52c2650d2abe32/runtime/alloc.c#L175
|
||||||
|
[bigarray-minor]: https://github.com/ocaml/ocaml/pull/92
|
||||||
|
[http-lwt-client-bug]: https://github.com/robur-coop/http-lwt-client/pull/16
|
||||||
|
[cstruct-cap]: https://github.com/mirage/ocaml-cstruct/pull/237
|
||||||
|
[gc-bigarray-pressure]: https://github.com/ocaml/ocaml/issues/7750
|
||||||
|
[better-bigarray-free]: https://github.com/ocaml/ocaml/pull/1738
|
||||||
|
[ocaml-getters]: https://github.com/ocaml/ocaml/pull/1864
|
||||||
|
[ocaml-solo5]: https://github.com/mirage/ocaml-solo5
|
||||||
|
[nojb-response]: https://discuss.ocaml.org/t/best-practices-and-design-patterns-for-supporting-concurrent-io-in-libraries/15001/4?u=dinosaure
|
||||||
|
[ocaml-tls]: https://github.com/mirleft/ocaml-tls
|
||||||
|
[cohttp]: https://github.com/mirage/ocaml-cohttp
|
||||||
|
[poor-man-functor]: https://github.com/mirage/colombe/blob/07cd4cf134168ecd841924ee7ddda1a9af8fbd5a/src/sigs.ml#L13-L16
|
||||||
|
[mimic]: https://github.com/dinosaure/mimic
|
||||||
|
[ptime]: https://github.com/dbuenzli/ptime
|
||||||
|
[mtime]: https://github.com/dbuenzli/mtime
|
||||||
|
[dune-variants]: https://github.com/ocaml/dune/pull/1207
|
||||||
|
[decompress-lzo]: https://github.com/mirage/decompress/blob/c8301ba674e037b682338958d6d0bb5c42fd720e/lib/lzo.ml#L164-L175
|
||||||
|
[ocaml-git]: https://github.com/mirage/ocaml-git
|
||||||
|
[mirage-lwt]: https://github.com/mirage/mirage/issues/1004#issue-507517315
|
||||||
|
[hkt]: https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-kinded-polymorphism.pdf
|
||||||
|
|
Loading…
Reference in a new issue