forked from robur/blog.robur.coop
407 lines
25 KiB
HTML
407 lines
25 KiB
HTML
|
|
||
|
<!doctype html>
|
||
|
<html lang="en">
|
||
|
<head>
|
||
|
<meta charset="utf-8">
|
||
|
<meta http-equiv="x-ua-compatible" content="ie=edge">
|
||
|
<meta name="viewport" content="width=device-width, initial-scale=1">
|
||
|
<title>
|
||
|
Robur's blog - The new Tar release, a retrospective
|
||
|
</title>
|
||
|
<meta name="description" content="A little retrospective to the new Tar release and changes">
|
||
|
<link type="text/css" rel="stylesheet" href="../css/hl.css">
|
||
|
<link type="text/css" rel="stylesheet" href="../css/style.css">
|
||
|
<script src="../js/hl.js"></script>
|
||
|
<link rel="alternate" type="application/rss+xml" href="../feed.xml" title="blog.robur.coop">
|
||
|
</head>
|
||
|
<body>
|
||
|
<header>
|
||
|
<h1>blog.robur.coop</h1>
|
||
|
<blockquote>
|
||
|
The <strong>Robur</strong> cooperative blog.
|
||
|
</blockquote>
|
||
|
</header>
|
||
|
<main><a href="/index.html">Back to index</a>
|
||
|
|
||
|
<article>
|
||
|
<h1>The new Tar release, a retrospective</h1>
|
||
|
<ul class="tags-list"><li><a href="/tags/ocaml.html">ocaml</a></li><li><a href="/tags/cstruct.html">cstruct</a></li><li><a href="/tags/functors.html">functors</a></li></ul><p>We are delighted to announce the new release of <code>ocaml-tar</code>. A small library for
|
||
|
reading and writing tar archives in OCaml. Since this is a major release, we'll
|
||
|
take the time in this article to explain the work that's been done by the
|
||
|
cooperative on this project.</p>
|
||
|
<p>Tar is an <strong>old</strong> project. Originally written by David Scott as part of Mirage,
|
||
|
this project is particularly interesting for building bridges between the tools
|
||
|
we can offer and what already exists. Tar is, in fact, widely used. So we're
|
||
|
both dealing with a format that's older than I am (but I'm used to it by email)
|
||
|
and a project that's been around since... 2012 (over 10 years!).</p>
|
||
|
<p>But we intend to maintain and improve it, since we're using it for the
|
||
|
<a href="https://hannes.robur.coop/Posts/OpamMirror">opam-mirror</a> project among other things - this unikernel is to
|
||
|
provide an opam-repository "tarball" for opam when you do <code>opam update</code>.</p>
|
||
|
<h2><code>Cstruct.t</code> & bytes</h2>
|
||
|
<p>As some of you may have noticed, over the last few months we've begun a fairly
|
||
|
substantial change to the Mirage ecosystem, replacing the use of <code>Cstruct.t</code> in
|
||
|
key places with bytes/string.</p>
|
||
|
<p>This choice is based on 2 considerations:</p>
|
||
|
<ul>
|
||
|
<li>we came to realize that <code>Cstruct.t</code> could be very costly in terms of
|
||
|
performance</li>
|
||
|
<li><code>Cstruct.t</code> remains a "Mirage" structure; outside the Mirage ecosystem, the
|
||
|
use of <code>Cstruct.t</code> is not so "obvious".</li>
|
||
|
</ul>
|
||
|
<p>The pull-request is available here: https://github.com/mirage/ocaml-tar/pull/137.
|
||
|
The discussion can be interesting in discovering common bugs (uninitialized
|
||
|
buffer, invalid access). There's also a small benchmark to support our initial
|
||
|
intuition<sup><a href="#fn1">1</a></sup>.</p>
|
||
|
<p>But this PR can also be an opportunity to understand the existence of
|
||
|
<code>Cstruct.t</code> in the Mirage ecosystem and the reasons for this historic choice.</p>
|
||
|
<h3><code>Cstruct.t</code> as a non-moveable data</h3>
|
||
|
<p>I've already <a href="https://discuss.ocaml.org/t/buffered-io-bytes-vs-bigstring/8978/3">made</a> a list of pros/cons when it comes to
|
||
|
bigarrays. Indeed, <code>Cstruct.t</code> is based on a bigarray:</p>
|
||
|
<pre><code class="language-ocaml">type buffer = (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t
|
||
|
|
||
|
type t =
|
||
|
{ buffer : buffer
|
||
|
; off : int
|
||
|
; len : int }
|
||
|
</code></pre>
|
||
|
<p>The experienced reader may rightly wonder why Cstruct.t is a bigarray with <code>off</code>
|
||
|
and <code>len</code>. First, we need to clarify what a bigarray is for OCaml.</p>
|
||
|
<p>A bigarray is a somewhat special value in OCaml. This value is allocated in the
|
||
|
C heap. In other words, its contents are not in OCaml's garbage collector, but
|
||
|
exist outside it. The first (and very important) implication of this feature is
|
||
|
that the contents of a bigarray do not move (even if the GC tries to defragment
|
||
|
the memory). This feature has several advantages:</p>
|
||
|
<ul>
|
||
|
<li>in parallel programming, it can be very interesting to use a bigarray knowing
|
||
|
that, from the point of view of the 2 processes, the position of the bigarray
|
||
|
will never change - this is essentially what <a href="https://github.com/rdicosmo/parmap">parmap</a> does (before
|
||
|
OCaml 5).</li>
|
||
|
<li>for calculations such as checksum or hash, it can be interesting to use a
|
||
|
bigarray. The calculation would not be interrupted by the GC since the
|
||
|
bigarray does not move. The calculation can therefore be continued at the same
|
||
|
point, which can help the CPU to better predict the next stage of the
|
||
|
calculation. This is what <a href="https://github.com/mirage/digestif">digestif</a> offers and what
|
||
|
<a href="https://github.com/mirage/decompress">decompress</a> requires.</li>
|
||
|
<li>for one reason or another, particularly when interacting with something other
|
||
|
than OCaml, you need to offer a memory zone that cannot move. This is
|
||
|
particularly true for unikernels as Xen guests (where the <em>net device</em>
|
||
|
corresponds to a fixed memory zone with which we need to interact) or
|
||
|
<a href="https://ocaml.org/manual/5.2/api/Unix.html#1_Mappingfilesintomemory">mmap</a>.</li>
|
||
|
<li>there are other subtleties more related to the way OCaml compiles. For
|
||
|
example, using bigarray layouts to manipulate "bigger words" can really have
|
||
|
an impact on performance, as <a href="https://github.com/robur-coop/utcp/pull/29">this PR</a> on <a href="https://github.com/robur-coop/utcp">utcp</a> shows.</li>
|
||
|
<li>finally, it may be useful to store sensitive information in a bigarray so as
|
||
|
to have the opportunity to clean up this information as quickly as possible
|
||
|
(ensuring that the GC has not made a copy) in certain situations.</li>
|
||
|
</ul>
|
||
|
<p>All these examples show that bigarrays can be of real interest as long as
|
||
|
<strong>their uses are properly contextualized</strong> - which ultimately remains very
|
||
|
specific. Our experience of using them in Mirage has shown us their advantages,
|
||
|
but also, and above all, their disadvantages:</p>
|
||
|
<ul>
|
||
|
<li>keep in mind that bigarray allocation uses either a system call like <code>mmap</code> or
|
||
|
<code>malloc()</code>. The latter, compared with what OCaml can offer, is slow. As soon
|
||
|
as you need to allocate bytes/strings smaller than
|
||
|
<a href="https://github.com/ocaml/ocaml/blob/744006bfbfa045cc1ca442ff7b52c2650d2abe32/runtime/alloc.c#L175"><code>(256 * words)</code></a>, these values are allocated in the minor heap,
|
||
|
which is incredibly fast to allocate (3 processor instructions which can be
|
||
|
predicted very well). So, preferring to allocate a 10-byte bigarray rather
|
||
|
than a 10-byte <code>bytes</code> penalizes you enormously.</li>
|
||
|
<li>since the bigarray exists in the C heap, the GC has a special mechanism for
|
||
|
knowing when to <code>free()</code> the zone as soon as the value is no longer in use.
|
||
|
Reference-counting is used to then allocate "small" values in the OCaml heap
|
||
|
and use them to manipulate <em>indirectly</em> the bigarray.</li>
|
||
|
</ul>
|
||
|
<h4>Ownership, proxy and GC</h4>
|
||
|
<p>This last point deserves a little clarification, particularly with regard to the
|
||
|
<code>Bigarray.sub</code> function. This function will not create a new, smaller bigarray
|
||
|
and copy what was in the old one to the new one (as <code>Bytes.sub</code>/<code>String.sub</code>
|
||
|
does). In fact, OCaml will allocate a "proxy" of your bigarray that represents a
|
||
|
subfield. This is where <em>reference-counting</em> comes in. This proxy value needs
|
||
|
the initial bigarray to be manipulated. So, as long as proxies exist, the GC
|
||
|
cannot <code>free()</code> the initial bigarray.</p>
|
||
|
<p>This poses several problems:</p>
|
||
|
<ul>
|
||
|
<li>the first is the allocation of these proxies. They can help us to manipulate
|
||
|
the initial bigarray in several places without copying it, but as time goes
|
||
|
by, these proxies could be very expensive</li>
|
||
|
<li>the second is GC intervention. You still need to scan the bigarray, in a
|
||
|
particular way, to know whether or not to keep it. This particular scan, once
|
||
|
again in time immemorial, was not all that common.</li>
|
||
|
<li>the third concerns bigarray ownership. Since we're talking about proxies, we
|
||
|
can imagine 2 competing tasks having access to the same bigarray.</li>
|
||
|
</ul>
|
||
|
<p>As far as the first point is concerned, <code>Bigarray.sub</code> could still be "slow" for
|
||
|
small data since it was, <em>de facto</em> (since a bigarray always has a finalizer -
|
||
|
don't forget reference counting!), allocated in the major heap. And, in truth,
|
||
|
this is perhaps the main reason for the existence of Cstruct! To have a "proxy"
|
||
|
to a bigarray allocated in the minor heap (and, be fast). But since
|
||
|
<a href="https://github.com/ocaml/ocaml/pull/92">Pierre Chambart's PR#92</a>, the problem is no more.</p>
|
||
|
<p>The second point, on the other hand, is still topical, even if we can see that
|
||
|
<a href="https://github.com/ocaml/ocaml/pull/1738">considerable efforts</a> have been made. What we see every
|
||
|
day on our unikernels is <a href="https://github.com/ocaml/ocaml/issues/7750">the pressure</a> that can be put on
|
||
|
the GC when it comes to bigarrays. Indeed, bigarrays use memory and making the C
|
||
|
heap cohabit with the OCaml heap inevitably comes at a cost. As far as
|
||
|
unikernels are concerned, which have a more limited memory than an OCaml
|
||
|
application, we reach this limit rather quickly and we therefore ask the GC to
|
||
|
work more specifically on our 10 or 20 byte bigarrays...</p>
|
||
|
<p>Finally, the third point can be the toughest. On several occasions, we've
|
||
|
noticed competing accesses on our bigarrays that we didn't want (for example,
|
||
|
<code>http-lwt-client</code> had <a href="https://github.com/robur-coop/http-lwt-client/pull/16">this problem</a>). In our experience,
|
||
|
it's very difficult to observe and know that there is indeed an unauthorized
|
||
|
concurrent access changing the contents of our buffer. In this respect, the
|
||
|
question remains open as regards <code>Cstruct.t</code> and the possibility of encoding
|
||
|
ownership of a <code>Cstruct.t</code> in the type to prevent unauthorized access.
|
||
|
<a href="https://github.com/mirage/ocaml-cstruct/pull/237">This PR</a> is interesting to see all the discussions that have taken
|
||
|
place on this subject<sup><a href="#fn2">2</a></sup>.</p>
|
||
|
<p>It should be noted that, with regard to the third point, the problem also
|
||
|
applies to bytes and the use of <code>Bytes.unsafe_to_string</code>!</p>
|
||
|
<h3>Conclusion about Cstruct</h3>
|
||
|
<p>We hope we've been thorough enough in our experience with Cstruct. If we go back
|
||
|
to the initial definition of our <code>Cstruct.t</code> shown above and take all the
|
||
|
history into account, it becomes increasingly difficult to argue for a
|
||
|
<strong>systematic</strong> use of Cstruct in our unikernels. In fact, the question of
|
||
|
<code>Cstruct.t</code> versus bytes/string remains completely open.</p>
|
||
|
<p>It's worth noting that the original reasons for <code>Cstruct.t</code> are no longer really
|
||
|
relevant if we consider how OCaml has evolved. It should also be noted that this
|
||
|
systematic approach to using <code>Cstruct.t</code> rather than bytes/string has cost us.</p>
|
||
|
<p>This is not to say that <code>Cstruct.t</code> is obsolete. The library is very good and
|
||
|
offers an API where manipulating bytes to extract information such as a TCP/IP
|
||
|
packet remains more pleasant than directly using bytes (even if, here too,
|
||
|
<a href="https://github.com/ocaml/ocaml/pull/1864">efforts</a> have been made).</p>
|
||
|
<p>As far as <code>ocaml-tar</code> is concerned, what really counts is the possibility for
|
||
|
other projects to use this library without requiring <code>Cstruct.t</code> - thus
|
||
|
facilitating its adoption. In other words, given the advantages/disadvantages of
|
||
|
<code>Cstruct.t</code>, we felt it would be a good idea to remove this dependency.</p>
|
||
|
<hr />
|
||
|
<p><tag id="fn1"><strong>1</strong></tag>: It should be noted that the benchmark also concerns
|
||
|
compression. In this case, we use <code>decompress</code>, which uses bigarrays. So there's
|
||
|
some copying involved (from bytes to bigarrays)! But despite this copying, it
|
||
|
seems that the change is worthwhile.</p>
|
||
|
<p><tag id="fn2"><strong>2</strong></tag>: It reminds me that we've been experimenting with
|
||
|
capabilities and using the type system to enforce certain characteristics. To
|
||
|
date, <code>Cstruct_cap</code> has not been used anywhere, which raises a real question
|
||
|
about the advantages/disadvantages in everyday use.</p>
|
||
|
<h2>Functors</h2>
|
||
|
<p>This is perhaps the other point of the Mirage ecosystem that is also the subject
|
||
|
of debate. Functors! Before we talk about functors, we need to understand their
|
||
|
relevance in the context of Mirage.</p>
|
||
|
<p>Mirage transforms an application into an operating system. What's the difference
|
||
|
between a "normal" application and a unikernel: the "subsystem" with which you
|
||
|
interact. In this case, a normal application will interact with the host system,
|
||
|
whereas a unikernel will have to interact with the Solo5 <em>mini-system</em>.</p>
|
||
|
<p>What Mirage is trying to offer is the ability for an application to transform
|
||
|
itself into either without changing a thing! Mirage's aim is to <strong>inject</strong> the
|
||
|
subsystem into your application. In this case:</p>
|
||
|
<ul>
|
||
|
<li>inject <code>unix.cmxa</code> when you want a Mirage application to become a simple
|
||
|
executable</li>
|
||
|
<li>inject <a href="https://github.com/mirage/ocaml-solo5">ocaml-solo5</a> when you want to produce a unikernel</li>
|
||
|
</ul>
|
||
|
<p>So we're not going to talk about the pros and cons of this approach here, but
|
||
|
consider this feature as one that requires us to use functors.</p>
|
||
|
<p>Indeed, what's the best way in OCaml to inject one implementation into another:
|
||
|
functors? There are definite advantages here too, but we're going to concentrate
|
||
|
on one in particular: the expressiveness of types at module level (which can be
|
||
|
used as arguments to our functors).</p>
|
||
|
<p>For example, did you know that OCaml has a dependent type system?</p>
|
||
|
<pre><code class="language-ocaml">type 'a nat = Zero : zero nat | Succ : 'a nat -> 'a succ nat
|
||
|
and zero = |
|
||
|
and 'a succ = S
|
||
|
|
||
|
module type T = sig type t val v : t nat end
|
||
|
module type Rec = functor (T:T) -> T
|
||
|
module type Nat = functor (S:Rec) -> functor (Z:T) -> T
|
||
|
|
||
|
module Zero = functor (S:Rec) -> functor (Z:T) -> Z
|
||
|
module Succ = functor (N:Nat) -> functor (S:Rec) -> functor (Z:T) -> S(N(S)(Z))
|
||
|
module Add = functor (X:Nat) -> functor (Y:Nat) -> functor (S:Rec) -> functor (Z:T) -> X(S)(Y(S)(Z))
|
||
|
|
||
|
module One = Succ(Zero)
|
||
|
module Two_a = Add(One)(One)
|
||
|
module Two_b = Succ(One)
|
||
|
|
||
|
module Z : T with type t = zero = struct
|
||
|
type t = zero
|
||
|
let v = Zero
|
||
|
end
|
||
|
|
||
|
module S (T:T) : T with type t = T.t succ = struct
|
||
|
type t = T.t succ
|
||
|
let v = Succ T.v
|
||
|
end
|
||
|
|
||
|
module A = Two_a(S)(Z)
|
||
|
module B = Two_b(S)(Z)
|
||
|
|
||
|
type ('a, 'b) refl = Refl : ('a, 'a) refl
|
||
|
|
||
|
let _ : (A.t, B.t) refl = Refl (* 1+1 == succ 1 *)
|
||
|
</code></pre>
|
||
|
<p>The code is ... magical, but it shows that two differently constructed modules
|
||
|
(<code>Two_a</code> & <code>Two_b</code>) ultimately produce the same type, and OCaml is able to prove
|
||
|
this equality. Above all, the example shows just how powerful functors can be.
|
||
|
But it also shows just how difficult functors can be to understand and use.</p>
|
||
|
<p>In fact, this is one of Mirage's biggest drawbacks: the overuse of functors
|
||
|
makes the code difficult to read and understand. It can be difficult to deduce
|
||
|
in your head the type that results from an application of functors, and the
|
||
|
constraints associated with it... (yes, I don't use <code>merlin</code>).</p>
|
||
|
<p>But back to our initial problem: injection! In truth, the functor is a
|
||
|
fly-killing sledgehammer in most cases. There are many other ways of injecting
|
||
|
what the system would be (and how to do a <code>read</code> or <code>write</code>) into an
|
||
|
implementation. The best example, as <a href="https://discuss.ocaml.org/t/best-practices-and-design-patterns-for-supporting-concurrent-io-in-libraries/15001/4?u=dinosaure">@nojb pointed out</a>, is of
|
||
|
course <a href="https://github.com/mirleft/ocaml-tls">ocaml-tls</a> - this answer also shows a contrast between the
|
||
|
functor approach (with <a href="https://github.com/mirage/ocaml-cohttp">CoHTTP</a> for example) and the "pure value-passing
|
||
|
interface" of <code>ocaml-tls</code>.</p>
|
||
|
<p>What's more, we've been trying to find other approaches for injecting the system
|
||
|
we want for several years now. We can already list several:</p>
|
||
|
<ul>
|
||
|
<li><code>ocaml-tls</code>' "value-passing" approach, of course, but also <code>decompress</code></li>
|
||
|
<li>of course, there's the passing of <a href="https://github.com/mirage/colombe/blob/07cd4cf134168ecd841924ee7ddda1a9af8fbd5a/src/sigs.ml#L13-L16">a record</a> (a sort of
|
||
|
mini-module with fewer possibilities with types, but which does the job - a
|
||
|
poor man's functor, in short) which would have the functions to perform the
|
||
|
system's operations</li>
|
||
|
<li><a href="https://github.com/dinosaure/mimic">mimic</a> can be used to inject a module as an implementation of a
|
||
|
flow/stream according to a resolution mechanism (DNS, <code>/etc/services</code>, etc.) -
|
||
|
a little closer to the idea of <em>runtime-resolved implicit implementations</em></li>
|
||
|
<li>there are, of course, the variants (but if we go back to 2010, this solution
|
||
|
wasn't so obvious) popularized by <a href="https://github.com/dbuenzli/ptime">ptime</a>/<a href="https://github.com/dbuenzli/mtime">mtime</a>, <code>digestif</code> &
|
||
|
<a href="https://github.com/ocaml/dune/pull/1207">dune</a></li>
|
||
|
<li>and finally, <a href="https://github.com/mirage/decompress/blob/c8301ba674e037b682338958d6d0bb5c42fd720e/lib/lzo.ml#L164-L175">GADTs</a>, which describe what the process should
|
||
|
do, then let the user implement the <code>run</code> function according to the system.</li>
|
||
|
</ul>
|
||
|
<p>In short, based on this list and the various experiments we've carried out on a
|
||
|
number of projects, we've decided to remove the functors from <code>ocaml-tar</code>! The
|
||
|
crucial question now is: which method to choose?</p>
|
||
|
<h3>The best answers</h3>
|
||
|
<p>There's no real answer to that, and in truth it depends on what level of
|
||
|
abstraction you're at. In fact, you'd like to have a fairly simple method of
|
||
|
abstraction from the system at the start and at the lowest level, to end up
|
||
|
proposing a functor that does all the <em>ceremony</em> (the glue between your
|
||
|
implementation and the system) at the end - that's what <a href="https://github.com/mirage/ocaml-git">ocaml-git</a>
|
||
|
does, for example.</p>
|
||
|
<p>The abstraction you choose also depends on how the process is going to work. As
|
||
|
far as streams/protocols are concerned, the <code>ocaml-tls</code>/<code>decompress</code> approach
|
||
|
still seems the best. But when it comes to introspecting a file/block-device, it
|
||
|
may be preferable to use a GADT that will force the user to implement an
|
||
|
arbitrary memory access rather than consume a sequence of bytes. In short, at
|
||
|
this stage, experience speaks for itself and, just as we were wrong about
|
||
|
functors, we won't be advising you to use this or that solution.</p>
|
||
|
<p>But based on our experience of <code>ocaml-tls</code> & <code>decompress</code> with LZO (which
|
||
|
requires arbitrary access to the content) and the way Tar works, we decided to
|
||
|
use a "value-passing" approach (to describe when we need to read/write) and a
|
||
|
GADT to describe calculations such as:</p>
|
||
|
<ul>
|
||
|
<li>iterating over the files/folders contained in a Tar document</li>
|
||
|
<li>producing a Tar file according to a "dispenser" of inputs</li>
|
||
|
</ul>
|
||
|
<pre><code class="language-ocaml">val decode : decode_state -> string ->
|
||
|
decode_state *
|
||
|
* [ `Read of int
|
||
|
| `Skip of int
|
||
|
| `Header of Header.t ] option
|
||
|
* Header.Extended.t option
|
||
|
(** [decode state] returns a new state and what the user should do next:
|
||
|
- [`Skip] skip bytes
|
||
|
- [`Read] read bytes
|
||
|
- [`Header hdr] do something according the last header extracted
|
||
|
(like stream-out the contents of a file). *)
|
||
|
|
||
|
type ('a, 'err) t =
|
||
|
| Really_read : int -> (string, 'err) t
|
||
|
| Read : int -> (string, 'err) t
|
||
|
| Seek : int -> (unit, 'err) t
|
||
|
| Bind : ('a, 'err) t * ('a -> ('b, 'err) t) -> ('b, 'err) t
|
||
|
| Return : ('a, 'err) result -> ('a, 'err) t
|
||
|
| Write : string -> (unit, 'err) t
|
||
|
</code></pre>
|
||
|
<p>However, and this is where we come back to OCaml's limitations and where
|
||
|
functors could help us: higher kinded polymorphism!</p>
|
||
|
<h3>Higher kinded Polymorphism</h3>
|
||
|
<p>If we return to our functor example above, there's one element that may be of
|
||
|
interest: <code>T with type t = T.t succ</code></p>
|
||
|
<p>In other words, add a constraint to a signature type. A constraint often seen
|
||
|
with Mirage (but deprecated now according to <a href="https://github.com/mirage/mirage/issues/1004#issue-507517315">this issue</a>) is the
|
||
|
type <code>io</code> and its constraint: <code>type 'a io</code>, <code>with type 'a io = 'a Lwt.t</code>.</p>
|
||
|
<p>So we had this type in Tar. The problem is that our GADT can't understand that
|
||
|
sometimes it will have to manipulate <em>Lwt</em> values, sometimes <em>Async</em> or
|
||
|
sometimes <em>Eio</em> (or <em>Miou</em>!). In other words: how do we compose our <code>Bind</code> with
|
||
|
the <code>Bind</code> of these three targets? The difficulty lies above all in history?
|
||
|
Supporting this library requires us to assume a certain compatibility with
|
||
|
applications over which we have no control. What's more, we need to maintain
|
||
|
support for all three libraries without imposing one.</p>
|
||
|
<hr />
|
||
|
<p>A small disgression at this stage seems important to us, as we've been working
|
||
|
in this way for over 10 years. Of course, despite all the solutions mentioned
|
||
|
above, not depending on a system (and/or a scheduler) also allows us to ensure
|
||
|
the existence of libraries like Tar over more than a decade! The OCaml ecosystem
|
||
|
is changing, and choosing this or that library to facilitate the development of
|
||
|
an application has implications we might regret 10 years down the line (for
|
||
|
example... <code>Cstruct.t</code>!). So, it can be challenging to ensure compatibility with
|
||
|
all systems, but the result is libraries steeped in the experience and know-how
|
||
|
of many developers!</p>
|
||
|
<hr />
|
||
|
<p>So, and this is why we talk about Higher Kinded Polymorphism, how do we abstract
|
||
|
the <code>t</code> from <code>'a t</code> (to replace it with <code>Lwt.t</code> or even with a type such as
|
||
|
<code>type 'a t = 'a</code>)? This is where we're going to use the trick explained in
|
||
|
<a href="https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-kinded-polymorphism.pdf">this paper</a>. The trick is to consider a "new type" that will represent our
|
||
|
monad (lwt or async) and inject/project a value from this monad to something
|
||
|
understandable by our GADT: <code>High : ('a, 't) io -> ('a, 't) t</code>.</p>
|
||
|
<pre><code class="language-ocaml">type ('a, 't) io
|
||
|
|
||
|
type ('a, 'err, 't) t =
|
||
|
| Really_read : int -> (string, 'err, 't) t
|
||
|
| Read : int -> (string, 'err, 't) t
|
||
|
| Seek : int -> (unit, 'err, 't) t
|
||
|
| Bind : ('a, 'err, 't) t * ('a -> ('b, 'err, 't) t) -> ('b, 'err, 't) t
|
||
|
| Return : ('a, 'err) result -> ('a, 'err, 't) t
|
||
|
| Write : string -> (unit, 'err, 't) t
|
||
|
| High : ('a, 't) io -> ('a, 'err, 't) t
|
||
|
</code></pre>
|
||
|
<p>Next, we need to create this new type according to the chosen scheduler. Let's
|
||
|
take <em>Lwt</em> as an example:</p>
|
||
|
<pre><code class="language-ocaml">module Make (X : sig type 'a t end) = struct
|
||
|
type t (* our new type *)
|
||
|
type 'a s = 'a X.t
|
||
|
|
||
|
external inj : 'a s -> ('a, t) io = "%identity"
|
||
|
external prj : ('a, t) io -> 'a s = "%identity"
|
||
|
end
|
||
|
|
||
|
module L = Make(Lwt)
|
||
|
|
||
|
let rec run
|
||
|
: type a err. (a, err, L.t) t -> (a, err) result Lwt.t
|
||
|
= function
|
||
|
| High v -> Ok (L.prj v)
|
||
|
| Return v -> Lwt.return v
|
||
|
| Bind (x, f) ->
|
||
|
run x >>= fun value -> run (f value)
|
||
|
| _ -> ...
|
||
|
</code></pre>
|
||
|
<p>So, as you can see, it's a real trick to avoid doing at home without a
|
||
|
companion. Indeed, the use of <code>%identity</code> corresponds to an <code>Obj.magic</code>! So even
|
||
|
if the <code>io</code> type is exposed (to let the user derive Tar for their own system),
|
||
|
this trick is not exposed for other packages, and we instead suggest helpers
|
||
|
such as:</p>
|
||
|
<pre><code class="language-ocaml">val lwt : 'a Lwt.t -> ('a, 'err, lwt) t
|
||
|
val miou : 'a -> ('a, 'err, miou) t
|
||
|
</code></pre>
|
||
|
<p>But this way, Tar can always be derived from another system, and the process for
|
||
|
extracting entries from a Tar file is the same for <strong>all</strong> systems!</p>
|
||
|
<h2>Conclusion</h2>
|
||
|
<p>This Tar release isn't as impressive as this article, but it does sum up all the
|
||
|
work we've been able to do over the last few months and years. We hope that our
|
||
|
work is appreciated and that this article, which sets out all the thoughts we've
|
||
|
had (and still have), helps you to better understand our work!</p>
|
||
|
|
||
|
</article>
|
||
|
|
||
|
</main>
|
||
|
<footer>
|
||
|
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
|
||
|
<br />
|
||
|
</footer>
|
||
|
<script>hljs.highlightAll();</script>
|
||
|
</body>
|
||
|
</html>
|