<ulclass="tags-list"><li><ahref="https://blog.robur.coop/tags.html#tag-OCaml">OCaml</a></li><li><ahref="https://blog.robur.coop/tags.html#tag-Cstruct">Cstruct</a></li><li><ahref="https://blog.robur.coop/tags.html#tag-functors">functors</a></li></ul><p>We are delighted to announce the new release of <code>ocaml-tar</code>. A small library for
<h3id="cstructt-as-a-non-moveable-data"><aclass="anchor"aria-hidden="true"href="#cstructt-as-a-non-moveable-data"></a><code>Cstruct.t</code> as a non-moveable data</h3>
<li>there are other subtleties more related to the way OCaml compiles. For
example, using bigarray layouts to manipulate "bigger words" can really have
an impact on performance, as <ahref="https://github.com/robur-coop/utcp/pull/29">this PR</a> on <ahref="https://github.com/robur-coop/utcp">utcp</a> shows.</li>
<li>finally, it may be useful to store sensitive information in a bigarray so as
to have the opportunity to clean up this information as quickly as possible
(ensuring that the GC has not made a copy) in certain situations.</li>
</ul>
<p>All these examples show that bigarrays can be of real interest as long as
<strong>their uses are properly contextualized</strong> - which ultimately remains very
specific. Our experience of using them in Mirage has shown us their advantages,
but also, and above all, their disadvantages:</p>
<ul>
<li>keep in mind that bigarray allocation uses either a system call like <code>mmap</code> or
<code>malloc()</code>. The latter, compared with what OCaml can offer, is slow. As soon
as you need to allocate bytes/strings smaller than
<ahref="https://github.com/ocaml/ocaml/blob/744006bfbfa045cc1ca442ff7b52c2650d2abe32/runtime/alloc.c#L175"><code>(256 * words)</code></a>, these values are allocated in the minor heap,
which is incredibly fast to allocate (3 processor instructions which can be
predicted very well). So, preferring to allocate a 10-byte bigarray rather
than a 10-byte <code>bytes</code> penalizes you enormously.</li>
<li>since the bigarray exists in the C heap, the GC has a special mechanism for
knowing when to <code>free()</code> the zone as soon as the value is no longer in use.
Reference-counting is used to then allocate "small" values in the OCaml heap
and use them to manipulate <em>indirectly</em> the bigarray.</li>
<p>The code is ... magical, but it shows that two differently constructed modules
(<code>Two_a</code>&<code>Two_b</code>) ultimately produce the same type, and OCaml is able to prove
this equality. Above all, the example shows just how powerful functors can be.
But it also shows just how difficult functors can be to understand and use.</p>
<p>In fact, this is one of Mirage's biggest drawbacks: the overuse of functors
makes the code difficult to read and understand. It can be difficult to deduce
in your head the type that results from an application of functors, and the
constraints associated with it... (yes, I don't use <code>merlin</code>).</p>
<p>But back to our initial problem: injection! In truth, the functor is a
fly-killing sledgehammer in most cases. There are many other ways of injecting
what the system would be (and how to do a <code>read</code> or <code>write</code>) into an
implementation. The best example, as <ahref="https://discuss.ocaml.org/t/best-practices-and-design-patterns-for-supporting-concurrent-io-in-libraries/15001/4?u=dinosaure">@nojb pointed out</a>, is of
course <ahref="https://github.com/mirleft/ocaml-tls">ocaml-tls</a> - this answer also shows a contrast between the
functor approach (with <ahref="https://github.com/mirage/ocaml-cohttp">CoHTTP</a> for example) and the "pure value-passing
interface" of <code>ocaml-tls</code>.</p>
<p>What's more, we've been trying to find other approaches for injecting the system
we want for several years now. We can already list several:</p>
<ul>
<li><code>ocaml-tls</code>' "value-passing" approach, of course, but also <code>decompress</code></li>
<li>of course, there's the passing of <ahref="https://github.com/mirage/colombe/blob/07cd4cf134168ecd841924ee7ddda1a9af8fbd5a/src/sigs.ml#L13-L16">a record</a> (a sort of
mini-module with fewer possibilities with types, but which does the job - a
poor man's functor, in short) which would have the functions to perform the
system's operations</li>
<li><ahref="https://github.com/dinosaure/mimic">mimic</a> can be used to inject a module as an implementation of a
flow/stream according to a resolution mechanism (DNS, <code>/etc/services</code>, etc.) -
a little closer to the idea of <em>runtime-resolved implicit implementations</em></li>
<li>there are, of course, the variants (but if we go back to 2010, this solution
wasn't so obvious) popularized by <ahref="https://github.com/dbuenzli/ptime">ptime</a>/<ahref="https://github.com/dbuenzli/mtime">mtime</a>, <code>digestif</code>&
<li>and finally, <ahref="https://github.com/mirage/decompress/blob/c8301ba674e037b682338958d6d0bb5c42fd720e/lib/lzo.ml#L164-L175">GADTs</a>, which describe what the process should
do, then let the user implement the <code>run</code> function according to the system.</li>
</ul>
<p>In short, based on this list and the various experiments we've carried out on a
number of projects, we've decided to remove the functors from <code>ocaml-tar</code>! The
crucial question now is: which method to choose?</p>
<p>If we return to our functor example above, there's one element that may be of
interest: <code>T with type t = T.t succ</code></p>
<p>In other words, add a constraint to a signature type. A constraint often seen
with Mirage (but deprecated now according to <ahref="https://github.com/mirage/mirage/issues/1004#issue-507517315">this issue</a>) is the
type <code>io</code> and its constraint: <code>type 'a io</code>, <code>with type 'a io = 'a Lwt.t</code>.</p>
<p>So we had this type in Tar. The problem is that our GADT can't understand that
sometimes it will have to manipulate <em>Lwt</em> values, sometimes <em>Async</em> or
sometimes <em>Eio</em> (or <em>Miou</em>!). In other words: how do we compose our <code>Bind</code> with
the <code>Bind</code> of these three targets? The difficulty lies above all in history?
Supporting this library requires us to assume a certain compatibility with
applications over which we have no control. What's more, we need to maintain
support for all three libraries without imposing one.</p>
<hr/>
<p>A small disgression at this stage seems important to us, as we've been working
in this way for over 10 years. Of course, despite all the solutions mentioned
above, not depending on a system (and/or a scheduler) also allows us to ensure
the existence of libraries like Tar over more than a decade! The OCaml ecosystem
is changing, and choosing this or that library to facilitate the development of
an application has implications we might regret 10 years down the line (for
example... <code>Cstruct.t</code>!). So, it can be challenging to ensure compatibility with
all systems, but the result is libraries steeped in the experience and know-how
of many developers!</p>
<hr/>
<p>So, and this is why we talk about Higher Kinded Polymorphism, how do we abstract
the <code>t</code> from <code>'a t</code> (to replace it with <code>Lwt.t</code> or even with a type such as
<code>type 'a t = 'a</code>)? This is where we're going to use the trick explained in
<ahref="https://www.cl.cam.ac.uk/~jdy22/papers/lightweight-higher-kinded-polymorphism.pdf">this paper</a>. The trick is to consider a "new type" that will represent our
monad (lwt or async) and inject/project a value from this monad to something