forked from robur/blog.robur.coop
Add the article about lwt pause
This commit is contained in:
parent
41395cb194
commit
d1e411bf7e
1 changed files with 306 additions and 0 deletions
306
articles/lwt_pause.md
Normal file
306
articles/lwt_pause.md
Normal file
|
@ -0,0 +1,306 @@
|
||||||
|
---
|
||||||
|
date: 2024-02-11
|
||||||
|
article.title: Cooperation and Lwt.pause
|
||||||
|
article.description:
|
||||||
|
A disgression about Lwt and Miou
|
||||||
|
tags:
|
||||||
|
- OCaml
|
||||||
|
- Scheduler
|
||||||
|
- Community
|
||||||
|
- Unikernel
|
||||||
|
- Git
|
||||||
|
breaks: false
|
||||||
|
---
|
||||||
|
|
||||||
|
Here's a concrete example of the notion of availability and the scheduler used
|
||||||
|
(in this case Lwt). As you may know, at Robur we have developed a unikernel:
|
||||||
|
[opam-mirror][opam-mirror]. It launches an HTTP service that can be used as an
|
||||||
|
OPAM overlay available from a Git repository (with `opam repository add <name>
|
||||||
|
<url>`).
|
||||||
|
|
||||||
|
The purpose of such an unikernel was to respond to a failure of the official
|
||||||
|
repository which fortunately did not last long and to offer decentralisation
|
||||||
|
of such a service. You can use https://opam.robur.coop!
|
||||||
|
|
||||||
|
It was also useful at the Mirage retreat, where we don't usually have a
|
||||||
|
great internet connection. Caching packages for our OCaml users on the local
|
||||||
|
network has benefited us in terms of our Internet bill by allowing the OCaml
|
||||||
|
users to fetch opam packages over the local network instead of over the shared,
|
||||||
|
metered 4G Internet conncetion.
|
||||||
|
|
||||||
|
Finally, it's a unikernel that I also use on my server for my software
|
||||||
|
[reproducibility service][reproducibility] in order to have an overlay for my
|
||||||
|
software like [Bob][bob].
|
||||||
|
|
||||||
|
In short, I advise you to use it, you can see its installation
|
||||||
|
[here][installation] (I think that in the context of a company, internally, it
|
||||||
|
can be interesting to have such a unikernel available).
|
||||||
|
|
||||||
|
However, this unikernel had a long-standing problem. We were already talking
|
||||||
|
about it at the Mirleft retreat, when we tried to get the repository from Git,
|
||||||
|
we had a (fairly long) unavailability of our HTTP server. Basically, we had to
|
||||||
|
wait ~10 min before the service offered by the unikernel was available.
|
||||||
|
|
||||||
|
## Availability
|
||||||
|
|
||||||
|
If you follow my [articles][miou-articles], as far as Miou is concerned, from
|
||||||
|
the outset I talk of the notion of availability if we were to make yet another
|
||||||
|
new scheduler for OCaml 5. We emphasised this notion because we had quite a few
|
||||||
|
problems on this subject and Lwt.
|
||||||
|
|
||||||
|
In this case, the notion of availability requires the scheduler to be able to
|
||||||
|
observe system events as often as possible. The problem is that Lwt doesn't
|
||||||
|
really offer this approach.
|
||||||
|
|
||||||
|
Indeed, Lwt offers a way of observing system events (`Lwt.pause`) but does not
|
||||||
|
do so systematically. The only time you really give the scheduler the
|
||||||
|
opportunity to see whether you can read or write is when you want to...
|
||||||
|
read or write...
|
||||||
|
|
||||||
|
More generally, it is said that Lwt's **bind** does not _yield_. In other words,
|
||||||
|
you can chain any number of functions together (via the `>>=` operator), but
|
||||||
|
from Lwt's point of view, there is no opportunity to see if an event has
|
||||||
|
occurred. Lwt always tries to go as far down your chain as possible:
|
||||||
|
- and finish your promise
|
||||||
|
- or come across an operation that requires a system event (read or write)
|
||||||
|
- or come across an `Lwt.pause` (as a _yield_ point)
|
||||||
|
|
||||||
|
Lwt is rather sparse in adding cooperation points besides `Lwt.pause` and
|
||||||
|
read/write operations, in contrast with Async where the bind operator is a
|
||||||
|
cooperation point.
|
||||||
|
|
||||||
|
### If there is no I/O, do not wrap in Lwt
|
||||||
|
|
||||||
|
It was (bad<sup>[1](#fn1)</sup>) advice I was given. If a function doesn't do
|
||||||
|
I/O, there's no point in putting it in Lwt. At first glance, however, the idea
|
||||||
|
may be a good one. If you have a function that doesn't do I/O, whether it's in
|
||||||
|
the Lwt monad or not won't make any difference to the way Lwt tries to execute
|
||||||
|
it. Once again, Lwt should go as far as possible. So Lwt tries to solve both
|
||||||
|
functions in the same way:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
val merge : int array -> int array -> int array
|
||||||
|
|
||||||
|
let rec sort0 arr =
|
||||||
|
if Array.length arr <= 1 then arr
|
||||||
|
else
|
||||||
|
let m = Array.length arr / 2 in
|
||||||
|
let arr0 = sort0 (Array.sub arr 0 m) in
|
||||||
|
let arr1 = sort0 (Array.sub arr m (Array.length arr - m)) in
|
||||||
|
merge arr0 arr1
|
||||||
|
|
||||||
|
let rec sort1 arr =
|
||||||
|
let open Lwt.Infix in
|
||||||
|
if Array.length arr <= 1 then Lwt.return arr
|
||||||
|
else
|
||||||
|
let m = Array.length arr / 2 in
|
||||||
|
Lwt.both
|
||||||
|
(sort1 (Array.sub arr m (Array.length arr - m)))
|
||||||
|
(sort1 (Array.sub arr 0 m))
|
||||||
|
>|= fun (arr0, arr1) ->
|
||||||
|
merge arr0 arr1
|
||||||
|
```
|
||||||
|
|
||||||
|
If we trace the execution of the two functions (for example, by displaying our
|
||||||
|
`arr` each time), we see the same behaviour whether Lwt is used or not. However,
|
||||||
|
what is interesting in the Lwt code is the use of `both`, which suggests that
|
||||||
|
the processes are running _at the same time_.
|
||||||
|
|
||||||
|
"At the same time" does not necessarily suggest the use of several cores or "in
|
||||||
|
parallel", but the possibility that the right-hand side may also have the
|
||||||
|
opportunity to be executed even if the left-hand side has not finished. In other
|
||||||
|
words, that the two processes can run **concurrently**.
|
||||||
|
|
||||||
|
But factually, this is not the case, because even if we had the possibility of
|
||||||
|
a point of cooperation (with the `>|=` operator), Lwt tries to go as far as
|
||||||
|
possible and decides to finish the left part before launching the right part:
|
||||||
|
|
||||||
|
```shell
|
||||||
|
$ ./a.out
|
||||||
|
sort0: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
|
||||||
|
sort0: [|3; 4; 2; 1; 7|]
|
||||||
|
sort0: [|3; 4|]
|
||||||
|
sort0: [|2; 1; 7|]
|
||||||
|
sort0: [|1; 7|]
|
||||||
|
sort0: [|5; 8; 9; 0; 6|]
|
||||||
|
sort0: [|5; 8|]
|
||||||
|
sort0: [|9; 0; 6|]
|
||||||
|
sort0: [|0; 6|]
|
||||||
|
|
||||||
|
sort1: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
|
||||||
|
sort1: [|3; 4; 2; 1; 7|]
|
||||||
|
sort1: [|3; 4|]
|
||||||
|
sort1: [|2; 1; 7|]
|
||||||
|
sort1: [|1; 7|]
|
||||||
|
sort1: [|5; 8; 9; 0; 6|]
|
||||||
|
sort1: [|5; 8|]
|
||||||
|
sort1: [|9; 0; 6|]
|
||||||
|
sort1: [|0; 6|]
|
||||||
|
```
|
||||||
|
|
||||||
|
<hr>
|
||||||
|
|
||||||
|
**<tag id="fn1">1</tag>**: However, if you are not interested in availability
|
||||||
|
and would like the scheduler to try to resolve your promises as quickly as
|
||||||
|
possible, this advice is clearly valid.
|
||||||
|
|
||||||
|
#### Performances
|
||||||
|
|
||||||
|
It should be noted, however, that Lwt has an impact. Even if the behaviour is
|
||||||
|
the same, the Lwt layer is not free. A quick benchmark shows that there is an
|
||||||
|
overhead:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
let _ =
|
||||||
|
let t0 = Unix.gettimeofday () in
|
||||||
|
for i = 0 to 1000 do let _ = sort0 arr in () done;
|
||||||
|
let t1 = Unix.gettimeofday () in
|
||||||
|
Fmt.pr "sort0 %fs\n%!" (t1 -. t0)
|
||||||
|
|
||||||
|
let _ =
|
||||||
|
let t0 = Unix.gettimeofday () in
|
||||||
|
Lwt_main.run @@ begin
|
||||||
|
let open Lwt.Infix in
|
||||||
|
let rec go idx = if idx = 1000 then Lwt.return_unit
|
||||||
|
else sort1 arr >>= fun _ -> go (succ idx) in
|
||||||
|
go 0 end;
|
||||||
|
let t1 = Unix.gettimeofday () in
|
||||||
|
Fmt.pr "sort1 %fs\n%!" (t1 -. t0)
|
||||||
|
```
|
||||||
|
|
||||||
|
```sh
|
||||||
|
$ ./a.out
|
||||||
|
sort0 0.000264s
|
||||||
|
sort1 0.000676s
|
||||||
|
```
|
||||||
|
|
||||||
|
This is the fairly obvious argument for not using Lwt when there's no I/O. Then,
|
||||||
|
if the Lwt monad is really needed, a simple `Lwt.return` at the very last
|
||||||
|
instance is sufficient (or, better, the use of `Lwt.map` / `>|=`).
|
||||||
|
|
||||||
|
#### Cooperation and concrete example
|
||||||
|
|
||||||
|
So `Lwt.both` is the one to use when we want to run two processes
|
||||||
|
"at the same time". For the example, [ocaml-git][ocaml-git] attempts _both_ to
|
||||||
|
retrieve a repository and also to analyse it. This can be seen in this snippet
|
||||||
|
of [code][ocaml-git-both].
|
||||||
|
|
||||||
|
In our example with ocaml-git, the problem "shouldn't" appear because, in this
|
||||||
|
case, both the left and right side do I/O (the left side binds into a socket
|
||||||
|
while the right side saves Git objects in your file system). So, in our tests
|
||||||
|
with `Git_unix`, we were able to see that the analysis (right-hand side) was
|
||||||
|
well executed and 'interleaved' with the reception of objects from the network.
|
||||||
|
|
||||||
|
### Composability
|
||||||
|
|
||||||
|
However, if we go back to our initial problem, we were talking about our
|
||||||
|
opam-mirror unikernel. As you might expect, there is no standalone MirageOS file
|
||||||
|
system (and many of our unikernels don't need one). So, in the case of
|
||||||
|
opam-mirror, we use the ocaml-git memory implementation: `Git_mem`.
|
||||||
|
|
||||||
|
`Git_mem` is different in that Git objects are simply stored in a `Hashtbl`.
|
||||||
|
There is no cooperation point when it comes to obtaining Git objects from this
|
||||||
|
`Hashtbl`. So let's return to our original advice:
|
||||||
|
|
||||||
|
> don't wrap code in Lwt if it doesn't do I/O.
|
||||||
|
|
||||||
|
And, of course, `Git_mem` doesn't do I/O. It does, however, require the process
|
||||||
|
to be able to work with Lwt. In this case, `Git_mem` wraps the results in Lwt
|
||||||
|
**as late as possible** (as explained above, so as not to slow down our
|
||||||
|
processes unnecessarily). The choice inevitably means that the right-hand side
|
||||||
|
can no longer offer cooperation points. And this is where our problem begins:
|
||||||
|
composition.
|
||||||
|
|
||||||
|
In fact, we had something like:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
let clone socket git =
|
||||||
|
Lwt.both (receive_pack socket) (analyse_pack git) >>= fun ((), ()) ->
|
||||||
|
Lwt.return_unit
|
||||||
|
```
|
||||||
|
|
||||||
|
However, our `analyse_pack` function is an injection of a functor representing
|
||||||
|
the Git backend. In other words, `Git_unix` or `Git_mem`:
|
||||||
|
|
||||||
|
```ocaml
|
||||||
|
module Make (Git : Git.S) = struct
|
||||||
|
let clone socket git =
|
||||||
|
Lwt.both (receive_pack socket) (Git.analyse_pack git) >>= fun ((), ()) ->
|
||||||
|
Lwt.return_unit
|
||||||
|
end
|
||||||
|
```
|
||||||
|
|
||||||
|
Composability poses a problem here because even if `Git_unix` and `Git_mem`
|
||||||
|
offer the same function (so both modules can be used), the fact remains that one
|
||||||
|
will always offer a certain availability to other services (such as an HTTP
|
||||||
|
service) while the other will offer a Lwt function which will try to go as far
|
||||||
|
as possible quite to make other services unavailable.
|
||||||
|
|
||||||
|
Composing with one or the other therefore does not produce the same behavior.
|
||||||
|
|
||||||
|
#### Where to put `Lwt.pause`?
|
||||||
|
|
||||||
|
In this case, our `analyse_pack` does read/write on the Git store. As far as
|
||||||
|
`Git_mem` is concerned, we said that these read/write accesses were just
|
||||||
|
accesses to a `Hashtbl`.
|
||||||
|
|
||||||
|
Thanks to [Hannes][hannes]' help, it took us an afternoon to work out where we
|
||||||
|
needed to add cooperation points in `Git_mem` so that `analyse_pack` could give
|
||||||
|
another service such as HTTP the opportunity to work. Basically, this series of
|
||||||
|
[commits][commits] shows where we needed to add `Lwt.pause`.
|
||||||
|
|
||||||
|
However, this points to a number of problems:
|
||||||
|
1) it is not necessarily true that on the basis of composability alone (by
|
||||||
|
_functor_ or by value), Lwt reacts in the same way
|
||||||
|
2) Subtly, you have to dig into the code to find the right opportunities where
|
||||||
|
to put, by hand, `Lwt.pause`.
|
||||||
|
3) In the end, Lwt has no mechanisms for ensuring the availability of a service
|
||||||
|
(this is something that must be taken into account by the implementer).
|
||||||
|
|
||||||
|
### In-depth knowledge of Lwt
|
||||||
|
|
||||||
|
I haven't mentioned another problem we encountered with [Armael][armael] when
|
||||||
|
implementing [multipart_form][multipart_form] where the use of stream meant that
|
||||||
|
Lwt didn't interleave the two processes and the use of a _bounded stream_ was
|
||||||
|
required. Again, even when it comes to I/O, Lwt always tries to go as far as
|
||||||
|
possible in one of two branches of a `Lwt.both`.
|
||||||
|
|
||||||
|
This allows us to conclude that beyond the monad, Lwt has subtleties in its
|
||||||
|
behaviour which may be different from another scheduler such as Async (hence the
|
||||||
|
incompatibility between the two, which is not just of the `'a t` type).
|
||||||
|
|
||||||
|
### Digression on Miou
|
||||||
|
|
||||||
|
That's why we put so much emphasis on the notion of availability when it comes
|
||||||
|
to Miou: to avoid repeating the mistakes of the past. The choices that can be
|
||||||
|
made with regard to this notion in particular have a major impact, and can be
|
||||||
|
unsatisfactory to the user in certain cases (for example, so-called pure
|
||||||
|
calculations could take longer with Miou than with another scheduler).
|
||||||
|
|
||||||
|
In this sense, we have tried to constrain ourselves in the development of Miou
|
||||||
|
through the use of `Effect.Shallow` which requires us to always re-attach our
|
||||||
|
handler (our scheduler) as soon as an effect is produced, unlike `Effect.Deep`
|
||||||
|
which can re-use the same handler for several effects. In other words, and as
|
||||||
|
we've described here, **an effect yields**!
|
||||||
|
|
||||||
|
## Conclusion
|
||||||
|
|
||||||
|
As far as opam-mirror is concerned, we now have an unikernel that is available
|
||||||
|
even if it attempts to clone a Git repository and save Git objects in memory. At
|
||||||
|
least, an HTTP service can co-exist with ocaml-git!
|
||||||
|
|
||||||
|
I hope we'll be able to use it at [the next retreat][retreat], which I invite
|
||||||
|
you to attend to talk more about Lwt, scheduler, Git and unikernels!
|
||||||
|
|
||||||
|
[opam-mirror]: https://git.robur.coop/robur/opam-mirror
|
||||||
|
[reproducibility]: https://blog.osau.re/articles/reproducible.html
|
||||||
|
[bob]: https://bob.osau.re/
|
||||||
|
[installation]: https://blog.osau.re/articles/reproducible.html
|
||||||
|
[ocaml-git]: https://github.com/mirage/ocaml-git
|
||||||
|
[ocaml-git-both]: https://github.com/mirage/ocaml-git/blob/a36c90404b149ab85f429439af8785bb1dde1bee/src/not-so-smart/smart_git.ml#L476-L481
|
||||||
|
[hannes]: https://hannes.robur.coop/
|
||||||
|
[armael]: https://cambium.inria.fr/~agueneau/
|
||||||
|
[multipart_form]: https://discuss.ocaml.org/t/ann-release-of-multipart-form-0-2-0/7704#memory-bound-implementation
|
||||||
|
[retreat]: https://retreat.mirage.io/
|
||||||
|
[commits]: https://github.com/mirage/ocaml-git/pull/631/files
|
||||||
|
[miou-articles]: https://blog.osau.re/tags/scheduler.html
|
Loading…
Reference in a new issue