From d1e411bf7e3e13f5a9f8c535dc86f07830881690 Mon Sep 17 00:00:00 2001 From: Calascibetta Romain Date: Wed, 21 Feb 2024 10:44:53 +0100 Subject: [PATCH] Add the article about lwt pause --- articles/lwt_pause.md | 306 ++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 306 insertions(+) create mode 100644 articles/lwt_pause.md diff --git a/articles/lwt_pause.md b/articles/lwt_pause.md new file mode 100644 index 0000000..177bd15 --- /dev/null +++ b/articles/lwt_pause.md @@ -0,0 +1,306 @@ +--- +date: 2024-02-11 +article.title: Cooperation and Lwt.pause +article.description: + A disgression about Lwt and Miou +tags: + - OCaml + - Scheduler + - Community + - Unikernel + - Git +breaks: false +--- + +Here's a concrete example of the notion of availability and the scheduler used +(in this case Lwt). As you may know, at Robur we have developed a unikernel: +[opam-mirror][opam-mirror]. It launches an HTTP service that can be used as an +OPAM overlay available from a Git repository (with `opam repository add +`). + +The purpose of such an unikernel was to respond to a failure of the official +repository which fortunately did not last long and to offer decentralisation +of such a service. You can use https://opam.robur.coop! + +It was also useful at the Mirage retreat, where we don't usually have a +great internet connection. Caching packages for our OCaml users on the local +network has benefited us in terms of our Internet bill by allowing the OCaml +users to fetch opam packages over the local network instead of over the shared, +metered 4G Internet conncetion. + +Finally, it's a unikernel that I also use on my server for my software +[reproducibility service][reproducibility] in order to have an overlay for my +software like [Bob][bob]. + +In short, I advise you to use it, you can see its installation +[here][installation] (I think that in the context of a company, internally, it +can be interesting to have such a unikernel available). + +However, this unikernel had a long-standing problem. We were already talking +about it at the Mirleft retreat, when we tried to get the repository from Git, +we had a (fairly long) unavailability of our HTTP server. Basically, we had to +wait ~10 min before the service offered by the unikernel was available. + +## Availability + +If you follow my [articles][miou-articles], as far as Miou is concerned, from +the outset I talk of the notion of availability if we were to make yet another +new scheduler for OCaml 5. We emphasised this notion because we had quite a few +problems on this subject and Lwt. + +In this case, the notion of availability requires the scheduler to be able to +observe system events as often as possible. The problem is that Lwt doesn't +really offer this approach. + +Indeed, Lwt offers a way of observing system events (`Lwt.pause`) but does not +do so systematically. The only time you really give the scheduler the +opportunity to see whether you can read or write is when you want to... +read or write... + +More generally, it is said that Lwt's **bind** does not _yield_. In other words, +you can chain any number of functions together (via the `>>=` operator), but +from Lwt's point of view, there is no opportunity to see if an event has +occurred. Lwt always tries to go as far down your chain as possible: +- and finish your promise +- or come across an operation that requires a system event (read or write) +- or come across an `Lwt.pause` (as a _yield_ point) + +Lwt is rather sparse in adding cooperation points besides `Lwt.pause` and +read/write operations, in contrast with Async where the bind operator is a +cooperation point. + +### If there is no I/O, do not wrap in Lwt + +It was (bad[1](#fn1)) advice I was given. If a function doesn't do +I/O, there's no point in putting it in Lwt. At first glance, however, the idea +may be a good one. If you have a function that doesn't do I/O, whether it's in +the Lwt monad or not won't make any difference to the way Lwt tries to execute +it. Once again, Lwt should go as far as possible. So Lwt tries to solve both +functions in the same way: + +```ocaml +val merge : int array -> int array -> int array + +let rec sort0 arr = + if Array.length arr <= 1 then arr + else + let m = Array.length arr / 2 in + let arr0 = sort0 (Array.sub arr 0 m) in + let arr1 = sort0 (Array.sub arr m (Array.length arr - m)) in + merge arr0 arr1 + +let rec sort1 arr = + let open Lwt.Infix in + if Array.length arr <= 1 then Lwt.return arr + else + let m = Array.length arr / 2 in + Lwt.both + (sort1 (Array.sub arr m (Array.length arr - m))) + (sort1 (Array.sub arr 0 m)) + >|= fun (arr0, arr1) -> + merge arr0 arr1 +``` + +If we trace the execution of the two functions (for example, by displaying our +`arr` each time), we see the same behaviour whether Lwt is used or not. However, +what is interesting in the Lwt code is the use of `both`, which suggests that +the processes are running _at the same time_. + +"At the same time" does not necessarily suggest the use of several cores or "in +parallel", but the possibility that the right-hand side may also have the +opportunity to be executed even if the left-hand side has not finished. In other +words, that the two processes can run **concurrently**. + +But factually, this is not the case, because even if we had the possibility of +a point of cooperation (with the `>|=` operator), Lwt tries to go as far as +possible and decides to finish the left part before launching the right part: + +```shell +$ ./a.out +sort0: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|] +sort0: [|3; 4; 2; 1; 7|] +sort0: [|3; 4|] +sort0: [|2; 1; 7|] +sort0: [|1; 7|] +sort0: [|5; 8; 9; 0; 6|] +sort0: [|5; 8|] +sort0: [|9; 0; 6|] +sort0: [|0; 6|] + +sort1: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|] +sort1: [|3; 4; 2; 1; 7|] +sort1: [|3; 4|] +sort1: [|2; 1; 7|] +sort1: [|1; 7|] +sort1: [|5; 8; 9; 0; 6|] +sort1: [|5; 8|] +sort1: [|9; 0; 6|] +sort1: [|0; 6|] +``` + +
+ +**1**: However, if you are not interested in availability +and would like the scheduler to try to resolve your promises as quickly as +possible, this advice is clearly valid. + +#### Performances + +It should be noted, however, that Lwt has an impact. Even if the behaviour is +the same, the Lwt layer is not free. A quick benchmark shows that there is an +overhead: + +```ocaml +let _ = + let t0 = Unix.gettimeofday () in + for i = 0 to 1000 do let _ = sort0 arr in () done; + let t1 = Unix.gettimeofday () in + Fmt.pr "sort0 %fs\n%!" (t1 -. t0) + +let _ = + let t0 = Unix.gettimeofday () in + Lwt_main.run @@ begin + let open Lwt.Infix in + let rec go idx = if idx = 1000 then Lwt.return_unit + else sort1 arr >>= fun _ -> go (succ idx) in + go 0 end; + let t1 = Unix.gettimeofday () in + Fmt.pr "sort1 %fs\n%!" (t1 -. t0) +``` + +```sh +$ ./a.out +sort0 0.000264s +sort1 0.000676s +``` + +This is the fairly obvious argument for not using Lwt when there's no I/O. Then, +if the Lwt monad is really needed, a simple `Lwt.return` at the very last +instance is sufficient (or, better, the use of `Lwt.map` / `>|=`). + +#### Cooperation and concrete example + +So `Lwt.both` is the one to use when we want to run two processes +"at the same time". For the example, [ocaml-git][ocaml-git] attempts _both_ to +retrieve a repository and also to analyse it. This can be seen in this snippet +of [code][ocaml-git-both]. + +In our example with ocaml-git, the problem "shouldn't" appear because, in this +case, both the left and right side do I/O (the left side binds into a socket +while the right side saves Git objects in your file system). So, in our tests +with `Git_unix`, we were able to see that the analysis (right-hand side) was +well executed and 'interleaved' with the reception of objects from the network. + +### Composability + +However, if we go back to our initial problem, we were talking about our +opam-mirror unikernel. As you might expect, there is no standalone MirageOS file +system (and many of our unikernels don't need one). So, in the case of +opam-mirror, we use the ocaml-git memory implementation: `Git_mem`. + +`Git_mem` is different in that Git objects are simply stored in a `Hashtbl`. +There is no cooperation point when it comes to obtaining Git objects from this +`Hashtbl`. So let's return to our original advice: + +> don't wrap code in Lwt if it doesn't do I/O. + +And, of course, `Git_mem` doesn't do I/O. It does, however, require the process +to be able to work with Lwt. In this case, `Git_mem` wraps the results in Lwt +**as late as possible** (as explained above, so as not to slow down our +processes unnecessarily). The choice inevitably means that the right-hand side +can no longer offer cooperation points. And this is where our problem begins: +composition. + +In fact, we had something like: + +```ocaml +let clone socket git = + Lwt.both (receive_pack socket) (analyse_pack git) >>= fun ((), ()) -> + Lwt.return_unit +``` + +However, our `analyse_pack` function is an injection of a functor representing +the Git backend. In other words, `Git_unix` or `Git_mem`: + +```ocaml +module Make (Git : Git.S) = struct + let clone socket git = + Lwt.both (receive_pack socket) (Git.analyse_pack git) >>= fun ((), ()) -> + Lwt.return_unit +end +``` + +Composability poses a problem here because even if `Git_unix` and `Git_mem` +offer the same function (so both modules can be used), the fact remains that one +will always offer a certain availability to other services (such as an HTTP +service) while the other will offer a Lwt function which will try to go as far +as possible quite to make other services unavailable. + +Composing with one or the other therefore does not produce the same behavior. + +#### Where to put `Lwt.pause`? + +In this case, our `analyse_pack` does read/write on the Git store. As far as +`Git_mem` is concerned, we said that these read/write accesses were just +accesses to a `Hashtbl`. + +Thanks to [Hannes][hannes]' help, it took us an afternoon to work out where we +needed to add cooperation points in `Git_mem` so that `analyse_pack` could give +another service such as HTTP the opportunity to work. Basically, this series of +[commits][commits] shows where we needed to add `Lwt.pause`. + +However, this points to a number of problems: +1) it is not necessarily true that on the basis of composability alone (by + _functor_ or by value), Lwt reacts in the same way +2) Subtly, you have to dig into the code to find the right opportunities where + to put, by hand, `Lwt.pause`. +3) In the end, Lwt has no mechanisms for ensuring the availability of a service + (this is something that must be taken into account by the implementer). + +### In-depth knowledge of Lwt + +I haven't mentioned another problem we encountered with [Armael][armael] when +implementing [multipart_form][multipart_form] where the use of stream meant that +Lwt didn't interleave the two processes and the use of a _bounded stream_ was +required. Again, even when it comes to I/O, Lwt always tries to go as far as +possible in one of two branches of a `Lwt.both`. + +This allows us to conclude that beyond the monad, Lwt has subtleties in its +behaviour which may be different from another scheduler such as Async (hence the +incompatibility between the two, which is not just of the `'a t` type). + +### Digression on Miou + +That's why we put so much emphasis on the notion of availability when it comes +to Miou: to avoid repeating the mistakes of the past. The choices that can be +made with regard to this notion in particular have a major impact, and can be +unsatisfactory to the user in certain cases (for example, so-called pure +calculations could take longer with Miou than with another scheduler). + +In this sense, we have tried to constrain ourselves in the development of Miou +through the use of `Effect.Shallow` which requires us to always re-attach our +handler (our scheduler) as soon as an effect is produced, unlike `Effect.Deep` +which can re-use the same handler for several effects. In other words, and as +we've described here, **an effect yields**! + +## Conclusion + +As far as opam-mirror is concerned, we now have an unikernel that is available +even if it attempts to clone a Git repository and save Git objects in memory. At +least, an HTTP service can co-exist with ocaml-git! + +I hope we'll be able to use it at [the next retreat][retreat], which I invite +you to attend to talk more about Lwt, scheduler, Git and unikernels! + +[opam-mirror]: https://git.robur.coop/robur/opam-mirror +[reproducibility]: https://blog.osau.re/articles/reproducible.html +[bob]: https://bob.osau.re/ +[installation]: https://blog.osau.re/articles/reproducible.html +[ocaml-git]: https://github.com/mirage/ocaml-git +[ocaml-git-both]: https://github.com/mirage/ocaml-git/blob/a36c90404b149ab85f429439af8785bb1dde1bee/src/not-so-smart/smart_git.ml#L476-L481 +[hannes]: https://hannes.robur.coop/ +[armael]: https://cambium.inria.fr/~agueneau/ +[multipart_form]: https://discuss.ocaml.org/t/ann-release-of-multipart-form-0-2-0/7704#memory-bound-implementation +[retreat]: https://retreat.mirage.io/ +[commits]: https://github.com/mirage/ocaml-git/pull/631/files +[miou-articles]: https://blog.osau.re/tags/scheduler.html