Cooperation and Lwt.pause
+Here's a concrete example of the notion of availability and the scheduler used
+(in this case Lwt). As you may know, at Robur we have developed a unikernel:
+opam-mirror. It launches an HTTP service that can be used as an
+OPAM overlay available from a Git repository (with opam repository add <name> <url>
).
The purpose of such an unikernel was to respond to a failure of the official +repository which fortunately did not last long and to offer decentralisation +of such a service. You can use https://opam.robur.coop!
+It was also useful at the Mirage retreat, where we don't usually have a +great internet connection. Caching packages for our OCaml users on the local +network has benefited us in terms of our Internet bill by allowing the OCaml +users to fetch opam packages over the local network instead of over the shared, +metered 4G Internet conncetion.
+Finally, it's a unikernel that I also use on my server for my software +reproducibility service in order to have an overlay for my +software like Bob.
+In short, I advise you to use it, you can see its installation +here (I think that in the context of a company, internally, it +can be interesting to have such a unikernel available).
+However, this unikernel had a long-standing problem. We were already talking +about it at the Mirleft retreat, when we tried to get the repository from Git, +we had a (fairly long) unavailability of our HTTP server. Basically, we had to +wait ~10 min before the service offered by the unikernel was available.
+Availability
+If you follow my articles, as far as Miou is concerned, from +the outset I talk of the notion of availability if we were to make yet another +new scheduler for OCaml 5. We emphasised this notion because we had quite a few +problems on this subject and Lwt.
+In this case, the notion of availability requires the scheduler to be able to +observe system events as often as possible. The problem is that Lwt doesn't +really offer this approach.
+Indeed, Lwt offers a way of observing system events (Lwt.pause
) but does not
+do so systematically. The only time you really give the scheduler the
+opportunity to see whether you can read or write is when you want to...
+read or write...
More generally, it is said that Lwt's bind does not yield. In other words,
+you can chain any number of functions together (via the >>=
operator), but
+from Lwt's point of view, there is no opportunity to see if an event has
+occurred. Lwt always tries to go as far down your chain as possible:
-
+
- and finish your promise +
- or come across an operation that requires a system event (read or write) +
- or come across an
Lwt.pause
(as a yield point)
+
Lwt is rather sparse in adding cooperation points besides Lwt.pause
and
+read/write operations, in contrast with Async where the bind operator is a
+cooperation point.
If there is no I/O, do not wrap in Lwt
+It was (bad1) advice I was given. If a function doesn't do +I/O, there's no point in putting it in Lwt. At first glance, however, the idea +may be a good one. If you have a function that doesn't do I/O, whether it's in +the Lwt monad or not won't make any difference to the way Lwt tries to execute +it. Once again, Lwt should go as far as possible. So Lwt tries to solve both +functions in the same way:
+val merge : int array -> int array -> int array
+
+let rec sort0 arr =
+ if Array.length arr <= 1 then arr
+ else
+ let m = Array.length arr / 2 in
+ let arr0 = sort0 (Array.sub arr 0 m) in
+ let arr1 = sort0 (Array.sub arr m (Array.length arr - m)) in
+ merge arr0 arr1
+
+let rec sort1 arr =
+ let open Lwt.Infix in
+ if Array.length arr <= 1 then Lwt.return arr
+ else
+ let m = Array.length arr / 2 in
+ Lwt.both
+ (sort1 (Array.sub arr m (Array.length arr - m)))
+ (sort1 (Array.sub arr 0 m))
+ >|= fun (arr0, arr1) ->
+ merge arr0 arr1
+
+If we trace the execution of the two functions (for example, by displaying our
+arr
each time), we see the same behaviour whether Lwt is used or not. However,
+what is interesting in the Lwt code is the use of both
, which suggests that
+the processes are running at the same time.
"At the same time" does not necessarily suggest the use of several cores or "in +parallel", but the possibility that the right-hand side may also have the +opportunity to be executed even if the left-hand side has not finished. In other +words, that the two processes can run concurrently.
+But factually, this is not the case, because even if we had the possibility of
+a point of cooperation (with the >|=
operator), Lwt tries to go as far as
+possible and decides to finish the left part before launching the right part:
$ ./a.out
+sort0: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
+sort0: [|3; 4; 2; 1; 7|]
+sort0: [|3; 4|]
+sort0: [|2; 1; 7|]
+sort0: [|1; 7|]
+sort0: [|5; 8; 9; 0; 6|]
+sort0: [|5; 8|]
+sort0: [|9; 0; 6|]
+sort0: [|0; 6|]
+
+sort1: [|3; 4; 2; 1; 7; 5; 8; 9; 0; 6|]
+sort1: [|3; 4; 2; 1; 7|]
+sort1: [|3; 4|]
+sort1: [|2; 1; 7|]
+sort1: [|1; 7|]
+sort1: [|5; 8; 9; 0; 6|]
+sort1: [|5; 8|]
+sort1: [|9; 0; 6|]
+sort1: [|0; 6|]
+
++
Performances
+It should be noted, however, that Lwt has an impact. Even if the behaviour is +the same, the Lwt layer is not free. A quick benchmark shows that there is an +overhead:
+let _ =
+ let t0 = Unix.gettimeofday () in
+ for i = 0 to 1000 do let _ = sort0 arr in () done;
+ let t1 = Unix.gettimeofday () in
+ Fmt.pr "sort0 %fs\n%!" (t1 -. t0)
+
+let _ =
+ let t0 = Unix.gettimeofday () in
+ Lwt_main.run @@ begin
+ let open Lwt.Infix in
+ let rec go idx = if idx = 1000 then Lwt.return_unit
+ else sort1 arr >>= fun _ -> go (succ idx) in
+ go 0 end;
+ let t1 = Unix.gettimeofday () in
+ Fmt.pr "sort1 %fs\n%!" (t1 -. t0)
+
+$ ./a.out
+sort0 0.000264s
+sort1 0.000676s
+
+This is the fairly obvious argument for not using Lwt when there's no I/O. Then,
+if the Lwt monad is really needed, a simple Lwt.return
at the very last
+instance is sufficient (or, better, the use of Lwt.map
/ >|=
).
Cooperation and concrete example
+So Lwt.both
is the one to use when we want to run two processes
+"at the same time". For the example, ocaml-git attempts both to
+retrieve a repository and also to analyse it. This can be seen in this snippet
+of code.
In our example with ocaml-git, the problem "shouldn't" appear because, in this
+case, both the left and right side do I/O (the left side binds into a socket
+while the right side saves Git objects in your file system). So, in our tests
+with Git_unix
, we were able to see that the analysis (right-hand side) was
+well executed and 'interleaved' with the reception of objects from the network.
Composability
+However, if we go back to our initial problem, we were talking about our
+opam-mirror unikernel. As you might expect, there is no standalone MirageOS file
+system (and many of our unikernels don't need one). So, in the case of
+opam-mirror, we use the ocaml-git memory implementation: Git_mem
.
Git_mem
is different in that Git objects are simply stored in a Hashtbl
.
+There is no cooperation point when it comes to obtaining Git objects from this
+Hashtbl
. So let's return to our original advice:
++don't wrap code in Lwt if it doesn't do I/O.
+
And, of course, Git_mem
doesn't do I/O. It does, however, require the process
+to be able to work with Lwt. In this case, Git_mem
wraps the results in Lwt
+as late as possible (as explained above, so as not to slow down our
+processes unnecessarily). The choice inevitably means that the right-hand side
+can no longer offer cooperation points. And this is where our problem begins:
+composition.
In fact, we had something like:
+let clone socket git =
+ Lwt.both (receive_pack socket) (analyse_pack git) >>= fun ((), ()) ->
+ Lwt.return_unit
+
+However, our analyse_pack
function is an injection of a functor representing
+the Git backend. In other words, Git_unix
or Git_mem
:
module Make (Git : Git.S) = struct
+ let clone socket git =
+ Lwt.both (receive_pack socket) (Git.analyse_pack git) >>= fun ((), ()) ->
+ Lwt.return_unit
+end
+
+Composability poses a problem here because even if Git_unix
and Git_mem
+offer the same function (so both modules can be used), the fact remains that one
+will always offer a certain availability to other services (such as an HTTP
+service) while the other will offer a Lwt function which will try to go as far
+as possible quite to make other services unavailable.
Composing with one or the other therefore does not produce the same behavior.
+Where to put Lwt.pause
?
+In this case, our analyse_pack
does read/write on the Git store. As far as
+Git_mem
is concerned, we said that these read/write accesses were just
+accesses to a Hashtbl
.
Thanks to Hannes' help, it took us an afternoon to work out where we
+needed to add cooperation points in Git_mem
so that analyse_pack
could give
+another service such as HTTP the opportunity to work. Basically, this series of
+commits shows where we needed to add Lwt.pause
.
However, this points to a number of problems:
+-
+
- it is not necessarily true that on the basis of composability alone (by +functor or by value), Lwt reacts in the same way +
- Subtly, you have to dig into the code to find the right opportunities where
+to put, by hand,
Lwt.pause
.
+ - In the end, Lwt has no mechanisms for ensuring the availability of a service +(this is something that must be taken into account by the implementer). +
In-depth knowledge of Lwt
+I haven't mentioned another problem we encountered with Armael when
+implementing multipart_form where the use of stream meant that
+Lwt didn't interleave the two processes and the use of a bounded stream was
+required. Again, even when it comes to I/O, Lwt always tries to go as far as
+possible in one of two branches of a Lwt.both
.
This allows us to conclude that beyond the monad, Lwt has subtleties in its
+behaviour which may be different from another scheduler such as Async (hence the
+incompatibility between the two, which is not just of the 'a t
type).
Digression on Miou
+That's why we put so much emphasis on the notion of availability when it comes +to Miou: to avoid repeating the mistakes of the past. The choices that can be +made with regard to this notion in particular have a major impact, and can be +unsatisfactory to the user in certain cases (for example, so-called pure +calculations could take longer with Miou than with another scheduler).
+In this sense, we have tried to constrain ourselves in the development of Miou
+through the use of Effect.Shallow
which requires us to always re-attach our
+handler (our scheduler) as soon as an effect is produced, unlike Effect.Deep
+which can re-use the same handler for several effects. In other words, and as
+we've described here, an effect yields!
Conclusion
+As far as opam-mirror is concerned, we now have an unikernel that is available +even if it attempts to clone a Git repository and save Git objects in memory. At +least, an HTTP service can co-exist with ocaml-git!
+I hope we'll be able to use it at the next retreat, which I invite +you to attend to talk more about Lwt, scheduler, Git and unikernels!
+ +