Added editing remarks from @christinerose https://github.com/ocaml/ocaml.org/pull/1806

This commit is contained in:
Hannes Mehnert 2023-11-29 13:57:48 +01:00
parent 5316960bcb
commit 1f46f74964

View file

@ -19,7 +19,7 @@ To dive a bit more into [network semantics](https://www.cl.cam.ac.uk/~pes20/Nets
In 2014 I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the [journal paper](https://dl.acm.org/doi/10.1145/3243650) ([author's copy](http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf)) published. At the same time the [MirageOS](https://mirage.io) melting pot was happening at University of Cambridge, where I contributed OCaml-TLS etc. with David. In 2014 I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the [journal paper](https://dl.acm.org/doi/10.1145/3243650) ([author's copy](http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf)) published. At the same time the [MirageOS](https://mirage.io) melting pot was happening at University of Cambridge, where I contributed OCaml-TLS etc. with David.
My intention was to understand TCP better, and use the specification as a basis for a TCP stack for MirageOS - the [existing one](https://github.com/mirage/mirage-tcpip) (which is still used) has technical debt: a high issue to number of lines ratio, the lwt monad is ubiquitous which makes testing and debugging pretty hard, utilizing multiple cores with OCaml multicore won't be easy, it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me to implement things such as restart-on-failure in albatross, there are certain connection states which will never be exited. My intention was to understand TCP better, and use the specification as a basis for a TCP stack for MirageOS - the [existing one](https://github.com/mirage/mirage-tcpip) (which is still used) has technical debt: a high issue to number of lines ratio, the lwt monad is ubiquitous which makes testing and debugging pretty hard, utilizing multiple cores with OCaml multicore won't be easy, it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me to implement things such as restart-on-failure in [Albatross](https://github.com/robur-coop/albatross), there are certain connection states which will never be exited.
# The rise of [µTCP](https://github.com/robur-coop/utcp) # The rise of [µTCP](https://github.com/robur-coop/utcp)
@ -37,11 +37,11 @@ Now, after switching over to µTCP, graphed below, there's much fewer network ut
[<img src="/static/img/a.ns.mtcp-utcp.png" width="750" />](/static/img/a.ns.mtcp-utcp.png) [<img src="/static/img/a.ns.mtcp-utcp.png" width="750" />](/static/img/a.ns.mtcp-utcp.png)
Investigating the involved parts showed that a TCP connection that was never established has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the [TCP.S](https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml) module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the [required code changes](https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c) was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously. Investigating the involved parts showed that an unestablished TCP connection has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the [TCP.S](https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml) module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the [required code changes](https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c) was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.
[<img src="/static/img/a.ns.utcp-ev.png" width="750" />](/static/img/a.ns.utcp-ev.png) [<img src="/static/img/a.ns.utcp-ev.png" width="750" />](/static/img/a.ns.utcp-ev.png)
Now, the network utilization is unwanted. This was hidden by the application waiting forever that the TCP connection getting established. Our bugfix uncovered another issue, a tight loop: Now, the network utilization is unwanted. This was hidden by the application waiting forever while the TCP connection getting established. Our bugfix uncovered another issue, a tight loop:
- the nameserver attempts to connect to the other nameserver (`request`); - the nameserver attempts to connect to the other nameserver (`request`);
- this results in a `TCP.create_connection` which errors after one roundtrip; - this results in a `TCP.create_connection` which errors after one roundtrip;
- this leads to a `close`, which attempts a `request` again. - this leads to a `close`, which attempts a `request` again.