diff --git a/Posts/TCP-ns b/Posts/TCP-ns index 118f3d3..921c8c1 100644 --- a/Posts/TCP-ns +++ b/Posts/TCP-ns @@ -1,5 +1,5 @@ -Re-developing TCP from the grounds up

Re-developing TCP from the grounds up

Written by hannes
Classified under: mirageosprotocoltcp
Published: 2023-11-28 (last updated: 2023-11-28)

The Transmission Control Protocol (TCP) is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).

+Re-developing TCP from the grounds up

Re-developing TCP from the grounds up

Written by hannes
Classified under: mirageosprotocoltcp
Published: 2023-11-28 (last updated: 2023-11-29)

The Transmission Control Protocol (TCP) is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).

As common for Internet protocols, also TCP is specified in a series of so-called requests for comments (RFC), the latest revised version from August 2022 is RFC 9293, the initial one was RFC 793 from September 1981.

My brief personal TCP story

My interest in TCP started back in 2006 when we worked on a network stack in Dylan (these days abandoned) - ever since then I wanted to understand the implementation tradeoffs in more detail - including attacks and how to prevent a TCP stack from being vulnerable.

@@ -15,7 +15,7 @@

Now, after switching over to µTCP, graphed below, there's much fewer network utilization and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. Both graphs contain on their left side a few hours of mirage-tcpip, and shortly after 20:00 on Nov 23rd µTCP got deployed.

-

Investigating the involved parts showed that a TCP connection that was never established has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Once this was well understood, developing the required code changes was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.

+

Investigating the involved parts showed that a TCP connection that was never established has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the TCP.S module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the required code changes was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.

Now, the network utilization is unwanted. This was hidden by the application waiting forever that the TCP connection getting established. Our bugfix uncovered another issue, a tight loop:

    diff --git a/atom b/atom index 26f5e45..056576d 100644 --- a/atom +++ b/atom @@ -1,4 +1,4 @@ -urn:uuid:981361ca-e71d-4997-a52c-baeee78e4156full stack engineer2023-11-28T21:26:50-00:00<p>Core Internet protocols require operational experiments, even if formally specified</p> +urn:uuid:981361ca-e71d-4997-a52c-baeee78e4156full stack engineer2023-11-29T12:45:29-00:00<p>Core Internet protocols require operational experiments, even if formally specified</p> 2023-11-28T21:17:01-00:00<p>The <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol (TCP)</a> is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).</p> <p>As common for Internet protocols, also TCP is specified in a series of so-called requests for comments (RFC), the latest revised version from August 2022 is <a href="https://datatracker.ietf.org/doc/html/rfc9293">RFC 9293</a>, the initial one was <a href="https://datatracker.ietf.org/doc/html/rfc793">RFC 793</a> from September 1981.</p> <h1 id="my-brief-personal-tcp-story">My brief personal TCP story</h1> @@ -15,7 +15,7 @@ <p><a href="/static/img/a.ns.mtcp.png"><img src="/static/img/a.ns.mtcp.png" width="750" /></a></p> <p>Now, after switching over to µTCP, graphed below, there's much fewer network utilization and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. Both graphs contain on their left side a few hours of mirage-tcpip, and shortly after 20:00 on Nov 23rd µTCP got deployed.</p> <p><a href="/static/img/a.ns.mtcp-utcp.png"><img src="/static/img/a.ns.mtcp-utcp.png" width="750" /></a></p> -<p>Investigating the involved parts showed that a TCP connection that was never established has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Once this was well understood, developing the <a href="https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c">required code changes</a> was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.</p> +<p>Investigating the involved parts showed that a TCP connection that was never established has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the <a href="https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml">TCP.S</a> module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the <a href="https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c">required code changes</a> was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.</p> <p><a href="/static/img/a.ns.utcp-ev.png"><img src="/static/img/a.ns.utcp-ev.png" width="750" /></a></p> <p>Now, the network utilization is unwanted. This was hidden by the application waiting forever that the TCP connection getting established. Our bugfix uncovered another issue, a tight loop:</p> <ul> @@ -35,7 +35,7 @@ <p>We'll take some more time to investigate issues of µTCP in production, plan to write further documentation and blog posts, and hopefully soon are ready for an initial public release. In the meantime, you can follow our development repository.</p> <p>We at <a href="https://robur.coop">robur</a> are working as a collective since 2018, on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continuing our work by <a href="https://aenderwerk.de/donate/">donating</a> (select robur from the drop-down, or put &quot;donation robur&quot; in the purpose of the bank transfer).</p> <p>If you have any questions, reach us best via eMail to team AT robur DOT coop.</p> -urn:uuid:96688956-0808-5d44-b795-1d64cbb4f947Re-developing TCP from the grounds up2023-11-28T21:26:50-00:00hannes<p>fleet management for MirageOS unikernels using a mutually authenticated TLS handshake</p> +urn:uuid:96688956-0808-5d44-b795-1d64cbb4f947Re-developing TCP from the grounds up2023-11-29T12:45:29-00:00hannes<p>fleet management for MirageOS unikernels using a mutually authenticated TLS handshake</p> 2022-11-17T12:41:11-00:00<p>EDIT (2023-05-16): Updated with albatross release version 2.0.0.</p> <h2 id="deploying-mirageos-unikernels">Deploying MirageOS unikernels</h2> <p>More than five years ago, I posted <a href="/Posts/VMM">how to deploy MirageOS unikernels</a>. My motivation to work on this topic is that I'm convinced of reduced complexity, improved security, and more sustainable resource footprint of MirageOS unikernels, and want to ease deployment thereof. More than one year ago, I described <a href="/Posts/Deploy">how to deploy reproducible unikernels</a>.</p>