updated from main (commit 1b759a1698)

This commit is contained in:
Canopy bot 2023-11-29 13:31:16 +00:00
parent c5c0de33cb
commit 09e9834897
7 changed files with 48 additions and 48 deletions

View file

@ -1,38 +1,38 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Re-developing TCP from the grounds up</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="Re-developing TCP from the grounds up" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="post"><h2>Re-developing TCP from the grounds up</h2><span class="author">Written by hannes</span><br/><div class="tags">Classified under: <a href="/tags/mirageos" class="tag">mirageos</a><a href="/tags/protocol" class="tag">protocol</a><a href="/tags/tcp" class="tag">tcp</a></div><span class="date">Published: 2023-11-28 (last updated: 2023-11-29)</span><article><p>The <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol (TCP)</a> is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).</p>
<p>As common for Internet protocols, also TCP is specified in a series of so-called requests for comments (RFC), the latest revised version from August 2022 is <a href="https://datatracker.ietf.org/doc/html/rfc9293">RFC 9293</a>, the initial one was <a href="https://datatracker.ietf.org/doc/html/rfc793">RFC 793</a> from September 1981.</p>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Redeveloping TCP from the ground up</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="Redeveloping TCP from the ground up" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="post"><h2>Redeveloping TCP from the ground up</h2><span class="author">Written by hannes</span><br/><div class="tags">Classified under: <a href="/tags/mirageos" class="tag">mirageos</a><a href="/tags/protocol" class="tag">protocol</a><a href="/tags/tcp" class="tag">tcp</a></div><span class="date">Published: 2023-11-28 (last updated: 2023-11-29)</span><article><p>The <a href="https://en.wikipedia.org/wiki/Transmission_Control_Protocol">Transmission Control Protocol (TCP)</a> is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).</p>
<p>As common for Internet protocols, TCP is specified in a series of so-called requests for comments (RFC). The latest revised version from August 2022 is <a href="https://datatracker.ietf.org/doc/html/rfc9293">RFC 9293</a>; the initial one was <a href="https://datatracker.ietf.org/doc/html/rfc793">RFC 793</a> from September 1981.</p>
<h1 id="my-brief-personal-tcp-story">My brief personal TCP story</h1>
<p>My interest in TCP started back in 2006 when we worked on a <a href="https://github.com/dylan-hackers/network-night-vision">network stack in Dylan</a> (these days abandoned) - ever since then I wanted to understand the implementation tradeoffs in more detail - including attacks and how to prevent a TCP stack from being vulnerable.</p>
<p>In 2012 I attended ICFP in Copenhagen - while being a PhD student at ITU Copenhagen - and there <a href="https://www.cl.cam.ac.uk/~pes20/">Peter Sewell</a> gave an invited talk &quot;Tales from the jungle&quot; about rigorous methods for real-world infrastructure (C semantics, hardware (concurrency) behaviour of CPUs, TCP/IP, and likely more). Working on formal specifications myself (<a href="https://en.itu.dk/-/media/EN/Research/PhD-Programme/PhD-defences/2013/130731-Hannes-Mehnert-PhD-dissertation-finalpdf.pdf">my dissertation</a>), and having a strong interest in real systems, I was immediately hooked by his perspective.</p>
<p>To dive a bit more into <a href="https://www.cl.cam.ac.uk/~pes20/Netsem/">network semantics</a> - the work done on TCP by Peter Sewell et al - is a formal specification (or a model) of TCP/IP and the Unix sockets API developed in HOL4. It is a label transition system with non-deterministic choices, and the model itself is executable. It has been validated with the real world by collecting thousands of traces on Linux, Windows, and FreeBSD - which have been checked by the model for validity - this copes with the different implementations of the English prose of the RFCs. The network semantics research found several issues in existing TCP stack and reported them upstream to have them fixed (though, there still is some special treatment, e.g. for the &quot;BSD listen bug&quot;).</p>
<p>In 2014 I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the <a href="https://dl.acm.org/doi/10.1145/3243650">journal paper</a> (<a href="http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf">author's copy</a>) published. At the same time the <a href="https://mirage.io">MirageOS</a> melting pot was happening at University of Cambridge, where I contributed OCaml-TLS etc. with David.</p>
<p>My intention was to understand TCP better, and use the specification as a basis for a TCP stack for MirageOS - the <a href="https://github.com/mirage/mirage-tcpip">existing one</a> (which is still used) has technical debt: a high issue to number of lines ratio, the lwt monad is ubiquitous which makes testing and debugging pretty hard, utilizing multiple cores with OCaml multicore won't be easy, it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me to implement things such as restart-on-failure in <a href="https://github.com/robur-coop/albatross">Albatross</a>, there are certain connection states which will never be exited.</p>
<p>My interest in TCP started back in 2006 when we worked on a <a href="https://github.com/dylan-hackers/network-night-vision">network stack in Dylan</a> (these days abandoned). Ever since, I wanted to understand the implementation tradeoffs in more detail, including attacks and how to prevent a TCP stack from being vulnerable.</p>
<p>In 2012, I attended ICFP in Copenhagen while a PhD student at ITU Copenhagen. There, <a href="https://www.cl.cam.ac.uk/~pes20/">Peter Sewell</a> gave an invited talk &quot;Tales from the jungle&quot; about rigorous methods for real-world infrastructure (C semantics, hardware (concurrency) behaviour of CPUs, TCP/IP, and likely more). Working on formal specifications myself in (<a href="https://en.itu.dk/-/media/EN/Research/PhD-Programme/PhD-defences/2013/130731-Hannes-Mehnert-PhD-dissertation-finalpdf.pdf">my dissertation</a>), and having a strong interest in real systems, I was immediately hooked by his perspective.</p>
<p>To dive a bit more into <a href="https://www.cl.cam.ac.uk/~pes20/Netsem/">network semantics</a>, the work done on TCP by Peter Sewell, et al., is a formal specification (or a model) of TCP/IP and the Unix sockets API developed in HOL4. It is a label transition system with nondeterministic choices, and the model itself is executable. It has been validated with the real world by collecting thousands of traces on Linux, Windows, and FreeBSD, which have been checked by the model for validity. This copes with the different implementations of the English prose of the RFCs. The network semantics research found several issues in existing TCP stacks and reported them upstream to have them fixed (though, there still is some special treatment, e.g., for the &quot;BSD listen bug&quot;).</p>
<p>In 2014, I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the <a href="https://dl.acm.org/doi/10.1145/3243650">journal paper</a> (<a href="http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf">author's copy</a>) published. At the same time, the <a href="https://mirage.io">MirageOS</a> melting pot was happening at University of Cambridge, where I contributed with David OCaml-TLS and other things.</p>
<p>My intention was to understand TCP better and use the specification as a basis for a TCP stack for MirageOS. The <a href="https://github.com/mirage/mirage-tcpip">existing one</a> (which is still used) has technical debt: a high issue to number of lines ratio. The Lwt monad is ubiquitous, which makes testing and debugging pretty hard, and also utilising multiple cores with OCaml Multicore won't be easy. Plus it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me implement things such as restart-on-failure in <a href="https://github.com/robur-coop/albatross">Albatross</a>, there are certain connection states which will never be exited.</p>
<h1 id="the-rise-of-µtcp">The rise of <a href="https://github.com/robur-coop/utcp">µTCP</a></h1>
<p>Back in Cambridge I didn't manage to write a TCP stack based on the model, but in 2019 I re-started that work and got µTCP (the formal model manually translated to OCaml) to compile and do TCP session setup and teardown. Since it was a model that uses non-determinism, this couldn't be translated one-to-one into an executable program, but there are places where decisions have to be done. Due to other projects, I worked only briefly in 2021 and 2022 on µTCP, but finally in the summer 2023 I motivated myself to push µTCP into a usable state - so far I spend 25 days in 2023 on µTCP. Thanks to <a href="https://tarides.com">Tarides</a> for supporting my work.</p>
<p>Since late August we are running some unikernels using µTCP, e.g. the <a href="https://retreat.mirage.io">retreat</a> website. This allows us to observe µTCP and find and solve issues that occur in the real world. It turned out that the model is not always correct (i.e. in the model there is no retransmit timer in the close wait state - which avoids proper session teardowns). We report statistics about how many TCP connections are in which state to an influx time series database and view graphs rendered by grafana. If there are connections that are stuck for multiple hours, this indicates a resource leak that should be addressed. Grafana was tremendously helpful for us to find out where to look for resource leaks. Still, there's work to understand the behaviour, look at what the model does, what µTCP does, what the RFC says, and eventually what existing deployed TCP stacks do.</p>
<p>Back in Cambridge, I didn't manage to write a TCP stack based on the model, but in 2019, I restarted that work and got µTCP (the formal model manually translated to OCaml) to compile and do TCP session setup and teardown. Since it was a model that uses nondeterminism, this couldn't be translated one-to-one into an executable program, but there are places where decisions have to be made. Due to other projects, I worked only briefly in 2021 and 2022 on µTCP, but finally in the Summer of 2023, I motivated myself to push µTCP into a usable state. So far I've spend 25 days in 2023 on µTCP. Thanks to <a href="https://tarides.com">Tarides</a> for supporting my work.</p>
<p>Since late August, we have been running some unikernels using µTCP, e.g., the <a href="https://retreat.mirage.io">retreat</a> website. This allows us to observe µTCP and find and solve issues that occur in the real world. It turned out that the model is not always correct (i.e., there is no retransmit timer in the close wait state, which avoids proper session teardowns). We report statistics about how many TCP connections are in which state to an Influx time series database and view graphs rendered by Grafana. If there are connections that are stuck for multiple hours, this indicates a resource leak that should be addressed. Grafana was tremendously helpful to find out where to look for resource leaks. Still, there's work to understand the behaviour, look at what the model does, what µTCP does, what the RFC says, and eventually what existing deployed TCP stacks do.</p>
<h1 id="the-secondary-nameserver-issue">The secondary nameserver issue</h1>
<p>One of our secondary nameservers attempts to receive zones (via AXFR using TCP) from another nameserver that is currently not running. Thus it replies to each SYN packet a corresponding RST. Below I graphed the network utilization (send data/packets is positive y-axis, receive part on the negative) over time (on the x-axis) on the left and memory usage (bytes on y-axis) over time (x-axis) on the right of our nameserver - you can observe that both increases over time, and roughly every 3 hours the unikernel hits its configured memory limit (64 MB), crashes with out of memory, and is restarted. The graph below is using the mirage-tcpip stack.</p>
<p>One of our secondary nameservers attempts to receive zones (via AXFR using TCP) from another nameserver that is currently not running. Thus it replies to each SYN packet a corresponding RST. Below I graphed the network utilisation (send data/packets is positive y-axis, receive part on the negative) over time (on the x-axis) on the left and memory usage (bytes on y-axis) over time (x-axis) on the right of our nameserver. You can observe that both increases over time, and roughly every 3 hours, the unikernel hits its configured memory limit (64 MB), crashes with <em>out of memory</em>, and is restarted. The graph below is using the <code>mirage-tcpip</code> stack.</p>
<p><a href="/static/img/a.ns.mtcp.png"><img src="/static/img/a.ns.mtcp.png" width="750" /></a></p>
<p>Now, after switching over to µTCP, graphed below, there's much fewer network utilization and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. Both graphs contain on their left side a few hours of mirage-tcpip, and shortly after 20:00 on Nov 23rd µTCP got deployed.</p>
<p>Now, after switching over to µTCP, graphed below, there's much less network utilisation and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. On their left side, both graphs contain a few hours of <code>mirage-tcpip</code>, and shortly after 20:00 on Nov 23rd, µTCP got deployed.</p>
<p><a href="/static/img/a.ns.mtcp-utcp.png"><img src="/static/img/a.ns.mtcp-utcp.png" width="750" /></a></p>
<p>Investigating the involved parts showed that an unestablished TCP connection has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the <a href="https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml">TCP.S</a> module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the <a href="https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c">required code changes</a> was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.</p>
<p>Investigating the involved parts showed that an unestablished TCP connection has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and it doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the <a href="https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml">TCP.S</a> module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the <a href="https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c">required code changes</a> was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilisation increased enormously.</p>
<p><a href="/static/img/a.ns.utcp-ev.png"><img src="/static/img/a.ns.utcp-ev.png" width="750" /></a></p>
<p>Now, the network utilization is unwanted. This was hidden by the application waiting forever while the TCP connection getting established. Our bugfix uncovered another issue, a tight loop:</p>
<p>Now, the network utilisation is unwanted. This was hidden by the application waiting forever while the TCP connection getting established. Our bug fix uncovered another issue -- a tight loop:</p>
<ul>
<li>the nameserver attempts to connect to the other nameserver (<code>request</code>);
<li>The nameserver attempts to connect to the other nameserver (<code>request</code>);
</li>
<li>this results in a <code>TCP.create_connection</code> which errors after one roundtrip;
<li>This results in a <code>TCP.create_connection</code> which errors after one roundtrip;
</li>
<li>this leads to a <code>close</code>, which attempts a <code>request</code> again.
<li>This leads to a <code>close</code>, which attempts a <code>request</code> again.
</li>
</ul>
<p>This is unnecessary since the DNS server code has a timer to attempt to connect to the remote nameserver periodically (but takes a break between attempts). After understanding this behaviour, we worked on <a href="https://github.com/mirage/ocaml-dns/pull/347">the fix</a> and re-deployed the nameserver again. The graph has on the left edge the tight loop (so you have a comparison), at 16:05 we deployed the fix - since then it looks pretty smooth, both in memory usage and in network utilization.</p>
<p>This is unnecessary since the DNS server code has a timer to attempt to connect to the remote nameserver periodically (but takes a break between attempts). After understanding this behaviour, we worked on <a href="https://github.com/mirage/ocaml-dns/pull/347">the fix</a> and redeployed the nameserver again. On the left edge, the has the tight loop (so you have a baseline for comparison), and at 16:05, we deployed the fix. Since then it looks pretty smooth, both in memory usage and in network utilisation.</p>
<p><a href="/static/img/a.ns.utcp-fixed.png"><img src="/static/img/a.ns.utcp-fixed.png" width="750" /></a></p>
<p>To give you the entire picture, below is the graph where you can spot the mirage-tcpip stack (lots of network, restarting every 3 hours), µTCP-without-informing-application (run for 3 * ~36 hours), dns-server-high-network-utilization (which only lasted for a brief period, thus it is more a point in the graph), and finally the unikernel with both fixes applied.</p>
<p>To give you the entire picture, below is the graph where you can spot the <code>mirage-tcpip</code> stack (lots of network, restarting every 3 hours), µTCP-without-informing-application (run for 3 * ~36 hours), DNS-server-high-network-utilization (which only lasted for a brief period, thus it is more a point in the graph), and finally the unikernel with both fixes applied.</p>
<p><a href="/static/img/a.ns.all.png"><img src="/static/img/a.ns.all.png" width="750" /></a></p>
<h1 id="conclusion">Conclusion</h1>
<p>What can we learn from that? Choosing convenient tooling is crucial for effective debugging. Also, fixing one issue may uncover other issues. And of course, the mirage-tcpip was running with the dns-server that had a tight reconnect loop. But, below the line: should such an application lead to memory leaks? I don't think so. My approach is that all core network libraries should work in a non-resource-leaky way with any kind of application on top of it. When one TCP connection returns an error (and thus is destroyed), the TCP stack should have no more resources used for that connection.</p>
<p>We'll take some more time to investigate issues of µTCP in production, plan to write further documentation and blog posts, and hopefully soon are ready for an initial public release. In the meantime, you can follow our development repository.</p>
<p>We at <a href="https://robur.coop">robur</a> are working as a collective since 2018, on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continuing our work by <a href="https://aenderwerk.de/donate/">donating</a> (select robur from the drop-down, or put &quot;donation robur&quot; in the purpose of the bank transfer).</p>
<p>What can we learn from that? Choosing convenient tooling is crucial for effective debugging. Also, fixing one issue may uncover other issues. And of course, the <code>mirage-tcpip</code> was running with the DNS-server that had a tight reconnect loop. But, below the line: should such an application lead to memory leaks? I don't think so. My approach is that all core network libraries should work in a non-resource-leaky way with any kind of application on top of it. When one TCP connection returns an error (and thus is destroyed), the TCP stack should have no more resources used for that connection.</p>
<p>We'll take more time to investigate issues of µTCP in production, plan to write further documentation and blog posts, and hopefully soon will be ready for an initial public release. In the meantime, you can follow our development repository.</p>
<p>We at <a href="https://robur.coop">Robur</a> are working as a collective since 2018 on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust, and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continue our work by <a href="https://aenderwerk.de/donate/">donating</a> (select Robur from the drop-down or put &quot;donation Robur&quot; in the purpose of the bank transfer).</p>
<p>If you have any questions, reach us best via eMail to team AT robur DOT coop.</p>
</article></div></div></main></body></html>

View file

@ -1,5 +1,5 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Re-developing TCP from the grounds up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Redeveloping TCP from the ground up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
</p></a><a href="/Posts/Albatross" class="list-group-item"><h2 class="list-group-item-heading">Deploying reproducible unikernels with albatross</h2><span class="author">Written by hannes</span> <time>2022-11-17</time><br/><p class="list-group-item-text abstract"><p>fleet management for MirageOS unikernels using a mutually authenticated TLS handshake</p>
</p></a><a href="/Posts/OpamMirror" class="list-group-item"><h2 class="list-group-item-heading">Mirroring the opam repository and all tarballs</h2><span class="author">Written by hannes</span> <time>2022-09-29</time><br/><p class="list-group-item-text abstract"><p>Re-developing an opam cache from scratch, as a MirageOS unikernel</p>
</p></a><a href="/Posts/Monitoring" class="list-group-item"><h2 class="list-group-item-heading">All your metrics belong to influx</h2><span class="author">Written by hannes</span> <time>2022-03-08</time><br/><p class="list-group-item-text abstract"><p>How to monitor your MirageOS unikernel with albatross and monitoring-experiments</p>

44
atom
View file

@ -1,41 +1,41 @@
<feed xmlns="http://www.w3.org/2005/Atom"><link href="https://hannes.robur.coop/atom" rel="self"/><id>urn:uuid:981361ca-e71d-4997-a52c-baeee78e4156</id><title type="text">full stack engineer</title><updated>2023-11-29T12:57:48-00:00</updated><entry><summary type="html">&lt;p&gt;Core Internet protocols require operational experiments, even if formally specified&lt;/p&gt;
<feed xmlns="http://www.w3.org/2005/Atom"><link href="https://hannes.robur.coop/atom" rel="self"/><id>urn:uuid:981361ca-e71d-4997-a52c-baeee78e4156</id><title type="text">full stack engineer</title><updated>2023-11-29T13:31:13-00:00</updated><entry><summary type="html">&lt;p&gt;Core Internet protocols require operational experiments, even if formally specified&lt;/p&gt;
</summary><published>2023-11-28T21:17:01-00:00</published><link href="/Posts/TCP-ns" rel="alternate"/><content type="html">&lt;p&gt;The &lt;a href=&quot;https://en.wikipedia.org/wiki/Transmission_Control_Protocol&quot;&gt;Transmission Control Protocol (TCP)&lt;/a&gt; is one of the main Internet protocols. Usually spoken on top of the Internet Protocol (legacy version 4 or version 6), it provides a reliable, ordered, and error-checked stream of octets. When an application uses TCP, they get these properties for free (in contrast to UDP).&lt;/p&gt;
&lt;p&gt;As common for Internet protocols, also TCP is specified in a series of so-called requests for comments (RFC), the latest revised version from August 2022 is &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9293&quot;&gt;RFC 9293&lt;/a&gt;, the initial one was &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc793&quot;&gt;RFC 793&lt;/a&gt; from September 1981.&lt;/p&gt;
&lt;p&gt;As common for Internet protocols, TCP is specified in a series of so-called requests for comments (RFC). The latest revised version from August 2022 is &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc9293&quot;&gt;RFC 9293&lt;/a&gt;; the initial one was &lt;a href=&quot;https://datatracker.ietf.org/doc/html/rfc793&quot;&gt;RFC 793&lt;/a&gt; from September 1981.&lt;/p&gt;
&lt;h1 id=&quot;my-brief-personal-tcp-story&quot;&gt;My brief personal TCP story&lt;/h1&gt;
&lt;p&gt;My interest in TCP started back in 2006 when we worked on a &lt;a href=&quot;https://github.com/dylan-hackers/network-night-vision&quot;&gt;network stack in Dylan&lt;/a&gt; (these days abandoned) - ever since then I wanted to understand the implementation tradeoffs in more detail - including attacks and how to prevent a TCP stack from being vulnerable.&lt;/p&gt;
&lt;p&gt;In 2012 I attended ICFP in Copenhagen - while being a PhD student at ITU Copenhagen - and there &lt;a href=&quot;https://www.cl.cam.ac.uk/~pes20/&quot;&gt;Peter Sewell&lt;/a&gt; gave an invited talk &amp;quot;Tales from the jungle&amp;quot; about rigorous methods for real-world infrastructure (C semantics, hardware (concurrency) behaviour of CPUs, TCP/IP, and likely more). Working on formal specifications myself (&lt;a href=&quot;https://en.itu.dk/-/media/EN/Research/PhD-Programme/PhD-defences/2013/130731-Hannes-Mehnert-PhD-dissertation-finalpdf.pdf&quot;&gt;my dissertation&lt;/a&gt;), and having a strong interest in real systems, I was immediately hooked by his perspective.&lt;/p&gt;
&lt;p&gt;To dive a bit more into &lt;a href=&quot;https://www.cl.cam.ac.uk/~pes20/Netsem/&quot;&gt;network semantics&lt;/a&gt; - the work done on TCP by Peter Sewell et al - is a formal specification (or a model) of TCP/IP and the Unix sockets API developed in HOL4. It is a label transition system with non-deterministic choices, and the model itself is executable. It has been validated with the real world by collecting thousands of traces on Linux, Windows, and FreeBSD - which have been checked by the model for validity - this copes with the different implementations of the English prose of the RFCs. The network semantics research found several issues in existing TCP stack and reported them upstream to have them fixed (though, there still is some special treatment, e.g. for the &amp;quot;BSD listen bug&amp;quot;).&lt;/p&gt;
&lt;p&gt;In 2014 I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3243650&quot;&gt;journal paper&lt;/a&gt; (&lt;a href=&quot;http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf&quot;&gt;author's copy&lt;/a&gt;) published. At the same time the &lt;a href=&quot;https://mirage.io&quot;&gt;MirageOS&lt;/a&gt; melting pot was happening at University of Cambridge, where I contributed OCaml-TLS etc. with David.&lt;/p&gt;
&lt;p&gt;My intention was to understand TCP better, and use the specification as a basis for a TCP stack for MirageOS - the &lt;a href=&quot;https://github.com/mirage/mirage-tcpip&quot;&gt;existing one&lt;/a&gt; (which is still used) has technical debt: a high issue to number of lines ratio, the lwt monad is ubiquitous which makes testing and debugging pretty hard, utilizing multiple cores with OCaml multicore won't be easy, it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me to implement things such as restart-on-failure in &lt;a href=&quot;https://github.com/robur-coop/albatross&quot;&gt;Albatross&lt;/a&gt;, there are certain connection states which will never be exited.&lt;/p&gt;
&lt;p&gt;My interest in TCP started back in 2006 when we worked on a &lt;a href=&quot;https://github.com/dylan-hackers/network-night-vision&quot;&gt;network stack in Dylan&lt;/a&gt; (these days abandoned). Ever since, I wanted to understand the implementation tradeoffs in more detail, including attacks and how to prevent a TCP stack from being vulnerable.&lt;/p&gt;
&lt;p&gt;In 2012, I attended ICFP in Copenhagen while a PhD student at ITU Copenhagen. There, &lt;a href=&quot;https://www.cl.cam.ac.uk/~pes20/&quot;&gt;Peter Sewell&lt;/a&gt; gave an invited talk &amp;quot;Tales from the jungle&amp;quot; about rigorous methods for real-world infrastructure (C semantics, hardware (concurrency) behaviour of CPUs, TCP/IP, and likely more). Working on formal specifications myself in (&lt;a href=&quot;https://en.itu.dk/-/media/EN/Research/PhD-Programme/PhD-defences/2013/130731-Hannes-Mehnert-PhD-dissertation-finalpdf.pdf&quot;&gt;my dissertation&lt;/a&gt;), and having a strong interest in real systems, I was immediately hooked by his perspective.&lt;/p&gt;
&lt;p&gt;To dive a bit more into &lt;a href=&quot;https://www.cl.cam.ac.uk/~pes20/Netsem/&quot;&gt;network semantics&lt;/a&gt;, the work done on TCP by Peter Sewell, et al., is a formal specification (or a model) of TCP/IP and the Unix sockets API developed in HOL4. It is a label transition system with nondeterministic choices, and the model itself is executable. It has been validated with the real world by collecting thousands of traces on Linux, Windows, and FreeBSD, which have been checked by the model for validity. This copes with the different implementations of the English prose of the RFCs. The network semantics research found several issues in existing TCP stacks and reported them upstream to have them fixed (though, there still is some special treatment, e.g., for the &amp;quot;BSD listen bug&amp;quot;).&lt;/p&gt;
&lt;p&gt;In 2014, I joined Peter's research group in Cambridge to continue their work on the model: updating to more recent versions of HOL4 and PolyML, revising the test system to use DTrace, updating to a more recent FreeBSD network stack (from FreeBSD 4.6 to FreeBSD 10), and finally getting the &lt;a href=&quot;https://dl.acm.org/doi/10.1145/3243650&quot;&gt;journal paper&lt;/a&gt; (&lt;a href=&quot;http://www.cl.cam.ac.uk/~pes20/Netsem/paper3.pdf&quot;&gt;author's copy&lt;/a&gt;) published. At the same time, the &lt;a href=&quot;https://mirage.io&quot;&gt;MirageOS&lt;/a&gt; melting pot was happening at University of Cambridge, where I contributed with David OCaml-TLS and other things.&lt;/p&gt;
&lt;p&gt;My intention was to understand TCP better and use the specification as a basis for a TCP stack for MirageOS. The &lt;a href=&quot;https://github.com/mirage/mirage-tcpip&quot;&gt;existing one&lt;/a&gt; (which is still used) has technical debt: a high issue to number of lines ratio. The Lwt monad is ubiquitous, which makes testing and debugging pretty hard, and also utilising multiple cores with OCaml Multicore won't be easy. Plus it has various resource leaks, and there is no active maintainer. But honestly, it works fine on a local network, and with well-behaved traffic. It doesn't work that well on the wild Internet with a variety of broken implementations. Apart from resource leakage, which made me implement things such as restart-on-failure in &lt;a href=&quot;https://github.com/robur-coop/albatross&quot;&gt;Albatross&lt;/a&gt;, there are certain connection states which will never be exited.&lt;/p&gt;
&lt;h1 id=&quot;the-rise-of-µtcp&quot;&gt;The rise of &lt;a href=&quot;https://github.com/robur-coop/utcp&quot;&gt;µTCP&lt;/a&gt;&lt;/h1&gt;
&lt;p&gt;Back in Cambridge I didn't manage to write a TCP stack based on the model, but in 2019 I re-started that work and got µTCP (the formal model manually translated to OCaml) to compile and do TCP session setup and teardown. Since it was a model that uses non-determinism, this couldn't be translated one-to-one into an executable program, but there are places where decisions have to be done. Due to other projects, I worked only briefly in 2021 and 2022 on µTCP, but finally in the summer 2023 I motivated myself to push µTCP into a usable state - so far I spend 25 days in 2023 on µTCP. Thanks to &lt;a href=&quot;https://tarides.com&quot;&gt;Tarides&lt;/a&gt; for supporting my work.&lt;/p&gt;
&lt;p&gt;Since late August we are running some unikernels using µTCP, e.g. the &lt;a href=&quot;https://retreat.mirage.io&quot;&gt;retreat&lt;/a&gt; website. This allows us to observe µTCP and find and solve issues that occur in the real world. It turned out that the model is not always correct (i.e. in the model there is no retransmit timer in the close wait state - which avoids proper session teardowns). We report statistics about how many TCP connections are in which state to an influx time series database and view graphs rendered by grafana. If there are connections that are stuck for multiple hours, this indicates a resource leak that should be addressed. Grafana was tremendously helpful for us to find out where to look for resource leaks. Still, there's work to understand the behaviour, look at what the model does, what µTCP does, what the RFC says, and eventually what existing deployed TCP stacks do.&lt;/p&gt;
&lt;p&gt;Back in Cambridge, I didn't manage to write a TCP stack based on the model, but in 2019, I restarted that work and got µTCP (the formal model manually translated to OCaml) to compile and do TCP session setup and teardown. Since it was a model that uses nondeterminism, this couldn't be translated one-to-one into an executable program, but there are places where decisions have to be made. Due to other projects, I worked only briefly in 2021 and 2022 on µTCP, but finally in the Summer of 2023, I motivated myself to push µTCP into a usable state. So far I've spend 25 days in 2023 on µTCP. Thanks to &lt;a href=&quot;https://tarides.com&quot;&gt;Tarides&lt;/a&gt; for supporting my work.&lt;/p&gt;
&lt;p&gt;Since late August, we have been running some unikernels using µTCP, e.g., the &lt;a href=&quot;https://retreat.mirage.io&quot;&gt;retreat&lt;/a&gt; website. This allows us to observe µTCP and find and solve issues that occur in the real world. It turned out that the model is not always correct (i.e., there is no retransmit timer in the close wait state, which avoids proper session teardowns). We report statistics about how many TCP connections are in which state to an Influx time series database and view graphs rendered by Grafana. If there are connections that are stuck for multiple hours, this indicates a resource leak that should be addressed. Grafana was tremendously helpful to find out where to look for resource leaks. Still, there's work to understand the behaviour, look at what the model does, what µTCP does, what the RFC says, and eventually what existing deployed TCP stacks do.&lt;/p&gt;
&lt;h1 id=&quot;the-secondary-nameserver-issue&quot;&gt;The secondary nameserver issue&lt;/h1&gt;
&lt;p&gt;One of our secondary nameservers attempts to receive zones (via AXFR using TCP) from another nameserver that is currently not running. Thus it replies to each SYN packet a corresponding RST. Below I graphed the network utilization (send data/packets is positive y-axis, receive part on the negative) over time (on the x-axis) on the left and memory usage (bytes on y-axis) over time (x-axis) on the right of our nameserver - you can observe that both increases over time, and roughly every 3 hours the unikernel hits its configured memory limit (64 MB), crashes with out of memory, and is restarted. The graph below is using the mirage-tcpip stack.&lt;/p&gt;
&lt;p&gt;One of our secondary nameservers attempts to receive zones (via AXFR using TCP) from another nameserver that is currently not running. Thus it replies to each SYN packet a corresponding RST. Below I graphed the network utilisation (send data/packets is positive y-axis, receive part on the negative) over time (on the x-axis) on the left and memory usage (bytes on y-axis) over time (x-axis) on the right of our nameserver. You can observe that both increases over time, and roughly every 3 hours, the unikernel hits its configured memory limit (64 MB), crashes with &lt;em&gt;out of memory&lt;/em&gt;, and is restarted. The graph below is using the &lt;code&gt;mirage-tcpip&lt;/code&gt; stack.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/static/img/a.ns.mtcp.png&quot;&gt;&lt;img src=&quot;/static/img/a.ns.mtcp.png&quot; width=&quot;750&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now, after switching over to µTCP, graphed below, there's much fewer network utilization and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. Both graphs contain on their left side a few hours of mirage-tcpip, and shortly after 20:00 on Nov 23rd µTCP got deployed.&lt;/p&gt;
&lt;p&gt;Now, after switching over to µTCP, graphed below, there's much less network utilisation and the memory limit is only reached after 36 hours, which is a great result. Though, still it is not very satisfying that the unikernel leaks memory. On their left side, both graphs contain a few hours of &lt;code&gt;mirage-tcpip&lt;/code&gt;, and shortly after 20:00 on Nov 23rd, µTCP got deployed.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/static/img/a.ns.mtcp-utcp.png&quot;&gt;&lt;img src=&quot;/static/img/a.ns.mtcp-utcp.png&quot; width=&quot;750&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Investigating the involved parts showed that an unestablished TCP connection has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the &lt;a href=&quot;https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml&quot;&gt;TCP.S&lt;/a&gt; module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the &lt;a href=&quot;https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c&quot;&gt;required code changes&lt;/a&gt; was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilization increased enormously.&lt;/p&gt;
&lt;p&gt;Investigating the involved parts showed that an unestablished TCP connection has been registered at the MirageOS layer, but the pure core does not expose an event from the received RST that the connection has been cancelled. This means the MirageOS layer piles up all the connection attempts, and it doesn't inform the application that the connection couldn't be established. Note that the MirageOS layer is not code derived from the formal model, but boilerplate for (a) effectful side-effects (IO) and (b) meeting the needs of the &lt;a href=&quot;https://github.com/mirage/mirage-tcpip/blob/v8.0.0/src/core/tcp.ml&quot;&gt;TCP.S&lt;/a&gt; module type (so that µTCP can be used as a drop-in replacement for mirage-tcpip). Once this was well understood, developing the &lt;a href=&quot;https://github.com/robur-coop/utcp/commit/67fc49468e6b75b96a481ebe44dd11ce4bb76e6c&quot;&gt;required code changes&lt;/a&gt; was straightforward. The graph shows that the fix was deployed at 15:25. The memory usage is constant afterwards, but the network utilisation increased enormously.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/static/img/a.ns.utcp-ev.png&quot;&gt;&lt;img src=&quot;/static/img/a.ns.utcp-ev.png&quot; width=&quot;750&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;Now, the network utilization is unwanted. This was hidden by the application waiting forever while the TCP connection getting established. Our bugfix uncovered another issue, a tight loop:&lt;/p&gt;
&lt;p&gt;Now, the network utilisation is unwanted. This was hidden by the application waiting forever while the TCP connection getting established. Our bug fix uncovered another issue -- a tight loop:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;the nameserver attempts to connect to the other nameserver (&lt;code&gt;request&lt;/code&gt;);
&lt;li&gt;The nameserver attempts to connect to the other nameserver (&lt;code&gt;request&lt;/code&gt;);
&lt;/li&gt;
&lt;li&gt;this results in a &lt;code&gt;TCP.create_connection&lt;/code&gt; which errors after one roundtrip;
&lt;li&gt;This results in a &lt;code&gt;TCP.create_connection&lt;/code&gt; which errors after one roundtrip;
&lt;/li&gt;
&lt;li&gt;this leads to a &lt;code&gt;close&lt;/code&gt;, which attempts a &lt;code&gt;request&lt;/code&gt; again.
&lt;li&gt;This leads to a &lt;code&gt;close&lt;/code&gt;, which attempts a &lt;code&gt;request&lt;/code&gt; again.
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This is unnecessary since the DNS server code has a timer to attempt to connect to the remote nameserver periodically (but takes a break between attempts). After understanding this behaviour, we worked on &lt;a href=&quot;https://github.com/mirage/ocaml-dns/pull/347&quot;&gt;the fix&lt;/a&gt; and re-deployed the nameserver again. The graph has on the left edge the tight loop (so you have a comparison), at 16:05 we deployed the fix - since then it looks pretty smooth, both in memory usage and in network utilization.&lt;/p&gt;
&lt;p&gt;This is unnecessary since the DNS server code has a timer to attempt to connect to the remote nameserver periodically (but takes a break between attempts). After understanding this behaviour, we worked on &lt;a href=&quot;https://github.com/mirage/ocaml-dns/pull/347&quot;&gt;the fix&lt;/a&gt; and redeployed the nameserver again. On the left edge, the has the tight loop (so you have a baseline for comparison), and at 16:05, we deployed the fix. Since then it looks pretty smooth, both in memory usage and in network utilisation.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/static/img/a.ns.utcp-fixed.png&quot;&gt;&lt;img src=&quot;/static/img/a.ns.utcp-fixed.png&quot; width=&quot;750&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;p&gt;To give you the entire picture, below is the graph where you can spot the mirage-tcpip stack (lots of network, restarting every 3 hours), µTCP-without-informing-application (run for 3 * ~36 hours), dns-server-high-network-utilization (which only lasted for a brief period, thus it is more a point in the graph), and finally the unikernel with both fixes applied.&lt;/p&gt;
&lt;p&gt;To give you the entire picture, below is the graph where you can spot the &lt;code&gt;mirage-tcpip&lt;/code&gt; stack (lots of network, restarting every 3 hours), µTCP-without-informing-application (run for 3 * ~36 hours), DNS-server-high-network-utilization (which only lasted for a brief period, thus it is more a point in the graph), and finally the unikernel with both fixes applied.&lt;/p&gt;
&lt;p&gt;&lt;a href=&quot;/static/img/a.ns.all.png&quot;&gt;&lt;img src=&quot;/static/img/a.ns.all.png&quot; width=&quot;750&quot; /&gt;&lt;/a&gt;&lt;/p&gt;
&lt;h1 id=&quot;conclusion&quot;&gt;Conclusion&lt;/h1&gt;
&lt;p&gt;What can we learn from that? Choosing convenient tooling is crucial for effective debugging. Also, fixing one issue may uncover other issues. And of course, the mirage-tcpip was running with the dns-server that had a tight reconnect loop. But, below the line: should such an application lead to memory leaks? I don't think so. My approach is that all core network libraries should work in a non-resource-leaky way with any kind of application on top of it. When one TCP connection returns an error (and thus is destroyed), the TCP stack should have no more resources used for that connection.&lt;/p&gt;
&lt;p&gt;We'll take some more time to investigate issues of µTCP in production, plan to write further documentation and blog posts, and hopefully soon are ready for an initial public release. In the meantime, you can follow our development repository.&lt;/p&gt;
&lt;p&gt;We at &lt;a href=&quot;https://robur.coop&quot;&gt;robur&lt;/a&gt; are working as a collective since 2018, on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continuing our work by &lt;a href=&quot;https://aenderwerk.de/donate/&quot;&gt;donating&lt;/a&gt; (select robur from the drop-down, or put &amp;quot;donation robur&amp;quot; in the purpose of the bank transfer).&lt;/p&gt;
&lt;p&gt;What can we learn from that? Choosing convenient tooling is crucial for effective debugging. Also, fixing one issue may uncover other issues. And of course, the &lt;code&gt;mirage-tcpip&lt;/code&gt; was running with the DNS-server that had a tight reconnect loop. But, below the line: should such an application lead to memory leaks? I don't think so. My approach is that all core network libraries should work in a non-resource-leaky way with any kind of application on top of it. When one TCP connection returns an error (and thus is destroyed), the TCP stack should have no more resources used for that connection.&lt;/p&gt;
&lt;p&gt;We'll take more time to investigate issues of µTCP in production, plan to write further documentation and blog posts, and hopefully soon will be ready for an initial public release. In the meantime, you can follow our development repository.&lt;/p&gt;
&lt;p&gt;We at &lt;a href=&quot;https://robur.coop&quot;&gt;Robur&lt;/a&gt; are working as a collective since 2018 on public funding, commercial contracts, and donations. Our mission is to get sustainable, robust, and secure MirageOS unikernels developed and deployed. Running your own digital communication infrastructure should be easy, including trustworthy binaries and smooth upgrades. You can help us continue our work by &lt;a href=&quot;https://aenderwerk.de/donate/&quot;&gt;donating&lt;/a&gt; (select Robur from the drop-down or put &amp;quot;donation Robur&amp;quot; in the purpose of the bank transfer).&lt;/p&gt;
&lt;p&gt;If you have any questions, reach us best via eMail to team AT robur DOT coop.&lt;/p&gt;
</content><category scheme="https://hannes.robur.coop/tags/tcp" term="tcp"/><category scheme="https://hannes.robur.coop/tags/protocol" term="protocol"/><category scheme="https://hannes.robur.coop/tags/mirageos" term="mirageos"/><id>urn:uuid:96688956-0808-5d44-b795-1d64cbb4f947</id><title type="text">Re-developing TCP from the grounds up</title><updated>2023-11-29T12:57:48-00:00</updated><author><name>hannes</name></author></entry><entry><summary type="html">&lt;p&gt;fleet management for MirageOS unikernels using a mutually authenticated TLS handshake&lt;/p&gt;
</content><category scheme="https://hannes.robur.coop/tags/tcp" term="tcp"/><category scheme="https://hannes.robur.coop/tags/protocol" term="protocol"/><category scheme="https://hannes.robur.coop/tags/mirageos" term="mirageos"/><id>urn:uuid:96688956-0808-5d44-b795-1d64cbb4f947</id><title type="text">Redeveloping TCP from the ground up</title><updated>2023-11-29T13:31:13-00:00</updated><author><name>hannes</name></author></entry><entry><summary type="html">&lt;p&gt;fleet management for MirageOS unikernels using a mutually authenticated TLS handshake&lt;/p&gt;
</summary><published>2022-11-17T12:41:11-00:00</published><link href="/Posts/Albatross" rel="alternate"/><content type="html">&lt;p&gt;EDIT (2023-05-16): Updated with albatross release version 2.0.0.&lt;/p&gt;
&lt;h2 id=&quot;deploying-mirageos-unikernels&quot;&gt;Deploying MirageOS unikernels&lt;/h2&gt;
&lt;p&gt;More than five years ago, I posted &lt;a href=&quot;/Posts/VMM&quot;&gt;how to deploy MirageOS unikernels&lt;/a&gt;. My motivation to work on this topic is that I'm convinced of reduced complexity, improved security, and more sustainable resource footprint of MirageOS unikernels, and want to ease deployment thereof. More than one year ago, I described &lt;a href=&quot;/Posts/Deploy&quot;&gt;how to deploy reproducible unikernels&lt;/a&gt;.&lt;/p&gt;

View file

@ -1,5 +1,5 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Re-developing TCP from the grounds up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Redeveloping TCP from the ground up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
</p></a><a href="/Posts/Albatross" class="list-group-item"><h2 class="list-group-item-heading">Deploying reproducible unikernels with albatross</h2><span class="author">Written by hannes</span> <time>2022-11-17</time><br/><p class="list-group-item-text abstract"><p>fleet management for MirageOS unikernels using a mutually authenticated TLS handshake</p>
</p></a><a href="/Posts/OpamMirror" class="list-group-item"><h2 class="list-group-item-heading">Mirroring the opam repository and all tarballs</h2><span class="author">Written by hannes</span> <time>2022-09-29</time><br/><p class="list-group-item-text abstract"><p>Re-developing an opam cache from scratch, as a MirageOS unikernel</p>
</p></a><a href="/Posts/Monitoring" class="list-group-item"><h2 class="list-group-item-heading">All your metrics belong to influx</h2><span class="author">Written by hannes</span> <time>2022-03-08</time><br/><p class="list-group-item-text abstract"><p>How to monitor your MirageOS unikernel with albatross and monitoring-experiments</p>

View file

@ -1,5 +1,5 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Re-developing TCP from the grounds up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Redeveloping TCP from the ground up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
</p></a><a href="/Posts/Albatross" class="list-group-item"><h2 class="list-group-item-heading">Deploying reproducible unikernels with albatross</h2><span class="author">Written by hannes</span> <time>2022-11-17</time><br/><p class="list-group-item-text abstract"><p>fleet management for MirageOS unikernels using a mutually authenticated TLS handshake</p>
</p></a><a href="/Posts/OpamMirror" class="list-group-item"><h2 class="list-group-item-heading">Mirroring the opam repository and all tarballs</h2><span class="author">Written by hannes</span> <time>2022-09-29</time><br/><p class="list-group-item-text abstract"><p>Re-developing an opam cache from scratch, as a MirageOS unikernel</p>
</p></a><a href="/Posts/Monitoring" class="list-group-item"><h2 class="list-group-item-heading">All your metrics belong to influx</h2><span class="author">Written by hannes</span> <time>2022-03-08</time><br/><p class="list-group-item-text abstract"><p>How to monitor your MirageOS unikernel with albatross and monitoring-experiments</p>

View file

@ -1,5 +1,5 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Re-developing TCP from the grounds up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Redeveloping TCP from the ground up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
</p></a><a href="/Posts/Traceroute" class="list-group-item"><h2 class="list-group-item-heading">Traceroute</h2><span class="author">Written by hannes</span> <time>2020-06-24</time><br/><p class="list-group-item-text abstract"><p>A MirageOS unikernel which traces the path between itself and a remote host.</p>
</p></a><a href="/Posts/DnsServer" class="list-group-item"><h2 class="list-group-item-heading">Deploying authoritative OCaml-DNS servers as MirageOS unikernels</h2><span class="author">Written by hannes</span> <time>2019-12-23</time><br/><p class="list-group-item-text abstract"><p>A tutorial how to deploy authoritative name servers, let's encrypt, and updating entries from unix services.</p>
</p></a><a href="/Posts/DNS" class="list-group-item"><h2 class="list-group-item-heading">My 2018 contains robur and starts with re-engineering DNS</h2><span class="author">Written by hannes</span> <time>2018-01-11</time><br/><p class="list-group-item-text abstract"><p>New year brings new possibilities and a new environment. I've been working on the most Widely deployed key-value store, the domain name system. Primary and secondary name services are available, including dynamic updates, notify, and tsig authentication.</p>

View file

@ -1,3 +1,3 @@
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Re-developing TCP from the grounds up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>full stack engineer</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="full stack engineer" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="flex-container"><div class="list-group listing"><a href="/Posts/TCP-ns" class="list-group-item"><h2 class="list-group-item-heading">Redeveloping TCP from the ground up</h2><span class="author">Written by hannes</span> <time>2023-11-28</time><br/><p class="list-group-item-text abstract"><p>Core Internet protocols require operational experiments, even if formally specified</p>
</p></a></div></div></div></main></body></html>