<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml"><head><title>Counting Bytes</title><meta charset="UTF-8"/><link rel="stylesheet" href="/static/css/style.css"/><link rel="stylesheet" href="/static/css/highlight.css"/><script src="/static/js/highlight.pack.js"></script><script>hljs.initHighlightingOnLoad();</script><link rel="alternate" href="/atom" title="Counting Bytes" type="application/atom+xml"/><meta name="viewport" content="width=device-width, initial-scale=1, viewport-fit=cover"/></head><body><nav class="navbar navbar-default navbar-fixed-top"><div class="container"><div class="navbar-header"><a class="navbar-brand" href="/Posts">full stack engineer</a></div><div class="collapse navbar-collapse collapse"><ul class="nav navbar-nav navbar-right"><li><a href="/About"><span>About</span></a></li><li><a href="/Posts"><span>Posts</span></a></li></ul></div></div></nav><main><div class="flex-container"><div class="post"><h2>Counting Bytes</h2><span class="author">Written by hannes</span><br/><div class="tags">Classified under: <a href="/tags/mirageos" class="tag">mirageos</a><a href="/tags/background" class="tag">background</a></div><span class="date">Published: 2016-06-11 (last updated: 2021-11-19)</span><article><p>I was busy writing code, text, talks, and also spend a week without Internet, where I ground and brewed 15kg espresso.</p>
<h2 id="size-of-a-mirageos-unikernel">Size of a MirageOS unikernel</h2>
<p>There have been lots of claims and myths around the concrete size of MirageOS unikernels.  In this article I'll apply some measurements which overapproximate the binary sizes.  The tools used for the visualisations are available online, and soon hopefully upstreamed into the mirage tool.  This article uses mirage-2.9.0 (which might be outdated at the time of reading).</p>
<p>Let us start with a very minimal unikernel, consisting of a <code>unikernel.ml</code>:</p>
<pre><code class="language-OCaml">module Main (C: V1_LWT.CONSOLE) = struct
  let start c = C.log_s c &quot;hello world&quot;
end
</code></pre>
<p>and the following <code>config.ml</code>:</p>
<pre><code class="language-OCaml">open Mirage

let () =
  register &quot;console&quot; [
    foreign &quot;Unikernel.Main&quot; (console @-&gt; job) $ default_console
  ]
</code></pre>
<p>If we <code>mirage configure --unix</code> and <code>mirage build</code>, we end up (at least on a 64bit FreeBSD-11 system with OCaml 4.02.3) with a 2.8MB <code>main.native</code>, dynamically linked against <code>libthr</code>, <code>libm</code> and <code>libc</code> (<code>ldd</code> ftw), or a 4.5MB Xen virtual image (built on a 64bit Linux computer).</p>
<p>In the <code>_build</code> directory, we can find some object files and their byte sizes:</p>
<pre><code class="language-bash"> 7144 key_gen.o
14568 main.o
 3552 unikernel.o
</code></pre>
<p>These do not sum up to 2.8MB ;)</p>
<p>We did not specify any dependencies ourselves, thus all bits have been injected automatically by the <code>mirage</code> tool.  Let us dig a bit deeper what we actually used.  <code>mirage configure</code> generates a <code>Makefile</code> which includes the dependent OCaml libraries, and the packages which are used:</p>
<pre><code class="language-Makefile">LIBS   = -pkgs functoria.runtime, mirage-clock-unix, mirage-console.unix, mirage-logs, mirage-types.lwt, mirage-unix, mirage.runtime
PKGS   = functoria lwt mirage-clock-unix mirage-console mirage-logs mirage-types mirage-types-lwt mirage-unix
</code></pre>
<p>I explained bits of our configuration DSL <a href="/Posts/Functoria">Functoria</a> earlier.  The <a href="https://github.com/mirage/mirage-clock">mirage-clock</a> device is automatically injected by mirage, providing an implementation of the <code>CLOCK</code> device.  We use a <a href="https://github.com/mirage/mirage-console">mirage-console</a> device, where we print the <code>hello world</code>.  Since <code>mirage-2.9.0</code> the logging library (and its reporter, <a href="https://github.com/mirage/mirage-logs">mirage-logs</a>) is automatically injected as well, which actually uses the clock.  Also, the <a href="https://github.com/mirage/mirage/tree/master/types">mirage type signatures</a> are required.  The <a href="https://github.com/mirage/mirage-platform/tree/master/unix">mirage-unix</a> contains a <code>sleep</code>, a <code>main</code>, and provides the argument vector <code>argv</code> (all symbols in the <code>OS</code> module).</p>
<p>Looking into the archive files of those libraries, we end up with ~92KB (NB <code>mirage-types</code> only contains types, and thus no runtime data):</p>
<pre><code class="language-bash">15268 functoria/functoria-runtime.a
 3194 mirage-clock-unix/mirage-clock.a
12514 mirage-console/mirage_console_unix.a
24532 mirage-logs/mirage_logs.a
14244 mirage-unix/OS.a
21964 mirage/mirage-runtime.a
</code></pre>
<p>This still does not sum up to 2.8MB since we're missing the transitive dependencies.</p>
<h3 id="visualising-recursive-dependencies">Visualising recursive dependencies</h3>
<p>Let's use a different approach: first recursively find all dependencies.  We do this by using <code>ocamlfind</code> to read <code>META</code> files which contain a list of dependent libraries in their <code>requires</code> line.  As input we use <code>LIBS</code> from the Makefile snippet above.  The code (OCaml script) is <a href="https://gist.github.com/hannesm/bcbe54c5759ed5854f05c8f8eaee4c79">available here</a>.  The colour scheme is red for pieces of the OCaml distribution, yellow for input packages, and orange for the dependencies.</p>
<p><a href="/static/img/mirage-console.svg"><img src="/static/img/mirage-console.svg" title="UNIX dependencies of hello world" width="700" /></a></p>
<p>This is the UNIX version only, the Xen version looks similar (but worth mentioning).</p>
<p><a href="/static/img/mirage-console-xen.svg"><img src="/static/img/mirage-console-xen.svg" title="Xen dependencies of hello world" width="700" /></a></p>
<p>You can spot at the right that <code>mirage-bootvar</code> uses <code>re</code>, which provoked me to <a href="https://github.com/mirage/mirage-bootvar-xen/pull/19">open a PR</a>, but Jon Ludlam <a href="https://github.com/mirage/mirage-bootvar-xen/pull/18">already had a nicer PR</a> which is now merged (and a <a href="https://github.com/mirage/mirage-bootvar-xen/pull/20">new release is in preparation</a>).</p>
<h3 id="counting-bytes">Counting bytes</h3>
<p>While a dependency graphs gives a big picture of what the composed libraries of a MirageOS unikernel, we also want to know how many bytes they contribute to the unikernel.  The dependency graph only contains the OCaml-level dependencies, but MirageOS has in addition to that a <code>pkg-config</code> universe of the libraries written in C (such as mini-os, openlibm, ...).</p>
<p>We overapproximate the sizes here by assuming that a linker simply concatenates all required object files.  This is not true, since the sum of all objects is empirically factor two of the actual size of the unikernel.</p>
<p>I developed a pie chart visualisation, but a friend of mine reminded me that such a chart is pretty useless for comparing slices for the human brain.  I spent some more time to develop a treemap visualisation to satisfy the brain.  The implemented algorithm is based on <a href="http://www.win.tue.nl/~vanwijk/stm.pdf">squarified treemaps</a>, but does not use implicit mutable state.  In addition, the <a href="https://gist.github.com/hannesm/c8a9b2e75bb4f98b5100a838ea125f3b">provided script</a> parses common linker flags (<code>-o -L -l</code>) and collects arguments to be linked in.  It can be passed to <code>ocamlopt</code> as the C linker, more instructions at the end of <code>treemap.ml</code> (which should be cleaned up and integrated into the mirage tool, as mentioned earlier).</p>
<p><a href="/static/img/mirage-console-bytes.svg"><img src="/static/img/mirage-console-bytes.svg" title="byte sizes of hello-world (UNIX)" width="700" /></a></p>
<p><a href="/static/img/mirage-console-xen-bytes-full.svg"><img src="/static/img/mirage-console-xen-bytes-full.svg" title="byte sizes of hello-world (Xen)" width="700" /></a></p>
<p>As mentioned above, this is an overapproximation.  The <code>libgcc.a</code> is only needed on Xen (see <a href="https://github.com/mirage/mirage/commit/c17f2f60a6309322ba45cecb00a808f62f05cf82#commitcomment-17573123">this comment</a>), I have not yet tracked down why there is a <code>libasmrun.a</code> and a <code>libxenasmrun.a</code>.</p>
<h3 id="more-complex-examples">More complex examples</h3>
<p>Besides the hello world, I used the same tools on our <a href="http://ownme.ipredator.se">BTC Piñata</a>.</p>
<p><a href="/static/img/pinata-deps.svg"><img src="/static/img/pinata-deps.svg" title="Piñata dependencies" width="700" /></a></p>
<p><a href="/static/img/pinata-bytes.svg"><img src="/static/img/pinata-bytes.svg" title="Piñata byte sizes" width="700" /></a></p>
<h3 id="conclusion">Conclusion</h3>
<p>OCaml does not yet do dead code elimination, but there <a href="https://github.com/ocaml/ocaml/pull/608">is a PR</a> based on the flambda middle-end which does so.  I haven't yet investigated numbers using that branch.</p>
<p>Those counting statistics could go into more detail (e.g. using <code>nm</code> to count the sizes of concrete symbols - which opens the possibility to see which symbols are present in the objects, but not in the final binary).  Also, collecting the numbers for each module in a library would be great to have.  In the end, it would be great to easily spot the source fragments which are responsible for a huge binary size (and getting rid of them).</p>
<p>I'm interested in feedback, either via
<a href="https://twitter.com/h4nnes">twitter</a> or via eMail.</p>
</article></div></div></main></body></html>