From c24756463813c87f8705c03cd9f93a34597106b2 Mon Sep 17 00:00:00 2001 From: The Robur Team Date: Wed, 18 Dec 2024 11:32:31 +0000 Subject: [PATCH] Pushed by YOCaml 2 from ec0dec16ef37517b8e979c093d7f2edeeed07482-dirty --- articles/2024-10-29-ptt.html | 206 ++++++++++++++++++++++++ articles/2024-12-04-github-sponsor.html | 62 +++++++ articles/gptar-update.html | 110 +++++++++++++ articles/miragevpn-testing.html | 67 ++++++++ 4 files changed, 445 insertions(+) create mode 100644 articles/2024-10-29-ptt.html create mode 100644 articles/2024-12-04-github-sponsor.html create mode 100644 articles/gptar-update.html create mode 100644 articles/miragevpn-testing.html diff --git a/articles/2024-10-29-ptt.html b/articles/2024-10-29-ptt.html new file mode 100644 index 0000000..0c525a6 --- /dev/null +++ b/articles/2024-10-29-ptt.html @@ -0,0 +1,206 @@ + + + + + + + + Robur's blog - Postes, télégraphes et téléphones, next steps + + + + + + + + +
+

blog.robur.coop

+
+ The Robur cooperative blog. +
+
+
Back to index + +
+

Postes, télégraphes et téléphones, next steps

+

As you know from our article on Robur's +finances, we've just received +funding for our email project. This project +started when I was doing my internship in Cambridge and it's great to see that +it's been able to evolve over time and remain functional. This article will +introduce you to the latest changes to our PTT +project and how far we've got towards providing +an OCaml mailing list service.

+

A Git repository or a simple block device as a database?

+

One issue that came up quickly in our latest experiments with our SMTP stack was +the database of users with an email address. Since we had decided to ‘break +down’ the various stages of an email submission to offer simple unikernels, we +ended up having to deploy 4 unikernels to have a service that worked.

+
    +
  • a unikernel for authentication
  • +
  • a unikernel DKIM-signing the incoming email
  • +
  • one unikernel as primary DNS server
  • +
  • one unikernel sending the signed email to its real destination
  • +
+

And we're only talking here about the submission of an email, the reception +concerns another ‘pipe’.

+

The problem with such an architecture is that some unikernels need to have the +same data: the users. In this case, the first unikernel needs to know the user's +password in order to verify authentication. The final unikernel needs to know +the real destinations of the users.

+

Let's take the example of two users: foo@robur.coop and bar@robur.coop. The +first points to hannes@foo.org and the second to reynir@example.com.

+

If Hannes wants to send a message to bar@robur.coop under the identity of +foo@robur.coop, he will need to authenticate himself to our first unikernel. +This first unikernel must therefore:

+
    +
  1. check that the user foo exists
  2. +
  3. the hashed password used by Hannes is the same as the one in the database
  4. +
+

Next, the email will be signed by our second unikernel. It will then forward the +email to the last unikernel, which will do the actual translation of the +recipients and DNS resolution. In other words:

+
    +
  1. it will see that one (the only) recipient is bar@robur.coop
  2. +
  3. check that bar@robur.coop exists and obtain its real address
  4. +
  5. it will obtain reynir@example.com and perform DNS resolution on +example.com to find out the email server for this domain
  6. +
  7. finally send the email signed by foo@robur.coop to reynir@example.com!
  8. +
+

So the first and last unikernels need to have the same information about our +users. One for the passwords, the second for the real email addresses.

+

But as you know, we're talking about unikernels that exist independently of each +other. What's more, they can't share files and the possibility of them sharing +block-devices remains an open question (and a complex one where parallel access +may be involved). In short, the only way to ‘synchronise’ these unikernels in +relation to common data is with a Git repository.

+

Git has the advantage of being widely used for our unikernels +(primary-git, pasteur, unipi and +contruno). The advantage is that you can track changes, modify +files and notify the unikernel to update itself (using nsupdate, a simple ping +or an http request to the unikernel).

+

The problem is that this requires certain skills. Even if it's ‘simple’ to set +up a Git server and then deploy our unikernels, we can restructure our +architecture and simplify the deployment of an SMTP stack!

+

Elit and OneFFS

+

We have therefore decided to merge the email exchange service and email +submission into a unikernel so that this is the only user information requester.

+

So we decided to use OneFFS as the file system for our database, +which will be a plain JSON file. This is perhaps one of the advantages of +MirageOS, which is that you can decide exactly what you need to implement +specific objectives.

+

In this case, those with experience of Postfix, LDAP or MariaDB could confirm +that configuring an email service should be ‘simpler’ than implementing a +multitude of pipes between different applications and authentication methods.

+

The JSON file is therefore very simple and so is the creation of an OneFFS +image:

+
$ cat >database.json<<EOF
+> [ { "name": "din"
+>   , "password": "xxxxxx"
+>   , "mailboxes": [ "romain.calascibetta@gmail.com" ] } ]
+> EOF
+$ opam install oneffs
+$ oneffs create -i database.json -o database.img
+
+

All you have to do is register this image as a block with albatross and launch +our Elit unikernel with this block-device.

+
$ albatross-client create-block --data=database.img database 1024
+$ albatross-client create --net=service:br0 --block=database:database \
+    elit elit.hvt \
+    --arg=...
+
+

At this stage, and if we add our unikernel signing incoming emails, we have more +or less the same thing as what I've described in my previous articles on +deploying an email service.

+

Multiplex receiving & sending emails

+

The PTT project is a toolkit for implementing SMTP servers. It gives developers +the choice of implementing their logic as they see fit:

+
    +
  • sign an email
  • +
  • resolve destinations according to a database
  • +
  • check SPF information
  • +
  • annotate the email as spam or not
  • +
  • etc.
  • +
+

Previously, PTT was split into 2 parts:

+
    +
  1. management of incoming clients/emails
  2. +
  3. the logic to be applied to incoming emails and their delivery
  4. +
+

The second point was becoming increasingly complex, however, and errors in +sending emails are legion (DMARC non-alignment, the email is too big for the +destination, the destination doesn't exist, etc.). All the more so since, up to +now, PTT could only report these errors via the logs...

+

Hannes immediately mentioned the possibility of separating the logic of the +unikernel from the delivery. This will allow us to deal with temporary failures +(greylisting) as well. So a fundamental change was made:

+
    +
  • improve the sendmail and sendmail-lwt packages (as well as proposing +sendmail-miou!) when sending or submitting an email
  • +
  • improve PTT so that there are now 3 distinct jobs: receiving, what to do with +incoming emails and sending emails
  • +
+

SMTP

+

This finally allows us to describe a clearer error management policy that is +independent of what we want to do with incoming emails. At this stage, we can +look for the Return-Path in emails that we haven't managed to send and notify +the senders!

+

All this is still in the experimental stage and practical cases are needed to +observe how we should handle errors and how others do.

+

Insights & Next goals

+

We're already starting to have a bit of fun with email and we can start sending +and receiving emails right away.

+

We're also already seeing hacking attempts on our unikernel:

+
    +
  • people trying to authenticate themselves without STARTTLS (or with it, +depending on how clever the bot is)
  • +
  • people trying to send emails as non-existent users in our database
  • +
  • we're also seeing content that has nothing to do with SMTP
  • +
+

Above all, this shows that, very early on, bots try to usurp the identity linked +to your server (in our case, osau.re) in order to send spam, authenticate +themselves or simply send ‘stuff’ and observe what happens. In this case, for +all the cases mentioned, Elit (and PTT) reacts well: in other words, it simply +cuts off the connection.

+

We were also able to observe how services such as gmail work. In addition, for +the purposes of a mailing list, email forwarding distorts DMARC verification +(specifically, SPF verification). The case is very simple:

+

foo@gmail.com tries to reply to robur@osau.re. robur@osau.re is a mailing list +to several addresses (one of them is bar@gmail.com). The unikernel will receive +the email and send it to bar@gmail.com. The problem is the alignment between +the From field (which corresponds to foo@gmail.com) and our osau.re server. +From gmail.com's point of view, there is a misalignment between these two +pieces of information and it therefore refuses to receive the email.

+

This is where our next objectives come in:

+
    +
  • finish our DMARC implementation
  • +
  • implement ARC so that our server notifies us that, on our side, the DMARC +check went well and that gmail.com should trust us on this.
  • +
+

There is another way of solving the problem, perhaps a little more problematic, +modify the incoming email and in particular the From field. Although this +could be done quite simply with mrmime, it's better to concentrate on +DMARC and ARC so that we can send our emails as they are and never alter them +(especially as this will invalidate previous DKIM signatures!).

+

Conclusion

+

It's always satisfying to see your projects working ‘more or less’ correctly. +This article will surely be the start of a series on the intricacies of email +and the difficulty of deploying such a service at home.

+

We hope that this NLnet-funded work will enable us to replace our current email +system with unikernels. We're already past the stage where we can, more or less +(without DMARC checking), send emails to each other, which is a big step!

+

So follow our work on our blog and if you like what we're producing (which +involves a whole bunch of protocols and formats - much more than just SMTP), you +can make a donation here!

+ +
+ +
+ + + + diff --git a/articles/2024-12-04-github-sponsor.html b/articles/2024-12-04-github-sponsor.html new file mode 100644 index 0000000..dc1baac --- /dev/null +++ b/articles/2024-12-04-github-sponsor.html @@ -0,0 +1,62 @@ + + + + + + + + Robur's blog - Sponsor us via GitHub + + + + + + + + +
+

blog.robur.coop

+
+ The Robur cooperative blog. +
+
+
Back to index + +
+

Sponsor us via GitHub

+

We're delighted to announce the possibility of helping our cooperative through +the GitHub Sponsors platform. The link is available here:

+

https://github.com/sponsors/robur-coop

+

We would also like to reiterate the possibility of making a donation[1] to our +cooperative via the IBAN of Änderwerk available here (if you need +a tax-deductible donation receipt, please use this form).

+
Account holder: Änderwerk gGmbH
+Subject: robur
+IBAN: DE46 4306 0967 1289 8604 00
+BIC: GENODEM1GLS
+Bank: GLS Gemeinschaftsbank, Christstrasse 9, 44789 Bochum, Germany
+
+

More generally, you can refer to our article which explains our +funding since the creation of Robur and we would like to point out that, +despite our funding, part of our work remains unfunded: in particular with +regard to the maintenance of certain software as well as certain services made +available to our users.

+

We would therefore be delighted if users of our software and services could +finance our work according to their means. GitHub in particular offers an +easy-to-use platform for funding us (even if, in all transparency, it takes a +certain amount from each transaction).

+
    +
  1. +

    In fact, this method is preferable to us as this means it will go directly to us instead of through GitHub and Stripe who will take a small cut of the donation in fees.

    +↩︎︎
+ +
+ +
+ + + + diff --git a/articles/gptar-update.html b/articles/gptar-update.html new file mode 100644 index 0000000..8bde6d8 --- /dev/null +++ b/articles/gptar-update.html @@ -0,0 +1,110 @@ + + + + + + + + Robur's blog - GPTar (update) + + + + + + + + +
+

blog.robur.coop

+
+ The Robur cooperative blog. +
+
+
Back to index + +
+

GPTar (update)

+

In a previous post I describe how I craft a hybrid GUID partition table (GPT) and tar archive by exploiting that there are disjoint areas of a 512 byte block that are important to tar headers and protective master boot records used in GPT respectively. +I recommend reading it first if you haven't already for context.

+

After writing the above post I read an excellent and fun and totally normal article by Emily on how she created executable tar archives. +Therein I learned a clever hack: +GNU tar has a tar extension for volume headers. +These are essentially labels for your tape archives when you're forced to split an archive across multiple tapes. +They can (seemingly) hold any text as label including shell scripts. +What's more is GNU tar and bsdtar does not extract these as files! +This is excellent, because I don't actually want to extract or list the GPT header when using GNU tar or bsdtar. +This prompted me to use a different link indicator.

+

This worked pretty great. +Listing the archive using GNU tar I still get GPTAR, but with verbose listing it's displayed as a --Volume Header--:

+
$ tar -tvf disk.img
+Vr-------- 0/0           16896 1970-01-01 01:00 GPTAR--Volume Header--
+-rw-r--r-- 0/0              14 1970-01-01 01:00 test.txt
+
+

And more importantly the GPTAR entry is ignored when extracting:

+
$ mkdir tmp
+$ cd tmp/
+$ tar -xf ../disk.img
+$ ls
+test.txt
+
+

BSD tar / libarchive

+

Unfortunately, this broke bsdtar!

+
$ bsdtar -tf disk.img
+bsdtar: Damaged tar archive
+bsdtar: Error exit delayed from previous errors.
+
+

This is annoying because we run FreeBSD on the host for opam.robur.coop, our instance of opam-mirror. +This Autumn we updated opam-mirror to use the hybrid GPT+tar GPTar tartition table[1] instead of hard coded or boot parameter specified disk offsets for the different partitions - which was extremely brittle! +So we were no longer able to inspect the contents of the tar partition from the host! +Unacceptable! +So I started to dig into libarchive where bsdtar comes from. +To my surprise, after building bsdtar from the git clone of the source code it ran perfectly fine!

+
$ ./bsdtar -tf ../gptar/disk.img
+test.txt
+
+

I eventually figure out this change fixed it for me. +I got in touch with Emily to let her know that bsdtar recently fixed this (ab)use of GNU volume headers. +Her reply was basically "as of when I wrote the article, I was pretty sure bsdtar ignored it." +And indeed it did. +Examining the diff further revealed that it ignored the GNU volume header - just not "correctly" when the GNU volume header was abused to carry file content as I did:

+
 /*
+  * Interpret 'V' GNU tar volume header.
+  */
+ static int
+ header_volume(struct archive_read *a, struct tar *tar,
+     struct archive_entry *entry, const void *h, size_t *unconsumed)
+ {
+-       (void)h;
++       const struct archive_entry_header_ustar *header;
++       int64_t size, to_consume;
++
++       (void)a; /* UNUSED */
++       (void)tar; /* UNUSED */
++       (void)entry; /* UNUSED */
+
+-       /* Just skip this and read the next header. */
+-       return (tar_read_header(a, tar, entry, unconsumed));
++       header = (const struct archive_entry_header_ustar *)h;
++       size = tar_atol(header->size, sizeof(header->size));
++       to_consume = ((size + 511) & ~511);
++       *unconsumed += to_consume;
++       return (ARCHIVE_OK);
+ }
+
+

So thanks to the above change we can expect a release of libarchive supporting further flavors of abuse of GNU volume headers! +🥳

+
    +
  1. +

    Emily came up with the much better term "tartition table" than what I had come up with - "GPTar".

    +↩︎︎
+ +
+ +
+ + + + diff --git a/articles/miragevpn-testing.html b/articles/miragevpn-testing.html new file mode 100644 index 0000000..a63961a --- /dev/null +++ b/articles/miragevpn-testing.html @@ -0,0 +1,67 @@ + + + + + + + + Robur's blog - Testing MirageVPN against OpenVPN™ + + + + + + + + +
+

blog.robur.coop

+
+ The Robur cooperative blog. +
+
+
Back to index + +
+

Testing MirageVPN against OpenVPN™

+

As our last milestone for the EU NGI Assure funded MirageVPN project (for now) we have been working on testing MirageVPN, our OpenVPN™-compatible VPN implementation against the upstream OpenVPN™. +During the development we have conducted many manual tests. +However, this scales poorly and it is easy to forget testing certain cases. +Therefore, we designed and implemented interoperability testing, driving the C implementation on the one side, and our OCaml implementation on the other side. The input for such a test is a configuration file that both implementations can use. +Thus we test establishment of the tunnel as well as the tunnel itself.

+

While conducting the tests, our instrumented binaries expose code coverage information. We use that to guide ourselves which other configurations are worth testing. Our goal is to achieve a high code coverage rate while using a small amount of different configurations. These interoperability tests are running fast enough, so they are executed on each commit by CI.

+

A nice property of this test setup is that it runs with an unmodified OpenVPN binary. +This means we can use an off-the-shelf OpenVPN binary from the package repository and does not entail further maintenance of an OpenVPN fork. +Testing against a future version of OpenVPN becomes trivial. +We do not just test a single part of our implementation but achieve an end-to-end test. +The same configuration files are used for both our implementation and the C implementation, and each configuration is used twice, once our implementation acts as the client, once as the server.

+

We added a flag to our client and our recently finished server applications, --test, which make them to exit once a tunnel is established and an ICMP echo request from the client has been replied to by the server. +Our client and server can be run without a tun device which otherwise would require elevated privileges. +Unfortunately, OpenVPN requires privileges to at least configure a tun device. +Our MirageVPN implementation does IP packet parsing in userspace. +We test our protocol implementation, not the entire unikernel - but the unikernel code is a tiny layer on top of the purely functional protocol implementation.

+

We explored unit testing the packet decoding and decryption with our implementation and the C implementation. +Specifically, we encountered a packet whose message authentication code (MAC) was deemed invalid by the C implementation. +It helped us discover the MAC computation was correct but the packet encoding was truncated - both implementations agreed that the MAC was bad. +The test was very tedious to write and would not easily scale to cover a large portion of the code. +If of interest, take a look into our modifications to OpenVPN and modifications to MirageVPN.

+

The end-to-end testing is in addition to our unit tests and fuzz testing; and to our benchmarking binary.

+

Our results are that with 4 configurations we achieve above 75% code coverage in MirageVPN. +While investigating the code coverage results, we found various pieces of code that were never executed, and we were able to remove them. +Code that does not exist is bug-free :D +With these tests in place future maintenance is less daunting as they will help us guard us from breaking the code.

+

At the moment we do not exercise the error paths very well in the code. +This is much less straightforward to test in this manner, and is important future work. +We plan to develop a client and server that injects faults at various stages of the protocol to test these error paths. +OpenVPN built with debugging enabled also comes with a --gremlin mode that injects faults, and would be interesting to investigate.

+ +
+ +
+ + + +