blog.robur.coop/articles/gptar-update.md

4.2 KiB

title date description tags author
GPTar (update) 2024-10-28 libarchive vs hybrid GUID partition table and GNU tar volume header
OCaml
gpt
tar
mbr
persistent storage
name email link
Reynir Björnsson reynir@reynir.dk https://reyn.ir/

In a previous post I describe how I craft a hybrid GUID partition table (GPT) and tar archive by exploiting that there are disjoint areas of a 512 byte block that are important to tar headers and protective master boot records used in GPT respectively. I recommend reading it first if you haven't already for context.

After writing the above post I read an excellent and fun and totally normal article by Emily on how she created executable tar archives. Therein I learned a clever hack: GNU tar has a tar extension for volume headers. These are essentially labels for your tape archives when you're forced to split an archive across multiple tapes. They can (seemingly) hold any text as label including shell scripts. What's more is GNU tar and bsdtar does not extract these as files! This is excellent, because I don't actually want to extract or list the GPT header when using GNU tar or bsdtar. This prompted me to use a different link indicator.

This worked pretty great. Listing the archive using GNU tar I still get GPTAR, but with verbose listing it's displayed as a --Volume Header--:

$ tar -tvf disk.img
Vr-------- 0/0           16896 1970-01-01 01:00 GPTAR--Volume Header--
-rw-r--r-- 0/0              14 1970-01-01 01:00 test.txt

And more importantly the GPTAR entry is ignored when extracting:

$ mkdir tmp
$ cd tmp/
$ tar -xf ../disk.img
$ ls
test.txt

BSD tar / libarchive

Unfortunately, this broke bsdtar!

$ bsdtar -tf disk.img
bsdtar: Damaged tar archive
bsdtar: Error exit delayed from previous errors.

This is annoying because we run FreeBSD on the host for opam.robur.coop, our instance of opam-mirror. This Autumn we updated opam-mirror to use the hybrid GPT+tar GPTar tartition table1 instead of hard coded or boot parameter specified disk offsets for the different partitions - which was extremely brittle! So we were no longer able to inspect the contents of the tar partition from the host! Unacceptable! So I started to dig into libarchive where bsdtar comes from. To my surprise, after building bsdtar from the git clone of the source code it ran perfectly fine!

$ ./bsdtar -tf ../gptar/disk.img
test.txt

I eventually figure out this change fixed it for me. I got in touch with Emily to let her know that bsdtar recently fixed this (ab)use of GNU volume headers. Her reply was basically "as of when I wrote the article, I was pretty sure bsdtar ignored it." And indeed it did. Examining the diff further revealed that it ignored the GNU volume header - just not "correctly" when the GNU volume header was abused to carry file content as I did:

 /*
  * Interpret 'V' GNU tar volume header.
  */
 static int
 header_volume(struct archive_read *a, struct tar *tar,
     struct archive_entry *entry, const void *h, size_t *unconsumed)
 {
-       (void)h;
+       const struct archive_entry_header_ustar *header;
+       int64_t size, to_consume;
+
+       (void)a; /* UNUSED */
+       (void)tar; /* UNUSED */
+       (void)entry; /* UNUSED */

-       /* Just skip this and read the next header. */
-       return (tar_read_header(a, tar, entry, unconsumed));
+       header = (const struct archive_entry_header_ustar *)h;
+       size = tar_atol(header->size, sizeof(header->size));
+       to_consume = ((size + 511) & ~511);
+       *unconsumed += to_consume;
+       return (ARCHIVE_OK);
 }

So thanks to the above change we can expect a release of libarchive supporting further flavors of abuse of GNU volume headers! 🥳


  1. Emily came up with the much better term "tartition table" than what I had come up with - "GPTar". ↩︎