Merge pull request 'Add "GPTar update" article' (#22) from gptar-update into main

Reviewed-on: #22
This commit is contained in:
Reynir Björnsson 2024-10-29 11:12:06 +00:00
commit e993307d83

110
articles/gptar-update.md Normal file
View file

@ -0,0 +1,110 @@
---
title: GPTar (update)
date: 2024-10-28
description: libarchive vs hybrid GUID partition table and GNU tar volume header
tags:
- OCaml
- gpt
- tar
- mbr
- persistent storage
author:
name: Reynir Björnsson
email: reynir@reynir.dk
link: https://reyn.ir/
---
In a [previous post][gptar-post] I describe how I craft a hybrid GUID partition table (GPT) and tar archive by exploiting that there are disjoint areas of a 512 byte *block* that are important to tar headers and *protective* master boot records used in GPT respectively.
I recommend reading it first if you haven't already for context.
After writing the above post I read an excellent and fun *and totally normal* article by Emily on how [she created **executable** tar archives][tar-executable].
Therein I learned a clever hack:
GNU tar has a tar extension for *volume headers*.
These are essentially labels for your tape archives when you're forced to split an archive across multiple tapes.
They can (seemingly) hold any text as label including shell scripts.
What's more is GNU tar and bsdtar **does not** extract these as files!
This is excellent, because I don't actually want to extract or list the GPT header when using GNU tar or bsdtar.
This prompted me to [use a different link indicator](https://github.com/reynir/gptar/pull/1).
This worked pretty great.
Listing the archive using GNU tar I still get `GPTAR`, but with verbose listing it's displayed as a `--Volume Header--`:
```shell
$ tar -tvf disk.img
Vr-------- 0/0 16896 1970-01-01 01:00 GPTAR--Volume Header--
-rw-r--r-- 0/0 14 1970-01-01 01:00 test.txt
```
And more importantly the `GPTAR` entry is ignored when extracting:
```shell
$ mkdir tmp
$ cd tmp/
$ tar -xf ../disk.img
$ ls
test.txt
```
## BSD tar / libarchive
Unfortunately, this broke bsdtar!
```shell
$ bsdtar -tf disk.img
bsdtar: Damaged tar archive
bsdtar: Error exit delayed from previous errors.
```
This is annoying because we run FreeBSD on the host for [opam.robur.coop](https://opam.robur.coop), our instance of [opam-mirror][opam-mirror].
This Autumn we updated [opam-mirror][opam-mirror] to use the hybrid GPT+tar GPTar *tartition table*[^tartition] instead of hard coded or boot parameter specified disk offsets for the different partitions - which was extremely brittle!
So we were no longer able to inspect the contents of the tar partition from the host!
Unacceptable!
So I started to dig into libarchive where bsdtar comes from.
To my surprise, after building bsdtar from the git clone of the source code it ran perfectly fine!
```shell
$ ./bsdtar -tf ../gptar/disk.img
test.txt
```
I eventually figure out [this change][libarchive-pr] fixed it for me.
I got in touch with Emily to let her know that bsdtar recently fixed this (ab)use of GNU volume headers.
Her reply was basically "as of when I wrote the article, I was pretty sure bsdtar ignored it."
And indeed it did.
Examining the diff further revealed that it ignored the GNU volume header - just not "correctly" when the GNU volume header was abused to carry file content as I did:
```diff
/*
* Interpret 'V' GNU tar volume header.
*/
static int
header_volume(struct archive_read *a, struct tar *tar,
struct archive_entry *entry, const void *h, size_t *unconsumed)
{
- (void)h;
+ const struct archive_entry_header_ustar *header;
+ int64_t size, to_consume;
+
+ (void)a; /* UNUSED */
+ (void)tar; /* UNUSED */
+ (void)entry; /* UNUSED */
- /* Just skip this and read the next header. */
- return (tar_read_header(a, tar, entry, unconsumed));
+ header = (const struct archive_entry_header_ustar *)h;
+ size = tar_atol(header->size, sizeof(header->size));
+ to_consume = ((size + 511) & ~511);
+ *unconsumed += to_consume;
+ return (ARCHIVE_OK);
}
```
So thanks to the above change we can expect a release of libarchive supporting further flavors of abuse of GNU volume headers!
🥳
[gptar-post]: gptar.html
[tar-executable]: https://uni.horse/executable-tarballs.html
[opam-mirror]: https://git.robur.coop/robur/opam-mirror/
[libarchive-pr]: https://github.com/libarchive/libarchive/pull/2127
[^tartition]: Emily came up with the much better term "tartition table" than what I had come up with - "GPTar".