blog.robur.coop/articles/gptar-update.html
2024-12-17 15:36:32 +00:00

110 lines
6 KiB
HTML

<!doctype html>
<html lang="en">
<head>
<meta charset="utf-8">
<meta http-equiv="x-ua-compatible" content="ie=edge">
<meta name="viewport" content="width=device-width, initial-scale=1">
<title>
Robur's blog - GPTar (update)
</title>
<meta name="description" content="libarchive vs hybrid GUID partition table and GNU tar volume header">
<link type="text/css" rel="stylesheet" href="/css/hl.css">
<link type="text/css" rel="stylesheet" href="/css/style.css">
<script src="/js/hl.js"></script>
<link rel="alternate" type="application/rss+xml" href="/feed.xml" title="blog.robur.coop">
</head>
<body>
<header>
<h1>blog.robur.coop</h1>
<blockquote>
The <strong>Robur</strong> cooperative blog.
</blockquote>
</header>
<main><a href="/index.html">Back to index</a>
<article>
<h1>GPTar (update)</h1>
<ul class="tags-list"><li><a href="/tags.html#tag-OCaml">OCaml</a></li><li><a href="/tags.html#tag-gpt">gpt</a></li><li><a href="/tags.html#tag-tar">tar</a></li><li><a href="/tags.html#tag-mbr">mbr</a></li><li><a href="/tags.html#tag-persistent storage">persistent storage</a></li></ul><p>In a <a href="gptar.html">previous post</a> I describe how I craft a hybrid GUID partition table (GPT) and tar archive by exploiting that there are disjoint areas of a 512 byte <em>block</em> that are important to tar headers and <em>protective</em> master boot records used in GPT respectively.
I recommend reading it first if you haven't already for context.</p>
<p>After writing the above post I read an excellent and fun <em>and totally normal</em> article by Emily on how <a href="https://uni.horse/executable-tarballs.html">she created <strong>executable</strong> tar archives</a>.
Therein I learned a clever hack:
GNU tar has a tar extension for <em>volume headers</em>.
These are essentially labels for your tape archives when you're forced to split an archive across multiple tapes.
They can (seemingly) hold any text as label including shell scripts.
What's more is GNU tar and bsdtar <strong>does not</strong> extract these as files!
This is excellent, because I don't actually want to extract or list the GPT header when using GNU tar or bsdtar.
This prompted me to <a href="https://github.com/reynir/gptar/pull/1">use a different link indicator</a>.</p>
<p>This worked pretty great.
Listing the archive using GNU tar I still get <code>GPTAR</code>, but with verbose listing it's displayed as a <code>--Volume Header--</code>:</p>
<pre><code class="language-shell">$ tar -tvf disk.img
Vr-------- 0/0 16896 1970-01-01 01:00 GPTAR--Volume Header--
-rw-r--r-- 0/0 14 1970-01-01 01:00 test.txt
</code></pre>
<p>And more importantly the <code>GPTAR</code> entry is ignored when extracting:</p>
<pre><code class="language-shell">$ mkdir tmp
$ cd tmp/
$ tar -xf ../disk.img
$ ls
test.txt
</code></pre>
<h2 id="bsd-tar--libarchive"><a class="anchor" aria-hidden="true" href="#bsd-tar--libarchive"></a>BSD tar / libarchive</h2>
<p>Unfortunately, this broke bsdtar!</p>
<pre><code class="language-shell">$ bsdtar -tf disk.img
bsdtar: Damaged tar archive
bsdtar: Error exit delayed from previous errors.
</code></pre>
<p>This is annoying because we run FreeBSD on the host for <a href="https://opam.robur.coop">opam.robur.coop</a>, our instance of <a href="https://git.robur.coop/robur/opam-mirror/">opam-mirror</a>.
This Autumn we updated <a href="https://git.robur.coop/robur/opam-mirror/">opam-mirror</a> to use the hybrid GPT+tar GPTar <em>tartition table</em><sup><a href="#fn-tartition" id="ref-1-fn-tartition" role="doc-noteref" class="fn-label">[1]</a></sup> instead of hard coded or boot parameter specified disk offsets for the different partitions - which was extremely brittle!
So we were no longer able to inspect the contents of the tar partition from the host!
Unacceptable!
So I started to dig into libarchive where bsdtar comes from.
To my surprise, after building bsdtar from the git clone of the source code it ran perfectly fine!</p>
<pre><code class="language-shell">$ ./bsdtar -tf ../gptar/disk.img
test.txt
</code></pre>
<p>I eventually figure out <a href="https://github.com/libarchive/libarchive/pull/2127">this change</a> fixed it for me.
I got in touch with Emily to let her know that bsdtar recently fixed this (ab)use of GNU volume headers.
Her reply was basically &quot;as of when I wrote the article, I was pretty sure bsdtar ignored it.&quot;
And indeed it did.
Examining the diff further revealed that it ignored the GNU volume header - just not &quot;correctly&quot; when the GNU volume header was abused to carry file content as I did:</p>
<pre><code class="language-diff"> /*
* Interpret 'V' GNU tar volume header.
*/
static int
header_volume(struct archive_read *a, struct tar *tar,
struct archive_entry *entry, const void *h, size_t *unconsumed)
{
- (void)h;
+ const struct archive_entry_header_ustar *header;
+ int64_t size, to_consume;
+
+ (void)a; /* UNUSED */
+ (void)tar; /* UNUSED */
+ (void)entry; /* UNUSED */
- /* Just skip this and read the next header. */
- return (tar_read_header(a, tar, entry, unconsumed));
+ header = (const struct archive_entry_header_ustar *)h;
+ size = tar_atol(header-&gt;size, sizeof(header-&gt;size));
+ to_consume = ((size + 511) &amp; ~511);
+ *unconsumed += to_consume;
+ return (ARCHIVE_OK);
}
</code></pre>
<p>So thanks to the above change we can expect a release of libarchive supporting further flavors of abuse of GNU volume headers!
🥳</p>
<section role="doc-endnotes"><ol>
<li id="fn-tartition">
<p>Emily came up with the much better term &quot;tartition table&quot; than what I had come up with - &quot;GPTar&quot;.</p>
<span><a href="#ref-1-fn-tartition" role="doc-backlink" class="fn-label">↩︎︎</a></span></li></ol></section>
</article>
</main>
<footer>
<a href="https://github.com/xhtmlboi/yocaml">Powered by <strong>YOCaml</strong></a>
<br />
</footer>
<script>hljs.highlightAll();</script>
</body>
</html>