From 575cc7f0951625af50fc347592037462994ecf5d Mon Sep 17 00:00:00 2001 From: Calascibetta Romain Date: Mon, 13 Jan 2025 18:10:16 +0100 Subject: [PATCH] Add a paragraph about patches and the limit on them --- articles/2025-01-07-carton-and-cachet.md | 39 ++++++++++++++++++++++++ 1 file changed, 39 insertions(+) diff --git a/articles/2025-01-07-carton-and-cachet.md b/articles/2025-01-07-carton-and-cachet.md index 3af96b7..acc0fad 100644 --- a/articles/2025-01-07-carton-and-cachet.md +++ b/articles/2025-01-07-carton-and-cachet.md @@ -328,6 +328,45 @@ also the content of another GitHub notification email. This shows that Carton is well on its way to finding the best candidate for the patch, which should be similar content, moreover, another GitHub notification. +The idea is to sacrifice a little computing time (in the reconstruction of +objects via their patches) to gain in compression ratio. It's fair to say that +a very long patch chain can degrade performance. However, there is a limit in +Git and Carton: a chain can't be longer than 50. Another point is the search for +the candidate source for the patch, which is often physically close to the patch +(within a few bytes): reading the PACK file by page (thanks to [Cachet][cachet]) +sometimes gives access to 3 or 4 objects, which have a certain chance of being +patched together. + +Let's take the example of Carton and a Git object: + +```shell +$ carton get pack-*.idx eaafd737886011ebc28e6208e03767860c22e77d +... +cache misses: 62 +cache hits: 758 +tree: 160720bb + Δ 160ae4bc + Δ 160ae506 + Δ 160ae575 + Δ 160ae5be + Δ 160ae5fc + Δ 160ae62f + Δ 160ae667 + Δ 160ae6a5 + Δ 160ae6db + Δ 160ae72a + Δ 160ae766 + Δ 160ae799 + Δ 160ae81e + Δ 160ae858 + Δ 16289943 +``` + +We can see here that we had to load 62 pages, but that we also reused the pages +we'd already read 758 times. We can also see that the offset of the patches +(which can be seen in Tree) is always close (the objects often follow each +other). + ### Mbox and real emails In a way, the concrete cases we use here are my emails. There may be a fairly