Add a paragraph about patches and the limit on them

This commit is contained in:
Calascibetta Romain 2025-01-13 18:10:16 +01:00
parent 04e71b2485
commit 575cc7f095

View file

@ -328,6 +328,45 @@ also the content of another GitHub notification email. This shows that Carton
is well on its way to finding the best candidate for the patch, which should be
similar content, moreover, another GitHub notification.
The idea is to sacrifice a little computing time (in the reconstruction of
objects via their patches) to gain in compression ratio. It's fair to say that
a very long patch chain can degrade performance. However, there is a limit in
Git and Carton: a chain can't be longer than 50. Another point is the search for
the candidate source for the patch, which is often physically close to the patch
(within a few bytes): reading the PACK file by page (thanks to [Cachet][cachet])
sometimes gives access to 3 or 4 objects, which have a certain chance of being
patched together.
Let's take the example of Carton and a Git object:
```shell
$ carton get pack-*.idx eaafd737886011ebc28e6208e03767860c22e77d
...
cache misses: 62
cache hits: 758
tree: 160720bb
Δ 160ae4bc
Δ 160ae506
Δ 160ae575
Δ 160ae5be
Δ 160ae5fc
Δ 160ae62f
Δ 160ae667
Δ 160ae6a5
Δ 160ae6db
Δ 160ae72a
Δ 160ae766
Δ 160ae799
Δ 160ae81e
Δ 160ae858
Δ 16289943
```
We can see here that we had to load 62 pages, but that we also reused the pages
we'd already read 758 times. We can also see that the offset of the patches
(which can be seen in Tree) is always close (the objects often follow each
other).
### Mbox and real emails
In a way, the concrete cases we use here are my emails. There may be a fairly