Add a paragraph about patches and the limit on them
This commit is contained in:
parent
04e71b2485
commit
575cc7f095
1 changed files with 39 additions and 0 deletions
|
@ -328,6 +328,45 @@ also the content of another GitHub notification email. This shows that Carton
|
|||
is well on its way to finding the best candidate for the patch, which should be
|
||||
similar content, moreover, another GitHub notification.
|
||||
|
||||
The idea is to sacrifice a little computing time (in the reconstruction of
|
||||
objects via their patches) to gain in compression ratio. It's fair to say that
|
||||
a very long patch chain can degrade performance. However, there is a limit in
|
||||
Git and Carton: a chain can't be longer than 50. Another point is the search for
|
||||
the candidate source for the patch, which is often physically close to the patch
|
||||
(within a few bytes): reading the PACK file by page (thanks to [Cachet][cachet])
|
||||
sometimes gives access to 3 or 4 objects, which have a certain chance of being
|
||||
patched together.
|
||||
|
||||
Let's take the example of Carton and a Git object:
|
||||
|
||||
```shell
|
||||
$ carton get pack-*.idx eaafd737886011ebc28e6208e03767860c22e77d
|
||||
...
|
||||
cache misses: 62
|
||||
cache hits: 758
|
||||
tree: 160720bb
|
||||
Δ 160ae4bc
|
||||
Δ 160ae506
|
||||
Δ 160ae575
|
||||
Δ 160ae5be
|
||||
Δ 160ae5fc
|
||||
Δ 160ae62f
|
||||
Δ 160ae667
|
||||
Δ 160ae6a5
|
||||
Δ 160ae6db
|
||||
Δ 160ae72a
|
||||
Δ 160ae766
|
||||
Δ 160ae799
|
||||
Δ 160ae81e
|
||||
Δ 160ae858
|
||||
Δ 16289943
|
||||
```
|
||||
|
||||
We can see here that we had to load 62 pages, but that we also reused the pages
|
||||
we'd already read 758 times. We can also see that the offset of the patches
|
||||
(which can be seen in Tree) is always close (the objects often follow each
|
||||
other).
|
||||
|
||||
### Mbox and real emails
|
||||
|
||||
In a way, the concrete cases we use here are my emails. There may be a fairly
|
||||
|
|
Loading…
Reference in a new issue