diff --git a/articles/2025-01-07-carton-and-cachet.md b/articles/2025-01-07-carton-and-cachet.md index 432b67b..3af96b7 100644 --- a/articles/2025-01-07-carton-and-cachet.md +++ b/articles/2025-01-07-carton-and-cachet.md @@ -28,8 +28,9 @@ Furthermore, if we have 2 blobs (2 versions of a file), one of which contains 'A' and the other contains 'A+B', the second blob will probably be saved in the form of a patch requiring the contents of the first blob and adding '+B'. At a higher level and according to our use of Git, we understand that this second -level of compression is very interesting: we generally just add/remove new -keywords in our files of our project. +level of compression is very interesting: we generally just add/remove few lines +(like introduce a new function) or delete some (removing code) in our files of +our project. However, there is a bias in what Git does and what we perceive. We often think that when it comes to patching in the case of Git, we think of the @@ -39,14 +40,14 @@ been added or deleted between two files, they are not necessarily optimal for producing a _small_ patch. In reality, what interests us in the case of the storage and transmission of -these patches over the network is not a form of lisibility in these patches but +these patches over the network is not a form of readability in these patches but an optimality in what can be considered as common between two files and what is not. It is at this stage that the use of [duff][duff] is introduced. This is a small library which can generate a patch between two files according to the series of bytes common to both files. We're talking about 'series of bytes' here because these elements common to our two files are not necessary -humanly readable. To find these series of common bytes, we use [Rabin's +human readable. To find these series of common bytes, we use [Rabin's fingerprint][rabin] algorithm: [a rolling hash][rolling-hash] used since time immemorial. @@ -110,7 +111,7 @@ using a method (such as compression). Isomorphism ensures that we can 'undo' this method and obtain exactly the same result again: ``` - x == decode(encode(x)) + decode(encode(x)) == x ``` This property is very important for emails because signatures exist in your @@ -392,8 +393,8 @@ to move a little faster, especially in terms of serializing our emails. Our experience with ocaml-git also enabled us to identify the benefits of the PACK file for emails. -But the development of Miou was particularly helpful in satisfying us in terms -of program execution time, thanks to the ability to parallelize certain +But the development of [Miou][miou] was particularly helpful in satisfying us in +terms of program execution time, thanks to the ability to parallelize certain calculations quite easily. The format is still a little rough and not quite ready for the development of a @@ -419,3 +420,4 @@ So, if you like what we're doing and want to help, you can make a donation via [mbox]: https://en.wikipedia.org/wiki/Mbox [donate-github]: https://github.com/sponsors/robur-coop [donate-iban]: https://robur.coop/Donate +[miou]: https://github.com/robur-coop/miou