Does it make any sense to somehow store an "uncompressed" version of normally-compressed files in the repository?
If so, is there a standard way to implement this? (Perhaps a standard pre-commit hook that uncompresses each such file into a specially-named folder; and a post-checkout hook that compresses such specially-named folders into the compressed files that LibreOffice knows how to read and write? Something like the process described by "Should I decompress zips before I archive?" ?) (Perhaps hacking the code of the version control software to automagically decompress the old version and the new version and storing the diff between the decompressed files, and if that fails or doesn't offer a significant improvement, fall back on the original system of storing the direct diff between the original files, or simply storing the file directly?)
I have a collection of OpenOffice / LibreOffice files that are frequently edited. I am storing them in a version-control repository -- as recommended by "Should images be stored in a git repository?". Although I happen to be using TortoiseHg or SourceTree to access my repositories, rather than git.
I happen to know that Open Office files are actually zip-compressed container with a few XML files inside. (I hear that many other popular application "binary file formats" are also some form of zip-compressed file).
My understanding is that even the smallest change to such "binary" files leads to the entire new file stored in the repository. As opposed to small changes in "text" files, which leads to only the changes being stored and transmitted.
In theory, that would have the advantages of:
- Where the change is only a few words, I could see the exact words that changed in the "diff" view in the change log. (Rather than the non-informative "binary file changed" message).
- When several different people independently edit version 14 of a file, it's much easier to merge all of their improvements into version 16 of the file without regression.
- faster synchronization to the remote repository -- only short "changes" need to be transmitted, rather than the entire (compressed) file.
- possibly smaller repository, in terms of disk space -- after a few hundred changes, I expect a relatively small repository that only contains a few hundred small changes, rather than a relatively large repository that contains a few hundred complete copies of these files. (I list this advantage last, because it is nearly irrelevant in these days of cheap disk space).
It makes sense especially if you need branching and diff'ing.
This old thread summarizes the situation.