Is there a compression tool that will let you output its dictionary (or similar) separate from the compressed output such that the dictionary can be re-used on a subsequent compression? The idea would be to transfer the dictionary one time, or use a reference dictionary at a remote site, and make a compressed file even smaller to transfer.
I've looked at the docs for a bunch of common compression tools, and I can't really find one that supports this. But most common compression tools aren't straight dictionary compression.
Usage I imagined is:
compress_tool --dictionary compressed.dict -o compressed.data uncompressed
decompress_tool --dictionary compressed.dict -o uncompressed compressed.data
To expand on my use case, I have a binary 500MB file F I want to copy over a slow network. Compressing the file alone yields a size of 200MB, which is still larger than I'd like. However, both my source and destination have a file F' which is very similar to F, but sufficiently different that binary diff tools don't work well. I was thinking that if I compress F' on both sites and then re-use information about that compression to compress F on the source, I could possibly eliminate some information from the transfer that could be rebuilt on the destination using F'.
I've created dicflate exactly for this purpose: https://github.com/hrobeers/dicflate
Preset dictionaries aren't really useful for files that size. They're great for small data (think compressing fields in a database, RPC queries/responses, snippets of XML or JSON, etc.), but for larger files like you have the algorithm builds up its own dictionary very quickly.
That said, it just so happens that I was playing with preset dictionaries in Squash fairly recently, and I do have some code which does pretty much what you're talking about for the zlib plugin. I'm not going to push it to master (I have a different API in mind if I decide to support preset dictionaries), but I've just pushed it to the 'deflate-dictionary-file' branch if you want to take a look. To compress, do something like
To decompress,
AFAIK there is nothing in zlib which supports building a dictionary--you have to do that yourself. The zlib documentation describes the "format":
For testing I was using something like this (YMMV):