I need to release some software quite frequently, and the software is contained as a VMWare disk file, i.e., .vmdk
file.
What I want is some kind of binary diff and patch utility to make the delta generated as small as possible.
问题:
回答1:
Let me start off with tried-and-true approaches, then point out some more recent approaches.
approaches that I have seen work with binary files
A long time ago, people expanded the old and the new versions of a binary file into temporary "text" files (every byte expanded to 3 bytes: 2 hex digits and a newline). Then run these two "text" files through an old version of "diff" (that definitely couldn't handle binary files) to make a patch file. Then we transmitted that "text" patch file over communication lines that were not yet 8-bit-clean. On the receive end, one expanded the old binary file into a temporary text version, then patched that old text file, and then compressed the new text file back into a binary file (compressing each pair of hex digits into one byte, and throwing away the newlines and any carriage returns that might have crept in).
More recently, I have been using rsync (or some utility built on top of it such as Unison). It handles arbitrary binary files just fine. I generally do a live update, with Unison running on my local machine and rsync running on the file server, talking back and forth to each other.
No matter how a patch file is generated, you can use any data compression utility to compress that file.
approaches that, as far as I know, ought to work with binary files
StackOverflow: "how to crate a PATCH file for the binary difference output file" suggests using bsdiff.
Another StackOverflow question implies that "vimdiff" seems to handle arbitrary bytes adequately.
StackOverflow: "Useful Binary Diff Tool" mentions a few other binary difference tools.
I hear that some tools based on rsync -- "rdiff" and "rdiff-backup" and "duplicity" -- allow you create a patch file. Then a person who receives that patch file can use it to update their old binary file to a new binary file.
The Wikipedia claims that recent versions of the standard "diff" and "patch" utilities support binary files. Have you tried that?
cutting-edge research in executable file compression
If you are interested in cutting-edge research on making the delta file as small as possible when updating executable files, you'll want to check out "How Courgette works" by Stephen Adams 2009 at The Chromium Projects.
Among other things, the computer that receives the patch "disassembles" the old application, converting all absolute addresses and offsets into symbols; then patches the disassembled code; then "reassembles" the patched code into the new version of the application.
回答2:
Try xdelta.
I was looking for some binary diff tools for very large files (one LVM logical volume and its snapshots, because LVM doesn't support snapshot of snapshot yet) and xdelta works for me.