I would love some help with a Bash script loop that will show all the differences between two binary files, using just
cmp file1 file2
It only shows the first change I would like to use cmp because it gives a offset an a line number of where each change is but if you think there's a better command I'm open to it :) thanks
I think cmp -l file1 file2
might do what you want. From the manpage:
-l --verbose
Output byte numbers and values of all differing bytes.
The output is a table of the offset, the byte value in file1 and the value in file2 for all differing bytes. It looks like this:
4531 66 63
4532 63 65
4533 64 67
4580 72 40
4581 40 55
[...]
So the first difference is at offset 4531, where file1's decimal byte value is 66 and file2's is 63.
The more efficient workaround I've found is to translate binary files to some form of text using od
.
Then any flavour of diff
works fine.
Method that works for byte addition / deletion
diff <(od -An -tx1 -w1 -v file1) \
<(od -An -tx1 -w1 -v file2)
Generate a test case with a single removal of byte 64:
for i in `seq 128`; do printf "%02x" "$i"; done | xxd -r -p > file1
for i in `seq 128`; do if [ "$i" -ne 64 ]; then printf "%02x" $i; fi; done | xxd -r -p > file2
Output:
64d63
< 40
If you also want to see the ASCII version of the character:
bdiff() (
f() (
od -An -tx1c -w1 -v "$1" | paste -d '' - -
)
diff <(f "$1") <(f "$2")
)
bdiff file1 file2
Output:
64d63
< 40 @
Tested on Ubuntu 16.04.
I prefer od
over xxd
because:
- it is POSIX,
xxd
is not (comes with Vim)
- has the
-An
to remove the address column without awk
.
Command explanation:
-An
removes the address column. This is important otherwise all lines would differ after a byte addition / removal.
-w1
puts one byte per line, so that diff can consume it. It is crucial to have one byte per line, or else every line after a deletion would become out of phase and differ. Unfortunately, this is not POSIX, but present in GNU.
-tx1
is the representation you want, change to any possible value, as long as you keep 1 byte per line.
-v
prevents asterisk repetition abbreviation *
which might interfere with the diff
paste -d '' - -
joins every two lines. We need it because the hex and ASCII go into separate adjacent lines. Taken from: Concatenating every other line with the next
- we use parenthesis
()
to define bdiff
instead of {}
to limit the scope of the inner function f
, see also: How to define a function inside another function in bash
See also:
- https://superuser.com/questions/125376/how-do-i-compare-binary-files-in-linux
- https://unix.stackexchange.com/questions/59849/diff-binary-files-of-different-sizes