I created two C programs
Program 1
int main() { }
Program 2
int main() { //Some Harmless comments }
AFAIK, when compiling, the compiler(gcc) should ignore the comments and redundant whitepaces, and hence the output must be similar.
But when I checked the md5sums of the output binaries, they don't match. I also tried compiling with optimisation -O3
and -Ofast
but they still didn't match.
What is happening here?
EDIT: the exact commands and there md5sums are(t1.c is program 1 and t2.c is program 2)
gcc ./t1.c -o aaa
gcc ./t2.c -o bbb
98c1a86e593fd0181383662e68bac22f aaa
c10293cbe6031b13dc6244d01b4d2793 bbb
gcc ./t2.c -Ofast -o bbb
gcc ./t1.c -Ofast -o aaa
2f65a6d5bc9bf1351bdd6919a766fa10 aaa
c0bee139c47183ce62e10c3dbc13c614 bbb
gcc ./t1.c -O3 -o aaa
gcc ./t2.c -O3 -o bbb
564a39d982710b0070bb9349bfc0e2cd aaa
ad89b15e73b26e32026fd0f1dc152cd2 bbb
And yes, md5sums match across multiple compilations with same flags.
BTW my system is gcc (GCC) 5.2.0
and Linux 4.2.0-1-MANJARO #1 SMP PREEMPT x86_64 GNU/Linux
Note: remember that the source file name goes into the unstripped binary, so two programs coming from differently named source files will have different hashes.
In similar situations, should the above not apply, you can try:
strip
against the binary to remove some fat. If the stripped binaries are the same then it was some metadata that isn't essential to the program operation.strings
, or dump both programs to hex and run a diff on the two hex dumps. Once located the difference(s), you might try and see whether there's some rhyme or reason to them (PID, timestamps, source file timestamp...). For example you might have a routine storing the timestamp at compile time for diagnostic purposes.It's because the file names are different (although the strings output is the same). If you try modifying the file itself (rather than having two files), you'll notice that the output binaries are no longer different. As both Jens and I said, it's because GCC dumps a whole load of metadata into the binaries it builds, including the exact source filename (and AFAICS so does clang).
Try this:
This explains why your md5sums don't change between builds, but they are different between different files. If you want, you can do what Jens suggested and compare the output of
strings
for each binary you'll notice that the filenames are embedded in the binary. If you want to "fix" this, you canstrip
the binaries and the metadata will be removed:The most common reason are file names and time stamps added by the compiler (usually in the debug info part of the ELF sections).
Try running
and you might see the reason. I once used this to find why the same source would cause different code when compiled in different directories. The finding was that the
__FILE__
macro expanded to an absolute file name, different in both trees.