Typical approaches recommend reading the binary via FileStream and comparing it byte-by-byte.
- Would a checksum comparison such as CRC be faster?
- Are there any .NET libraries that can generate a checksum for a file?
Typical approaches recommend reading the binary via FileStream and comparing it byte-by-byte.
If you only need to compare two files, I guess the fastest way would be (in C, I don't know if it's applicable to .NET)
OTOH, if you need to find if there are duplicate files in a set of N files, then the fastest way is undoubtedly using a hash to avoid N-way bit-by-bit comparisons.
Another improvement on large files with identical length, might be to not read the files sequentially, but rather compare more or less random blocks.
You can use multiple threads, starting on different positions in the file and comparing either forward or backwards.
This way you can detect changes at the middle/end of the file, faster than you would get there using a sequential approach.
The slowest possible method is to compare two files byte by byte. The fastest I've been able to come up with is a similar comparison, but instead of one byte at a time, you would use an array of bytes sized to Int64, and then compare the resulting numbers.
Here's what I came up with:
In my testing, I was able to see this outperform a straightforward ReadByte() scenario by almost 3:1. Averaged over 1000 runs, I got this method at 1063ms, and the method below (straightforward byte by byte comparison) at 3031ms. Hashing always came back sub-second at around an average of 865ms. This testing was with an ~100MB video file.
Here's the ReadByte and hashing methods I used, for comparison purposes:
Yet another answer, derived from @chsh. MD5 with usings and shortcuts for file same, file not exists and differing lengths:
Something (hopefully) reasonably efficient:
Edit: This method would not work for comparing binary files!
In .NET 4.0, the
File
class has the following two new methods:Which means you could use: