I am writing in C using OpenSSL library.
How can I calculate hash of a large file using md5?
As I know, I need to load a whole file to RAM as char array and then call the hash function. But what if the file is about 4Gb long? Sounds like a bad idea.
SOLVED: Thanks to askovpen, I found my bug. I've used
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5_Update (&mdContext, data, 1024);
not
while ((bytes = fread (data, 1, 1024, inFile)) != 0)
MD5_Update (&mdContext, data, bytes);
You don't have to load the entire file in memory at once. You can use the functions MD5_Init(), MD5_Update() and MD5_Final() to process it in chunks to produce the hash. If you are worried about making it an "atomic" operation, it may be necessary to lock the file to prevent someone else changing it during the operation.
First, MD5 is a hashing algorithm. It doesn't encrypt anything.
Anyway, you can read the file in chunks of whatever size you like. Call MD5_Init once, then call MD5_Update with each chunk of data you read from the file. When you're done, call MD5_Final to get the result.
example
gcc -g -Wall -o file file.c -lssl -lcrypto
result:
The top answer is correct, but didn't mention something: The value of the hash will be different for each buffer size used. The value will be consistent across hashes, so the same buffer size will produce the same hash everytime, however if this hash will be compared against a hash of the same data at a later time, the same buffer size must be used for each call.
In addition, if you want to make sure your digest code functions correctly, and go online to compare your hash with the online hashing websites, it appears they use a buffer length of 1. This also brings an interesting thought: It is perfectly acceptable to use a buffer length of 1 to hash a large file, it will just take longer (duh).
So my rule of thumb is if it's only for internal use, then I can set the buffer length accordingly for a large file, but if it has to play nice with other systems, then set the buffer length to 1 and deal with the time consequence.