I have over 1/2 million files to hash over multiple folders An md5/crc hashing is taking too long some files are 1GB ~ 11Gb in size Im thinking of just hashing part of the file using head
So the below works when it comes to hashing finding and hashing everything.
find . -type f -exec sha1sum {} \;
Im just sure how to take this a step further and just do hash for the first say 256kB of the file e.g
find . -type f -exec head -c 256kB | sha1sum
Not sure if head is okay to use in this instance of would dd be better? The above command doesn't work so looking for ideas on how I can do this
I would like the output to be the same as what is seen in a native md5sum e.g in the below format (going to a text file)
<Hash> <file name>
Im not sure if the above is possible with a single line or will a for/do loop need to be used..... Performance is key using bash on RHEL6
It is unclear where your limitation is. Do you have a slow disk or a slow CPU?
If your disk is not the limitation, you are probably limited by using a single core. GNU Parallel can help with that:
If the limitation is disk I/O, then your idea of
head
makes perfect sense:The optimal value of
-j10
depends on your disk system, so try adjusting it until you find the optimal value (which can be as low as-j1
).