Surely there must be a way to do this easily!
I've tried the Linux command-line apps such as sha1sum
and md5sum
but they seem only to be able to compute hashes of individual files and output a list of hash values, one for each file.
I need to generate a single hash for the entire contents of a folder (not just the filenames).
I'd like to do something like
sha1sum /folder/of/stuff > singlehashvalue
Edit: to clarify, my files are at multiple levels in a directory tree, they're not all sitting in the same root folder.
Here's a simple, short variant in Python 3 that works fine for small-sized files (e.g. a source tree or something, where every file individually can fit into RAM easily), ignoring empty directories, based on the ideas from the other solutions:
It works like this:
You can pass in a different hash function as second parameter if SHA-1 is not your cup of tea.
You can do
tar -c /path/to/folder | sha1sum
One possible way would be:
If there is a whole directory tree, you're probably better off using find and xargs. One possible command would be
Edit: Good point, it's probably a good thing to sort the list of files, so:
And, finally, if you also need to take account of permissions and empty directories:
The arguments to
stat
will cause it to print the name of the file, followed by its octal permissions. The two finds will run one after the other, causing double the amount of disk IO, the first finding all file names and checksumming the contents, the second finding all file and directory names, printing name and mode. The list of "file names and checksums", followed by "names and directories, with permissions" will then be checksummed, for a smaller checksum.A robust and clean approach
This is what I have on top my head, any one who has spent some time working on this practically would have caught other gotchas and corner cases.
Here's a tool, very light on memory, which addresses most cases, might be a bit rough around the edges but has been quite helpful.
An example usage and output of
dtreetrawl
.A snippet of human friendly output:
I had to check into a whole directory for file changes.
But with excluding, timestamps, directory ownerships.
Goal is to get a sum identical anywhere, if the files are identical.
Including hosted into other machines, regardless anything but the files, or a change into them.
It generate a list of hash by file, then concatenate those hashes into one.
This is way faster than the tar method.
For a stronger privacy in our hashes, we can use sha512sum on the same recipe.
The hashes are also identicals anywhere using sha512sum but there is no known way to reverse it.
There is a python script for that:
http://code.activestate.com/recipes/576973-getting-the-sha-1-or-md5-hash-of-a-directory/
If you change the names of a file without changing their alphabetical order, the hash script will not detect it. But, if you change the order of the files or the contents of any file, running the script will give you a different hash than before.