Context: I downloaded a file (Audirvana 0.7.1.zip) from code.google to my Macbook Pro (Mac OS X 10.6.6).
I wanted to verify the checksum, which for that particular file is posted as 862456662a11e2f386ff0b24fdabcb4f6c1c446a (SHA-1). git hash-object
gave me a different hash, but openssl sha1
returned the expected 862456662a11e2f386ff0b24fdabcb4f6c1c446a.
The following experiment seems to rule out any possible download corruption or newline differences and to indicate that there are actually two different algorithms at play:
$ echo A > foo.txt
$ cat foo.txt
A
$ git hash-object foo.txt
f70f10e4db19068f79bc43844b49f3eece45c4e8
$ openssl sha1 foo.txt
SHA1(foo.txt)= 7d157d7c000ae27db146575c08ce30df893d3a64
What's going on?
You see a difference because
git hash-object
doesn't just take a hash of the bytes in the file - it prepends the string "blob " followed by the file size and a NUL to the file's contents before hashing. There are more details in this other answer on Stack Overflow:Or, to convince yourself, try something like:
The SHA1 digest is calculated over a header string followed by the file data. The header consists of the object type, a space and the object length in bytes as decimal. This is separated from the data by a null byte.
So:
One consequence of this is that "the" empty tree and "the" empty blob have different IDs. That is:
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391 always means "empty file" 4b825dc642cb6eb9a060e54bf8d69288fbee4904 always means "empty directory"
You will find that you can in fact do
git ls-tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
in a new git repository with no objects registered, because it is recognised as a special case and never actually stored (with modern Git versions). By contrast, if you add an empty file to your repo, a blob "e69de29bb2d1d6434b8b29ae775ad8c2e48c5391" will be stored.The answer lies here:
How to assign a Git SHA1's to a file without Git?
git
calculates on file metadata + contents, not just contents.That is a good enough answer for now, and the takeaway is that
git
is not the tool for checksumming downloads.