I'm just curious as to why this choice was made - it basically rules out changing the compression algorithm used by Git - because it doesn't use the SHA1 of the raw blobs. Perhaps there is some efficiency consideration here. Maybe ZLIB is faster at compressing a file than the SHA1 algorithm is at creating the hash, so therefore compressing before hashing is faster?
Here is a link to the original Git READMEby Linus: http://git.kernel.org/?p=git/git.git;a=blob;f=README;h=27577f76849c09d3405397244eb3d8ae1d11b0f3;hb=e83c5163316f89bfbde7d9ab23ca2e25604af290
And here is the relavent paragraph:
"There are several kinds of objects in the content-addressable collection database. They are all in deflated with zlib, and start off with a tag of their type, and size information about the data. The SHA1 hash is always the hash of the compressed object, not the original one."