Why does Git use the SHA1 of the *compressed* obje

2019-06-15 12:55发布

I'm just curious as to why this choice was made - it basically rules out changing the compression algorithm used by Git - because it doesn't use the SHA1 of the raw blobs. Perhaps there is some efficiency consideration here. Maybe ZLIB is faster at compressing a file than the SHA1 algorithm is at creating the hash, so therefore compressing before hashing is faster?

Here is a link to the original Git READMEby Linus: http://git.kernel.org/?p=git/git.git;a=blob;f=README;h=27577f76849c09d3405397244eb3d8ae1d11b0f3;hb=e83c5163316f89bfbde7d9ab23ca2e25604af290

And here is the relavent paragraph:

"There are several kinds of objects in the content-addressable collection database. They are all in deflated with zlib, and start off with a tag of their type, and size information about the data. The SHA1 hash is always the hash of the compressed object, not the original one."

标签： git hash compression blob sha1

1条回答

别忘想泡老子

2楼-- · 2019-06-15 13:26

Like you said, it is the original README, when Git was started. Since then, it has been changed so that the SHA1 is computed before compressing.

It’s worth noting that the SHA-1 hash that is used to name the object is the hash of the original data plus this header, so 'sha1sum' file does not match the object name for file. (Historical note: in the dawn of the age of git the hash was the SHA-1 of the compressed object.)

http://schacon.github.com/git/user-manual.html#object-details

0人赞添加讨论(0) 举报

Why does Git use the SHA1 of the *compressed* obje

采纳回答

编辑标签

举报内容

检举类型

检举原因

检举说明(必填)

打开微信“扫一扫”，打开网页后点击屏幕右上角分享按钮

付费偷看金额在0.1-10元之间

Why does Git use the SHA1 of the compressed obje