As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened,
visit the help center for guidance.
Closed 8 years ago.
I watched Linus (creator of git) give a talk on git. At one point he talks about how git is safer. He also said that other SCMs can't deal with data corruption. So I googled it and found out that this is not true.
for example this link talks about "Replace the offending commit with a new commit altogether, re-creating approximately the same changes."
Maybe I misunderstood him, any idea what he meant?
He said, many times, that git is the ONLY SCM that let you checkout the same data you put in.
Linus was referring to the fact that git commits are identifiable by their hash.
Git trees are objects consisting of multiple (trees, blobs) (read: blob=file, roughly).
The cryptographic hash of a parent node in is a hash of that of all underlying trees/blobs recursively. Such trees are known as Merkle (Hash) Trees
and have the interesting property that the toplevel hash is a cryptographically strong hash that uniquely identifies the whole tree.
Note that the hash includes the commit attributes, and these include the parent ids. That is, if some file in some revision ever changes, the hash of the blob changes, therefore the hash(es) of the containing trees change, the hash of the snapshot (root tree) changes, the hash of the commit changes, therewith the hash of any child commits need to change and so on. All history will be altered.
If any of these rules are violated, it will be trivially detectable:
- the hash of a single tree is deterministically verifiable in O(n) where n is the number of objects in the root tree
- the integrity of a full branch history is deterministically verified in O(n) where n is the number of nodes in a revision chain.
In fact, git-verify-tag
, git fsck
are useful commands to do the checking explicitly. Besides that, verification automatically occurs on git subcommands (send-pack, receive-pack, read-tree, write-tree etc.)
Re: Replace the offending commit thread
In this first post by Linus he already deconstructs/defuses the bomb:
Hmm. Scary. That should not have been successful with a corrupt repo.
Unless you have done a .grafts file to hide the corruption, or something
like that?
Which is immediately confirmed by Denis Bueno in the response.
I think he was referring to the fact that git uses a cryptographic hash to ensure data correctness, and that it stores snapshots rather than changesets. Saying that git is the only SCM that does so, is probably an overstatement today, but it might have been true in the past, before the advent of DVCS systems. Note that the term "snapshot" does not mean it stores the entire files. See this answer for details.