Why does Git not store the branch name as part of

2019-02-09 00:04发布

问题:

Please note: I'm not trying to restart the argument whether Mercurial or Git is better, I just have a technical question that I, as a Mercurial user, don't understand. I'm also not sure whether SO is the right place to ask such a question, but it is programming related.

There have been many discussions about how the two version control systems Git and Mercurial differ from each other from a user's point of view (e.g. What is the Difference Between Mercurial and Git? and http://felipec.wordpress.com/2011/01/16/mercurial-vs-git-its-all-in-the-branches/ ), and the major difference is the handling of branches. I have read through many of these discussions, but I keep asking myself this question:

Why does Git not store the branch name as part of the commit?

I don't really see a good reason for not doing that; it means that data can't just simply vanish because there is no reference (tag, branch, whatever) poiting to it.

I see storing the branch in the commit as a big plus for Mercurial, because that makes it more difficult to lose data.

The main point of the Git crowd in favor of Git's branching model, that you can simply delete branches, does not prevent Git from storing the name of the branch as part of each commit: If the commits of a branch are deleted, so are the references to that branch. It will also not interfere with the "cheap branching" argument: branches will not be more expensive to manage. And I don't think that the additional storage needed should be of concern: it's just a couple of bytes per commit.

回答1:

One of the definitive source about branches for Git and Mercurial is the SO question:

"Git and Mercurial - Compare and Contrast"

In Git references (branches, remote-tracking branches and tags) reside outside DAG of commits.

(That allows to manage different namespaces regarding branches, for local and remote branches)

You have a similar notion with Mercurial with bookmark branches (which can be pushed/pulled).

Note that in Git, the data won't "vanish" because there is no reference: you still have the reflog to retrieve those unreferenced commits.

Why does Git not store the branch name as part of the commit?
I don't really see a good reason for not doing that

The idea is to separate what has changed (the commits) from whym ie from the context of the change (the name of the branch).
Since you can fast-forward merge a branch, commits from one branch can be part of another at any time.

That is why Jakub Narębski questioned the design of Mercurial "named branches" (with branch names embedded in changeset metadata), especially with a global namespace, not very suited for a distributed version control system.

You create a branch to isolate a development effort (see "When should you branch?"), but with a DVCS, that development effort (the set of commits) should be published under any branch name. What local context (branch name) you have defined might not be valid once published to another Git repo.



回答2:

Mercurial’s basic model of operation is very simple, anonymous branches that form a directed acyclic graph (DAG), and as such branch names carry little importance and you will be dealing with them a lot less. Named branches are there mostly for organizational purposes (release branches, etc.), for which I would argue a global namespace makes more sense or at least is less objectionable.

Git has a more complicated and managed branching model than Mercurial, where even your local changes are treated like a separate named branch, and facing such a plethora of named branches you have to introduce namespaces to manage them. For the same reason Git has the concept of fast-forward merges, something that does not apply to Mercurial because you wouldn’t have created a separate branch in the first place.

Both these concepts add extra complexity, and at the same time block useful features like storing the branch name along with the commit. This is because you can not store namespaced branches without some global space, and git has none.

The flaw in the argument for namespaces that VonC referenced above is that it assumes there is a problem if you and I both create a branch called ‘x’. There isn’t, just like there is no problem when you create a named branch, merge it, and later create another one with the same name. A well-chosen name describes what the branch does no matter who the author is, and if you need to differentiate more the author is stored right along with the branch name, and all of this is permanent.

I think it was a very good decision of the Mercurial project to store the branch name along with the commit. Just like the commit message and the author, the context of the work (the branch) is important and useful meta-information. It makes it much easier to see the flow of the changesets when inspecting history, because you can see the context they were made in. In Git I have experienced that without this information, history quickly becomes a mess of jumbly lines that is hard to make heads or tails of. I think having this is well worth the trade-off of having no namespace.

I think you could say the philosophical difference is that Mercurial treats named branches as permanent meta-data that gives some extra information about a line of commits, whereas for Git branches are an un-versioned managing system for developers on top of the DAG. Mercurial also has these under the name ‘bookmarks’ by the way (as of version 1.8 a core feature), but they are really more of a tracking tool and treated like labels instead of branches.