dvcs partial merge (git, hg merge tracking)

2019-04-24 17:04发布

问题:

I've one question about general DVCS, including Git and Hg.

In both Git and Hg merge tracking is done at the "commit" level instead of the "file/directory" level.

One of the "side effects" is that you can't easily do a "partial merge":

  • You've modified 30 files in your branch "feature_branch_x"
  • You want to merge ONLY the files under (let's say) /kernel/gui

With "item based merge tracking" (Perforce, ClearCase, Plastic SCM <= 3.0) you can just select a few files to merge, then checkin, then later repeat the merge and the pending files will show up.

With Hg, Git: once you merge (there're ways to keep files out without being merged) the "tracking" is set and if you repeat the merge no candidates to merge are left.

My question is how do you feel about it??

Are there cases where you feel "partial merge" is mandatory? Can you live without it? (merge with commit/cset level tracking is much faster).

DISCLAIMER: I work for Plastic SCM and we've moved to "cset" level tracking in 4.0, but we're wondering whether it could be a good idea to stay with "item level merge tracking" or even allow both.

回答1:

My feeling is that wanting to do a partial merge of a branch is a sign that too much was put in one branch in the first place. The correct way to deal with that situation is to split the branch into two branches, thereby correcting the original error, not compounding the error by trying to keep track of a partial merge. I would favor an SCM feature that made splitting a branch easier to do.



回答2:

This whole-tree merge of mercurial and Git comes from the philosophy of both to track only the state of a whole tree. Neither Mercurial nor Git can store in their metadata that there was a partial merge, since both SCM track the parents of the merge, and the resulting tree. The advantage of this kind of view is that it is much less likely to get the repo in an unstable state by committing half-baked merges.

Think of a situation where files where moved from one subdir into another, and also this paths are coded into a source file. Now when you merge the files in a subdirectory only, the files get correctly moved during the merge, but the references in the source files are still pointing to the old directory. When you now commit this partial merge, you have a defunct state in the VCS. You would need to somehow mark the final merge commit as the completing one (I don't know if Plastic SCM has such a semantic), to prevent others to check out such a work-in-progress state.

Of course it is nasty to merge branches which are diverted over a long time. In the DVCS world this monster-merges are tried to be mitigated to merge early and continuously diverting branches (say one is a feature branch, and a stable one, it is a good idea to often merge stable->feature). Also git has the ability to track merge conflict solutions (called rerere), which helps to mitigate merge pains when you try to do the same merge several times (like when to "practice" the merge before the final finish).



回答3:

Pablo, here's a real world case supporting item-level merge: let's have the main M branch a customer branch C. The branch C was forked a while ago, possibly weeks or months ago, and M evolved significantly meanwhile. Now you need to do a hot fix for the customer and hence modify code in C. Also, you need to introduce the change in M.

Our workflow would be to do full, proper fix in M, test M and deliver a new general-purpose release of the product. Then I'd need to merge relevant parts of the fix into C to be able to provide a custom build to the customer affected by the problem. Hence I need to transfer some changes from M into C. Such operation will have the following aspects:

  • some files are merged “as is”,
  • some files are merged manually, but it's very important to know that a merge occured,
  • files merged from M need not be from the latest changeset recorder in M, they may span several changesets.

So, to able to track such operation when inspecting the repository's history, 'data (code) flows' between files shall be recorded on a file-by-file case. The resulting 'merge changeset' would consist of 1:1 automatic merges as well as manually adjusted merges.

UPDATE: Speed vs. usability tradeoff: As I understand your product you bet on features like really good merge. And I bet most users won't care of superspeed — but they do will care of a really good merge.

About a decade ago adding 1000 files into ClearCase took 10 minutes. It took 1 minute to add them into Subversion. That's why we chose Subversion over ClearCase. However, if it took ClearCase 2 minutes we would — most likely — choose ClearCase because of it's then-better features.

If I get good, working features that support real world commercial software development scenarios I won't care whether it will be 50 % faster or slower than my current VCS (Subversion). But, on the other hand, if you provide poor features and/or something that's a usability blocker in comparison to other VCS tools users won't switch.

To conclude on changeset-level vs. item-level merge: stick to freedom — and that's, at least from my point of view, item-level merge.



回答4:

from Merge some files now and some later:

consider commits to be made up of sub-commits with changes to only one file each, and then support file-based operations. it's clear that people don't know at branch/commit-time what they will want to merge in the future, and this would solve that. git could be a content AND file tracker.



回答5:

IMHO the SCM that got branch/merge most "right" for this was PRCS, which supported painless partial merges with full per-item history. It allowed you to have things like per-platform branches and merge between them without fear of platform-specific changes "bleeding" into the other branch. The algorithm it used is described here;

http://prcs.sourceforge.net/merge.html

The thing I most miss is on merges where the basis and both branches differed, it would do a three-way-merge per file and the first time it would prompt you for "yours", "theirs", or "merge" and it would remember your response as the default for the next merge.

When I was using it I found branch/merge so painless and intuitive I used them all the time and it never felt complicated. For some reason branch/merge with hg and git have never felt quite so painless.