In a Git cherry-pick or rebase merge conflict, how

2019-01-30 20:15发布

问题:

In a normal Git merge conflict, the three versions of a file in play for the three-way merge are roughly as follows:

  • LOCAL: the version from my branch
  • REMOTE: the version from the other branch
  • BASE: the version from the common ancestor of the two branches (in particular, the common ancestor of my branch's HEAD and the other branch's HEAD)

When a Git cherry-pick generates a merge conflict, there is no common ancestor, properly speaking, so how are these things determined? The same could be asked about rebase.

回答1:

cherry-pick

Unless I have misled myself, then if you do "git cherry-pick <commit C>", then you get:

  • LOCAL: the commit you're merging on top of (ie the HEAD of your branch)
  • REMOTE: the commit you're cherry picking (i.e. <commit C>)
  • BASE: the parent of the commit you're cherry-picking (ie C^, ie the parent of C)

If it's not immediately clear why BASE should be C^, see the "why" section below.

Meanwhile, let's take an example, and see that BASE can be but often won't be a common ancestor during a cherry-pick. Suppose the commit graph looks like this

E <-- master
|
D 
| C <-- foo_feature(*)
|/
B
|
A

and you are in branch foo_feature (hence the asterisk). If you do "git cherry-pick <commit D>", then BASE for that cherry-pick will be commit B, which is a common ancestor of C and D. (C will be LOCAL and D will be REMOTE.) However, if you instead do "git cherry-pick <commit E>, then BASE will be commit D. (C will be LOCAL and E will be REMOTE.)

rebase

For background context, rebase is approximately iterated cherry-picking. In particular, rebasing topic on top of master (ie "git checkout topic; git rebase master") means approximately :

git checkout master # switch to master's HEAD commit
git checkout -b topic_rebased # create new branch rooted there
for each commit C in master..topic # for each topic commit not already in master...
    git cherry-pick C # bring it over to the new branch
finally, forget what "topic" used to mean and now defined "topic" as the HEAD of topic_rebased.

The labels that apply during this process are extensions of the normal cherry-pick rules:

  • LOCAL: the commit you're cherry-picking on top of
    • This is the HEAD of the new topic_rebased branch
    • For the first commit only, this will be the same as the HEAD of master
  • REMOTE: the commit you're cherry picking (i.e. <commit C>)
  • BASE: the parent of the commit you're cherry-picking (C^, ie the parent of C)

This implies something to keep in mind about LOCAL vs REMOTE, if you want to avoid confusion:

Even though you were on branch topic when you initiated the rebase, LOCAL never refers to a commit on the topic branch while a rebase is in progress. Instead, LOCAL always refers to a commit on the new branch being created (topic_rebased).

(If one fails to keep this in mind, then during a nasty merge one may start asking oneself, "Wait, why is it saying these are local changes? I swear they were changes made on master, not on my branch.")

To be more concrete, here is an example:

Say we have commit graph

D <-- foo_feature(*)
|
| C <-- master
B |
|/
|
A

and we are currently on branch foo_feature (indicated by "*"). If we run "git rebase master", the rebase will proceed in two steps:

First, changes from B will be replayed on top of C. During this, C is LOCAL, B is REMOTE, and A is BASE. Note that A is a real common ancestor of B and C. After this first step, you have a graph approximately like so:

   B' <-- foo_feature
D  |
|  |
|  C <-- master
B /
|/
|
A

(In real life, B and D might have already been pruned out of the tree at this point, but I'm leaving them in here, in order to make it easier to spot any potential common ancestors.)

Second, changes from D will be replayed on top of B'. During this, B' is LOCAL, D is REMOTE, and B is BASE. Note that B is not a relevant common ancestor of anything. (For example, it's not a common ancestor of the current LOCAL and REMOTE, B' and D. And it's not a common ancestor of the original branch heads, C and D). After this step, you have a branch approximately like so:

   D' <-- foo_feature
   |
   B'
D  |
|  |
|  C <-- master
B /
|/
|
A

For completeness, note by the end of the rebase B and D are removed from the graph, yielding:

D' <-- foo_feature
|
B'
|
C <-- master
|
A

Why is BASE defined as it is?

As noted above, both for a cherry-pick and for a rebase, BASE is the parent (C^) of the the commit C being pulled in. In the general case C^ isn't a common ancestor, so why call it BASE? (In a normal merge BASE is a common ancestor. And part of git's successes in merging are due to its ability to find a good common ancestor.)

Essentially, one does this as a way to implement "patch" functionality via the normal three-way merge algorithm. In particular you get these "patchy" properties:

  • If <commit C> doesn't modify a given given region of the file, then the version of that region from your branch will prevail. (This is, regions that the "patch" doesn't call for changing don't get patched.)
  • If <commit C> modifies a given region of the file and your branch leaves that region alone, then the version of that region from <commit x> will prevail. (That is, regions that the "patch" calls for changing get patched.)
  • If <commit C> modifies a given region of the file but your branch has also modified that region, then you get a merge conflict.