Find first common child of two commits

2020-03-15 05:19发布

问题:

           :
           A
T         / \
i        B   C
m        :   :
e        D   E
          \ /
|          F
V          :

git merge-base B E allows to find where a the common ancestor A of the two commits. Is there a way to find the commit F where the two branches are merged again?

回答1:

Oops. Didn't read that carefully enough.

The only information in a commit is the id of its parent (or parents). You cannot get to a child from a parent commit (this is the directed part of the repository being a DAG).

Looking at this more - it looks like the --ancestry-path option for git log can do this. For instance given:

* 85d26ab When compiling vim, also compile & install gvim
*   3146e5d Merge remote-tracking branch 'origin/devel' into deve
|\
| * 28d08e5 rebasing-merge: specify all commits explicitly
* | 006d11d Help 'file' find its magic file
|/
* e68531d (tag: Git-1.7.6-preview20110720) Update submodules

we can get the all children of these two commits using

git log --oneline --ancestry-path B..E

if you then reverse this and pick off the first one -- that is F.

git rev-list --reverse --ancestry-path 28d08e5..006d11d | head -1

in my case that returns 3146e5d.



回答2:

There isn't necessarily a unique answer to this problem, so you have to decide on a few constraints and/or heuristics, or accept the possibility more than one "downstream" merge. The heart of the problem is the same as the problem of multiple merge base candidates—use git merge-base --all to list them all, otherwise Git just picks whichever one pops up first in its algorithm. We can do the same, or find all best merge candidates.

You've drawn what I usually prefer to render sideways as, e.g.:

  B--...--D
 /         \
A           F--G--H   <-- branch1
 \         /
  C--...--E   <-- branch2

but we might have this:

  B--C---D--E--...   <-- branch1
 /    \ /
A      X
 \    / \
  F--G---H--I--...   <-- branch2

In this case both merges D and H are equally good candidates for "the place where the branches re-merge" if you allow both branch1 and branch2 to be considered. Even if you don't, if branch2 merges back into branch1 later:

  B--C---D--E---J--...   <-- branch1
 /    \ /      /
A      X      /
 \    / \    /
  F--G---H--I--...   <-- branch2

then just starting from (or ending at) branch1, both D and H are equally good candidates.

In any case, what we need here is to enumerate commits that end in one or all of the branches you want to consider. To do that, we can use, e.g.:

git rev-list --ancestry-path ^B ^E branch1 branch2

This finds commits that are ancestors of branch1 or branch2, and are also descendants of commit B or of commit E.

To really get the right answer, we want to add --children. That way we'll get the hash ID of each commit, along with the children of that commit that go in this same direction. Git achieves the --children by reversing the backwards connections from the children to the parents as it traverses the links, which is good enough; but we won't see commits B or E. This is kind of a problem. To get them shown, we can add --boundary. This is not ideal: --boundary sometimes includes some commits we don't want. Fortunately, they're all marked with - so we can exclude extra boundary commits by knocking out ones that aren't the commits we care about.

I'm not going to show any of that, but if you did that, you would now have a list, one entry per line, of each node (vertex) and its edges that connect to its children. You can now ask What is the LCA of the DAG formed by these (V,E) sets?

It would be nice if we could just use Git's LCA algorithm, but Git does not have a way to invoke it on arbitrary graphs—we can only invoke it on commits, and the actual commits have parents, not children. So you will have to write your own. See Algorithm to find lowest common ancestor in directed acyclic graph? (which, unfortunately, has no accepted answer). This algorithm looks correct at first blush; it has one of the two standard definitions for LCA in a graph.

If we're willing to settle for a not-nearly-as-good answer, though, we can get something that's probably sufficient in most cases by adding --topo-order (to make sure parents come out after all their children) and --merges (to omit everything that's not a merge commit). This will get a list of all merges.

I have made here a test repository with a simple case:

$ git log --all --decorate --oneline --graph
* 91fcef6 (HEAD -> master) J
* d1e5905 I
*   5bf18a0 merge
|\  
| * 49b2ba7 (sidebr) D
| * 725e5ea C
| * 36b830d (tag: B) B
* | 198a982 (tag: G) G
* | 216bc01 F
* | e905e59 E
|/  
* 5df9428 initial

So I can now name commits B and G using B and G, and the branch I want for a "move in this direction" is just master. So:

$ git rev-list --topo-order --merges --ancestry-path ^B ^G master
5bf18a0797dfd78107928a9a4095f357cfabe914

The last line here is the merge that's "closest" to the two commits. In this case, that's also the only line, and that's the merge we want.

The flaw here is clear enough once we draw it. Suppose I had a more complex graph, such as:

      I--J
     /    \
    H      M--N
   / \    /    \
  /   K--L      \
 /               \
A                 P--Q  <-- master
 \               /
  \   C--D      /
   \ /    \    /
    B      G--O
     \    /
      E--F

If I now run git rev-list --topo-order --merges --ancestry-path ^B ^H master, I'll enumerate commit P, then both G and M in some order. So the last line will either be commit G or commit M, and while both of these are merges, they don't meet the right criterion: they don't merge B and H. Only commit P does that.

Hence, to check whether you have a right answer—without handling the multiple LCA issue—you should take each of the output lines from this git rev-list command, probably in reverse order (consider adding --reverse), and see if both commits are ancestors of each. "Internal" merges like G and M will have only one commit as an ancestor. To do the is-ancestor test, use git merge-base --is-ancestor:

if git merge-base --is-ancestor $commit1 $mergecommit &&
       git merge-base --is-ancestor $commit2 $mergecommit; then
    ... we've found a correct candidate
else
    ... move on to another candidate
fi


回答3:

Adapt all.awk from this answer to also carry the line number for each ref, then when you've encountered both parents look at the refs they have in common.