I have to create some code review from unmerged branches.
In finding solutions, let's not go to local-branch context problem as this will run on a server; there will be just the origin remote, I will always run a git fetch origin command before other commands, and when we talk about branches, we will refer to origin/branch-name.
If the setup were simple and each branch that originated from master continued on its own way, we could just run:
git rev-list origin/branch-name --not origin/master --no-merges
for each unmerged branch and add the resulting commits to each review per branch.
The problem arises when there are merges between 2-3 branches and work is continued on some of them. As I said, for each branch I want to create code reviews programmatic and I don't want to include a commit in multiple reviews.
Mainly the problems reduce on finding the original branch for each commit.
Or to put it simpler... finding all unmerged commits grouped by the branch they most probably were created on.
Let's focus on a simple example:
* b4 - branch2's head
* | a4 - branch1's head
| * b3
* | merge branch2 into branch1
* |\ | m3 - master's head
| * \| a3
| | |
| | * b2
| * | merge master into branch1
* /| | m2
|/ | * merge branch1 into branch2
| * /| a2
| |/ |
| | * b1
| | /
| |/
| /|
|/ |
| * a1
* / m1
|/
|
* start
and what I want to obtain is:
- branch1: a1, a2, a3, a4
- branch2: b1, b2, b3, b4
The best solution I found so far is to run:
git show-branch --topo-order --topics origin/master origin/branch1 origin/branch2
and parse the result:
* [master] m3
! [branch1] a4
! [branch2] b4
---
+ [branch2] b4
+ [branch2^] b3
+ [branch1] a4
++ [branch2~2] b2
-- [branch2~3] Merge branch 'branch1' into branch2
++ [branch2~4] b1
+ [branch1~2] a3
+ [branch1~4] a2
++ [branch1~5] a1
*++ [branch2~5] m1
Output interpretation is like this:
- First n lines are the n branches analyzed
- one line with ----
- one line for each commit with a plus (or minus in case of merge commits) on the n-th indentation character if that commit is on the n-th branch.
- the last line is the merge base for all branches analyzed
For point 3. the commit name resolution is starting with a branch name and, from what I see, this branch corresponds to the branch that commits were created on, probably by promoting path reaching by first-parent.
As I'm not interested in merge commits, I'll ignore them.
I'll then parse each branch-path-commit to obtain their hash with rev-parse.
How can I handle this situation?
The repository could be cloned with
--mirror
which creates a bare repository that can be used as a mirror of the original repository and can be updated withgit remote update --prune
after which all the tags should be deleted for this feature.I implement it this way:
1. get a list of branches not merged into master
2. for each branch get a list of revisions on that branch and not in master branch
If the list is empty, remove the branch from the list of branches
3. for each revision, determine the original branch with
and match regex for
^([^\~\^]*)([\~\^].*)?$
. The first pattern is the branch name, the second is the relative path to the branch.If the branch name found is not equal to the initial branch, remove revision from the list.
At the end I obtained a list of branches and for each of them a list of commits.
After some more bash research, it can be done all in one line with:
The result is an output in the form
which can be read, parsed, ordered, group or whatever.
I would suggest doing it kind of the way you described it. But I would work on the output of
git log --format="%H:%P:%s" ^origin/master origin/branch1 origin/branch2
, so you can do better tree-walking.git rev-parse
). Mark every commit with the names of the head you came from and its distance.commit -> known-name
.Now for each of your commits, you will have a list of distance values (that might be negative) to your branch heads. For each commit, the branch with the least distance is the one the commit was most likely created on.
If you have time, you might want to walk the whole history and then substract the history of master – that might give slightly better results if your branches have been merged into master before.
Couldn’t resist: Made a python script that does what I described. But with one change: with every normal step, the distance is not increased, but decreased. This has the effect that branches that lived longer after a merge-point are preferred, which I personally like more. Here it is: https://gist.github.com/Chronial/5275577
Usage: simply run
git-annotate-log.py ^origin/master origin/branch1 origin/branch2
check the quality of the results (will output a git log tree with annotations).If I grasp your problem space, think you can use --sha1-name
to list what you are interested in, then run the commits through git-what-branch
and format the report to suite your needs?
There is no correct answer to this question because it is underspecified.
Git history is simply a directed acyclic graph (DAG), and it's generally impossible to determine semantic relationships between two arbitrary nodes in a DAG unless the nodes are sufficiently labeled. Unless you can guarantee that the commit messages in your example graph follow a reliable, machine-parseable pattern, the commits are not sufficiently labeled—it's impossible to automatically identify the commits you are interested in without additional context (e.g., guarantees that your developers follow certain best practices).
Here's an example of what I mean. You say that commit
a1
is associated withbranch1
, but this can't be determined with certainty just by looking at the nodes of your example graph. It's possible that once upon a time your example repository history looked like this:Note that
branch1
doesn't even exist yet in the above graph. The above graph could have arisen from the following sequence of events:branch2
is created atstart
in the shared repositorya1
on his/her localbranch2
branchm1
andb1
on his/her localbranch2
branchbranch2
branch to the shared repository, causing thebranch2
ref in the shared repository to point toa1
branch2
branch to the shared repository, but this fails with a non-fast-forward error (branch2
currently points toa1
and can't be fast-forwarded tob1
)git pull
, merginga1
intob1
git commit --amend -m "merge branch1 into branch2"
for some inexplicable reasonSome time later, user#1 creates
branch1
off ofa1
and createsa2
, while user#2 fast-forward mergesm1
intomaster
, resulting in the following commit history:Given that this sequence of events is technically possible (although unlikely), how can a human let alone Git tell you which commits "belong" to which branch?
Parsing Merge Commit Messages
If you can guarantee that users don't change merge commit messages (they always accept the Git default), and that Git has never and will never change the default merge commit message format, then the merge commit's commit message can be used as a clue that
a1
started off onbranch1
. You'll have to write a script to parse the commit messages—there are no simple Git one-liners to do this for you.If Merges are Always Intentional
Alternatively, if your developers follow best practices (each merge is intentional and is meant to bring in a differently-named branch, resulting in a repository without those stupid merge commits created by
git pull
), and you are not interested in the commits from a completed child branch, then the commits you're interested in are on the first-parent path. If you know which branch is the parent of the branch you are analyzing, you can do the following:This command lists the SHA1 identifiers for the commits that are reachable from
branch-ref
excluding the commits reachable fromparent-branch-ref
and the commits that were merged in from child branches.In your example graph above, assuming parent order is determined by your annotations and not by the order of the lines going into a merge commit,
git rev-list --first-parent --no-merges master..branch1
would print the SHA1 identifiers for commits a4, a3, a2, and a1 (in that order; use--reverse
if you want the opposite order), andgit rev-list --first-parent --no-merges master..branch2
would print the SHA1 identifiers for commits b4, b3, b2, and b1 (again, in that order).If Branches Have Clear Parent/Child Relationships
If your developers do not follow best practices and your branches are littered with those stupid merges created by
git pull
(or an equivalent operation), but you have clear parent/child branch relationships, then writing a script to perform the following algorithm may work for you:Find all commits reachable from the branch of interest excluding all commits from its parent branch, its parent's parent branch, its parent's parent's branch, etc., and save the results. For example:
Do the same for all child, grandchild, etc. branches of the branch of interest. For example, assuming
branch2
is considered to be a child ofbranch1
:Filter out the results of step #2 from the results of step #1. For example:
The trouble with this approach is that once a child branch is merged into its parent, those commits are considered to be part of the parent even if development on the child branch continues. Although this makes sense semantically, it does not produce the result you say you want.
Some Best Practices
Here are some best practices to make this particular problem easier to solve in the future. Most if not all of these can be enforced via clever use of hooks in the shared repository.
git pull
--no-ff
. If it does have children branches, you can still rebase, but please preserve the--no-ff
merges of the children branches (this is trickier than it should be).If all of your developers follow these rules, then a simple:
is all you need to see the commits that were made on that branch minus the commits made on its children branches.