How does --no-ff merge break bisect and blame?

2020-04-09 19:11发布

问题:

Understanding the Git Workflow article says,

So you add a new rule: “When you merge in your feature branch, use –no-ff to force a new commit.” This gets the job done, and you move on.

Then one day you discover a critical bug in production, and you need to track down when it was introduced. You run bisect but keep landing on checkpoint commits. You give up and investigate by hand.

You narrow the bug to a single file. You run blame to see how it changed in the last 48 hours. You know it’s impossible, but blame reports the file hasn’t been touched in weeks. It turns out blame reports changes for the time of the initial commit, not when merged. Your first checkpoint commit modified this file weeks ago, but the change was merged in today.

The no-ff band-aid, broken bisect, and blame mysteries are all symptoms that you’re using a screwdriver as a hammer.

git merge --no-ff is a case when you prevent fast-forward merge explicitly. But, if one commit is not the direct ancestor of another, fast-forward doesn't even take place. It's a rare scenario in development. In other words, most of the merges are non-fast-forward type. Then how does passing --no-ff break the functionality of bisect and blame?

回答1:

tl;dr

Merge commits and --no-ff don't have anything to do with git bisect or git blame per se.

Mixing public and private histories is what makes it harder to make sense of the changes introduced by different commits. This becomes painfully obvious when using git bisect and git blame for debugging.

Merge commits and git bisect

As @torek said, forcing the creation of merge commits by passing --no-ff doesn't do anything to git bisect.

The real problem comes from polluting the public history of a repository with interim commits – sometimes called checkpoint commits – that were used by the programmers to keep track of their own work locally.

Due to their temporary nature, these commits tend to be inconsistent – potentially introducing errors in the code base – and poorly documented. Landing on such a commit in the middle of a debugging session with git bisect isn't a pleasant experience; the patch could be hard to interpret or – even worse – it could break the code in the working tree.

I think Linus Torvalds said it best:

I want clean history, but that really means (a) clean and (b) history.

Regarding the "clean" part, he goes on to elaborate:

Keep your own history readable.

Some people do this by just working things out in their head first, and not making mistakes. But that's very rare, and for the rest of us, we use "git rebase" etc. while we work on our problems.

Don't expose your crap.

Merge commits and git blame

When it comes to git blame, merge commits do, in fact, have an impact on the results you get.

git blame always shows the original author of a given line of code; in other words who added it to the file. From the point of view of Git, a merge commit doesn't add anything to a file, it simply represents the snapshot of the repository's directories and files that results from combining two or more lines of history. Unless it's an evil merge, that is.

This means that running git blame on a merged file is going to show you the original author of each line, regardless of who did the merge commit.

That's why merge commits can make it harder to track who did a particular change and when. But then again, you shouldn't be merging private commits into public branches anyway, since they won't make sense to anyone else but the original author.



回答2:

Then how does passing --no-ff break the functionality of bisect and blame?

I don't think it does. Moreover, I don't think the blog post you're quoting says it does either. Let me re-excerpt and add a bit of emphasis:

You run bisect but keep landing on checkpoint commits. You give up and investigate by hand.

You've given up on the bisection because of the checkpoint commits. Bisect must test them since they are commits that may have introduced the bug, but because they're checkpoint commits, they're likely not actually test-able and you must tell bisect to skip them. If you were to persist with bisect, it would probably eventually find that the bug is in a commit within the large range, and you'd fix it by fixing the line(s) broken by the large range of commits, which you might examine by doing git diff A B, where A is the commit just before the start of the large range, and B is the commit at the end of the large range. In which case, why not just have one commit going from A to B directly?

(I won't go into git blame here.)

What Ben Sandofsky (author of the blog post) suggests you do here is not so much "avoid --no-ff" but rather: "When preparing a series of small checkpoint commits to be published and pushed back to the actual project, don't just shove all the raw small-commits unchanged directly into the project." He then outlines several options for cleaning them up, the simplest being to squash all the small commits into one single all-in-one commit.

I'll leave his additional options (which make more sense for larger features) to his blog post, but assuming you're looking at many small checkpoint commits that should simply be squashed down to a single commit, it should be pretty trivial for you to do that squash as a single commit that appends to the current history, and therefore there's no need for a branch to hold it and a merge to bring it in. The commands that do the "take big range of commits from private branch and get ready for a new single tip-of-published-branch commit" are git merge --squash, which doesn't make a merge (nor a commit),1 followed by git commit.


1Thus it probably should not be spelled git merge in the first place, but that's git for you: if there were a command git eatcake, it would have one option that eats cake, a second option that just grinds wheat into flour while also separating eggs in case you want to make the cake yourself first, and a third option that tests whether the eggs it would separate, if it were doing the second mode, have gone bad.