Smarter rebase avoiding redundant work?

2020-06-08 15:18发布

问题:

One issue I run into with long rebases is having to resolve redundant conflicts. Say I have a branch with a sequence of commits that keeps modifying a function, and the final commit removes the function entirely.

When I do rebase master, Git naively applies each of the commits in turn. That means I need to resolve each of these commits with the tip of master - even though ultimately that work is wasted.

What's a good way to deal with this situation? Perhaps I should just generate a single patch for the whole branch, and apply that against master? If so, is there any way to preserve some history? Thoughts, suggestions, etc.

回答1:

You want to use git rerere combined with teaching the rerere database from historical commits using rerere-train.sh (you may already have it at /usr/share/doc/git/contrib/rerere-train.sh). This allows git to automatically use merge conflict resolutions learned from the history.

Warning: you're basically making git rewrite the source code by blindly using historical string replacements to fix the conflicting merge. You should review all conflicting merges after the rebase. I find that gitk works fine for this (it will show only conflict resolution as the patch for merges). I've had only good experiences with rerere, you might not be that lucky. Basically, if your history does contain broken merges (that is, merges that are technically incorrectly done and then later fixed in following commits), you do not want to use rerere from the history, unless you want to have similarly broken merges done automatically for you.

Long story short, you just run

git config --global rerere.enabled 1
bash /usr/share/doc/git/contrib/rerere-train.sh --all

followed by the rebase you really want to do and it should just magically work.

After you have enabled rerere globally, you no longer need to learn from the history in the future. The learning feature is required only for using rerere after the fact the conflict resolution is already done before enabling rerere.

PS. I found similar answer to another question: https://stackoverflow.com/a/4155237/334451



回答2:

You could use git rerere feature.

You have to enable it using git config --global rerere.enabled 1, after that, every conflict you resolve get stored for later use and the resolution is reapplied in the same contexts.

You can check the stored resolutions with git rerere diff.

Take a look at this tutorial for more information.



回答3:

Why not squash the redundant patches together in an initial interactive rebase (first re-order them so they are together) so that you have cleaned out the 'modify then delete' aspects of the sequence. You can be selective with the hunks within a commit during this stage (e.g. using git gui). This would then give you a better sequence for a final clean rebase.



回答4:

(This is my second answer to question. On second reading I think the original problem might have been a bit different from the one I first understood.)

I understand the question as you're having a development branch paraller to master. Usually these kind of branch style is called feature branches and I definitely encourage using those.

One should always try to keep feature branches clean. In practice, you want a feature branch that has commits you would had done if you never made any mistakes. For me, that means committing a lot and later git rebase -i to fix the mistakes when I later learn about those mistakes.

By the time your feature branch is ready, it should look like

  1. Add API to do thing X
  2. Fix existing API Y for corner case Z
  3. Add feature B using X and Y (works in case Z, too!)
  4. Improve feature B: do magic stuff E

Instead of

  1. WIP
  2. WIP2
  3. Add API
  4. Move API to do X
  5. Add feature B
  6. On second thought, rename the parameters for X
  7. Fix feature B
  8. Fix APi for X
  9. Fix corner case Z
  10. Fix corner case Z for API Y, too
  11. do magic stuff E
  12. commit missing fILE

If you then rebase your feature branch to latest master branch, the changes are high that only commit Fix existing API Y for corner case Z may cause conflicts. If that commit is minimal change to modify existing API then fixing the conflict should be easy. In addition, that conflict only arises if some other commit has modified exactly the lines touched by your minimal change.

If you do feature branches and rebase feature branches instead of merging (my preferred style is to rebase so that fast-forward is possible and then do git checkout master && git merge --no-ff feature-branch-x and document the whole thing in the merge commit – that allows keeping full history of branch and allows GUI tools to easily navigate around the feature if needed) you definitely want to keep your feature branches clean before rebasing those branches to master. Not only your rebases will be easier but the history is readable in the long run.

So in the above example one could rebase -i <old-enough-sha1> and the re-order commits as 3+4+6+8, 10, 1+2+5+7+9, 11+12 where + means squash. Git allows splitting and editing existing commits, too, but it's usually easier to keep commits really small and then squash some of those later. Note that in this example even the original commit number 10 ends up before the original first commit. This is normal and reflects the reality that your implementation was not perfect. That does not need to be stored in version history, though.

In your case, it sounds like you have a feature branch where multiple commits add and remove the same stuff. Squash those commits as a single commit (may end up as no change which is okay). Rebase your feature branch to master only when the feature branch looks clean. Definitely learn to use git gui or some other tool that makes committing changed lines instead of files easy. Every commit should be a change that modifies a sane collection of stuff. If you add a new feature X, the same commit must not fix existing function Y or add missing documentation about Z. Not even if those changes were made to the same file. To me, this is the kind of stuff that Linus Torvalds meant when he said "files do not matter".