Can I safely rebase one branch into other and then

2020-05-06 10:53发布

问题:

I have to development branches and I found branch B depending on code from branch A.

I want to rebase A into B so I can keep on developing B.

Soon A will be merged to master (before B), but not now. Then, when I merge B, will it break the reference from having A rebased on it?

Can I just then rebase B on master and all will be fine or do I need to do any special step?

回答1:

Note that git (and pretty much all other version control systems) calls this rebasing onto, not into.

You may indeed do this kind of rebase. However, it's important to know that when you rebase some commit(s), git actually copies them to new commits. This affects everyone who shares the code, if they already have the old commits.

The picture

Let's suppose that what you have right now can be drawn this way:

...--N--O--P--Q--R   <-- master
      \     \
       \     J--K    <-- A
        \
         E--F--G     <-- B

Here branch A points to commit K, branch B points to commit G, and branch master points to commit R (each commit now has these one-letter codes so that we can identify it more easily; in reality, their names are SHA-1 hashes like bfcd84f529b...). Commits N through R are therefore on master (along with all commits hidden off to the left of N); commits N, O, P, J, and K (and all commits left of N as before) are on branch A; and commits N, E, F, and G (and all commits left of N as always) are on branch B.

Merge bases

Commit N is the merge base of branches B and master, and commit P is the merge base of branches A and master. The merge base is simply1 the commit after which the branches diverge, which becomes easy to see if you draw the graph as we just did. This matters for your git rebase because the default method git uses is to find this merge base commit, and copy commits that are after that point. (And, this means that before you do a rebase, it may be a good idea for you to draw the graph, so that you can see what you will be doing.)

Copying commits

To copy a single commit yourself, use the command git cherry-pick. You don't need to do this now—rebase will do it for you—but it's a good idea to know how it works, so let's cover that briefly.

In git, each commit is a fully self-contained snapshot. For instance, consider commit F on branch B. Commit F contains the complete source tree associated with your project, as of the time you made commit F. This is also true of commits E and G. If we compare commit E vs commit F, we will see what changed between "the tree for commit E" and "the tree for commit F". Similarly, if we compare F vs G, we will find out what changed in going from F to G.

Converting a commit to a set of changes—called a changeset—allows us to copy a commit. Suppose, for instance, we see what happened from N to E. Then suppose further that we check out commit K and make the same change, and then commit it on a new, temporary branch:

...--N--O--P--Q--R   <-- master
      \     \
       \     J--K    <-- A
        \        \
         \        E'   <-- temp branch
          \
           E--F--G   <-- B

I called the new commit E', rather than giving it a new single letter, because it's a copy of commit E. It's not exactly the same as commit E of course. For one thing, it has a different parent (it has commit K as its parent); and it probably has a bunch of other differences from E: specifically whatever happened in commits O, P, J, and K, since we've now applied "the changes from N to E onto "what was in K".

Rebase is just repeated copies

git rebase works by:

  1. Identifying the commits to copy;
  2. Copying them using a temporary branch; and
  3. Re-pointing the original branch.

If you git checkout B and then git rebase A, the commits identified in step 1 are those after the merge base between B and A, up to the tip of branch B. The A and B here are because your current branch is B and you said git rebase A.

Exercise: look at the graph and identify the merge-base of B and A. It might be a bit hard to spot, but what if we re-draw the graph like this?

             Q--R    <-- master
            /
...--N--O--P--J--K   <-- A
      \
       E--F--G       <-- B

(This drawing is, topologically speaking, exactly the same as the original graph we drew, but now it's super-obvious that commit N is the merge base.)

In short, these are commits E, F, and G. So rebase goes on to step 2 and copies those three commits, placing the copies after ("onto") the tip of branch A (because you said git rebase A).

Thus, at this point in the rebase process, we now have:

             Q--R            <-- master
            /
...--N--O--P--J--K           <-- A
      \           \
       \           E'-F'-G'  <-- temp
        \
         E--F--G             <-- B

There's just one last thing for rebase to do now, which is to re-point the current branch B, to the last commit just copied. That's pretty easy:

             Q--R            <-- master
            /
...--N--O--P--J--K           <-- A
      \           \
       \           E'-F'-G'  <-- B
        \
         E--F--G             [abandoned]

The original commits, E through G, are mostly gone at this point (they're still retained through the reflog for branch B but unless you specifically look there, you won't see them any more). The new copies, E' through G', have been added on just past the tip of branch A.

Why all this matters

All of this copying and shuffling-about takes place in your repository, and not in anyone else's copy of your repository. This means that any co-worker or centralized server does not have these copies yet. If you git push the copies to a centralized server, the server will get the copies—but if the server had the original, now abandoned, commits, and if the server called that "branch B", you will have to force-push to the server.

Doing this force-push will make the server discard its copies of the original commits, and put in place the copied commits and call those "branch B". This is true even if, on the server, there is a commit H subsequent to commit G, with branch B pointing to commit H. In other words, this can "lose" commits from the server. That's the first wrinkle: you must coordinate with everyone else using the server, so that you do not lose commits.

Further, after doing a force push, and/or if your co-workers are sharing your repository (by fetching/pulling from it), your co-workers may need to adjust their repositories to account for the copy-commits-and-move-branch-label. To them, this is an upstream rebase and they must recover from it. This is the second wrinkle: you must coordinate with everyone else using the branch, so that they know to recover from the upstream rebase.

In short, make sure all your co-developers are OK with this.

If you have others working with you, and they are sharing the branches you will rebase, just make sure they all know about it. If these commits that you will rebase are not published—if they're purely private to your own repository—then this coordination requires no effort at all, as there is no one else to inform, so there is no one who can object to it.

Merging, and maybe rebasing again

Soon A will be merged to master (before B), but not now. Then, when I merge B, will it break the reference from having A rebased on it?

Again, the way to answer these questions is to start by drawing the commit graph. Let's use the new, rebased drawing, then add the merge commit that merges B into master. Since we want to show the merge, we need to re-draw the graph a bit yet again, to make it easier to connect the rows:

...--N--O--P-Q-R             <-- master
            \
             J--K            <-- A
                 \
                  E'-F'-G'   <-- B

Now we'll run git checkout master and git merge A to make new merge commit M:

...--N--O--P-Q-R--M          <-- master
            \    /
             J--K            <-- A
                 \
                  E'-F'-G'   <-- B

We did not have to change anything in B here, and we can continue developing on it as usual. But, if we want to—and if all our co-developers agree—we can now rebase B again, this time onto master.

If we git checkout B; git rebase master, the rebase will, as usual, start by finding the merge base between B and master. What is the merge base? Look at the graph and find the point where master and B join up. It's definitely not the new merge commit M, but it's not N any more either.

The commits that are on branch B are N, O, P, J, K, E', F', and G' (and any commits we can't see that are to the left of N). This part is straightforward enough. The tricky part is the commits that are on branch master. These are N, O, P, Q, R, and M, but also J and K. This is because those two commits can be reached by following commit M's second parent.

Therefore, the merge base—which is defined as the "closest" commit that's on both branches—is actually commit K. This is exactly what we want! To rebase B onto master, we want git to copy E', F', and G'. The new copies E'', F'', and G'' will go after M, and we will get this:

                    E''-F''-G''   <-- B
                   /
...--N--O--P-Q-R--M               <-- master
            \    /
             J--K                 <-- A
                 \
                  E'-F'-G'        [abandoned]

The real lesson here is draw the graph as it will help you figure out what is going on.


1At least, it's simple when there is a single such point. In more complex graphs, there can be more than one merge base. Rebasing with complex graphs requires a lot more than I can write in this particular SO posting, though.