Difference between 'rebase master' and 

2019-01-21 07:38发布

问题:

Given the following branch structure:

  *------*---*
Master        \
               *---*--*------*
               A       \
                        *-----*-----*
                        B         (HEAD)

If I want to merge my B changes (and only my B changes, no A changes) into master what is the difference between these two set of commands?

>(B)      git rebase master
>(B)      git checkout master
>(master) git merge B

>(B)      git rebase --onto master A B
>(B)      git checkout master
>(master) git merge B

I'm mainly interested in learning if code from Branch A could make it into master if I use the first way.

回答1:

Bear with me for a while before I answer the question as asked. One the earlier answers is right but there are labeling and other relatively minor (but potentially confusing) issues, so I want to start with branch drawings and branch labels. Also, people coming from other systems, or maybe even just new to revision control and git, often think of branches as "lines of development" rather than "traces of history" (git implements them as the latter, rather than the former, so a commit is not necessarily on any specific "line of development").

First, there is a minor problem with the way you drew your graph:

  *------*---*
Master        \
               *---*--*------*
               A       \
                        *-----*-----*
                        B         (HEAD)

Here's the exact same graph, but with the labels drawn in differently and some more arrow-heads added (and I've numbered the commit nodes for use below):

0 <- 1 <- 2         <-------------------- master
           \
            3 <- 4 <- 5 <- 6      <------ A
                       \
                        7 <- 8 <- 9   <-- HEAD=B

Why this matters is that git is quite loose about what it means for a commit to be "on" some branch—or perhaps a better phrase is to say that some commit is "contained in" some set of branches. Commits cannot be moved or changed, but branch labels can and do move.

More specifically, a branch name like master, A, or B points to one specific commit. In this case, master points to commit 2, A points to commit 6, and B points to commit 9. The first few commits 0 through 2 are contained within all three branches; commits 3, 4, and 5 are contained within both A and B; commit 6 is contained only within A; and commits 7 through 9 are contained only in B. (Incidentally, multiple names can point to the same commit, and that's normal when you make a new branch.)

Before we proceed, let me re-draw the graph yet one more way:

0
 \
  1
   \
    2     <-- master
     \
      3 - 4 - 5
              |\
              | 6   <-- A
               \
                7
                 \
                  8
                   \
                    9   <-- HEAD=B       

This just emphasizes that it's not a horizontal line of commits that matter, but rather the parent/child relationships. The branch label points to a starting commit, and then (at least the way these graphs are drawn) we move left, maybe also going up or down as needed, to find parent commits.


When you rebase commits, you're actually copying those commits.

Git can never change any commit

There's one "true name" for any commit (or indeed any object in a git repository), which is its SHA-1: that 40-hex-digit string like 9f317ce... that you see in git log for instance. The SHA-1 is a cryptographic1 checksum of the contents of the object. The contents are the author and committer (name and email), time stamps, a source tree, and the list of parent commits. The parent of commit #7 is always commit #5. If you make a mostly-exact copy of commit #7, but set its parent to commit #2 instead of commit #5, you get a different commit with a different ID. (I've run out of single digits at this point—normally I use single uppercase letters to represent commit IDs, but with branches named A and B I thought that would be confusing. So I'll call a copy of #7, #7a, below.)

What git rebase does

When you ask git to rebase a chain of commits—such as commits #7-8-9 above—it has to copy them, at least if they're going to move anywhere (if they're not moving it can just leave the originals in place). It defaults to copying commits from the currently-checked-out branch, so git rebase needs just two extra pieces of information:

  • Which commits should it copy?
  • Where should the copies land? That is, what's the target parent-ID for the first-copied commit? (Additional commits simply point back to the first-copied, second-copied, and so on.)

When you run git rebase <upstream>, you let git figure out both parts from one single piece of information. When you use --onto, you get to tell git separately about the both parts: you still supply an upstream but it doesn't compute the target from <upstream>, it only computes the commits to copy from <upstream>. (Incidentally, I think <upstream> is not a good name, but it's what rebase uses and I don't have anything way better, so let's stick with it here. Rebase calls target <newbase>, but I think target is a much better name.)

Let's take a look at these two options first. Both assume that you're on branch B in the first place:

  1. git rebase master
  2. git rebase --onto master A

With the first command, the <upstream> argument to rebase is master. With the second, it's A.

Here's how git computes which commits to copy: it hands the current branch to git rev-list, and it also hands <upstream> to git rev-list, but using --not—or more precisely, with the equivalent of the two-dot exclude..include notation. This means we need to know how git rev-list works.

While git rev-list is extremely complicated—most git commands end up using it; it's the engine for git log, git bisect, rebase, filter-branch, and so on—this particular case is not too hard: with the two-dot notation, rev-list lists every commit reachable from the right-hand side (including that commit itself), excluding every commit reachable from the left-hand side.

In this case, git rev-list HEAD finds all commits reachable from HEAD—that is, almost all commits: commits 0-5 and 7-9—and git rev-list master finds all commits reachable from master, which is commit #s 0, 1, and 2. Subtracting 0-through-2 from 0-5,7-9 leaves 3-5,7-9. These are the candidate commits to copy, as listed by git rev-list master..HEAD.

For our second command, we have A..HEAD instead of master..HEAD, so the commits to subtract are 0-6. Commit #6 doesn't appear in the HEAD set, but that's fine: subtracting away something that's not there, leaves it not there. The resulting candidates-to-copy is therefore 7-9.

That still leaves us with figuring out the target of the rebase, i.e., where should copied commits land? With the second command, the answer is "the commit identified by the --onto argument". Since we said --onto master, that means the target is commit #2.

rebase #1

git rebase master

With the first command, though, we didn't specify a target directly, so git uses the commit identified by <upstream>. The <upstream> we gave was master, which points to commit #2, so the target is commit #2.

The first command is therefore going to start by copying commit #3 with whatever minimal changes are needed so that its parent is commit #2. Its parent is already commit #2. Nothing has to change, so nothing changes, and rebase just re-uses the existing commit #3. It must then copy #4 so that its parent is #3, but the parent is already #3, so it just re-uses #4. Likewise, #5 is already good. It completely ignores #6 (that's not in the set of commits to copy); it checks #s 7-9 but they're all good as well, so the whole rebase ends up just re-using all the original commits. You can force copies anyway with -f, but you didn't, so this whole rebase ends up doing nothing.

rebase #2

git rebase --onto master A

The second rebase command used --onto to select #2 as its target, but told git to copy just commits 7-9. Commit #7's parent is commit #5, so this copy really has to do something.2 So git makes a new commit—let's call this #7a—that has commit #2 as its parent. The rebase moves on to commit #8: the copy now needs #7a as its parent. Finally, the rebase moves on to commit #9, which needs #8a as its parent. With all commits copied, the last thing rebase does is move the label (remember, labels move and change!). This gives a graph like this:

          7a - 8a - 9a       <-- HEAD=B
         /
0 - 1 - 2                    <-- master
         \
          3 - 4 - 5 - 6      <-- A
                    \
                     7 - 8 - 9   [abandoned]

OK, but what about git rebase --onto master A B?

This is almost the same as git rebase --onto master A. The difference is that extra B at the end. Fortunately, this difference is very simple: if you give git rebase that one extra argument, it runs git checkout on that argument first.3

Your original commands

In your first set of commands, you ran git rebase master while on branch B. As noted above, this is a big no-op: since nothing needs to move, git copies nothing at all (unless you use -f / --force, which you didn't). You then checked out master and used git merge B, which—if it it is told to4—creates a new commit with the merge. Therefore Dherik's answer, as of the time I saw it at least, is correct here: The merge commit has two parents, one of which is the tip of branch B, and that branch reaches back through three commits that are on branch A and therefore some of what's on A winds up being merged into master.

With your second command sequence, you first checked out B (you were already on B so this was redundant, but was part of the git rebase). You then had rebase copy three commits, producing the final graph above, with commits 7a, 8a, and 9a. You then checked out master and made a merge commit with B (see footnote 4 again). Again Dherik's answer is correct: the only thing missing is that the original, abandoned commits are not drawn-in and it's not as obvious that the new merged-in commits are copies.


1This only matters in that it's extraordinarily difficult to target a particular checksum. That is, if someone you trust tells you "I trust the commit with ID 1234567...", it's almost impossible for someone else—someone you may not trust so much—to come up with a commit that has that same ID, but has different contents. The chances of it happening by accident are 1 in 2160, which is much less likely than you having a heart attack while being struck by lightning while drowning in a tsunami while being abducted by space aliens. :-)

2The actual copy is made using the equivalent of git cherry-pick: git compares the commit's tree with its parent's tree to get a diff, then applies the diff to the new parent's tree.

3This is actually, literally true at this time: git rebase is a shell script that parses your options, then decides which kind of internal rebase to run: the non-interactive git-rebase--am or the interactive git-rebase--interactive. After it's figured out all the arguments, if there's the one left-over branch name argument, the script does git checkout <branch-name> before starting the internal rebase.

4Since master points to commit 2 and commit 2 is an ancestor of commit 9, this would normally not make a merge commit after all, but instead do what Git calls a fast-forward operation. You can instruct Git not to do these fast-forwards using git merge --no-ff. Some interfaces, such as GitHub's web interface and perhaps some GUIs, may separate the different kinds of operations, so that their "merge" forces a true merge like this.

With a a fast-forward merge, the final graph for the first case is:

0 <- 1 <- 2         [master used to be here]
           \
            3 <- 4 <- 5 <- 6      <------ A
                       \
                        7 <- 8 <- 9   <-- master, HEAD=B

In either case, commits 1 through 9 are now on both branches, master and B. The difference, compared to the true merge is that, from the graph, you can see the history that includes the merge.

In other words, the advantage to a fast-forward merge is that it leaves no trace of what is otherwise a trivial operation. The disadvantage of a fast-forward merge is, well, that it leaves no trace. So the question of whether to allow the fast-forward is really a question of whether you want to leave an explicit merge in the history formed by the commits.



回答2:

Before any of the given operations your repository looks like this

           o---o---o---o---o  master
                \
                 x---x---x---x---x  A
                                  \
                                   o---o---o  B

After a standard rebase (without --onto master) the structure will be:

           o---o---o---o---o  master
               |            \
               |             x'--x'--x'--x'--x'--o'--o'--o'  B
                \
                 x---x---x---x---x  A

...where the x' are commits from the A branch. (Note how they're now duplicated at the base of branch B.)

Instead, a rebase with --onto master will create the following cleaner and simpler structure:

           o---o---o---o---o  master
               |            \
               |             o'--o'--o'  B
                \
                 x---x---x---x---x  A


回答3:

The differences:

First set

  • (B) git rebase master

    *---*---* [master]
             \
              *---*---*---* [A]
                       \
                        *---*---* [B](HEAD)
    

Nothing happened. There are no new commits in master branch since the creation of B branch.

  • (B) git checkout master

    *---*---* [master](HEAD)
             \
              *---*---*---* [A]
                       \
                        *---*---* [B]
    
  • (master) git merge B

    *---*---*-----------------------* [Master](HEAD)
             \                     /
              *---*---*---* [A]   /
                       \         /
                        *---*---* [B]
    

Second set

  • (B) git rebase --onto master A B

    *---*---*-- [master]
            |\
            | *---*---*---* [A]
            |
            *---*---* [B](HEAD)
    
  • (B) git checkout master

    *---*---*-- [master](HEAD)
            |\
            | *---*---*---* [A]
            |
            *---*---* [B]
    
  • (master) git merge B

    *---*---*----------------------* [master](HEAD)
            |\                    /
            | *---*---*---* [A]  /
            |                   /  
            *---*--------------* [B]
    

I want to merge my B changes (and only my B changes, no A changes) into master

Be careful what you understand for "only my B changes".

In the first set, the B branch is (before the final merge):

 *---*---*
          \
           *---*---*
                    \
                     *---*---* [B]

And in the second set your B branch is:

*---*---*
        |
        |
        |
        *---*---* [B]

If I understand correctly, what you want is only the B commits that are not in A branch. So, the second set is the right choice for you before the merge.



回答4:

git log --graph --decorate --oneline A B master (or an equivalent GUI tool) can be used after each git command to visualize the changes.

This is the initial state of the repository, with B as the current branch.

(B) git log --graph --oneline --decorate A B master
* 5a84c72 (A) C6
| * 9a90b7c (HEAD -> B) C9
| * 2968483 C8
| * 187c9c8 C7
|/  
* 769014a C5
* 6b8147c C4
* 9166c60 C3
* 0aaf90b (master) C2
* 8c46dcd C1
* 4d74b57 C0

Here is a script to create a repository in this state.

#!/bin/bash

commit () {
    for i in $(seq $1 $2); do
        echo article $i > $i
        git add $i
        git commit -m C$i
    done
}

git init
commit 0 2

git checkout -b A
commit 3 6

git checkout -b B HEAD~
commit 7 9

The first rebase command does nothing.

(B) git rebase master
Current branch B is up to date.

Checking out master and merging B simply points master at the same commit as B, (i.e. 9a90b7c). No new commits are created.

(B) git checkout master
Switched to branch 'master'

(master) git merge B
Updating 0aaf90b..9a90b7c
Fast-forward
<... snipped diffstat ...>

(master) git log --graph --oneline --decorate A B master
* 5a84c72 (A) C6
| * 9a90b7c (HEAD -> master, B) C9
| * 2968483 C8
| * 187c9c8 C7
|/  
* 769014a C5
* 6b8147c C4
* 9166c60 C3
* 0aaf90b C2
* 8c46dcd C1
* 4d74b57 C0

The second rebase command copies the commits in the range A..B and points them at master. The three commits in this range are 9a90b7c C9, 2968483 C8, and 187c9c8 C7. The copies are new commits with their own commit IDs; 7c0e241, 40b105d, and 5b0bda1. The branches master and A are unchanged.

(B) git rebase --onto master A B
First, rewinding head to replay your work on top of it...
Applying: C7
Applying: C8
Applying: C9

(B) log --graph --oneline --decorate A B master
* 7c0e241 (HEAD -> B) C9
* 40b105d C8
* 5b0bda1 C7
| * 5a84c72 (A) C6
| * 769014a C5
| * 6b8147c C4
| * 9166c60 C3
|/  
* 0aaf90b (master) C2
* 8c46dcd C1
* 4d74b57 C0

As before, checking out master and merging B simply points master at the same commit as B, (i.e. 7c0e241). No new commits are created.

The original chain of commits that B was pointing at still exists.

git log --graph --oneline --decorate A B master 9a90b7c
* 7c0e241 (HEAD -> master, B) C9
* 40b105d C8
* 5b0bda1 C7
| * 5a84c72 (A) C6
| | * 9a90b7c C9    <- NOTE: This is what B used to be
| | * 2968483 C8
| | * 187c9c8 C7
| |/  
| * 769014a C5
| * 6b8147c C4
| * 9166c60 C3
|/  
* 0aaf90b C2
* 8c46dcd C1
* 4d74b57 C0


回答5:

You can try it yourself and see. You can create a local git repository to play with:

#! /bin/bash
set -e
mkdir repo
cd repo

git init
touch file
git add file
git commit -m 'init'

echo a > file0
git add file0
git commit -m 'added a to file'

git checkout -b A
echo b >> fileA
git add fileA
git commit -m 'b added to file'
echo c >> fileA
git add fileA
git commit -m 'c added to file'

git checkout -b B
echo x >> fileB
git add fileB
git commit -m 'x added to file'
echo y >> fileB
git add fileB
git commit -m 'y added to file'
cd ..

git clone repo rebase
cd rebase
git checkout master
git checkout A
git checkout B
git rebase master
cd ..

git clone repo onto
cd onto
git checkout master
git checkout A
git checkout B
git rebase --onto master A B
cd ..

diff <(cd rebase; git log --graph --all) <(cd onto; git log --graph --all)