What is the exact difference when you execute `git

2019-07-30 11:21发布

问题:

I know the difference between git fetch and git pull. git pull is basically a git fetch + git merge in one command.

However, I was researching on how to update my fork (master branch) with the upstream without checking out the master branch. I came across this SO answer - Merge, update and pull Git branches without checkouts

But when I used git fetch upstream master:master after I was already checked out on master, I ran into this error -

fatal: Refusing to fetch into current branch refs/heads/master of non-bare repository

So, I tried git pull upstream master:master and it worked. What is interesting is that doing git pull upstream master:master updates my fork with upstream regardless of whether I am on master or not. Whereas git fetch upstream master:master only works when I am NOT on master branch.

It will be very interesting to read explanation on this, from the knowledgeable folks out here.

回答1:

git pull is basically a git fetch + git merge in one command

Yes—but, as you suspected, there is more to it than that.

Bennett McElwee's comment, in the answer you linked-to, actually has one of the key items. He mentions that you can:

Use fetch origin branchB:branchB, which will fail safely if the merge isn't fast-forward.

Another is not very well documented: it's the -u aka --update-head-ok option in git fetch, which git pull sets. The documentation does define what it does, but is a bit mysterious and scary:

By default git fetch refuses to update the head which corresponds to the current branch. This flag disables the check. This is purely for the internal use for git pull to communicate with git fetch, and unless you are implementing your own Porcelain you are not supposed to use it.

This gets us to your observation:

So, I tried git pull upstream master:master and it worked. What is interesting is that doing git pull upstream master:master updates my fork with upstream regardless of whether I am on master or not. Whereas git fetch upstream master:master only works when I am NOT on master branch.

This is due to that -u flag. If you ran git fetch upstream master:master, that would work, for some sense of the meaning work, but leave you with a different problem. The warning is there for a reason. Let's look at what that reason is, and see whether the warning is overly harsh. Warning: there is a lot here! Much of the complication below is to make up for historical mistakes, while retaining backwards compatibility.

Branch names, references, and fast-forwarding

First, let's talk about references and fast-forward operations.

In Git, a reference is just a fancy way of talking about a branch name like master, or a tag name like v1.2, or a remote-tracking name like origin/master, or, well, any number of other names, all in one common and sensible fashion: we group each specific kind of name into a name space, or as a single word, namespace. Branch names live under refs/heads/, tag names live under refs/tags/, and so on, so that master is really just refs/heads/master.

Every one of these names, all of which start with refs/, is a reference. There are a few extra references that don't start with refs as well, although Git is a little bit erratic internally in deciding whether names like HEAD and ORIG_HEAD and MERGE_HEAD are actually references.1 In the end, though, a reference mainly serves as a way to have a useful name for a Git object hash ID. Branch names in particular have a funny property: they move from one commit to another, typically in a way that Git refers to as a fast forward.

That is, given a branch with some commits, represented by uppercase letters here, and a second branch with more commits that include all the commits on the first branch:

...--E--F--G   <-- branch1
            \
             H--I   <-- branch2

Git is allowed to slide the name branch1 forward so that it points to either of the commits that were, before, reachable only through the name branch2.2 Compare this, to, say:

...--E--F--G------J   <-- branch1
            \
             H--I   <-- branch2

If we were to move the name branch1 to point to commit I instead of commit J, what would happen to commit J itself?3 This kind of motion, which leaves a commit behind, is a non-fast-forward operation on the branch name.

These names can be shortened by leaving off the refs/ part, or often, even the refs/heads/ part or the refs/tags/ part or whatever. Git will look in its reference-name database4 for the first one that matches, using the six-step rules described in the gitrevisions documentation. If you have a refs/tags/master and a refs/heads/master, for instance, and say master, Git will match refs/tags/master first and use the tag.5


1If a reference is a name that has, or can have, a reflog, then HEAD is a reference while ORIG_HEAD and the other *_HEAD names are not. It's all a little fuzzy at the edges here, though.

2These commits might be reachable through more names. The important thing is that they weren't reachable through branch1 before the fast-forward, and are afterward.

3The immediate answer is actually that nothing happens, but eventually, if commit I is not reachable through some name, Git will garbage collect the commit.

4This "database" is really just the combination of the directory .git/refs plus the file .git/packed-refs, at least for the moment. If Git finds both a file entry and a pathname, the pathname's hash overrides the one in the packed-refs file.

5Exception: git checkout tries the argument as a branch name first, and if that works, treats master as a branch name. Everything else in Git treats it as a tag name, since prefixing with refs/tags is step 3, vs step 4 for a branch name.


Refspecs

Now that we know that a reference is just a name pointing to a commit, and a branch name is a specific kind of reference for which fast forwards are normal everyday things, let's look at the refspec. Let's start with the most common and explainable form, which is just two reference names separated by a colon, such as master:master or HEAD:branch.

Git uses refspecs whenever you connect two Gits to each other, such as during git fetch and during git push. The name on the left is the source and the name on the right is the destination. If you are doing git fetch, the source is the other Git repository, and the destination is your own. If you are doing git push, the source is your repository, and the destination is theirs. (In the special case of using ., which means this repository, both source and destination are yourself, but everything still works just as if your Git is talking to another, separate Git.)

If you use fully-qualified names (starting with refs/), you know for sure which one you will get: branch, tag, or whatever. If you use partially-qualified or unqualified names, Git will usually figure out what you mean anyway. You will occasionally run into a case where Git can't figure out what you mean; in that case, use a fully qualified name.

You can simplify a refspec even further by omitting one of the two names. Git knows which name you omit by which side of the colon is gone: :dst has no source name, while src: has no destination name. If you write name, Git treats that as name:: a source with no destination.

What these mean varies. An empty source for git push means delete: git push origin :branch has your Git ask their Git to delete the name entirely. An empty destination for git push means use the default which is normally the same branch name: git push origin branch pushes your branch by asking their Git to set their branch named branch.6 Note that it's normal to git push to their branch directly: you send them your commits, then ask them to set their refs/heads/branch. This is quite different from the normal fetch!

For git fetch, an empty destination means don't update any of my references. A non-empty destination means update the reference I supply. Unlike git push, though, the usual destination you might use here is a remote-tracking name: you would fetch their refs/heads/master into your own refs/remotes/origin/master. That way, your branch name master—your refs/heads/master—is left untouched.

For historical reasons, though, the usual form of git fetch is just written as git fetch remote branch, omitting the destination. In this case, Git does something seemingly self-contradictory:

  • It writes the branch name update nowhere. The lack of a destination means that no (local) branch gets updated.
  • It writes the hash ID into .git/FETCH_HEAD. Everything git fetch fetches always goes here. This is where and how git pull finds out what git fetch did.
  • It updates the remote-tracking name, such as refs/remotes/origin/master, even thought it was not told to do so. Git calls this an opportunistic update.

(Much of this is actually controlled through a default refspec that you will find in your .git/config file.)

You can also complicate a refspec by adding a leading plus sign +. This sets the "force" flag, which overrides the default "fast forward" check for branch name motion. This is the normal case for your remote-tracking names: you want your Git to update your refs/remotes/origin/master to match their Git's refs/heads/master even if that's a non-fast-forward change, so that your Git always remembers where their master was, the last time your Git talked with their Git.

Note that the leading plus only makes sense if there is a destination to update. There are three possibilities here:

  • You're creating a new name. This is generally OK.7
  • You're making no change to the name: it used to map to commit hash H and the request says to set it to map to commit hash H. This is obviously OK.
  • You are changing the name. This one breaks down into three more sub-possibilities:
    • It's not a branch-like name at all, e.g., it's a tag and should not move. You will need a force flag to override the default rejection.8
    • It's a branch-like name, and the branch motion is a fast-forward. You won't need the force flag.
    • It's a branch-like name, but the motion is not a fast-forward. You will need the force flag.

This covers all the rules for updating references, except for one last rule, for which we need yet more background.


6You can complicate this by setting push.default to upstream. In this case, if your branch fred has its upstream set to origin/barney, git push origin fred asks their Git to set their branch named barney.

7For various cases of updates, you can write hooks that do whatever you like to verify names and/or updates.

8In Git versions before 1.8.3, Git accidentally used branch rules for tag updates. So this only applies to 1.8.3 and later.


HEAD is very special

Remember that a branch name like master just identifies some particular commit hash:

$ git rev-parse master
468165c1d8a442994a825f3684528361727cd8c0

You have also seen that git checkout branchname behaves one way, and git checkout --detach branchname or git checkout hash behaves another way, giving a scary warning about a "detached HEAD". While HEAD acts like a reference in most ways, in a few, it's very special. In particular, HEAD is normally a symbolic reference, in which it contains the full name of a branch name. That is:

$ git checkout master
Switched to branch 'master'
$ cat .git/HEAD
ref: refs/heads/master

tells us that the current branch name is master: that HEAD is attached to master. But:

$ git checkout --detach master
HEAD is now at 468165c1d... Git 2.17
$ cat .git/HEAD
468165c1d8a442994a825f3684528361727cd8c0

after which git checkout master puts us back on master as usual.

What this means is that when we have a detached HEAD, Git knows which commit we have checked out, because the correct hash ID is right there, in the name HEAD. If we were to make some arbitrary change to the value stored in refs/heads/master, Git would still know which commit we have checked out.

But if HEAD just contains the name master, the only way that Git knows that the current commit is, say, 468165c1d8a442994a825f3684528361727cd8c0, is that refs/heads/master maps to 468165c1d8a442994a825f3684528361727cd8c0. If we did something that changed refs/heads/master to some other hash ID, Git would think that we have that other commit checked out.

Does this matter? Yes, it does! Let's see why:

$ git status
... nothing to commit, working tree clean
$ git rev-parse master^
1614dd0fbc8a14f488016b7855de9f0566706244
$ echo 1614dd0fbc8a14f488016b7855de9f0566706244 > .git/refs/heads/master
$ git status
...
Changes to be committed:
...
        modified:   GIT-VERSION-GEN
$ echo 468165c1d8a442994a825f3684528361727cd8c0 > .git/refs/heads/master
$ git status
...
nothing to commit, working tree clean

Changing the hash ID stored in master changed Git's idea of the status!

The status involves HEAD vs index plus index vs work-tree

The git status command runs two git diffs (well, git diff --name-statuses, internally):

  • compare HEAD vs index
  • compare index vs work-tree

Remember, the index, aka the staging area or the cache, holds the contents of the current commit until we start modifying it to hold the contents of the next commit we will make. The work-tree is merely a minor helper for this whole update the index, then commit process. We only need it because the files in the index are in the special Git-only format, that most of the programs on our systems cannot use.

If HEAD holds the raw hash ID for the current commit, then comparing HEAD vs index stays the same regardless of what we do with our branch names. But if HEAD holds one specific branch name, and we change that one specific branch name's value, and then do the comparison, we'll compare a different commit to our index. The index and work-tree will be unchanged, but Git's idea of the relative difference between the (different) current commit and the index will change.

This is why git fetch refuses to update the current branch name by default. It's also why you cannot push to the current branch of a non-bare repository: that non-bare repository has an index and work-tree whose contents are probably intended to match the current commit. If you change that Git's idea of what the current commit is, by changing the hash stored in the branch name, the index and work-tree are likely to stop matching the commit.

That's not fatal—not at all, in fact. That's precisely what git reset --soft does: it changes the branch name to which HEAD is attached, without touching the contents in the index and the work-tree. Meanwhile git reset --mixed changes the branch name and the index, but leaves the work-tree untouched, and git reset --hard changes the branch name, the index, and the work-tree all in one go.

A fast-forward "merge" is basically a git reset --hard

When you use git pull to run git fetch and then git merge, the git merge step is very often able to do what Git calls a fast-forward merge. This is not a merge at all, though: it's a fast-forward operation on the current branch name, followed immediately by updating the index and work-tree contents to the new commit, the same way git reset --hard would. The key difference is that git pull checks—well, is supposed to check9—that no in-progress work will be destroyed by this git reset --hard, while git reset --hard itself deliberately does not check, to let you throw away in-progress work that you no longer want.


9Historically, git pull keeps getting this wrong, and it gets fixed after someone loses a bunch of work. Avoid git pull!


Putting all this together

When you run git pull upstream master:master, Git first runs:

git fetch --update-head-ok upstream master:master

which has your Git call up another Git at the URL listed for upstream and collect commits from them, as found via their name master—the left side of the master:master refspec. Your Git then updates your own master, presumably refs/heads/master, using the right side of the refspec. The fetch step would normally fail if master is your current branch—if your .git/HEAD contains ref: refs/heads/master—but the -u or --update-head-ok flag prevents the failure.

(If all goes well, your git pull will run its second, git merge, step:

git merge -m <message> <hash ID extracted from .git/FETCH_HEAD>

but let's finish with the first step first.)

The fast-forward rules make sure that your master update is a fast-forward operation. If not, the fetch fails and your master is unchanged, and the pull stops here. So we're OK so far: your master is fast-forwarded if and only if that's possible given the new commit(s), if any, obtained from upstream.

At this point, if your master has been changed and it's your current branch, your repository is now out of sync: your index and work-tree no longer match your master. However, git fetch has left the correct hash ID in .git/FETCH_HEAD as well, and your git pull now goes on to the reset-like update. This actually uses the equivalent of git read-tree rather than git reset, but as long as it succeeds—given the pre-pull checks, it should succeed—the end effect is the same: your index and work-tree will match the new commit.

Alternatively, perhaps master is not your current branch. Perhaps your .git/HEAD contains instead ref: refs/heads/branch. In this case, your refs/heads/master is safely fast-forwarded the way git fetch would have done even without --update-head-ok. Your .git/FETCH_HEAD contains the same hash ID as your updated master, and your git pull runs git merge to attempt a merge—which may or may not be a fast-forward operation, depending on the commit to which your branch name branch points right now. If the merge succeeds, Git either makes a commit (real merge) or adjusts index and work-tree as before (fast-forward "merge") and writes the appropriate hash ID into .git/refs/heads/branch. If the merge fails, Git stops with a merge conflict and makes you clean up the mess as usual.

The last possible case is that your HEAD is detached, but this works in the same way as for the ref: refs/heads/branch case. The only difference is that the new hash ID, when all is said and done, goes straight into .git/HEAD rather than into .git/refs/heads/branch.