git pull
is basically a git fetch
+ git merge
in one command
Yes—but, as you suspected, there is more to it than that.
Bennett McElwee's comment, in the answer you linked-to, actually has one of the key items. He mentions that you can:
Use fetch origin branchB:branchB
, which will fail safely if the merge isn't fast-forward.
Another is not very well documented: it's the -u
aka --update-head-ok
option in git fetch
, which git pull
sets. The documentation does define what it does, but is a bit mysterious and scary:
By default git fetch refuses to update the head which corresponds
to the current branch. This flag disables the check. This is purely
for the internal use for git pull to communicate with git fetch,
and unless you are implementing your own Porcelain you are not
supposed to use it.
This gets us to your observation:
So, I tried git pull upstream master:master
and it worked. What is interesting is that doing git pull upstream master:master
updates my fork with upstream regardless of whether I am on master or not. Whereas git fetch upstream master:master
only works when I am NOT on master branch.
This is due to that -u
flag. If you ran git fetch upstream master:master
, that would work, for some sense of the meaning work, but leave you with a different problem. The warning is there for a reason. Let's look at what that reason is, and see whether the warning is overly harsh. Warning: there is a lot here! Much of the complication below is to make up for historical mistakes, while retaining backwards compatibility.
Branch names, references, and fast-forwarding
First, let's talk about references and fast-forward operations.
In Git, a reference is just a fancy way of talking about a branch name like master
, or a tag name like v1.2
, or a remote-tracking name like origin/master
, or, well, any number of other names, all in one common and sensible fashion: we group each specific kind of name into a name space, or as a single word, namespace. Branch names live under refs/heads/
, tag names live under refs/tags/
, and so on, so that master
is really just refs/heads/master
.
Every one of these names, all of which start with refs/
, is a reference. There are a few extra references that don't start with refs
as well, although Git is a little bit erratic internally in deciding whether names like HEAD
and ORIG_HEAD
and MERGE_HEAD
are actually references.1 In the end, though, a reference mainly serves as a way to have a useful name for a Git object hash ID. Branch names in particular have a funny property: they move from one commit to another, typically in a way that Git refers to as a fast forward.
That is, given a branch with some commits, represented by uppercase letters here, and a second branch with more commits that include all the commits on the first branch:
...--E--F--G <-- branch1
\
H--I <-- branch2
Git is allowed to slide the name branch1
forward so that it points to either of the commits that were, before, reachable only through the name branch2
.2 Compare this, to, say:
...--E--F--G------J <-- branch1
\
H--I <-- branch2
If we were to move the name branch1
to point to commit I
instead of commit J
, what would happen to commit J
itself?3 This kind of motion, which leaves a commit behind, is a non-fast-forward operation on the branch name.
These names can be shortened by leaving off the refs/
part, or often, even the refs/heads/
part or the refs/tags/
part or whatever. Git will look in its reference-name database4 for the first one that matches, using the six-step rules described in the gitrevisions documentation. If you have a refs/tags/master
and a refs/heads/master
, for instance, and say master
, Git will match refs/tags/master
first and use the tag.5
1If a reference is a name that has, or can have, a reflog, then HEAD
is a reference while ORIG_HEAD
and the other *_HEAD
names are not. It's all a little fuzzy at the edges here, though.
2These commits might be reachable through more names. The important thing is that they weren't reachable through branch1
before the fast-forward, and are afterward.
3The immediate answer is actually that nothing happens, but eventually, if commit I
is not reachable through some name, Git will garbage collect the commit.
4This "database" is really just the combination of the directory .git/refs
plus the file .git/packed-refs
, at least for the moment. If Git finds both a file entry and a pathname, the pathname's hash overrides the one in the packed-refs
file.
5Exception: git checkout
tries the argument as a branch name first, and if that works, treats master
as a branch name. Everything else in Git treats it as a tag name, since prefixing with refs/tags
is step 3, vs step 4 for a branch name.
Refspecs
Now that we know that a reference is just a name pointing to a commit, and a branch name is a specific kind of reference for which fast forwards are normal everyday things, let's look at the refspec. Let's start with the most common and explainable form, which is just two reference names separated by a colon, such as master:master
or HEAD:branch
.
Git uses refspecs whenever you connect two Gits to each other, such as during git fetch
and during git push
. The name on the left is the source and the name on the right is the destination. If you are doing git fetch
, the source is the other Git repository, and the destination is your own. If you are doing git push
, the source is your repository, and the destination is theirs. (In the special case of using .
, which means this repository, both source and destination are yourself, but everything still works just as if your Git is talking to another, separate Git.)
If you use fully-qualified names (starting with refs/
), you know for sure which one you will get: branch, tag, or whatever. If you use partially-qualified or unqualified names, Git will usually figure out what you mean anyway. You will occasionally run into a case where Git can't figure out what you mean; in that case, use a fully qualified name.
You can simplify a refspec even further by omitting one of the two names. Git knows which name you omit by which side of the colon is gone: :dst
has no source name, while src:
has no destination name. If you write name
, Git treats that as name:
: a source with no destination.
What these mean varies. An empty source for git push
means delete: git push origin :branch
has your Git ask their Git to delete the name entirely. An empty destination for git push
means use the default which is normally the same branch name: git push origin branch
pushes your branch
by asking their Git to set their branch named branch
.6 Note that it's normal to git push
to their branch directly: you send them your commits, then ask them to set their refs/heads/branch
. This is quite different from the normal fetch
!
For git fetch
, an empty destination means don't update any of my references. A non-empty destination means update the reference I supply. Unlike git push
, though, the usual destination you might use here is a remote-tracking name: you would fetch their refs/heads/master
into your own refs/remotes/origin/master
. That way, your branch name master
—your refs/heads/master
—is left untouched.
For historical reasons, though, the usual form of git fetch
is just written as git fetch remote branch
, omitting the destination. In this case, Git does something seemingly self-contradictory:
- It writes the branch name update nowhere. The lack of a destination means that no (local) branch gets updated.
- It writes the hash ID into
.git/FETCH_HEAD
. Everything git fetch
fetches always goes here. This is where and how git pull
finds out what git fetch
did.
- It updates the remote-tracking name, such as
refs/remotes/origin/master
, even thought it was not told to do so. Git calls this an opportunistic update.
(Much of this is actually controlled through a default refspec that you will find in your .git/config
file.)
You can also complicate a refspec by adding a leading plus sign +
. This sets the "force" flag, which overrides the default "fast forward" check for branch name motion. This is the normal case for your remote-tracking names: you want your Git to update your refs/remotes/origin/master
to match their Git's refs/heads/master
even if that's a non-fast-forward change, so that your Git always remembers where their master
was, the last time your Git talked with their Git.
Note that the leading plus only makes sense if there is a destination to update. There are three possibilities here:
- You're creating a new name. This is generally OK.7
- You're making no change to the name: it used to map to commit hash H and the request says to set it to map to commit hash H. This is obviously OK.
- You are changing the name. This one breaks down into three more sub-possibilities:
- It's not a branch-like name at all, e.g., it's a tag and should not move. You will need a force flag to override the default rejection.8
- It's a branch-like name, and the branch motion is a fast-forward. You won't need the force flag.
- It's a branch-like name, but the motion is not a fast-forward. You will need the force flag.
This covers all the rules for updating references, except for one last rule, for which we need yet more background.
6You can complicate this by setting push.default
to upstream
. In this case, if your branch fred
has its upstream set to origin/barney
, git push origin fred
asks their Git to set their branch named barney
.
7For various cases of updates, you can write hooks that do whatever you like to verify names and/or updates.
8In Git versions before 1.8.3, Git accidentally used branch rules for tag updates. So this only applies to 1.8.3 and later.
HEAD is very special
Remember that a branch name like master
just identifies some particular commit hash:
$ git rev-parse master
468165c1d8a442994a825f3684528361727cd8c0
You have also seen that git checkout branchname
behaves one way, and git checkout --detach branchname
or git checkout hash
behaves another way, giving a scary warning about a "detached HEAD". While HEAD
acts like a reference in most ways, in a few, it's very special. In particular, HEAD
is normally a symbolic reference, in which it contains the full name of a branch name. That is:
$ git checkout master
Switched to branch 'master'
$ cat .git/HEAD
ref: refs/heads/master
tells us that the current branch name is master
: that HEAD
is attached to master
. But:
$ git checkout --detach master
HEAD is now at 468165c1d... Git 2.17
$ cat .git/HEAD
468165c1d8a442994a825f3684528361727cd8c0
after which git checkout master
puts us back on master
as usual.
What this means is that when we have a detached HEAD, Git knows which commit we have checked out, because the correct hash ID is right there, in the name HEAD
. If we were to make some arbitrary change to the value stored in refs/heads/master
, Git would still know which commit we have checked out.
But if HEAD
just contains the name master
, the only way that Git knows that the current commit is, say, 468165c1d8a442994a825f3684528361727cd8c0
, is that refs/heads/master
maps to 468165c1d8a442994a825f3684528361727cd8c0
. If we did something that changed refs/heads/master
to some other hash ID, Git would think that we have that other commit checked out.
Does this matter? Yes, it does! Let's see why:
$ git status
... nothing to commit, working tree clean
$ git rev-parse master^
1614dd0fbc8a14f488016b7855de9f0566706244
$ echo 1614dd0fbc8a14f488016b7855de9f0566706244 > .git/refs/heads/master
$ git status
...
Changes to be committed:
...
modified: GIT-VERSION-GEN
$ echo 468165c1d8a442994a825f3684528361727cd8c0 > .git/refs/heads/master
$ git status
...
nothing to commit, working tree clean
Changing the hash ID stored in master
changed Git's idea of the status!
The status involves HEAD vs index plus index vs work-tree
The git status
command runs two git diff
s (well, git diff --name-status
es, internally):
- compare HEAD vs index
- compare index vs work-tree
Remember, the index, aka the staging area or the cache, holds the contents of the current commit until we start modifying it to hold the contents of the next commit we will make. The work-tree is merely a minor helper for this whole update the index, then commit process. We only need it because the files in the index are in the special Git-only format, that most of the programs on our systems cannot use.
If HEAD
holds the raw hash ID for the current commit, then comparing HEAD
vs index stays the same regardless of what we do with our branch names. But if HEAD
holds one specific branch name, and we change that one specific branch name's value, and then do the comparison, we'll compare a different commit to our index. The index and work-tree will be unchanged, but Git's idea of the relative difference between the (different) current commit and the index will change.
This is why git fetch
refuses to update the current branch name by default. It's also why you cannot push to the current branch of a non-bare repository: that non-bare repository has an index and work-tree whose contents are probably intended to match the current commit. If you change that Git's idea of what the current commit is, by changing the hash stored in the branch name, the index and work-tree are likely to stop matching the commit.
That's not fatal—not at all, in fact. That's precisely what git reset --soft
does: it changes the branch name to which HEAD
is attached, without touching the contents in the index and the work-tree. Meanwhile git reset --mixed
changes the branch name and the index, but leaves the work-tree untouched, and git reset --hard
changes the branch name, the index, and the work-tree all in one go.
A fast-forward "merge" is basically a git reset --hard
When you use git pull
to run git fetch
and then git merge
, the git merge
step is very often able to do what Git calls a fast-forward merge. This is not a merge at all, though: it's a fast-forward operation on the current branch name, followed immediately by updating the index and work-tree contents to the new commit, the same way git reset --hard
would. The key difference is that git pull
checks—well, is supposed to check9—that no in-progress work will be destroyed by this git reset --hard
, while git reset --hard
itself deliberately does not check, to let you throw away in-progress work that you no longer want.
9Historically, git pull
keeps getting this wrong, and it gets fixed after someone loses a bunch of work. Avoid git pull
!
Putting all this together
When you run git pull upstream master:master
, Git first runs:
git fetch --update-head-ok upstream master:master
which has your Git call up another Git at the URL listed for upstream
and collect commits from them, as found via their name master
—the left side of the master:master
refspec. Your Git then updates your own master
, presumably refs/heads/master
, using the right side of the refspec. The fetch
step would normally fail if master
is your current branch—if your .git/HEAD
contains ref: refs/heads/master
—but the -u
or --update-head-ok
flag prevents the failure.
(If all goes well, your git pull
will run its second, git merge
, step:
git merge -m <message> <hash ID extracted from .git/FETCH_HEAD>
but let's finish with the first step first.)
The fast-forward rules make sure that your master
update is a fast-forward operation. If not, the fetch fails and your master
is unchanged, and the pull
stops here. So we're OK so far: your master
is fast-forwarded if and only if that's possible given the new commit(s), if any, obtained from upstream
.
At this point, if your master
has been changed and it's your current branch, your repository is now out of sync: your index and work-tree no longer match your master
. However, git fetch
has left the correct hash ID in .git/FETCH_HEAD
as well, and your git pull
now goes on to the reset-like update. This actually uses the equivalent of git read-tree
rather than git reset
, but as long as it succeeds—given the pre-pull
checks, it should succeed—the end effect is the same: your index and work-tree will match the new commit.
Alternatively, perhaps master
is not your current branch. Perhaps your .git/HEAD
contains instead ref: refs/heads/branch
. In this case, your refs/heads/master
is safely fast-forwarded the way git fetch
would have done even without --update-head-ok
. Your .git/FETCH_HEAD
contains the same hash ID as your updated master
, and your git pull
runs git merge
to attempt a merge—which may or may not be a fast-forward operation, depending on the commit to which your branch name branch
points right now. If the merge succeeds, Git either makes a commit (real merge) or adjusts index and work-tree as before (fast-forward "merge") and writes the appropriate hash ID into .git/refs/heads/branch
. If the merge fails, Git stops with a merge conflict and makes you clean up the mess as usual.
The last possible case is that your HEAD
is detached, but this works in the same way as for the ref: refs/heads/branch
case. The only difference is that the new hash ID, when all is said and done, goes straight into .git/HEAD
rather than into .git/refs/heads/branch
.