git checkout --ours does not remove files from unm

2019-01-31 14:11发布

问题:

Hi I need to merge two branches like this.

This is just an example what is happening, I work with hundreds of files which need resolution.

git merge branch1
...conflicts...
git status
....
# Unmerged paths:
#   (use "git add/rm <file>..." as appropriate to mark resolution)
#
#   both added:   file1
#   both added:   file2
#   both added:   file3
#   both added:   file4
git checkout --ours file1
git chechout --theirs file2
git checkout --ours file3
git chechout --theirs file4
git commit -a -m "this should work"
U   file1
fatal: 'commit' is not possible because you have unmerged files.
Please, fix them up in the work tree, and then use 'git add/rm <file>' as
appropriate to mark resolution and make a commit, or use 'git commit -a'.

When I do git merge tool, there is the correct content just from the 'ours' branch and when I save it, the file disappears from the unmerged list. But since I have hundreds of files like this, this is not an option.

I thought that this approach will bring me where I want to be - easily say which file from which branch I want to keep.

But I guess I misunderstood the concept of the git checkout --ours/theirs commands after a merge.

Could you please provide me some info, how to handle this situation? I use git 1.7.1

回答1:

It's mostly a quirk of how git checkout works internally. The Git folks have a tendency to let implementation dictate interface.

The end result is that after git checkout with --ours or --theirs, if you want to resolve the conflict, you must also git add the same paths:

git checkout --ours -- path/to/file
git add path/to/file

But this is not the case with other forms of git checkout:

git checkout HEAD -- path/to/file

or:

git checkout MERGE_HEAD -- path/to/file

(these are subtly different in multiple ways). In some cases this means the fastest way is to use the middle command. (Incidentally, the -- here is to make sure Git can distinguish between a path name and an option or branch name. For instance, if you have a file named --theirs, it will look like an option, but -- will tell Git that no, it's really a path name.)

To see how this all works internally, and why you need the separate git add except when you don't, read on. :-) First, let's do a quick review of the merge process.

Merge, part 1: how merge begins

When you run:

$ git merge commit-or-branch

the first thing Git does is find the merge base between the named commit and the current (HEAD) commit. (Note that if you supply a branch name here, as in git merge otherbranch, Git translates that to a commit-ID, namely the tip of the branch. It saves the branch name argument for the eventual merge log message, but needs the commit ID to find the merge base.)

Having found a suitable merge base,1 Git then produces two git diff listings: one from the merge base to HEAD, and one from the merge base to the commit you identified. This gets "what you changed" and "what they changed", which Git now has to combine.

For files where you made a change and they didn't, Git can just take your version.

For files where they made a change and you didn't, Git can just take their version.

For files where you both made changes, Git must do some real merge work. It compares the changes, line by line, to see if it can combine them. If it can combine them, it does so. If the merges seem—based, again, on purely line-by-line comparisons—to conflict, Git declares a "merge conflict" for that file (and goes ahead and tries to merge anyway, but leaves conflict markers in place).

Once Git has merged everything it can, it either finishes the merge—because there were no conflicts—or stops with a merge conflict.


1The merge base is obvious if you draw the commit graph. Without drawing the graph, it's kind of mysterious. This is why I always tell people to draw the graph, or at least, as much of it as needed to make sense.

The technical definition is that the merge base is the "lowest common ancestor" (LCA) node in the commit graph. In less technical terms, it's the most recent commit where your current branch joins up with the branch you're merging. That is, by recording each merge's parent commit IDs, Git is able to find the last time the two branches were together, and hence figure out both what you did, and what they did. For this to work at all, though, Git has to record each merge. Specifically, it has to write both (or all, for so-called "octopus" merges) parent IDs into the new merge commit.

In some cases, there's more than one suitable merge base. The process then depends on your merge strategy. The default recursive strategy will merge the multiple merge bases to produce a "virtual merge base". This is rare enough that you can ignore it for now.


Merge, part 2: stopping with a conflict, and Git's "index"

When Git does stop this way, it needs to give you a chance to resolve the conflicts. But this also means that it needs to record the conflicts, and this is where Git's "index"—also called "the staging area", and sometimes "the cache"—really earns its existence.

For every staged file in your work-tree, the index has up to four entries, rather than just one entry. At most three of these are ever actually in use, but there are four slots, which are numbered, 0 through 3.

Slot zero is used for resolved files. When you're working with Git and not doing merges, only slot zero gets used. When you edit a file in the work tree, it has "unstaged changes", and then you git add the file and the changes are written to the repository, updating slot zero; your changes are now "staged".

Slots 1-3 are used for unresolved files. When git merge has to stop with a merge conflict, it leaves slot zero empty, and writes everything to slots 1, 2, and 3. The merge base version of the file is recorded in slot 1, the --ours version is recorded in slot 2, and the --theirs version is recorded in slot 3. These nonzero slot entries are how Git knows that the file is unresolved.2

As you resolve files, you git add them, which erases all the slot 1-3 entries and writes a slot-zero, staged-for-commit entry. This is how Git knows the file is resolved and ready for a new commit. (Or, in some cases, you git rm the file, in which case Git writes a special "removed" value to slot zero, again erasing slots 1-3.)


2There are a few cases where one of these three slots is also empty. Suppose file new does not exist in the merge base and is added in both ours and theirs. Then :1:new is left empty and :2:new and :3:new record the add/add conflict. Or, suppose file f does exist in the base, is modified in our HEAD branch, and is removed in their branch. Then :1:f records the base file, :2:f records our version of the file, and :3:f is empty, recording the modify/delete conflict.

For modify/modify conflicts, all three slots are occupied; only when one file is missing is one of these slots empty. It's logically impossible to have two empty slots: there's no such thing as a delete/delete conflict, nor a nocreate/add conflict. But there is some weirdness with rename conflicts, which I've omitted here as this answer is long enough! In any case, it's the very existence of some value(s) in slots 1, 2, and/or 3 that mark the file as unresolved.


Merge, part 3: finishing the merge

Once all files are resolved—all entries are only in the zero-numbered slots—you can git commit the merge result. If git merge is able to do the merge without assistance, it normally runs git commit for you, but the actual commit is still done by running git commit.

The commit command works the same way as it always does: it turns the index contents into tree objects and writes a new commit. The only thing special about a merge commit is that it has more than one parent commit ID.3 The extra parents come from a file git merge leaves behind. The default merge message also comes from a file (a separate file in practice, although in principle they could have been combined).

Note that in all cases, the new commit's contents are determined by the index's contents. Moreover, once the new commit is done, the index is still full: it still contains the same contents. By default, git commit won't make another new commit at this point because it sees that the index matches the HEAD commit. It calls this "empty" and requires --allow-empty to make an extra commit, but the index is not empty at all. It's still quite full—it just is full of the same thing as the HEAD commit.


3This assumes you are making a real merge, not a squash merge. When making a squash merge, git merge deliberately does not write the extra parent ID to the extra file, so that the new merge commit has only a single parent. (For some reason, git merge --squash also suppresses the automatic commit, as if it included the --no-commit flag as well. It's not clear why, since you could just run git merge --squash --no-commit if you want the automatic commit suppressed.)

A squash merge does not record its other parent(s). This means that if we go to merge again, some time later, Git won't know where to start the diffs from. This means you should generally only squash-merge if you plan to abandon the other branch. (There are some tricky ways to combine squash merges and real merges but they're well out of the scope of this answer.)


How git checkout branch uses the index

With all that out of the way, we then have to look at how git checkout uses Git's index, too. Remember, in normal usage, only slot zero is occupied, and the index has one entry for every staged file. Moreover, that entry matches the current (HEAD) commit unless you've modified the file and git add-ed the result. It also matches the file in the work-tree unless you've modified the file.4

If you are on some branch and you git checkout some other branch, Git tries to switch to the other branch. For this to succeed, Git has to replace the index entry for each file with the entry that goes with the other branch.

Let's say, just for concreteness, that you're on master and you are doing git checkout branch. Git will compare each current index entry with the index entry it would need to be on the tip-most commit of branch branch. That is, for file README.txt, are the master contents the same as those for branch, or are they different?

If the contents are the same, Git can take it easy and just move on to the next file. If the contents are different, Git has to do something to the index entry. (It's around this point that Git checks to see if the work-tree file differs from the index entry, too.)

Specifically, in the case where branch's file differs from master's, git checkout has to replace the index entry with the version from branch—or, if README.txt doesn't exist in the tip commit of branch, Git has to remove the index entry. Moreover, if git checkout is going to modify or remove the index entry, it also needs to modify or remove the work-tree file. Git makes sure this is a safe thing to do, i.e., that the work-tree file matches the master commit's file, before it will let you switch branches.

In other words, this is how (and why) Git finds out whether it's OK to change branches—whether you have modifications that would be clobbered by switching from master to branch. If you have modifications in your work-tree, but the modified files are the same in both branches, Git can just leave the modifications in the index and work-tree. It can and will alert you to these modified files "carried over" into the new branch: easy, since it had to check for this anyway.

Once all the tests have passed and Git has decided that it's OK to switch from master to branch—or if you specified --forcegit checkout actually updates the index with all the changed (or removed) files, and updates the work-tree to match.

Note that all this action has used slot zero. There are no slot 1-3 entries at all, so that git checkout does not have to remove any such things. You're not in the middle of a conflicted merge, and you ran git checkout branch to not just check out one file, but rather an entire set of files and switch branches.

Note also that you can, instead of checking out a branch, check out a specific commit. For instance, this is how you might look at a previous commit:

$ git log
... peruse log output ...
$ git checkout f17c393 # let's see what's in this commit

The action here is the same as for checking out a branch, except that instead of using the tip commit of the branch, Git checks out an arbitrary commit. Instead of now being "on" the new branch, you're now on no branch:5 Git gives you a "detached HEAD". To reattach your head, you must git checkout master or git checkout branch to get back "on" the branch.


4The index entry may not match the work-tree version if Git is doing special CR-LF ending modifications, or applying smudge filters. This gets pretty advanced and the best thing is to ignore this case for now. :-)

5More accurately, this puts you on an anonymous (unnamed) branch that will grow from the current commit. You will stay in detached HEAD mode if you make new commits, and as soon as you git checkout some other commit or branch, you'll switch there and Git will "abandon" the commits you've made. The point of this detached HEAD mode is both to let you look around and to let you make new commits that will just go away if you don't take special action to save them. For anyone relatively new to Git, though, having commits "just go away" is not so good—so make sure you know that you're in this "detached HEAD" mode, whenever you are in it.

The git status command will tell you if you're in detached HEAD mode. Use it often.6 If your Git is old (the OP's is 1.7.1, which is very old now), git status is not as helpful as it is in modern versions of Git, but it's still better than nothing.

6Some programmers like to have key git status information encoded into each command-prompt. I personally do not go this far, but can be a good idea.


Checking out specific files, and why it sometimes resolves merge conflicts

The git checkout command has other modes of operation, though. In particular, you can run git checkout [flags etc] -- path [path ...] to check out specific files. This is where things get weird. Note that when you use this form of the command, Git does not check to make sure you are not overwriting your files.7

Now, instead of changing branches, you're telling Git to get some particular file(s) from somewhere, and drop them into the work-tree, overwriting whatever is there, if anything. The tricky question is: just where is Git getting these files?

Generally speaking, there are three places that Git keeps files:

  • in commits;8
  • in the index;
  • and in the work-tree.

The checkout command can read from either of the first two places, and always writes the result to the work-tree.

When git checkout gets a file from a commit, it first copies it to the index. Whenever it does this, it writes the file to slot zero. Writing to slot zero wipes out slots 1-3, if they are occupied. When git checkout gets a file from the index, it does not have to copy it to the index. (Of course not: it's already there!) This is how git checkout works when you are not in the middle of a merge: you can git checkout -- path/to/file to get the index version back.9

Suppose, though, that you are in the middle of a conflicted merge and are going to git checkout some path, maybe with --ours. (If you are not in the middle of a merge, there's nothing in slots 1-3, and --ours makes no sense.) So you run git checkout --ours -- path/to/file.

This git checkout gets the file from the index—in this case, from index slot 2. Since this is already in the index, Git does not write to the index, just to the work-tree. So the file is not resolved!

The same goes for git checkout --theirs: it gets the file from the index (slot 3), and does not resolve anything.

But: if you git checkout HEAD -- path/to/file, you are telling git checkout to extract from the HEAD commit. Since this is a commit, Git starts by writing the file contents to the index. This writes slot 0 and erases 1-3. And now the file is resolved!

Since, during a conflicted merge, Git records the being-merged commit's ID in MERGE_HEAD, you can also git checkout MERGE_HEAD -- path/to/file to get the file from the other commit. This, too, extracts from a commit, so it writes to the index, resolving the file.


7I often wish Git used a different front-end command for this, since we could then say, unequivocally, that git checkout is safe, that it won't overwrite files without --force. But this kind of git checkout does overwrite files, on purpose!

8This is a bit of a lie, or at least a stretch: commits don't contain files directly. Instead, commits contain a (single) pointer to a tree object. This tree object contains the IDs of additional tree objects and of blob objects. The blob objects contain the actual file contents.

The same is, in fact, true of the index as well. Each index slot contains, not the actual file contents, but rather the hash IDs of blob objects in the repository.

For our purposes, though, this doesn't really matter: we just ask Git to retrieve commit:path and it finds the trees and the blob ID for us. Or, we ask Git to retrieve :n:path and it finds the blob ID in the index entry for path for slot n. Then it gets us the file's contents, and we're good to go.

This colon-and-number syntax works everywhere in Git, while the --ours and --theirs flags only work in git checkout. The funny colon syntax is described in gitrevisions.

9The use-case for git checkout -- path is this: suppose, whether or not you are merging, you made some changes to a file, tested, found those changes worked, then ran git add on the file. Then you decided to make more changes, but have not run git add again. You test the second set of changes and find they are wrong. If only you could get the work-tree version of the file set back to the version you git add-ed just a moment ago.... Aha, you can: you git checkout -- path and Git copies the index version, from slot zero, back to the work-tree.


Subtle behavior warning

Note, though, that using --ours or --theirs has another slight subtle difference besides just the "extract from index and therefore don't resolve" behavior. Suppose that, in our conflicted merge, Git has detected that some file was renamed. That is, in the merge base, we had file doc.txt, but now in HEAD we have Documentation/doc.txt. The path we need for git checkout --ours is Documentation/doc.txt. This is also the path in the HEAD commit, so it's OK to git checkout HEAD -- Documentation/doc.txt.

But what if, in the commit we're merging, doc.txt did not get renamed? In this case, we should10 be able to git checkout --theirs -- Documentation/doc.txt to get their doc.txt from the index. But if we try to git checkout MERGE_HEAD -- Documentation/doc.txt, Git won't be able to find the file: it's not in Documentation, in the MERGE_HEAD commit. We have to git checkout MERGE_HEAD -- doc.txt to get their file ... and that would not resolve Documentation/doc.txt. In fact, it would just create ./doc.txt (if it was renamed there's almost certainly no ./doc.txt, hence "create" is a better guess than "overwrite").

Because merging uses HEAD's names, it's generally safe enough to git checkout HEAD -- path to extract-and-resolve in one step. And if you're working on resolving files and have been running git status, you should know whether they have a renamed file, and therefore whether it's safe to git checkout MERGE_HEAD -- path to extract-and-resolve in one step by discarding your own changes. But you should still be aware of this, and know what to do if there is a rename to be concerned with.


10I say "should" here, not "can", because Git currently forgets the rename a little bit too soon. So if using --theirs to get a file that you renamed in HEAD, you have to use the old name here too, and then rename the file in the work-tree.