.gitattributes: merge=ours strategy vs. fast-forwa

2020-03-29 01:06发布

问题:

If I am in such a git situation:

* da6a750 (A) Further in A, okay for merging back into master
*   bf27b58 Merge branch 'master' into A
|\  
| * 86294d1 (HEAD -> master) Development on master
* | abe6b8a Welcome to branch A
|/  
* 589517c First commit

On the master branch, three files:

./development:

development on master
initial

./specific:

master branch

./.gitattributes:

specific merge=ours

On A branch, three files as well:

./development:

development on master
initial
development in A
further in A, okay for merging into master

./specific:

Welcome to branch A

./.gitattributes:

specific merge=ours

When I merged master into A to produce commit bf27b58, I was happy that the ./specific file was not changed. Because I need it to be kept as it is on branch A.

However, I would like now to merge further A changes into master in such a way that, in the newly produced commit, the ./specific file would be the same as in commit 86294d1.


My guess was that specific merge=ours would guarantee this, but it seems not to be the case. I have tried running:

$ git merge A
Updating 86294d1..da6a750
Fast-forward
 development | 2 ++
 specific    | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

.. with no success. A fast-forward merging has been triggered and I get myself with A and master both pointing to da6a750. In other words, I have a master commit displaying Welcome to branch A in the ./specific file:

* da6a750 (HEAD -> master, A) Further in A, okay for merging into master
*   bf27b58 Merge branch 'master' into A
|\  
| * 86294d1 Development on master
* | abe6b8a Welcome to branch A
|/  
* 589517c First commit

.. which is not what I want.


Instead, I have tried running:

$ git merge --no-ff A
Merge made by the 'recursive' strategy.
 development | 2 ++
 specific    | 2 +-
 2 files changed, 3 insertions(+), 1 deletion(-)

.. with no success either : a new commit 02c9c10 is produced with the same problem:

*   02c9c10 (HEAD -> master) Merge branch 'A'
|\  
| * da6a750 (A) Further in A, okay for merging into master
| *   bf27b58 Merge branch 'master' into A
| |\  
| |/  
|/|   
* | 86294d1 Development on master
| * abe6b8a Welcome to branch A
|/  
* 589517c First commit

.. which is not what I want because 02c9c10(master):./specific displays Welcome to branch A.

(so 02c9c10 and da6a750 are strictly identical, they just hash differently because of the commit message, I guess).


How come specific merge=ours seems not taken into account in this case?
How do I make it work without needing to manually git merge --no-ff --no-commit && git checkout master specific && git commit?

回答1:

TL;DR: it's not actually the fast forward itself

Your question comes down to: "why isn't Git obeying my custom merge direction?" In fact, this problem can occur with any merge, and any custom merge driver. The fact that this merge can be done as a fast forward operation merely guarantees that you (with your particular case) will hit the problem.

The reason boils down to the fact that any custom .gitattributes merge driver, including merge=ours, is invoked only when Git believes there is "something to merge". This does not seem so bad until you realize what it takes for Git to have such a belief.

Sidebar: merge strategies

It's worth mentioning here, as a side-bar, Git's -s strategy argument to git merge. These strategies take over the whole process, including the "find the merge base" step—plus everything after that—and hence can do their own thing, which includes ignoring .gitattributes entirely. Obviously if a strategy ignores your .gitattributes, setting a custom merge driver or mode there won't help.

Therefore, we're looking only at the -s strategies that do use a merge base and two of what Git calls heads (which we'll label "ours" and "theirs"), and do use .gitattributes. There are three of those built in to Git—recursive, resolve, and subtree—but they all work the same here, with respect to what gets merged and what happens with custom merge drivers. (The other two built-in merge strategies, ours and octopus, either don't bother with a merge base and a "theirs" at all, or—for octopus—have more than two heads, so that there is no clear notion of "ours" and "theirs".)

One merge base and two heads

So, now that we have settled on the built in merges that have one merge base commit and two head commits, we can look at what it means for Git to think, in its tiny little pre-programmed Gitty way, that there is something to merge.

The two heads are easier to define. One of them, the one we call "ours", is just HEAD itself. The other is whatever argument we pass to git merge:

git merge A

means "ours" is HEAD and "theirs" is the commit identified by A.

Here is your git log --all --decorate --oneline --graph output again (thanks, by the way, for including that—it's critical for most merges!):

* da6a750 (A) Further in A, okay for merging back into master
*   bf27b58 Merge branch 'master' into A
|\  
| * 86294d1 (HEAD -> master) Development on master
* | abe6b8a Welcome to branch A
|/  
* 589517c First commit

so we can say that the two heads are commit 86294d1 (HEAD or master or just "ours") and commit da6a750 (A or just "theirs").

The merge base is whatever commit they first share in terms of their graph history, i.e., starting from both heads, work backwards in history if needed until you find a commit that they have in common, that you can reach from both heads. So we start from da6a750, work backwards one step to bf27b58, then work backwards one more step to both 86294d1 and abe6b8a. Meanwhile, we start from 86294d1 and ... oh look we've hit a common commit already! :-)

Since the merge base is one of the two heads, normally we'd either get a fast forward, or a complaint that there is nothing to merge. Since the merge base is the "ours" head, of those two options, Git would pick the fast forward operation. Using --no-ff tells Git: don't pick that, go ahead and do a full blown merge after all.

Now, the fact that the merge base is the "ours" commit guarantees we will have your problem, but in fact, we could have your problem even if the merge base were not the "ours" commit. Let's take a look at what's inside a commit, at the next level down of what Git needs and does when it works on both git diff and git merge—but first, let's think about what git merge is supposed to do.

The goal of a merge is to combine work

As a general rule, the idea when running git merge is that we want to take two sets of work—things we did on our branch in our commits, and things "they", whoever they are, did on their branch in their commits—and produce a new commit that is the best of both worlds: that takes any good stuff we did, plus any good stuff they did.

If we draw the graph horizontally instead of vertically, with older commits at the left and newer ones at the right, we can draw this:

          o--o--o--...--H   <-- ours
         /
...--o--B
         \
          o-----...-----T   <-- theirs

where each o is a commit, and so are B, H, and T. Commit B is the merge base, where the two forks in this graph rejoin in the "past" (leftward) direction. H is our (HEAD) commit and T is the head / tip commit of their branch. How, then, can we combine our work with their work?

Git's answer is to run two git diffs:

git diff B H     # find out what we did
git diff B T     # find out what they did

Then it can combine these two diffs:

  • Wherever we added something—some lines of text—to some files, Git should make the final result have those added lines in those files. Wherever we deleted some lines of text in some files, it should make the final result have those lines deleted.

    Because git diff expresses the differences as "delete this and add that" (even for differences that change this to that), that covers everything git diff says.

  • Likewise, wherever they added lines, Git should make the final result have the added lines. Wherever they deleted lines, Git should make the final result have those same deletions.

  • To take care of a very common case, if we and they made the exact same change—deleting the same original lines, and/or adding the same replacements—Git takes only one copy of this.

  • And of course, if there's a place where we both touched the same lines, but in different ways, Git just throws up its metaphorical hands, exclaims "Oy vey!", and declares a merge conflict.

    (It's these merge conflicts that give us the most headaches, so most of the twisty knobs Git gives us are designed for dealing with those conflicts in some way. That's mostly true of .gitattributes merge attributes, too—though that's not directly relevant to our problem here.)

Now, all this combining is a lot of work, so to make Git go fast, there's a short-cut.

What's inside a raw commit for git merge to git diff

We can look at any commit object, or indeed any Git object at all, with git cat-file -p:

$ git cat-file -p HEAD
tree 5bc304073b94505cd3f6716829c4cec5a7474762
parent 29257c2c82dca881c4cc65765392a32e46264fbe
author Chris Torek <chris.torek@gmail.com> 1490287144 -0700
committer Chris Torek <chris.torek@gmail.com> 1490297185 -0700

insert early footnote on Git branch creation

In the "about version control" chapter section that introduces

(I snipped the rest off here).

The more interesting part here is actually the tree, so let's view some of that:

$ git cat-file -p 5bc304073b94505cd3f6716829c4cec5a7474762
100644 blob 8d1519c435c4da5a65228785fa7ba7033fe011ff    .gitignore
100644 blob 66c9d22a735ee9d8da7f7ed49599583aa642842f    Makefile
100644 blob c9c824fa6668e45976c4fe8a10e4d5c25e272f0c    about.tex
100644 blob 1757109f5aa921ecf9a8051180c25f09e1496c07    aboutvc.tex

(again I snipped things off here).

Each of those raw hash IDs for each blob object—i.e., stored file version—tells Git which version goes with this commit. (More precisely, that's the file version for this tree object, but this tree goes with this commit, so it amounts to the same thing.)

Git can, and in fact has to, extract these blob hash IDs for each of the three commits—the merge base, "ours", and "theirs". The hash IDs are how it will be able to diff the old and new versions of files like aboutvc.tex (in my case) or specific (in yours). But there is an interesting thing about these hash IDs: they're based entirely on the contents of the object.1 If two files in two different commits are exactly, completely, 100% bit-for-bit identical, they have the same hash and are stored in the repository just once. This means that no matter how many commits have a copy of that particular version of that file, there's only one copy stored in the database.


1In fact, they are cryptographic hashes of the object contents, including the little type-and-size header Git sticks on the front of each object. That header is why the now-famous SHA-1 hash collision is not an immediate problem for Git.


Same hash => problem

This fast hash comparison—the fact that the same hash means "same version of that file"—means that git diff and git merge can immediately and easily tell that there's no change to some file, from base to ours, or base to theirs ... and this is precisely where merge=ours goes wrong. Git looks at base-vs-ours, and base-vs-theirs. One pair has the same hash. One pair has a different hash.

At this point, Git simply assumes that the right answer, regardless of merge strategy or turney-knob setting in .gitattributes, is to take the file from whichever head has a different hash. For most files, in most cases, that's the right answer. But if we have defined a custom merge driver, or set merge=ours, it might be the wrong answer.

When the one head that's different is "theirs", and the custom merge direction is "keep ours", it's the wrong answer. That's true no matter what commit is chosen as the merge base, but when the merge base is HEAD—is our commit—then all the hashes, in the diff from base to ours, are the same, and the result is always "their version of the file".

That, in fact, is why a fast forward is possible in the first place: the final merged tree is always just their tree. Git, in effect, ignores all the custom directions in .gitattributes. That remains true even if you force a real merge rather than a fast-forward-non-merge "merge".

Perhaps Git should check for custom merge drivers or merge=ours directives, and disable this short-cut, at least for real (non-fast-forward) merges. But it doesn't, and therefore you will have this problem. You will also have this problem for other cases, where there's a real merge to be done, but the file is modified only in the base-to-theirs comparison.

One last sidebar: don't do this for configuration files

People often want to use this merge=ours to make sure that configuration files stored on a branch are kept the way they are on that branch. This is nearly always the wrong overall strategy: instead, configuration files should be omitted entirely from version control, or at least from the version control of this particular repository. Instead of committing, e.g., config.ini or config.php, commit a config.ini.sample or config.default.php or some such. Copy this configuration to the "real config", or read it as a secondary strategy if the "real" configuration is missing or incomplete.

This gives you a way to version configurations (sample and/or default ones) in general, without versioning the specific run-time configuration of someone using this repository as the place from which they run the software / app itself. Should the user wish to version-control her particular configuration, she can store that in a separate repository, and replace config.ini with (e.g.) a symbolic link to ../myconfigs/fooapp.ini, which is where she has her configurations versioned.

(A similar trick is to get the configuration from $HOME/.gitconfig or /usr/local/etc/fooapp.ini. That is, store the configuration separately in the first place. Again, if you want or need some sort of default configuration, you can keep that versioned with the software, but the user's own configuration is separate, and not under your own version control at all.)