Line Endings: Git merge creates duplicates without

2019-04-10 23:06发布

问题:

Git Auto Merge Issue:

When there is Same code committed in two different branches file with one of this branch code having extra CRLF/LF at start. While merging it auto merges the file creates duplicates without any conflict. Please advise earliest.

Below image shows all the possible symbols in text file. Note: Branch A does not have Line Feed(Line: 245). And Automated Merging below creates duplicates without showing conflict.

回答1:

(Note: line endings are not the culprit here.)

This case is interesting. The problem seems to be that the two sets of added lines are, in git's algorithm anyway, added at two different places. Git has no understanding of the code and simply decides that since the two (somewhat different) changes add lines to different sections of the original, it is OK to just add both of those differing changes.

One of the lessons you should take away from this is that git is not smart. It is simply following a bunch of simple rules that usually work, but just because it thinks it merged two sets of diffs successfully, does not mean that the result is correct. This is yet another reason why automated testing is a good idea, for instance.

Note: complete steps, via a script, to re-create the problem appear at the bottom of this answer.

Let's take a look at the state of BranchA and BranchB just before we ask git to merge. The crucial part is what we get with git diff when we compare the merge base (the tip of branch common, in this particular setup) with the two actual tip commits. To see these diffs, I use the three-dot form of git diff:

$ git diff BranchB...BranchA
diff --git a/demo-file b/demo-file
index 1d822d4..d222dc7 100644
--- a/demo-file
+++ b/demo-file
@@ -11,6 +11,8 @@ Note that CR-LF is not an issue
             get { return valueForKey<int?>("realPortNum") ; }
             set { takeValueForKey("realPortNum", value); }
         }
+        // ADSO-3530
+        public Decimal? tradeItemDryPhyWetQty

         public string riskMktCode
         {
$ git diff BranchA...BranchB
diff --git a/demo-file b/demo-file
index 1d822d4..52802fa 100644
--- a/demo-file
+++ b/demo-file
@@ -12,6 +12,9 @@ Note that CR-LF is not an issue
             set { takeValueForKey("realPortNum", value); }
         }

+        //ADSO-3530
+        public Decimal? tradeItemDryPhyWetQty
+
         public string riskMktCode
         {
             get { return valueForKey<string>("riskMktCode") ; }
$ 

The first command, git diff BranchB...BranchA, tells git:

  1. Find the commits identified by BranchB and BranchA. (These are the two tip-most commits on BranchB and BranchA respectively. In this case, they are also the only commits on those two branches that are not already on the common branch, since we made only the one commit exclusively on BranchA and one commit exclusively on BranchB. In many real-word situations, there might be 10, 20, or more commits on one branch and 2, 5, or even 50 or more commits on the other, but git just finds the two tip-most commits, for this step.)

  2. Find the merge base for these two commits. The merge base is a place where the two branches rejoin in history. In this case, the merge base is quite obvious: it's the commit at the tip of branch common, which is where the two branches BranchA and BranchB emerge as separate branches. The commit at the tip of common is on all three branches (and any other branches, such as the default master branch).

  3. Diff the merge-base against the second commit, i.e., the tip of BranchA.

The second command, git diff BranchA...BranchB, works very similarly. The only change is that the two input commits are selected in the other order. Git finds the same merge base, but now diffs that commit against the tip commit of BranchB.

Take another look at the diffs quoted above. There are two different diff results.

The first diff shows that git should modify a block beginning with context at line 11. There are three "above" lines of context (lines 11, 12, and 13, these being the get, set, and close brace lines), then we added the comment and function declaration lines, and then there are three lines of "below" context.

The second diff shows that git should add three lines of text in a block beginning at line 12 (not line 11). The three "above" lines of context are the set, close-brace, and blank lines, and those lines are not themselves going to be changed by the first diff (though they will have some text inserted between them). Then we added three lines (comment, function declaration, and blank line) and then we have the trailing context.

Note that git has decided that our newly added initial blank line was already present and that we, instead, added a subsequent blank line, with our additions happening at line 14, not line 13. This explains why the two additions do not conflict: as far as git is concerned, the BranchA change is "add two lines at line 11+3" and the BranchB change is "add three lines at old-line-12+3 (which is now line 14+3 after adding two lines)".

The result is that git adds both blocks of text, even though they are very similar.

Script to reproduce problem is below.

#! /bin/sh

tdir=/tmp/mergetest

die() {
    echo "fatal: $@" 1>&2
    exit 1
}

set -e
[ -d $tdir ] && die "$tdir: already exists -- hint: rm -rf $tdir"

mkdir $tdir
cd $tdir
git init
echo "This repository is for demonstrating git merge." > README
git add README
git commit -m initial

# Create common file on common branch.
git checkout -b common
cat << END > demo-file
This is a demo file,
meant to illustrate how git merge works,
why git merge is not very bright,
and why it is therefore necessary to INSPECT THE MERGE RESULTS
(automated tests are good).
The next few lines are not line 241 through 250 here,
but do match the original sample input.
Note that CR-LF is not an issue
(this host Unix-ish system uses simple newlines).
        {
            get { return valueForKey<int?>("realPortNum") ; }
            set { takeValueForKey("realPortNum", value); }
        }

        public string riskMktCode
        {
            get { return valueForKey<string>("riskMktCode") ; }
            set { 
                takeValueForKey("riskMktCode", value);
Finally, we have some
trailing text so as to provide
plenty of context area for git,
when it is doing its comparisons of the
merge-base version of the file
against the two branch versions.
END
git add demo-file
git commit -m 'create common base'

# Set variable to two-line form that we will add to both files.
samepart="        // ADSO-3530
        public Decimal? tradeItemDryPhyWetQty"

# Make version on BranchA with two added lines.
git checkout -b BranchA
ed - demo-file << END
13a
$samepart
.
w
q
END
git add demo-file
git commit -m 'branch A: add declaration for tradeItemDryPhyWetQty'

# Make alternate version on BranchB with three added lines;
# note that we start from the common base.
git checkout -b BranchB common
ed - demo-file << END
13a

$samepart
.
w
q
END
git add demo-file
git commit -m 'branch B: add declaration for tradeItemDryPhyWetQty'

# Show which commit is the merge-base.
mergebase=$(git merge-base BranchA BranchB)
echo "The merge base is commit $(git rev-parse --short $mergebase)".

# View diffs.  Could use "git diff $mergebase BranchA" here.
echo "Here is what we added in BranchA, vs the common base:"
git diff BranchB...BranchA

# Could use "git diff $mergebase BranchB" here.
echo "And, here is what we added in BranchB, vs the common base:"
git diff BranchA...BranchB

echo "Now we merge the two (on BranchA in this case)"
git checkout BranchA
git merge --no-edit BranchB

echo "Comparing the result to the merge base, we get:"
git diff $mergebase HEAD