git clone git@github.com:erocarrera/pydot
(35a8d858b) in a Debian with git config core.autocrlf input
shows:
modified: test/graphs/b545.dot
modified: test/graphs/b993.dot
modified: test/graphs/cairo.dot
These files have CRLF line endings, for example:
$ file test/graphs/cairo.dot
test/graphs/cairo.dot: UTF-8 Unicode text, with CRLF line terminators
The .gitattributes
file contains:
*.py eol=lf
*.dot eol=lf
*.txt eol=lf
*.md eol=lf
*.yml eol=lf
*.png binary
*.ps binary
Changing core.autocrlf
has no effect on the status of these files. Deleting the .gitattributes
has no effect either. Changing these files with dos2unix
does not change their status (as expected), and back with unix2dos
shows no difference with diff
versus an older copy. File permissions look unchanged with ls -lsa
. Also, the files have uniform line endings as far as I can tell with vi -b
(thus it shouldn't be the case that unix2dos
or dos2unix
convert from mixed to uniform line endings, which could have explained this strange behavior). I'm using git
version 2.11.0.
What does git
think has changed?
Somewhat relevant:
- Git status shows files as changed even though contents are the same
- Files showing as modified directly after git clone
- Cloning a git repo, and it already has a dirty working directory... Whaaaaa?
I didn't find an answer that explains this behavior during my search over several discussions. This issue arose from pydot
# 163.
In more detail:
git status
On branch master
Your branch is up-to-date with 'origin/master'.
Changes not staged for commit:
(use "git add <file>..." to update what will be committed)
(use "git checkout -- <file>..." to discard changes in working directory)
modified: test/graphs/b545.dot
modified: test/graphs/b993.dot
modified: test/graphs/cairo.dot
no changes added to commit (use "git add" and/or "git commit -a")
git diff test/graphs/b993.dot
warning: CRLF will be replaced by LF in test/graphs/b993.dot.
The file will have its original line endings in your working directory.
diff --git a/test/graphs/b993.dot b/test/graphs/b993.dot
index e87e112..8aa0872 100644
--- a/test/graphs/b993.dot
+++ b/test/graphs/b993.dot
@@ -1,10 +1,10 @@
-diGraph G{
-graph [charset="utf8"]
-1[label="Umlaut"];
-2[label="ü"];
-3[label="ä"];
-4[label="ö"];
-1->2;
-1->3;
-1->4;
-}
+diGraph G{
+graph [charset="utf8"]
+1[label="Umlaut"];
+2[label="ü"];
+3[label="ä"];
+4[label="ö"];
+1->2;
+1->3;
+1->4;
+}
UPDATE:
Out of curiosity, I committed one of these files, dumped git log -1 -p > diff
, and vi -b diff
shows that git
normalized
1 commit 2021d6adc1bc8978fa08d729b3f4d565f9b89651
2 Author:
3 Date:
4
5 DRAFT: experiment to see what changed
6
7 diff --git a/test/graphs/b545.dot b/test/graphs/b545.dot
8 index ebd3e8f..2c33f91 100644
9 --- a/test/graphs/b545.dot
10 +++ b/test/graphs/b545.dot
11 @@ -1,9 +1,9 @@
12 -digraph g {^M
13 -^M
14 -"N11" ^M
15 - [^M
16 - shape = record^M
17 - label = "<p0>WFSt|1571 as Ref: 1338 D"^M
18 -]^M
19 -N11ne -> N11:p0^M
20 -}^M
21 +digraph g {
22 +
23 +"N11"
24 + [
25 + shape = record
26 + label = "<p0>WFSt|1571 as Ref: 1338 D"
27 +]
28 +N11ne -> N11:p0
29 +}
Other weird observations: git checkout
any of these files after cloning does not have any effect. After the above commit, the file b545.dot
continued to have CLRF line endings in the working directory. Applying dos2unix
followed by unix2dos
didn't make git
think that it has changed (whereas before the commit it did, probably because the committed file had CLRF line endings).
Thanks to @torek for the explanation (which agrees with my conjecture).
In summary, the asymmetric
git
configuration leads tocommit(checkout(Index))
not being the identity mapping. With CRLF in the index, this particular configuration checked out CRLF, but after the input transformations in effect (eol=lf
),git
would commit LF instead of CRLF.The root cause of this confusion was comparing the:
This doesn't show whether the file has changed. What one should compare is what
git
will commit after applying the input transformations with what is already committed. Clearly, if those two items differ, then the file has changed.Following this reasoning, one could declare the repository "unstable", in that it regards itself as modified in absence of interaction with the world. This supports avoiding this state by changing the committed files to LF, or changing the
.gitattributes
(I prefer committing LF).In this situation,
git
would commit LF for both LF and CRLF in the working directory, sodos2unix
andunix2dos
would had no effect on the commit outcome, thus neither to the file's status.This happens precisely because those files are committed with CRLF endings, yet the
.gitattributes
file says to commit them with LF-only endings.Git can and will do CRLF-vs-LF-only conversion in two places:
During extraction from index to work-tree. A file stored in a commit or in the index is always assumed to be in a "clean" state, but when extracting that file from the index, to the work-tree, Git should apply any conversions directed by
.gitattributes
in the form of "change LF-only to CRLF", for instance, and also in the form of what Git calls smudge filters.During the copy of a file from work-tree back to index. A file stored in the work-tree is in the "smudged" state, so at this point, Git should apply any "cleaning" conversions: for instance, change CR-LF to LF-only, and applying clean filters.
Note that there are two points at which these conversions can occur. This does not mean that they will occur at both points, just that these are the two possible places. As the
.gitattributes
documentation notes, the actual conversions are:eol=lf
: none on index -> work-tree; CR-LF to LF-only on work-tree -> indexeol=crlf
: LF-only to CR-LF on index -> work-tree; none on work-tree -> indexNow, a file that's actually in the repository, stored in a commit, is purely read-only. It can never change inside that commit. More precisely, the commit identifies (by hash ID) a tree that identifies (by hash ID) a blob that has whatever contents it has. These hash IDs are themselves crytographic checksums of the object contents, so they are naturally all read-only: if we try to change the contents, what we get is instead a new, different object with a new, different hash ID.
Because
git checkout
actually works by coping the raw hash IDs from the commit's tree(s) to the index, the versions of files stored in the index are necessarily identical to those stored in the commit.Hence, if somehow—regardless of the how—the committed files are in a form that disagrees with what
.gitattributes
directs Git to do, the files will become "dirty" in the work-tree regardless of the fact that you haven't done anything to them! If you were togit add
the three files in question, that would copy them from work-tree to index, and hence delete the carriage-returns from their line endings. Hence they are, ingit status
terms, modified but not yet staged for commit.Stripping out the carriage returns in the work-tree versions leaves them in the same state: they're modified with respect to what's in the index, because
git add
will now leave their LF-only line endings unchanged, producing new, different files that are in the index.A more interesting question is: How did they get into the commit(s) in the wrong state? This is not something we can answer: only those who made those commits can produce that answer. We can only speculate. One way to achieve this is to add and commit the files without a
.gitattributes
in effect, then to set the.gitattributes
into effect withoutgit add
-ing the files again. This way, the CR-LF endings get into someone's index and hence get into that user's commits, even though the.gitattributes
file now says (but did not earlier say) that any newgit add
should strip away the carriage returns.It should, but only after cloning again:
That would desactivate
core.autocrlf
globally, but this is just for testing here.