What is the differrence between `* text=auto` and

2020-06-04 13:02发布

问题:

I am looking again and again at the documentation of .gitattributes but I cannot find a clear answer on what is the differrence between these two:

* text=auto

* text eol=lf

Also is text=auto intended only for use with * or it can also be used with specific extensions? In such a case what is the differrence?

*.txt text=auto

*.txt text eol=lf

回答1:

TL;DR

The eol=lf setting overrides any text setting, and since you have chosen to apply this to every path, only the eol=lf setting will matter, if you use that.

Full explanation

Let's start with this and work outwards:

Also is text=auto intended only for use with * or it can also be used with specific extensions?

Patterns may include extensions. The text=auto part is an attribute setting, and the patterns select which attributes to apply to which file(s).

How Git reads a .gitattributes file

Each line in .gitattributes matches, or does not match, some path name such as dir1/dir2/file.ext or README.md or whatever. As the gitattributes documentation says:

Each line in gitattributes file is of form:

pattern attr1 attr2 ...

That is, a pattern followed by an attributes list, separated by whitespaces. Leading and trailing whitespaces are ignored. Lines that begin with # are ignored. Patterns that begin with a double quote are quoted in C style. When the pattern matches the path in question, the attributes listed on the line are given to the path.

Hence, * is the pattern. These "patterns" are the same as those in .gitignore files except that negative patterns are disallowed. Thus, you can use patterns like *.txt and *.jpg to match file name extensions, or patterns like dir1/* to match files within a specific directory (although there is another way to do this: like .gitignore files, you can have .gitattributes files per directory, in which case they apply to files in that dierctory and its subdirectories, but not to paths higher in the tree).

Now, for text vs text=auto, and for eol=lf or not, we find the following:

Each attribute can be in one of these states for a given path:

  • Set

    The path has the attribute with special value "true"; this is specified by listing only the name of the attribute in the attribute list.

  • Unset [details snipped, but see below]

  • Set to a value

    The path has the attribute with specified string value; this is specified by listing the name of the attribute followed by an equal sign = and its value in the attribute list.

  • Unspecified

    No pattern matches the path, and nothing says if the path has or does not have the attribute, the attribute for the path is said to be Unspecified.

(This last one's wording is particularly poor, in my opinion. It really means "of all patterns matching the path, none said anything about this attribute.")

So for text, the attribute is set, and for text=auto, the attribute is set to a value. The value part in this case is auto. Since the pattern is *, it applies to all files.

This same logic applies to the eol=lf item. If, first, this eol=lf occurs in some pattern, and second, that pattern matches the file in question, then the eol attribute is set to a value, and the value is lf. Since your suggested line was * text eol=lf, this would make eol set to a value, and would make text set, but not set to a value.

If you write, in a single .gitattributes file, the two line sequence:

* text=auto
* text eol=lf

the second line's text overrides the first one's, so that text is set (but not to a value) and eol is set to a value, with the value being lf. Both lines matched, and the second line overrode the first.

If you reverse the two lines:

* text eol=lf
* text=auto

then again both lines match but now the second line only overrides the text setting, so now you have text set to auto and eol set to lf.

How the text attribute applies to files

The very next section of the gitattributes documentation says:

This attribute [text] enables and controls end-of-line normalization. ... [If it is]

  • Set ... enables end-of-line normalization and marks the path as a text file ...

  • Unset ... tells Git not to attempt any end-of-line conversion upon checkin or checkout ...

  • Set to string value "auto" ... If Git decides that the content is text ...

  • Unspecified ... Git uses the core.autocrlf configuration variable ...

(which means you have to go chase down the git config documentation to find out what core.autocrlf does, if you leave text unspecified).

You have chosen to either set it for every file, or set it to auto for every file. The former means "do conversion for every file" and the latter—the auto setting—means: Hey, Git, please decide for me whether the file is text or not. If you decide that it is text, do the conversion.

How eol=lf applies to files

Just below the description for the text setting is this description for the eol setting:

This attribute sets a specific line-ending style to be used in the working directory. It enables end-of-line conversion without any content checks, effectively setting the text attribute.

  • Set to string value "crlf" ... [snipped because you set lf]

  • Set to string value "lf"

    This setting forces Git to normalize line endings to LF on checkin and prevents conversion to CRLF when the file is checked out.

So, if you have eol=lf set for a path—and with * as the pattern, it will be set for every path—Git will treat every file as text, and do conversion from CR-LF line endings to LF-only line endings on "checkin" (this is badly phrased, again: the conversion actually occurs during the git add step). Git will do nothing during checkout (this too is not perfectly phrased: the conversion—or in this case, non-conversion—happens during extraction from index to work-tree).

If you use different patterns you get different results

Note that if you choose a pattern like *.txt, then these attributes are set only for paths that match the pattern. For other paths, these attributes remain unset. You should therefore look back at the documentation and see what happens when these attributes are unset.

You can, of course, do this:

* -text
*.txt eol=lf

The first line will explicitly unset text on all files, leaving eol unspecified on all files. The second line then sets to a value eol=lf for *.txt files, overriding the unspecified value. Now Git will apply the eol=lf rules to all files whose name matches *.txt, and use the unspecified-eol and unset-text rules for all remaining files.

This special -text syntax is the stuff I snipped above. Using text=false does not unset text: it leaves text set to the string value false. This has the same effect as leaving text unspecified (not specifically unset). Using -text gives it the special unset setting.

The difference between an unset text and an unspecified text is that when text is unspecified, Git could attempt to guess—based on the core.* settings like core.autocrlf—whether to do conversions. When text is specifically unset, Git will not do any guessing: it will just not do any conversion at all, for that file.