Paths not excluded from Github language statistics

2019-05-15 12:31发布

问题:

I've already read about related SO threads here and here, as well as Github Linguist manual override, but I cannot seem to be able to exclude some top-level directories from language statistics.

At its current latest version, this repo shows a predominance of HTML code. Clicking on the HTML details, two HTML files are listed:

  • packages/NUnit.2.5.7.10213/NUnitFitTests.html
    Last indexed on 30 Dec 2016.

  • packages/NUnit.2.5.7.10213/Tools/NUnitFitTests.html
    Last indexed on 30 Dec 2016.

but those should be part of excluded paths within .gitattributes:

.nuget/* linguist-vendored
libs/* linguist-vendored
NUnit.Runners.2.6.4/* linguist-vendored
packages/* linguist-vendored             §§ <--- this one in particular
RubyInstallationFiles/* linguist-vendored

But in the same details page, the ranking at the bottom left clearly shows HTML at a lower place, while C# sits at the top:

What am I doing wrong?

Side question: among the many changes, I also removed comments from .gitattribute file, as I could not find from any reference if those are allowed or what. Does anyone know if you can have comments in there? Which format? TA

回答1:

You can check the attributes with git-check-attr and verify they're set the way you think they are.

$ git check-attr --all -- packages/NUnit.2.5.7.10213/NUnitFitTests.html
$

Seems it has no attributes. The problem appears to be that packages/* is not recursive.

$ git check-attr --all -- packages/NUnit.2.5.7.10213/
packages/NUnit.2.5.7.10213/: linguist-vendored: set

So what are the rules for patterns? Same as for gitignore.

The rules how the pattern matches paths are the same as in .gitignore files; see gitignore(5). Unlike .gitignore, negative patterns are forbidden.

What you're looking for is /**.

A trailing "/**" matches everything inside. For example, "abc/**" matches all files inside directory "abc", relative to the location of the .gitignore file, with infinite depth.

Putting that fix in...

$ cat .gitattributes 
.nuget/** linguist-vendored
libs/** linguist-vendored
NUnit.Runners.2.6.4/** linguist-vendored
packages/** linguist-vendored
RubyInstallationFiles/** linguist-vendored

And now we're good.

$ git check-attr --all packages/NUnit.2.5.7.10213/NUnitFitTests.html
packages/NUnit.2.5.7.10213/NUnitFitTests.html: linguist-vendored: set

That also answers your question about comments...

A line starting with # serves as a comment. Put a backslash ("\") in front of the first hash for patterns that begin with a hash.



回答2:

Several things can be happening:

Language statistics weren't updated yet The language detection job runs as a low-priority background job. Language statistics may take some time to update (up to a day).

You've missed some HTML file(s) Search results showing files for each language are cached and not always up-to-date. Therefore, there may be some HTML files in your repository that you forgot to vendor.


How to debug? Your best option is to run Linguist locally. If you have a working Ruby environment, this is as simple as:

gem install github-linguist
linguist /path/to/your/repository --breakdown

This command will output Linguist results with the files detected for each language and the computed percentages.


Note: Your .gitattributes syntax is correct, no need to double the asterisks. Double asterisks are not needed at the end of a path for Linguist. However, you may need them to match several directories at the beginning of a wildcarded path, e.g.:

**/NSpec/Domain/Formatters/Templates/*