How to make git diff ignore comments

2019-03-09 19:36发布

问题:

I am trying to produce a list of the files that were changed in a specific commit. The problem is, that every file has the version number in a comment at the top of the file - and since this commit introduces a new version, that means that every file has changed.

I don't care about the changed comments, so I would like to have git diff ignore all lines that match ^\s*\*.*$, as these are all comments (part of /* */).

I cannot find any way to tell git diff to ignore specific lines.

I have already tried setting a textconv attribute to cause git to pass the files to sed before diffing them, so that sed can strip out the offending lines - the problem with this, is that git diff --name-status does not actually diff the files, just compares the hashes, and of course all the hashes have changed.

Is there any way to do this?

回答1:

git diff -G <regex>

And specify a regex that does NOT match your version number line.



回答2:

Here is a solution that is working well for me. I've written up the solution and some additional missing documentation on the git (log|diff) -G<regex> option.

Basically using the same solution as above, but specifically for comments that start with a * or a #, and sometimes a space before the *... But it still needs to allow #ifdef, #include, etc changes.

Look ahead and look behind do not seem to be supported by the -G option, nor does the ? in general, and I have had problems with using *, too. + seems to be working well, though.

(note, tested on Git v2.7.0)

Multi-Line Comment Version

git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])'
  • -w ignore whitespace
  • -G only show diff lines that match the following regex
  • (^[^\*# /]) any line that does not start with a star or a hash or a space
  • (^#\w) any line that starts with # followed by a letter
  • (^\s+[^\*#/]) any line that starts with some whitespace followed by a comment character

Basically an svn hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against svn without the fyi info that svn drops in the comments.

Technically this will allow for python and bash comments like #TODO to be shown in the diff, and if a division operator started on a new line in c++ it could be ignored:

a = b
    / c;

Also the documentation on -G in git seemed pretty lacking, so the info here should help:

git diff -G<regex>

-G<regex>

Look for differences whose patch text contains added/removed lines that match <regex>.

To illustrate the difference between -S<regex> --pickaxe-regex and -G<regex>,
consider a commit with the following diff in the same file:

+    return !regexec(regexp, two->ptr, 1, &regmatch, 0);
...
-    hit = !regexec(regexp, mf2.ptr, 1, &regmatch, 0);

While git log -G"regexec\(regexp" will show this commit,
git log -S"regexec\(regexp" --pickaxe-regex will not
(because the number of occurrences of that string did not change).

See the pickaxe entry in gitdiffcore(7) for more information.

(note, tested on Git v2.7.0)

  • -G uses basic regex.
  • No support for ?, *, !, {, } regex syntax.
  • Grouping with () and OR-ing groups works with |.
  • Wild card characters such as \s, \W, etc are supported.
  • Look-ahead and look-behind are not supported.
  • Beginning and ending line anchors ^$ work.
  • Feature has been available since Git 1.7.4.

Excluded Files v Excluded Diffs

Note that the -G option filters the files that will be diffed.

But if a file gets "diffed" those lines that were "excluded/included" before will all be shown in the diff.

Examples

Only show file differences with at least one line that mentions foo.

git diff -G'foo'

Show file differences for everything except lines that start with a #

git diff -G'^[^#]'

Show files that have differences mentioning FIXME or TODO

git diff -G`(FIXME)|(TODO)`

See also git log -G, git grep, git log -S, --pickaxe-regex, --pickaxe-all

UPDATE: Which regex tool is in use by the -G option?

https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=

https://github.com/git/git/blob/master/diffcore-pickaxe.c

if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
    int cflags = REG_EXTENDED | REG_NEWLINE;
    if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
        cflags |= REG_ICASE;
    regcomp_or_die(&regex, needle, cflags);
    regexp = &regex;

// and in the regcom_or_die function
regcomp(regex, needle, cflags);

http://man7.org/linux/man-pages/man3/regexec.3.html

   REG_EXTENDED
          Use POSIX Extended Regular Expression syntax when interpreting
          regex.  If not set, POSIX Basic Regular Expression syntax is
          used.

// ...

   REG_NEWLINE
          Match-any-character operators don't match a newline.

          A nonmatching list ([^...])  not containing a newline does not
          match a newline.

          Match-beginning-of-line operator (^) matches the empty string
          immediately after a newline, regardless of whether eflags, the
          execution flags of regexec(), contains REG_NOTBOL.

          Match-end-of-line operator ($) matches the empty string
          immediately before a newline, regardless of whether eflags
          contains REG_NOTEOL.

Hope that helps everyone.



回答3:

I found it easiest to use git difftool to launch an external diff:

git difftool -y -x "diff -I '<regex>'"


回答4:

Found a solution. I can use this command:

git diff --numstat --minimal <commit> <commit> | sed '/^[1-]\s\+[1-]\s\+.*/d'

To show the files that have more than 1 line changed between commits, which eliminates files whose only change was the version number in the comments.



回答5:

Perhaps a bash script like this (I didn't test the code, let me know if you could make it work or not)

#!/bin/bash
git diff --name-only "$@" | while read FPATH ; do
    LINES_COUNT=`git diff --textconv "$FPATH" "$@" | sed '/^[1-]\s\+[1-]\s\+.*/d' | wc -l`
    if [ $LINES_COUNT -gt 0 ] ; then
        echo -e "$LINES_COUNT\t$FPATH"
    fi
done | sort -n


回答6:

Using 'grep' on the 'git diff' output:

git diff -w | grep -c -E "(^[+-]\s*(\/)?\*)|(^[+-]\s*\/\/)"

comment line changes alone can be calculated. (A)

Using 'git diff --stat' output:

git diff -w --stat

all line changes can be calculated. (B)

To get Non Comment Source Line changes (NCSL) count, subtract (A) from (B).

Explanation : In the 'git diff ' output (in which whitespace changes are ignored),

  • Look out for a line which start with either '+' or '-', which means modified line.
  • There can be optional white-space characters following this. '\s*'
  • Then look for comment line pattern '/*' (or) just '*' (or) '//'.
  • Since, '-c' option is given with grep, just print the count. Remove '-c' option to see the comments alone in the diffs.

NOTE: There can be minor errors in the comment line count due to following assumptions, and the result should be taken as ballpark figure.

  • 1.) Source files are based on C language. Makefile, Shell script files have different convention '#' to denote the comment lines and if they are part of diffset, there comment lines won't be counted.

  • 2.) GIT convention of line change: If a line is modified, GIT sees it as that particular line is deleted and a new line is inserted there and it may look like 2 lines are changed whereas in reality one line is modified.

    In the below example, new definition of 'FOO' looks like two line change.

    $ git diff --stat -w abc.h
    ...
    -#define FOO 7
    +#define FOO 105
    ...
    1 files changed, 1 insertions(+), 1 deletions(-)
    $

  • 3.) Valid comment lines not matching the pattern (or) Valid source code lines matching the pattern can cause errors in the calculation.

    In the below example, "+ blah blah" line which doesn't start with '*' won't be detected as a comment line.

           + /*   
           +  blah blah  
           + *   
           + */
    

    In the below example, "+ *ptr" line will be counted as comment line as it starts with *, though it is a valid source code line.

            + printf("\n %p",  
            +         *ptr);