I am trying to produce a list of the files that were changed in a specific commit. The problem is, that every file has the version number in a comment at the top of the file - and since this commit introduces a new version, that means that every file has changed.
I don't care about the changed comments, so I would like to have git diff ignore all lines that match ^\s*\*.*$
, as these are all comments (part of /* */).
I cannot find any way to tell git diff to ignore specific lines.
I have already tried setting a textconv attribute to cause git to pass the files to sed before diffing them, so that sed can strip out the offending lines - the problem with this, is that git diff --name-status does not actually diff the files, just compares the hashes, and of course all the hashes have changed.
Is there any way to do this?
git diff -G <regex>
And specify a regex that does NOT match your version number line.
Here is a solution that is working well for me. I've written up the solution and some additional missing documentation on the git (log|diff) -G<regex>
option.
Basically using the same solution as above, but specifically for comments that start with a *
or a #
, and sometimes a space before the *
... But it still needs to allow #ifdef
, #include
, etc changes.
Look ahead and look behind do not seem to be supported by the -G
option, nor does the ?
in general, and I have had problems with using *
, too. +
seems to be working well, though.
(note, tested on Git v2.7.0)
Multi-Line Comment Version
git diff -w -G'(^[^\*# /])|(^#\w)|(^\s+[^\*#/])'
-w
ignore whitespace
-G
only show diff lines that match the following regex
(^[^\*# /])
any line that does not start with a star or a hash or a space
(^#\w)
any line that starts with #
followed by a letter
(^\s+[^\*#/])
any line that starts with some whitespace followed by a comment character
Basically an svn hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against svn without the fyi info that svn drops in the comments.
Technically this will allow for python and bash comments like #TODO
to be shown in the diff, and if a division operator started on a new line in c++ it could be ignored:
a = b
/ c;
Also the documentation on -G
in git seemed pretty lacking, so the info here should help:
git diff -G<regex>
-G<regex>
Look for differences whose patch text contains added/removed lines that match <regex>
.
To illustrate the difference between -S<regex> --pickaxe-regex
and -G<regex>
,
consider a commit with the following diff in the same file:
+ return !regexec(regexp, two->ptr, 1, ®match, 0);
...
- hit = !regexec(regexp, mf2.ptr, 1, ®match, 0);
While git log -G"regexec\(regexp"
will show this commit,
git log -S"regexec\(regexp" --pickaxe-regex
will not
(because the number of occurrences of that string did not change).
See the pickaxe entry in gitdiffcore(7) for more information.
(note, tested on Git v2.7.0)
-G
uses basic regex.
- No support for
?
, *
, !
, {
, }
regex syntax.
- Grouping with
()
and OR-ing groups works with |
.
- Wild card characters such as
\s
, \W
, etc are supported.
- Look-ahead and look-behind are not supported.
- Beginning and ending line anchors
^$
work.
- Feature has been available since Git 1.7.4.
Excluded Files v Excluded Diffs
Note that the -G
option filters the files that will be diffed.
But if a file gets "diffed" those lines that were "excluded/included" before will all be shown in the diff.
Examples
Only show file differences with at least one line that mentions foo
.
git diff -G'foo'
Show file differences for everything except lines that start with a #
git diff -G'^[^#]'
Show files that have differences mentioning FIXME
or TODO
git diff -G`(FIXME)|(TODO)`
See also git log -G
, git grep
, git log -S
, --pickaxe-regex
, --pickaxe-all
UPDATE: Which regex tool is in use by the -G option?
https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=
https://github.com/git/git/blob/master/diffcore-pickaxe.c
if (opts & (DIFF_PICKAXE_REGEX | DIFF_PICKAXE_KIND_G)) {
int cflags = REG_EXTENDED | REG_NEWLINE;
if (DIFF_OPT_TST(o, PICKAXE_IGNORE_CASE))
cflags |= REG_ICASE;
regcomp_or_die(®ex, needle, cflags);
regexp = ®ex;
// and in the regcom_or_die function
regcomp(regex, needle, cflags);
http://man7.org/linux/man-pages/man3/regexec.3.html
REG_EXTENDED
Use POSIX Extended Regular Expression syntax when interpreting
regex. If not set, POSIX Basic Regular Expression syntax is
used.
// ...
REG_NEWLINE
Match-any-character operators don't match a newline.
A nonmatching list ([^...]) not containing a newline does not
match a newline.
Match-beginning-of-line operator (^) matches the empty string
immediately after a newline, regardless of whether eflags, the
execution flags of regexec(), contains REG_NOTBOL.
Match-end-of-line operator ($) matches the empty string
immediately before a newline, regardless of whether eflags
contains REG_NOTEOL.
Hope that helps everyone.
I found it easiest to use git difftool
to launch an external diff:
git difftool -y -x "diff -I '<regex>'"
Found a solution. I can use this command:
git diff --numstat --minimal <commit> <commit> | sed '/^[1-]\s\+[1-]\s\+.*/d'
To show the files that have more than 1 line changed between commits, which eliminates files whose only change was the version number in the comments.
Perhaps a bash script like this (I didn't test the code, let me know if you could make it work or not)
#!/bin/bash
git diff --name-only "$@" | while read FPATH ; do
LINES_COUNT=`git diff --textconv "$FPATH" "$@" | sed '/^[1-]\s\+[1-]\s\+.*/d' | wc -l`
if [ $LINES_COUNT -gt 0 ] ; then
echo -e "$LINES_COUNT\t$FPATH"
fi
done | sort -n
Using 'grep' on the 'git diff' output:
git diff -w | grep -c -E "(^[+-]\s*(\/)?\*)|(^[+-]\s*\/\/)"
comment line changes alone can be calculated. (A)
Using 'git diff --stat' output:
git diff -w --stat
all line changes can be calculated. (B)
To get Non Comment Source Line changes (NCSL) count, subtract (A) from (B).
Explanation :
In the 'git diff ' output (in which whitespace changes are ignored),
- Look out for a line which start with either '+' or '-', which means
modified line.
- There can be optional white-space characters following this. '\s*'
- Then look for comment line pattern '/*' (or) just '*' (or) '//'.
- Since, '-c' option is given with grep, just print the count. Remove '-c' option to see the comments alone in the diffs.
NOTE: There can be minor errors in the comment line count due to following assumptions, and the result should be taken as ballpark figure.
1.) Source files are based on C language. Makefile, Shell script files have different convention '#' to denote the comment lines and if they are part of diffset, there comment lines won't be counted.
2.) GIT convention of line change: If a line is modified, GIT sees it as that particular line is deleted and a new line is inserted there and it may look like 2 lines are changed whereas in reality one line is modified.
In the below example, new definition of 'FOO' looks like two line change.
$ git diff --stat -w abc.h
...
-#define FOO 7
+#define FOO 105
...
1 files changed, 1 insertions(+), 1 deletions(-)
$
3.) Valid comment lines not matching the pattern (or) Valid source code lines matching the pattern can cause errors in the calculation.
In the below example, "+ blah blah" line which doesn't start with '*' won't be detected as a comment line.
+ /*
+ blah blah
+ *
+ */
In the below example, "+ *ptr" line will be counted as comment line as it starts with *, though it is a valid source code line.
+ printf("\n %p",
+ *ptr);