I am trying to produce a list of the files that were changed in a specific commit. The problem is, that every file has the version number in a comment at the top of the file - and since this commit introduces a new version, that means that every file has changed.
I don't care about the changed comments, so I would like to have git diff ignore all lines that match ^\s*\*.*$
, as these are all comments (part of /* */).
I cannot find any way to tell git diff to ignore specific lines.
I have already tried setting a textconv attribute to cause git to pass the files to sed before diffing them, so that sed can strip out the offending lines - the problem with this, is that git diff --name-status does not actually diff the files, just compares the hashes, and of course all the hashes have changed.
Is there any way to do this?
Here is a solution that is working well for me. I've written up the solution and some additional missing documentation on the
git (log|diff) -G<regex>
option.Basically using the same solution as above, but specifically for comments that start with a
*
or a#
, and sometimes a space before the*
... But it still needs to allow#ifdef
,#include
, etc changes.Look ahead and look behind do not seem to be supported by the
-G
option, nor does the?
in general, and I have had problems with using*
, too.+
seems to be working well, though.(note, tested on Git v2.7.0)
Multi-Line Comment Version
-w
ignore whitespace-G
only show diff lines that match the following regex(^[^\*# /])
any line that does not start with a star or a hash or a space(^#\w)
any line that starts with#
followed by a letter(^\s+[^\*#/])
any line that starts with some whitespace followed by a comment characterBasically an svn hook modifies every file in and out right now and modifies multi-line comment blocks on every file. Now I can diff my changes against svn without the fyi info that svn drops in the comments.
Technically this will allow for python and bash comments like
#TODO
to be shown in the diff, and if a division operator started on a new line in c++ it could be ignored:Also the documentation on
-G
in git seemed pretty lacking, so the info here should help:git diff -G<regex>
(note, tested on Git v2.7.0)
-G
uses basic regex.?
,*
,!
,{
,}
regex syntax.()
and OR-ing groups works with|
.\s
,\W
, etc are supported.^$
work.Excluded Files v Excluded Diffs
Note that the
-G
option filters the files that will be diffed.But if a file gets "diffed" those lines that were "excluded/included" before will all be shown in the diff.
Examples
Only show file differences with at least one line that mentions
foo
.Show file differences for everything except lines that start with a
#
Show files that have differences mentioning
FIXME
orTODO
See also
git log -G
,git grep
,git log -S
,--pickaxe-regex
,--pickaxe-all
UPDATE: Which regex tool is in use by the -G option?
https://github.com/git/git/search?utf8=%E2%9C%93&q=regcomp&type=
https://github.com/git/git/blob/master/diffcore-pickaxe.c
http://man7.org/linux/man-pages/man3/regexec.3.html
// ...
Hope that helps everyone.
And specify a regex that does NOT match your version number line.
I found it easiest to use
git difftool
to launch an external diff:Found a solution. I can use this command:
To show the files that have more than 1 line changed between commits, which eliminates files whose only change was the version number in the comments.
Using 'grep' on the 'git diff' output:
comment line changes alone can be calculated. (A)
Using 'git diff --stat' output:
all line changes can be calculated. (B)
To get Non Comment Source Line changes (NCSL) count, subtract (A) from (B).
Explanation : In the 'git diff ' output (in which whitespace changes are ignored),
NOTE: There can be minor errors in the comment line count due to following assumptions, and the result should be taken as ballpark figure.
1.) Source files are based on C language. Makefile, Shell script files have different convention '#' to denote the comment lines and if they are part of diffset, there comment lines won't be counted.
2.) GIT convention of line change: If a line is modified, GIT sees it as that particular line is deleted and a new line is inserted there and it may look like 2 lines are changed whereas in reality one line is modified.
In the below example, new definition of 'FOO' looks like two line change.
$ git diff --stat -w abc.h
...
-#define FOO 7
+#define FOO 105
...
1 files changed, 1 insertions(+), 1 deletions(-)
$
3.) Valid comment lines not matching the pattern (or) Valid source code lines matching the pattern can cause errors in the calculation.
In the below example, "+ blah blah" line which doesn't start with '*' won't be detected as a comment line.
In the below example, "+ *ptr" line will be counted as comment line as it starts with *, though it is a valid source code line.
Perhaps a bash script like this (I didn't test the code, let me know if you could make it work or not)