-->

How to have git-diff ignore all whitespace-change

2020-07-11 06:51发布

问题:

In another post I found the very neat

git diff --color-words='[^[:space:]]|([[:alnum:]]|UTF_8_GUARD)+'

which does a great job at compressing git-diff's output to the essential while remaining legible (especially when adding --word-diff=plain for additional [-/-] and {+/+} surrounding deletions/additions). While this does include whitespace changes, the output does not highlight them in any noticeable way, e.g. when changing the indentation of a line of python code (which is a severe change) will show up as that line with the longer indentation (before or after), but there is no highlighting whatsoever.

How can whitespace changes be hightlighted correctly, maybe by replacing whitespace by some unicode characters such as ·, and , or something more close to git diff --word-diff-regex=.'s {+ +} etc but with the smarter word separation?

回答1:

I couldn't solve your problem, but I worry that Git might be working against you here. Recall that --color-words=<regex> is a combination of --word-diff=color and --word-diff-regex=<regex>. The man git diff documentation says:

   --word-diff-regex=<regex>
       Use <regex> to decide what a word is, instead of considering runs
       of non-whitespace to be a word. Also implies --word-diff unless it
       was already enabled.

       Every non-overlapping match of the <regex> is considered a word.
       Anything between these matches is considered whitespace and
       ignored(!) for the purposes of finding differences. You may want
       to append |[^[:space:]] to your regular expression to make sure
       that it matches all non-whitespace characters. A match that
       contains a newline is silently truncated(!) at the newline.

       The regex can also be set via a diff driver or configuration
       option, see gitattributes(1) or git-config(1). Giving it
       explicitly overrides any diff driver or configuration setting.
       Diff drivers override configuration settings.

Note this part of the middle paragraph: "Anything between these matches is considered whitespace and ignored(!) for the purposes of finding differences." So, it sounds like Git trys to treat whitespace specially here, and that might be a problem.



回答2:

The best I can get so far is

git diff --color-words='[[:space:]]|([[:alnum:]]|UTF_8_GUARD)+' --word-diff=plain

Note the removed ^ in front of [:space:]!



回答3:

Here's an alternative using the substitution suggested at the question's end:

git config --global core.pager 'less --raw-control-chars'

such that unicode symbols are displayed correctly instead of some weird <c3>ish output. Add the following to your git configuration:

[diff "txt"]
    textconv = unwhite.sh

and, lacking a global solution, to .gitattributes something like

*.py diff=txt

Finally, unwhite.sh:

#!/bin/bash
awk 1 ORS='[7m\\n[27m\n' $1 | sed -e 's/ /␣/g' -e 's/\t/→/g'

Be advised there are raw escape (awk fails to support \e) characters before the [s, I display the newline-indicating \n in inverted colors to differ them from literal \ns. This may fail to copy paste, in which case you may have to manually insert them. Or try your luck with a unicode symbol such as instead.

I deviated from the original unicode symbols since they failed to display correctly on msysgit.