git: number of lines *not* changed since specific

2019-07-25 17:32发布

问题:

There are plenty of answers with great command line fu to find changes (or change statistics), but I'd like to find the opposite: how many lines (per file) have not changed since a particular commit?

The closest I could find is this: How to find which files have not changed since commit? but I'd like to know how many lines (ideally: in each file) have survived unchanged, not which files.

So, basically: can git diff --stat output unchanged lines in addition to insertions and deletions?

Alternatively, I'd imagine that git ls-files, git blame and some awk magic might do the trick, but I haven't been able to figure it out quite yet. -- For example, rather than label each line with the commit number of the last change, can I get git-blame to indicate if this change occurred before or after a given commit? Together with grep and wc -l that would get me there.

回答1:

Figured it out. The key is that git blame can specify date ranges (see https://git-scm.com/docs/git-blame, section "SPECIFYING RANGES"). Assume 123456 is the commit I want to compare to. With

git blame 123456..

"lines that have not changed since the range boundary [...] are blamed for that range boundary commit", that is, it will show everything that hasn't changed since that commit as "^123456". Thus, per file, the answer to my question is

git blame 123456.. $file | grep -P "^\^123456" | wc -l # unchanged since
git blame 123456.. $file | grep -Pv "^\^123456" | wc -l # new since

Wrapped into bash script to go over all files in repo (git ls-files) and printing pretty:

#!/bin/bash

total_lines=0;
total_lines_unchanged=0;
total_lines_new=0;

echo "--- total unchanged new filename ---"

for file in `git ls-files | \
  <can do some filtering of files here with grep>`
do
  # calculate stats for this file
  lines=`cat $file | wc -l`
  lines_unchanged=`git blame 123456.. $file | grep -P "^\^123456" | wc -l`
  lines_new=`git blame 123456.. $file | grep -Pv "^\^123456" | wc -l`

  # print pretty
  lines_pretty="$(printf "%6d" $lines)"
  lines_unchanged_pretty="$(printf "%6d" $lines_unchanged)"
  lines_new_pretty="$(printf "%6d" $lines_new)"
  echo "$lines_pretty $lines_unchanged_pretty $lines_new_pretty $file"

  # add to total
  total_lines=$(($total_lines + $lines))
  total_lines_unchanged=$(($total_lines_unchanged + $lines_unchanged))
  total_lines_new=$(($total_lines_new + $lines_new))
done

# print total
echo "--- total unchanged new ---"

lines_pretty="$(printf "%6d" $total_lines)"
lines_unchanged_pretty="$(printf "%6d" $total_lines_unchanged)"
lines_new_pretty="$(printf "%6d" $total_lines_new)"
echo "$lines_pretty $lines_unchanged_pretty $lines_new_pretty TOTAL"

Thanks to Gregg for his answer, which had me look into the options for git-blame more!



回答2:

git diff HEAD~ HEAD && echo files that changed
git rev-parse HEAD && echo hash of current rev
wc -l <filename> && echo total lines
git blame <filename> | grep -v -c -e"<first8bytesofhash>"  && echo unchanged lines
git blame <filename> | grep -c -e"<first8bytesofhash>" && echo changed lines


回答3:

I try with Python:

import commands
s,o=commands.getstatusoutput('git tag start')
s,o=commands.getstatusoutput('git log --pretty=%H --max-parents=0')
roots=o.split()
result=set()
for root in roots:
  s,o=commands.getstatusoutput('git reset root')
  s,o=commands.getstatusoutput('git ls-files')
  all=set(o.split())
  s,o=commands.getstatusoutput('git ls-files --modified')
  modified=set(o.split())
  unchanged=all-modified
  result=result|unchanged
print result
s,o=commands.getstatusoutput('git reset start --hard')


回答4:

$ wc -l main.c
718 main.c
$ git diff --numstat v2.0.0 main.c
152     70      main.c

That's 152 lines of the current main.c are changed or added since v2.0.0, so 566 lines are unchanged since then.

lines-unchanged-in-since () {
        set -- $2 `wc -l $1` `git diff --numstat $2 $1` 
        echo $(($2-$4)) lines unchanged in $3 since $1
}