Git - get all commits and blobs they created

2019-01-17 01:55发布

问题:

Is there a git command that can output for every commit:

  1. id
  2. subject
  3. blobs it created with they path and size (like git ls-tree -l -r <commit> but only for created blobs)

回答1:

To get commits (all and output one line per commit):

git rev-list --all --pretty=oneline

Then split commits by space with limit of 2 and get every commit id and message

To get blobs created by commit (recurse to subdirs, show merge commits, detect renames and copies, don't show commit id on first line):

git diff-tree -r -c -M -C --no-commit-id <commit-sha>

A bit of parsing of every line and excluding some of them — and we get list of new blobs and they path for commit

Last is to get blob sizes:

git cat-file --batch-check < <list-of-blob-shas>

And another time a bit of parsing



回答2:

Relying on git rev-list is not always enough because it

List[s] commits that are reachable by following the parent links from the given commit(s) [..]

(git help rev-list)

Thus it does not list commits that are on another branch and it does not list commits that are not reachable by any branch (perhaps they were created because of some rebase and/or detached-head actions).

Similarly, git log just follows the parent links from the current checked out commit. Again you don't see commits referenced by other branches or which are in a dangling state.

You can really get all commits with a command like this:

for i in `(find .git/objects  -type f |
             sed 's@^.*objects/\(..\)/\(.\+\)$@\1\2@' ;
           git verify-pack -v .git/objects/pack/*.idx  |
             grep commit |
             cut -f1 -d' '; ) | sort -u`
  do
  git log -1 --pretty=format:'%H %P %ai %s%n'  $i
done

To keep it simple, the loop body prints for each commit one line containing its hash, the parent hash(es), date and subject. Note, to iterate over all commits you need to consider packed and not-yet packed objects.

You can print the referenced blobs (and only created ones) by calling git diff-tree $i (and greping for capitial A in the fifth column) from the loop body.



回答3:

You can get everything but size out of the box. This one is pretty close:

git log --name-status


回答4:

One solution based on tig's answer:

#!/usr/bin/perl

foreach my $rev (`git rev-list --all --pretty=oneline`) {
  my $tot = 0;
  ($sha = $rev) =~ s/\s.*$//;
  foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
    $blob = (split /\s/, $blob)[3];
    next if $blob == "0000000000000000000000000000000000000000"; # Deleted
    my $size = `echo $blob | git cat-file --batch-check`;
    $size = (split /\s/, $size)[2];
    $tot += int($size);
  }
  print "$tot $rev" if $tot > 1000000; # Show only if > 1MiB
}

Maybe not the best code, but should get you most of the way.



回答5:

Another useful command when searching for

git fsck --lost-found

will show dangling commits. I needed to use this to find a commit a i wiped with an ill-timed reset --hard

But don't take my word for it:

https://www.kernel.org/pub/software/scm/git/docs/git-fsck.html



回答6:

You can also get a list of all commits (including the dangling ones) with:

git log --walk-reflogs | grep -E -o '[0-9a-f]{40}'

Include this line in the settings for a new view in gitk (in the last input field, the command to generate additional commits) and you will get a tree that also shows the 'forgotten history' of the project.



标签: git blob commit