Git - get all commits and blobs they created

2019-01-17 01:44发布

Is there a git command that can output for every commit:

  1. id
  2. subject
  3. blobs it created with they path and size (like git ls-tree -l -r <commit> but only for created blobs)

标签: git blob commit
6条回答
forever°为你锁心
2楼-- · 2019-01-17 01:51

One solution based on tig's answer:

#!/usr/bin/perl

foreach my $rev (`git rev-list --all --pretty=oneline`) {
  my $tot = 0;
  ($sha = $rev) =~ s/\s.*$//;
  foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
    $blob = (split /\s/, $blob)[3];
    next if $blob == "0000000000000000000000000000000000000000"; # Deleted
    my $size = `echo $blob | git cat-file --batch-check`;
    $size = (split /\s/, $size)[2];
    $tot += int($size);
  }
  print "$tot $rev" if $tot > 1000000; # Show only if > 1MiB
}

Maybe not the best code, but should get you most of the way.

查看更多
爱情/是我丢掉的垃圾
3楼-- · 2019-01-17 01:52

Another useful command when searching for

git fsck --lost-found

will show dangling commits. I needed to use this to find a commit a i wiped with an ill-timed reset --hard

But don't take my word for it:

https://www.kernel.org/pub/software/scm/git/docs/git-fsck.html

查看更多
Viruses.
4楼-- · 2019-01-17 01:58

To get commits (all and output one line per commit):

git rev-list --all --pretty=oneline

Then split commits by space with limit of 2 and get every commit id and message

To get blobs created by commit (recurse to subdirs, show merge commits, detect renames and copies, don't show commit id on first line):

git diff-tree -r -c -M -C --no-commit-id <commit-sha>

A bit of parsing of every line and excluding some of them — and we get list of new blobs and they path for commit

Last is to get blob sizes:

git cat-file --batch-check < <list-of-blob-shas>

And another time a bit of parsing

查看更多
爷的心禁止访问
5楼-- · 2019-01-17 01:59

You can also get a list of all commits (including the dangling ones) with:

git log --walk-reflogs | grep -E -o '[0-9a-f]{40}'

Include this line in the settings for a new view in gitk (in the last input field, the command to generate additional commits) and you will get a tree that also shows the 'forgotten history' of the project.

查看更多
看我几分像从前
6楼-- · 2019-01-17 02:00

Relying on git rev-list is not always enough because it

List[s] commits that are reachable by following the parent links from the given commit(s) [..]

(git help rev-list)

Thus it does not list commits that are on another branch and it does not list commits that are not reachable by any branch (perhaps they were created because of some rebase and/or detached-head actions).

Similarly, git log just follows the parent links from the current checked out commit. Again you don't see commits referenced by other branches or which are in a dangling state.

You can really get all commits with a command like this:

for i in `(find .git/objects  -type f |
             sed 's@^.*objects/\(..\)/\(.\+\)$@\1\2@' ;
           git verify-pack -v .git/objects/pack/*.idx  |
             grep commit |
             cut -f1 -d' '; ) | sort -u`
  do
  git log -1 --pretty=format:'%H %P %ai %s%n'  $i
done

To keep it simple, the loop body prints for each commit one line containing its hash, the parent hash(es), date and subject. Note, to iterate over all commits you need to consider packed and not-yet packed objects.

You can print the referenced blobs (and only created ones) by calling git diff-tree $i (and greping for capitial A in the fifth column) from the loop body.

查看更多
爷、活的狠高调
7楼-- · 2019-01-17 02:17

You can get everything but size out of the box. This one is pretty close:

git log --name-status
查看更多
登录 后发表回答