Is there a git command that can output for every commit:
- id
- subject
- blobs it created with they path and size (like
git ls-tree -l -r <commit>
but only for created blobs)
Is there a git command that can output for every commit:
git ls-tree -l -r <commit>
but only for created blobs)To get commits (all and output one line per commit):
git rev-list --all --pretty=oneline
Then split commits by space with limit of 2 and get every commit id and message
To get blobs created by commit (recurse to subdirs, show merge commits, detect renames and copies, don't show commit id on first line):
git diff-tree -r -c -M -C --no-commit-id <commit-sha>
A bit of parsing of every line and excluding some of them — and we get list of new blobs and they path for commit
Last is to get blob sizes:
git cat-file --batch-check < <list-of-blob-shas>
And another time a bit of parsing
Relying on git rev-list
is not always enough because it
List[s] commits that are reachable by following the parent links from the given commit(s) [..]
(git help rev-list
)
Thus it does not list commits that are on another branch and it does not list commits that are not reachable by any branch (perhaps they were created because of some rebase
and/or detached-head actions).
Similarly, git log
just follows the parent links from the current checked out commit. Again you don't see commits referenced by other branches or which are in a dangling state.
You can really get all commits with a command like this:
for i in `(find .git/objects -type f |
sed 's@^.*objects/\(..\)/\(.\+\)$@\1\2@' ;
git verify-pack -v .git/objects/pack/*.idx |
grep commit |
cut -f1 -d' '; ) | sort -u`
do
git log -1 --pretty=format:'%H %P %ai %s%n' $i
done
To keep it simple, the loop body prints for each commit one line containing its hash, the parent hash(es), date and subject. Note, to iterate over all commits you need to consider packed and not-yet packed objects.
You can print the referenced blobs (and only created ones) by calling git diff-tree $i
(and greping for capitial A
in the fifth column) from the loop body.
You can get everything but size out of the box. This one is pretty close:
git log --name-status
One solution based on tig's answer:
#!/usr/bin/perl
foreach my $rev (`git rev-list --all --pretty=oneline`) {
my $tot = 0;
($sha = $rev) =~ s/\s.*$//;
foreach my $blob (`git diff-tree -r -c -M -C --no-commit-id $sha`) {
$blob = (split /\s/, $blob)[3];
next if $blob == "0000000000000000000000000000000000000000"; # Deleted
my $size = `echo $blob | git cat-file --batch-check`;
$size = (split /\s/, $size)[2];
$tot += int($size);
}
print "$tot $rev" if $tot > 1000000; # Show only if > 1MiB
}
Maybe not the best code, but should get you most of the way.
Another useful command when searching for
git fsck --lost-found
will show dangling commits. I needed to use this to find a commit a i wiped with an ill-timed reset --hard
But don't take my word for it:
https://www.kernel.org/pub/software/scm/git/docs/git-fsck.html
You can also get a list of all commits (including the dangling ones) with:
git log --walk-reflogs | grep -E -o '[0-9a-f]{40}'
Include this line in the settings for a new view in gitk (in the last input field, the command to generate additional commits) and you will get a tree that also shows the 'forgotten history' of the project.