Is there a better way of getting a raw list of SHA1s for ALL objects in a repository than doing ls .git/objects/??/\*
and cat .git/objects/pack/*.idx | git show-index
?
I know about git rev-list --all
but that only lists commit objects that are referenced by .git/refs, and I'm looking for everything including unreferenced objects that are created by git-hash-object, git-mktree etc.
Edit: Aristotle posted an even better answer, which should be marked as correct.
Edit: the script contained a syntax error, missing backslash at the end of the grep -v
line
Mark's answer worked for me, after a few modifications:
- Used
--git-dir
instead of --show-cdup
to support bare repos
- Avoided error when there are no packs
- Used
perl
because OS X Mountain Lion's BSD-style sed
doesn't support -r
#!/bin/sh
set -e
cd "$(git rev-parse --git-dir)"
# Find all the objects that are in packs:
find objects/pack -name 'pack-*.idx' | while read p ; do
git show-index < $p | cut -f 2 -d ' '
done
# And now find all loose objects:
find objects/ \
| egrep '[0-9a-f]{38}' \
| grep -v /pack/ \
| perl -pe 's:^.*([0-9a-f][0-9a-f])/([0-9a-f]{38}):\1\2:' \
;
Try
git rev-list --objects --all
Edit Josh made a good point:
git rev-list --objects -g --no-walk --all
list objects reachable from the ref-logs.
To see all objects in unreachable commits as well:
git rev-list --objects --no-walk \
$(git fsck --unreachable |
grep '^unreachable commit' |
cut -d' ' -f3)
Putting it all together, to really get all objects in the output format of rev-list --objects
, you need something like
{
git rev-list --objects --all
git rev-list --objects -g --no-walk --all
git rev-list --objects --no-walk \
$(git fsck --unreachable |
grep '^unreachable commit' |
cut -d' ' -f3)
} | sort | uniq
To sort the output in slightly more useful way (by path for tree/blobs, commits first) use an additional | sort -k2
which will group all different blobs (revisions) for identical paths.
I don’t know since when this option exists but you can
git cat-file --batch-check --batch-all-objects
This gives you, according to the man page,
all objects in the repository and any alternate object stores (not just reachable objects)
(emphasis mine).
By default this yields the object type and it’s size together with each hash but you can easily remove this information, e.g. with
git cat-file --batch-check --batch-all-objects | cut -d' ' -f1
or by giving a custom format to --batch-check
.
This is a more correct, simpler, and faster rendition of the script from the answers by Mark and by willkill.
It uses rev-parse --git-path
to find the objects
directory even in a more complex Git repository setup (e.g. in a multi-worktree situation or whatnot).
It avoids all unnecessary use of find
, grep
, perl
, sed
.
If works gracefully even if you have no loose objects or no packs (or neither… if you’re inclined to run this on a fresh repository).
It does, however, require a Bash from this millennium