Is there a better way of getting a raw list of SHA1s for ALL objects in a repository than doing ls .git/objects/??/\*
and cat .git/objects/pack/*.idx | git show-index
?
I know about git rev-list --all
but that only lists commit objects that are referenced by .git/refs, and I'm looking for everything including unreferenced objects that are created by git-hash-object, git-mktree etc.
Edit: Aristotle posted an even better answer, which should be marked as correct.
Edit: the script contained a syntax error, missing backslash at the end of the
grep -v
lineMark's answer worked for me, after a few modifications:
--git-dir
instead of--show-cdup
to support bare reposperl
because OS X Mountain Lion's BSD-stylesed
doesn't support-r
I don’t know since when this option exists but you can
This gives you, according to the man page,
(emphasis mine).
By default this yields the object type and it’s size together with each hash but you can easily remove this information, e.g. with
or by giving a custom format to
--batch-check
.The
git cat-file --batch-check --batch-all-objects
command, suggested in Erki Der Loony's answer, can be made faster with the new Git 2.19 (Q3 2018) option--unordered
.The API to iterate over all objects learned to optionally list objects in the order they appear in packfiles, which helps locality of access if the caller accesses these objects while as objects are enumerated.
See commit 0889aae, commit 79ed0a5, commit 54d2f0d, commit ced9fff (14 Aug 2018), and commit 0750bb5, commit b1adb38, commit aa2f5ef, commit 736eb88, commit 8b36155, commit a7ff6f5, commit 202e7f1 (10 Aug 2018) by Jeff King (
peff
). (Merged by Junio C Hamano --gitster
-- in commit 0c54cda, 20 Aug 2018)It is even faster in Git 2.20 (Q4 2018) with:
See commit 8c84ae6, commit 8b2f8cb, commit 9249ca2, commit 22a1646, commit bf73282 (04 Oct 2018) by René Scharfe (
rscharfe
).(Merged by Junio C Hamano --
gitster
-- in commit 82d0a8c, 19 Oct 2018)And with this patch:
Git 2.21 (Q1 2019) optimizes further the codepath to write out commit-graph, by following the usual pattern of visiting objects in in-pack order.
See commit d7574c9 (19 Jan 2019) by Ævar Arnfjörð Bjarmason (
avar
).(Merged by Junio C Hamano --
gitster
-- in commit 04d67b6, 05 Feb 2019)Git 2.23 (Q3 2019) improves "
git rev-list --objects
" which learned with "--no-object-names
" option to squelch the path to the object that is used as a grouping hint for pack-objects.See commit 42357b4 (19 Jun 2019) by Emily Shaffer (
nasamuffin
).(Merged by Junio C Hamano --
gitster
-- in commit f4f7e75, 09 Jul 2019)So that is the difference between:
And, with
--no-object-name
:Try
Edit Josh made a good point:
list objects reachable from the ref-logs.
To see all objects in unreachable commits as well:
Putting it all together, to really get all objects in the output format of
rev-list --objects
, you need something likeTo sort the output in slightly more useful way (by path for tree/blobs, commits first) use an additional
| sort -k2
which will group all different blobs (revisions) for identical paths.I don't know of an obviously better way than just looking at all the loose object files and the indices of all pack files. The format of the git repository is very stable, and with this method you don't have to rely on having exactly the right options to
git fsck
, which is classed as porcelain. I think this method is faster, as well. The following script shows all the objects in a repository:(My original version of this script was based on this useful script to find the largest objects in your pack files, but I switched to using
git show-index
, as suggested in your question.)I've made this script into a GitHub gist.
Another useful option is to use
git verify-pack -v <packfile>
verify-pack -v
lists all objects in the database along with their object type.