Git - how to list ALL objects in the database

2020-01-24 19:59发布

问题:

Is there a better way of getting a raw list of SHA1s for ALL objects in a repository than doing ls .git/objects/??/\* and cat .git/objects/pack/*.idx | git show-index?

I know about git rev-list --all but that only lists commit objects that are referenced by .git/refs, and I'm looking for everything including unreferenced objects that are created by git-hash-object, git-mktree etc.

回答1:

Edit: Aristotle posted an even better answer, which should be marked as correct.

Edit: the script contained a syntax error, missing backslash at the end of the grep -v line

Mark's answer worked for me, after a few modifications:

  • Used --git-dir instead of --show-cdup to support bare repos
  • Avoided error when there are no packs
  • Used perl because OS X Mountain Lion's BSD-style sed doesn't support -r

#!/bin/sh

set -e

cd "$(git rev-parse --git-dir)"

# Find all the objects that are in packs:

find objects/pack -name 'pack-*.idx' | while read p ; do
    git show-index < $p | cut -f 2 -d ' '
done

# And now find all loose objects:

find objects/ \
    | egrep '[0-9a-f]{38}' \
    | grep -v /pack/ \
    | perl -pe 's:^.*([0-9a-f][0-9a-f])/([0-9a-f]{38}):\1\2:' \
;


回答2:

Try

 git rev-list --objects --all

Edit Josh made a good point:

 git rev-list --objects -g --no-walk --all

list objects reachable from the ref-logs.

To see all objects in unreachable commits as well:

 git rev-list --objects --no-walk \
      $(git fsck --unreachable |
        grep '^unreachable commit' |
        cut -d' ' -f3)

Putting it all together, to really get all objects in the output format of rev-list --objects, you need something like

{
    git rev-list --objects --all
    git rev-list --objects -g --no-walk --all
    git rev-list --objects --no-walk \
        $(git fsck --unreachable |
          grep '^unreachable commit' |
          cut -d' ' -f3)
} | sort | uniq

To sort the output in slightly more useful way (by path for tree/blobs, commits first) use an additional | sort -k2 which will group all different blobs (revisions) for identical paths.



回答3:

I don’t know since when this option exists but you can

git cat-file --batch-check --batch-all-objects

This gives you, according to the man page,

all objects in the repository and any alternate object stores (not just reachable objects)

(emphasis mine).

By default this yields the object type and it’s size together with each hash but you can easily remove this information, e.g. with

git cat-file --batch-check --batch-all-objects | cut -d' ' -f1

or by giving a custom format to --batch-check.



回答4:

This is a more correct, simpler, and faster rendition of the script from the answers by Mark and by willkill.

  • It uses rev-parse --git-path to find the objects directory even in a more complex Git repository setup (e.g. in a multi-worktree situation or whatnot).

  • It avoids all unnecessary use of find, grep, perl, sed.

  • If works gracefully even if you have no loose objects or no packs (or neither… if you’re inclined to run this on a fresh repository).

  • It does, however, require a Bash from this millennium