I'm working on a git course and wanted to mention that lost refs are not really lost until running git gc
. But verifying this, I found out that this is not the case. Even after running git gc --prune=all --aggressive
the lost refs are still there.
Clearly I misunderstood something. And before saying something incorrect in the course, I want to get my facts straight! Here is an example script illustrates the effect:
#!/bin/bash
git init
# add 10 dummy commits
for i in {1..10}; do
date > foo.txt
git add foo.txt
git commit -m "bump" foo.txt
sleep 1
done;
CURRENT=$(git rev-parse HEAD)
echo HEAD before reset: ${CURRENT}
# rewind
git reset --hard HEAD~5
# add another 10 commits
for i in {1..10}; do
date > foo.txt
git add foo.txt
git commit -m "bump" foo.txt
sleep 1
done;
This script will add 10 dummy commits, reset to 5 commits in the past and add another 10 commits. Just before resetting, it will print the hash of it's current HEAD.
I would expect to lose the object in CURRENT
after running git gc --prune=all
. Yet, I can still run git show
on that hash.
I do understand that after running git reset
and adding new commits, I have essentially created a new branch. But my original branch no longer has any reference, so it does not show up in git log --all
. It also would not be pushed to any remote I suppose.
My understanding of git gc
was that is removes those objects. This does not seem to be the case.
Why? And when exactly does git gc
remove objects?
For an object to be pruned, it must meet two criteria. One is date/time related: it must have been created1 long enough ago to be ripe for collection. The "long enough ago" part is what you are setting with --prune=all
: you're overriding the normal "at least two weeks old" setting.
The second criterion is where your experiment is going wrong. To be pruned, the object must also be unreachable. As twalberg noted in a comment, each of your ostensibly-abandoned commits (and hence their corresponding trees and blobs) is actually referenced, through Git's "reflog" entries.
There are two reflog entries for each such commit: one for HEAD
, and one for the branch name to which HEAD
itself referred at the time the commit was made (in this case, the reflog for refs/heads/master
, i.e., branch master
). Each reflog entry has its own time-stamp, and git gc
also expires reflog entries for you, although with a more complex set of rules than the simple "14 days" default for object expiry.2
Hence, git gc
could first delete all reflog entries that are keeping the old object around, then prune the object. It just is not happening here.
To view, or even delete, reflog entries manually, use git reflog
. Note that git reflog
displays entries by running git log
with the -g
/ --walk-reflogs
option (plus some additional display formatting options). You can run git reflog --all --expire=all
to clear everything out, though this is a bludgeon when a scalpel may be more appropriate. Use --expire-unreachable
for a bit more selectivity. For more about this, see the git log
documentation and of course the git reflog
documentation.
1Some Unix-y file systems do not store file creation ("birth") time at all: the st_ctime
field of a stat
structure is the inode change time, not the creation time. If there is a creation time, it is in st_birthtime
or st_birthtimespec
.3 However, every Git object is read-only, so the file's creation time is also its modification time. Hence st_mtime
, which is always available, gives the creation time for the object.
2The exact rules are described in the git gc
documentation, but I think By default, 30 days for unreachable commits and 90 days for reachable commits is a decent summary. The definition of reachable here is unusual, though: it means reachable from the current value of the reference for which this reflog holds old values. That is, if we're looking at the reflog for master
, we find the commit that master
identifies (e.g., 1234567
), then see if each reflog entry for master
(e.g., master@{27}
) is reachable from that particular commit (1234567
again).
3This particular name confusion is brought to you by the POSIX standardization folks. :-) The st_birthtimespec
field is a struct timespec
, which records both seconds and nanoseconds.