Is there any difference between git gc
and git repack -ad; git prune
?
If yes, what additional steps will be done by git gc
(or vice versa)?
Which one is better to use in regard to space optimization or safety?
问题:
回答1:
Is there any difference between
git gc
andgit repack -ad; git prune
?
The difference is that by default git gc
is very conservative about what housekeeping tasks are needed. For example, it won't run git repack
unless the number of loose objects in the repository is above a certain threshold (configurable via the gc.auto
variable). Also, git gc
is going to run more tasks than just git repack
and git prune
.
If yes, what additional steps will be done by
git gc
(or vice versa)?
According to the documentation, git gc
runs:
git-prune
git-reflog
git-repack
git-rerere
More specifically, by looking at the source code of gc.c
(lines 338-343)1 we can see that it invokes at the most the following commands:
pack-refs --all --prune
reflog expire --all
repack -d -l
prune --expire
worktree prune --expire
rerere gc
Depending on the number of packs (lines 121-126), it may run repack
with -A
option instead (lines 203-212):
* If there are too many loose objects, but not too many * packs, we run "repack -d -l". If there are too many packs, * we run "repack -A -d -l". Otherwise we tell the caller * there is no need. if (too_many_packs()) add_repack_all_option(); else if (!too_many_loose_objects()) return 0;
Notice on line 211-212 of the need_for_gc
function that if there aren't enough loose objects in the repository, gc
is not run at all.
This is further clarified in the documentation:
Housekeeping is required if there are too many loose objects or too many packs in the repository. If the number of loose objects exceeds the value of the
gc.auto
configuration variable, then all loose objects are combined into a single pack usinggit repack -d -l
. Setting the value ofgc.auto
to0
disables automatic packing of loose objects.If the number of packs exceeds the value of
gc.autoPackLimit
, then existing packs (except those marked with a.keep
file) are consolidated into a single pack by using the-A
option ofgit repack
.
As you can see, git gc
strives to do the right thing based on the state of the repository.
Which one is better to use in regard to space optimization or safety?
In general it's better to run git gc --auto
simply because it will do the least amount of work necessary to keep the repository in good shape – safely and without wasting too many resources.
However, keep in mind that a garbage collection may already be triggered automatically following certain commands, unless this behavior is disabled by the setting the gc.auto
configuration variable to 0
.
From the documentation:
--auto
With this option,git gc
checks whether any housekeeping is required; if not, it exits without performing any work. Some git commands rungit gc --auto
after performing operations that could create many loose objects.
So for most repositories you shouldn't need to explicitly run git gc
all that often, since it will already be taken care of for you.
1. As of commit a0a1831
made on 2016-08-08.
回答2:
git help gc
contains a few hints...
The optional configuration variable gc.rerereresolved indicates how long records of conflicted merge you resolved earlier are kept.
The optional configuration variable gc.rerereunresolved indicates how long records of conflicted merge you have not resolved are kept.
I believe those are not done if you only do git repack -ad; git prune
.
回答3:
Note that, which git prune
is run by git gc
, the former has evolved with Git 2.22 (Q2 2019)
"git prune
" has been taught to take advantage of reachability bitmap when able.
See commit cc80c95, commit c2bf473, commit fde67d6, commit d55a30b (14 Feb 2019) by Jeff King (peff
).
(Merged by Junio C Hamano -- gitster
-- in commit f7213a3, 07 Mar 2019)
prune
: use bitmaps for reachability traversalPruning generally has to traverse the whole commit graph in order to see which objects are reachable.
This is the exact problem that reachability bitmaps were meant to solve, so let's use them (if they're available, of course).
See reachability bitmap here.
Here are timings on git.git:
Test HEAD^ HEAD ------------------------------------------------------------------------ 5304.6: prune with bitmaps 3.65(3.56+0.09) 1.01(0.92+0.08) -72.3%
And on linux.git:
Test HEAD^ HEAD -------------------------------------------------------------------------- 5304.6: prune with bitmaps 35.05(34.79+0.23) 3.00(2.78+0.21) -91.4%
The tests show a pretty optimal case, as we'll have just repacked and should have pretty good coverage of all refs with our bitmaps.
But that's actually pretty realistic: normally prune is run via "gc
" right after repacking.Notes on the implementation: the change is actually in
reachable.c
, so it would improve reachability traversals by "reflog expire --stale-fix
", as well.
Those aren't performed regularly, though (a normal "git gc
" doesn't use--stale-fix
), so they're not really worth measuring. There's a low chance of regressing that caller, since the use of bitmaps is totally transparent from the caller's perspective.